KubeCraftJobs

DevOps & Cloud Job Board

SRE Consultant

Jobs via Dice

Santa Clara, CA

On-site
Senior
Full Time
Posted January 03, 2026

Tech Stack

nvidia jenkins python elk kvm prometheus grafana golang kubernetes mysql appcast

Please log in or register to view job application links.

Job Description

Dice is the leading career destination for tech experts at every stage of their careers. Our client, Cardinal Integrated Technologies Inc, is seeking the following. Apply via Dice today! **Job Title: SRE Consultant(20309-1)** **Location: Santa Clara, CA (Onsite 5 days a week)** **Duration: 6-12+ Months Contract** **Must Have Skills** Skill 1 Manage Nvidia's on-prem infrastructure. Maintain uptime, reliability and readiness of on-prem engineering cloud spread across multiple data Centers. Skill 2 Maintain KPI pipelines using Jenkins, Python and ELK. Skill 3 Baremetal data centre machine management tools like IPMI, Redfish, KVM **Requirements/Skills:** - Manage Nvidia's on-prem infrastructure. Maintain uptime, reliability and readiness of on-prem engineering cloud spread across multiple data centers. - Guard service level agreements (SLAs) for critical engineering services. Implement monitoring, alerting, and incident response procedures to ensure adherence to defined performance targets. Perform root cause analysis and post-mortems of incidents for any threshold breaches. **Observability** - Set up and manage monitoring and logging tools such as Prometheus, Grafana, or the ELK Stack to oversee system health and performance. Maintain KPI pipelines using Jenkins, Python and ELK. - Improve monitoring systems by adding custom alerts based on business needs. - Help in capacity planning, optimization and better utilization efforts. **Day-to-Day Support** - Support user reported issues & issues. Monitor alerts and take necessary action. - Actively participate in WAR room for critical issues - Create and maintain documentation for operational procedures, configurations, and troubleshooting guides. - Baremetal data center machine management tools like IPMI, Redfish, KVM etc. - Automation using Jenkins, Python, Go, Bash. - Infrastructure tools like Kubernetes, MySQL, Prometheus, Grafana and ELK. - Any familiarity with Nvidia hardware like GPU & Tegras is a plus -- Best Regards: **Asif Khan |Assistant Manager Recruitment | Cardinal Integrated Technologies**