KubeCraftJobs

DevOps & Cloud Job Board

PowerVS Support and Site Reliability Engineering Manager

IBM

Bengaluru East, Karnataka

Hybrid
Senior
Full Time
Posted January 09, 2026

Tech Stack

provide-support docker kubernetes prometheus new-relic instana jenkins tekton avature

Please log in or register to view job application links.

Job Description

**Introduction** At IBM Infrastructure & Technology, we design and operate the systems that keep the world running. From high-resiliency mainframes and hybrid cloud platforms to networking, automation, and site reliability. Our teams ensure the performance, security, and scalability that clients and industries depend on every day. Working in Infrastructure & Technology means tackling complex challenges with curiosity and collaboration. You’ll work with diverse technologies and colleagues worldwide to deliver resilient, future-ready solutions that power innovation. With continuous learning, career growth, and a supportive culture, IBM provides the opportunities to build expertise and shape the infrastructure that drives progress. Site Reliability engineers apply Software Engineering principles to perform infrastructure management tasks more efficiently. They are focused on reliability and resiliency, and build systems which proactively detect issues before they cause customer impact. They are responsible for maintaining a high-performance, secure, and stable infrastructure for our clients. Additionally, SREs resolve customer issues and problems detected through monitoring. They participate in datacenter build and configuration activities, performing tests, and deploy new features and capacity. **Your Role And Responsibilities** The PowerVS Support and SRE Manager will be the responsible for support and operations across our cloud infrastructure serving IBM Power. You will perform team management and mentor support and maintenance and operations SREs to increase the knowledge and performance of the collective team. In addition, you will interface with senior architects across several locations and business to ensure that the overall strategy is communicated and understood. **Responsibilities** - Lead the support, operations, and maintenance for Power Cloud infrastructure supporting a key client datacenter - Have a strong understanding of an SRE model in order to build an SRE culture - Create practices and procedures utilizing best practices for Cloud infrastructure support and operations - Direct resources to diagnose and resolve complex system, application software, security and related problems that impact system and availability - Proactively identify issues and improvement opportunities - Provide support for production escalations and problem resolution for customers - Lead the evaluation/evolution of tools/technologies/programs with input from internal teams, external developers **Preferred Education** Master's Degree **Required Technical And Professional Expertise** 5+ years of experience supporting production in Power and or Cloud services 3+ years of experience in a leadership and/or management role Experience in Cloud, DevOps, Support, Site Reliability Engineering **Preferred Technical And Professional Experience** - 5+ years of experience in software development - Expertise in Microservice Architecture, Docker, Kubernetes, and other Cloud native technologies - Debugging/Monitoring knowledge of Cloud Native Applications using Devops Tools such as Prometheus, NewRelic, Instana and others - Understanding of Devops Lifecycle and associated tools such as Git, CICD tools like Jenkins, Tekton, Travis and others - Understanding of Cloud Computing (IaaS, PaaS, SaaS) and security principles - Experience with Storage Cloud services