On-site
Senior
Posted January 06, 2026
Tech Stack
splunk
grafana
prometheus
docker
kubernetes
vagrant
python
golang
swift
amazon-web-services
google-cloud-platform
microsoft-azure
terraform
pulumi
ansible
chef
jenkins
Job Description
Imagine what you could accomplish here. Bring your passion, creativity, and dedication, and there will be no limit to what you can achieve. This is not just another SRE role - it’s a chance to help redefine how reliability engineering is practiced at hyper-scale. Our team is building the platforms that will autonomously operate Apple’s core information security systems, setting a new bar for how critical services are managed.
**Description**
We are seeking exceptional engineers who thrive at the intersection of reliability, software development and automation - individuals driven to push the boundaries of what’s possible. The ideal candidate has a strong foundation in modern SRE practices and a proven ability to design and implement software that solves operational challenges. You’ll break new ground using the most advanced tools and approaches available, developing automation that doesn’t just keep pace with scale but anticipates, reacts and stays ahead of it.
You will work closely with Security Engineering, Threat Detection, Incident Response and other internal functions to ensure the scalability, availability and security of the tools and infrastructure that support Apple’s cybersecurity mission.
Join us, and help build the future of self-managing systems at one of the most innovative companies in the world. Our team is highly collaborative, working closely with partner teams to deliver the best results for Apple. We strive to find the best solution while also considering the need to get things done efficiently for each engineering challenge we face. Good ideas are valued and rewarded.","responsibilities":"Operate, monitor, and triage all aspects of our production and non-production environments
Pioneer and implement the next generation telemetry system for AIS services
Establish alert handling procedures, run-books, and collaborate with our global security team
Automate deployment and orchestration of services into the cloud environment as well as other routine processes
Actively participate in capacity planning and disaster recovery exercises
Interact with and support partner teams across the enterprise
Cultivate and maintain relationships with internal and external third party vendors
**Preferred Qualifications**
Experience or experimentation building systems that leverage Agentic AI principles, tools, platforms and frameworks
Strong understanding and experience in implementing monitoring and observability tools like Splunk, Grafana, Prometheus
Building and operating container orchestrating systems (Docker, Kubernetes, Vagrant and micro-services)
Experience administering and troubleshooting Linux systems including the usage of standard Linux utilities
Experience in shell scripting (e.g., bash/zsh) and system administration
Experience with measuring, analyzing, and optimizing system performance
Passion for high-quality code, tests, documentation and production services
Participation in an on-call rotation
Bachelor’s degree in Computer Science, or a related field, or equivalent practical experience
**Minimum Qualifications**
Proven experience in Site Reliability Engineering or a related field
Strong programming skills: Python, Go or Swift
Experience working with cloud compute environments like AWS, GCP or Azure
Experience with infrastructure as code (IaC), configuration management, CI/CD, and automation, e.g., Terraform, Pulumi, CloudFormation, Ansible, Chef, Puppet, Jenkins
Cloud deployment and CI/CD problem diagnosis and troubleshooting
Apple is an equal opportunity employer that is committed to inclusion and diversity. Apple provides reasonable accommodations to applicants with disabilities and in accordance with local requirements. Apple is a drug-free workplace.