Location: London 2 days a week in office
Type: Permanent
My client are seeking an experienced Site Reliability Engineering Manager to lead a high-performing team responsible for the stability, scalability, and performance of our on-premise infrastructure. The ideal candidate brings a strong background in Linux systems (especially Red Hat), automation with Python, and hands-on experience in managing mission-critical environments.
You will play a pivotal role in guiding technical direction, mentoring engineers, and ensuring the reliability of our platforms used across the organization.
Lead and mentor a team of SREs, fostering a culture of reliability, ownership, and continuous improvement.
Architect and maintain highly available on-premise systems and services running on Linux (primarily RHEL).
Drive automation initiatives using Python to streamline operations and reduce manual interventions.
Collaborate with development and operations teams to define SLAs, SLOs, and incident management protocols.
Oversee system monitoring, alerting, and capacity planning to proactively prevent issues.
Manage upgrades, patches, and lifecycle management for Red Hat Linux systems and related infrastructure.
Develop and enforce best practices for CI/CD pipelines and Infrastructure as Code (IaC).
Partner with InfoSec and compliance teams to ensure systems meet regulatory and security requirements.
Proven experience leading or managing SRE, DevOps, or Systems Engineering teams.
Deep hands-on experience with Linux systems administration, especially Red Hat Enterprise Linux (RHEL).
Strong programming/scripting skills in Python for automation and tooling.
Experience with on-premise infrastructure and hybrid environments.
Excellent understanding of system internals, networking, monitoring tools, and performance tuning.
Familiarity with configuration management and automation tools (e.g., Ansible, Puppet, or Terraform).
Track record of designing reliable systems and implementing monitoring & alerting best practices.
Experience with container technologies (Docker, Kubernetes) in on-prem or hybrid cloud setups.
Exposure to ITIL practices or enterprise IT operations environments.
Red Hat certifications (RHCSA, RHCE) are a plus.