Salary:
£135000-150000 - Per Annum
Locations:
London, London, United Kingdom
Type:
Permanent
Published:
April 25, 2025
Contact:
Ike Feehi
Ref:
16018
Required Skills:
SRE,DevOps
Share this job
Apply

Job Title: Site Reliability Engineering (SRE) Manager

Location: London 2 days a week in office
Type: Permanent 

About the Role

My client are seeking an experienced Site Reliability Engineering Manager to lead a high-performing team responsible for the stability, scalability, and performance of our on-premise infrastructure. The ideal candidate brings a strong background in Linux systems (especially Red Hat), automation with Python, and hands-on experience in managing mission-critical environments.

You will play a pivotal role in guiding technical direction, mentoring engineers, and ensuring the reliability of our platforms used across the organization.


Key Responsibilities

  • Lead and mentor a team of SREs, fostering a culture of reliability, ownership, and continuous improvement.

  • Architect and maintain highly available on-premise systems and services running on Linux (primarily RHEL).

  • Drive automation initiatives using Python to streamline operations and reduce manual interventions.

  • Collaborate with development and operations teams to define SLAs, SLOs, and incident management protocols.

  • Oversee system monitoring, alerting, and capacity planning to proactively prevent issues.

  • Manage upgrades, patches, and lifecycle management for Red Hat Linux systems and related infrastructure.

  • Develop and enforce best practices for CI/CD pipelines and Infrastructure as Code (IaC).

  • Partner with InfoSec and compliance teams to ensure systems meet regulatory and security requirements.


Requirements

Must-Have:

  • Proven experience leading or managing SRE, DevOps, or Systems Engineering teams.

  • Deep hands-on experience with Linux systems administration, especially Red Hat Enterprise Linux (RHEL).

  • Strong programming/scripting skills in Python for automation and tooling.

  • Experience with on-premise infrastructure and hybrid environments.

  • Excellent understanding of system internals, networking, monitoring tools, and performance tuning.

  • Familiarity with configuration management and automation tools (e.g., Ansible, Puppet, or Terraform).

  • Track record of designing reliable systems and implementing monitoring & alerting best practices.

Nice-to-Have:

  • Experience with container technologies (Docker, Kubernetes) in on-prem or hybrid cloud setups.

  • Exposure to ITIL practices or enterprise IT operations environments.

  • Red Hat certifications (RHCSA, RHCE) are a plus.

Apply