Operative is a revenue accelerant for media companies around the world. No other software company in AdTech space, brings a comparable depth of experience to create truly innovative software that performs across all platforms, revenue models and business units. We are a SAAS (Software as a Service) platform which helps clients manage advertisements both in the linear (TV) and digital space. We have been in the market for over two decades and have 1100+ employees with 12 offices spread across the globe. Operative is proud to play a pivotal role in the way advertising is bought, sold and managed across media industry.

Job Summary

As a Site Reliability Engineer (SRE) at Operative, you will play a key role in designing, automating, and operating scalable and secure systems that support our SaaS offerings in Google Cloud Platform (GCP) and Amazon Web Services (AWS). You will be part of the CloudOps team with a mission to improve observability, automate operations, and drive infrastructure-as-code and deployment best practices across our product suite. You will collaborate closely with developers, product managers, and operations teams, acting as both a technical contributor and people enabler. The role blends hands-on engineering with leadership, and requires an understanding of modern DevOps/SRE practices in cloud-native environments.

Responsibilities

· Collaborate with ProdOps and engineering teams to support delivery objectives and unblock project constraints.

· Champion SRE and CloudOps best practices across the engineering organization.

· Lead technical projects and contribute to infrastructure automation and CI/CD modernization.

· Design and implement highly available, resilient systems on GCP, AWS using Infrastructure as Code (IaC).

· Automate maintenance tasks for cloud-based systems to improve reliability and reduce manual effort.

· Ensure production systems meet SLAs (e.g., 99.99% uptime), with proactive monitoring, alerting, and capacity planning.

· Act as a point of escalation for complex infrastructure or deployment issues.

· Help ensure production environments meet compliance and audit standards (e.g., SOC2, ISO 27001).

· Participate in 24x7x365 on-call rotation and incident response efforts.

· Write and maintain documentation for infrastructure and operational procedures.

Must-Have Skills

· 5+ years in a combination of SRE, CloudOps, DevOps, or system engineering roles supporting production environments

· Strong hands-on experience with Google Cloud Platform (GCP) services (e.g., Compute Engine, Cloud Run, GKE, Cloud Functions, Cloud Storage) and AWS Services.

· Solid skills in Infrastructure as Code (IaC) with tools like Terraform or Cloud Deployment Manager

· Deep experience with CI/CD pipelines, ideally with tools such as GitLab CI, Jenkins, or TeamCity

· Strong background in Linux system administration

· Proficiency with automation/configuration management using tools such as Ansible, Chef, or Puppet

· Proficiency in monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Stackdriver/Cloud Operations)

· Hands-on experience with Docker and Kubernetes or GKE (Google Kubernetes Engine), EKS.

· Experience in security best practices for cloud platforms and production workloads

· Familiarity with version control systems such as Git

· Excellent scripting ability (e.g., Bash, Python, or PowerShell)

· Effective communication and collaboration skills across teams

Nice-to-Have Skills

· Experience with Kafka, AWS RDS, or other event streaming and database technologies

· Experience supporting microservices and service-oriented architectures (SOA)

· Familiarity with multi-cloud environments (e.g., Azure alongside GCP and AWS)

· Knowledge of cloud cost optimization strategies

· Previous experience helping organizations move toward true continuous deployment

· Experience with compliance automation or infrastructure audits (e.g., SOC2,ISO27001)

· Familiarity with Agile processes and tools like Jira

· Experience with Netflix OSS tools or custom reliability frameworks

· Use of Cloud Functions or Serverless frameworks in GCP, AWS

Why join us ?

Operative is a technology-oriented product organization that believes in empowering its people
We use the latest tech stack and empower our engineers to learn, work and ideate on new technologies available in the market
We provide flexi work schedules and remote working to encourage work life balance
We are an equal opportunities employer and recruit based on the experience and skill set.
We offer a competitive salary and benefits package

Please apply online and upload your CV.

“Operative is a merit-first, equal opportunity employer; diverse applications are encouraged.”

Operative cares about your privacy and protecting your data. By submitting an application for a position with Operative, you acknowledge that you have read the following and consent to how Operative treats your data: 1) the Candidate Privacy Policy available at https://www.operative.com/candidate-privacy-notice/ (or if you are a candidate from Israel the Candidate Privacy Notice (Israel), available at https://www.operative.com/candidate-privacy-notice-israel/, and 2) the Candidate Notice for Data Transfer and Retention available at https://www.operative.com/candidate-notice/.

This job is no longer accepting applications

See open jobs at Operative.See open jobs similar to "Senior Site Reliability Engineer" Francisco Partners.

See more open positions at Operative

Privacy policy Cookie policy