Role Overview
We are looking for a Senior DevOps / SRE Team Lead who can own reliability and platform strategy, lead technical direction, and coordinate across engineering, product, and business teams.
This role is a hands-on senior technical leader:
You design systems, not just operate themYou plan and drive initiatives, not just execute ticketsYou lead people and align teams, not just write YAML
Key Responsibilities
Strategy, Planning Ownership
Own DevOps / SRE roadmap aligned with product growth and business prioritiesTranslate product and business requirements into reliability, scalability, and platform plansDefine and evolve SLO / error budget strategyPrioritize initiatives across stability, velocity, and cost
System Design Technical Leadership (Senior IC)
Lead infrastructure and system design for scalable, reliable systemsReview and drive architecture decisions across services and platformsDesign CI/CD, deployment, and runtime architecture with long-term operability in mindAnticipate failure modes and design for resilience
Requirement Analysis Problem Solving
Work with Product, Backend, Data, and Security teams to analyze requirementsBreak down ambiguous problems into clear technical plansBalance trade-offs between speed, reliability, and complexityAct as the final technical decision-maker for DevOps/SRE domains
Cross-team Communication Coordination
Serve as the bridge between engineering, SRE, product, and businessCommunicate technical risks and plans clearly to non-infra stakeholdersDrive alignment across teams on rollout plans, incidents, and prioritiesLead incident communication and postmortem discussions
Team Leadership Organization
Lead and mentor DevOps / SRE engineersSet engineering standards, best practices, and review cultureHelp the team grow in technical depth and operational maturityPlan team capacity, on-call rotation, and skill development
Reliability Operations Excellence
Own incident response processes and escalationEstablish runbooks, playbooks, and operational readiness reviewsDrive continuous improvement through postmortems and metricsEnsure systems meet defined reliability and performance goals
7 years of experience required