Job updated about 2 months ago
The employer was active 13 days ago

Job Description

Role Overview

We are looking for a Senior DevOps / SRE Team Lead who can own reliability and platform strategy, lead technical direction, and coordinate across engineering, product, and business teams.

This role is a hands-on senior technical leader:

  • You design systems, not just operate them
  • You plan and drive initiatives, not just execute tickets
  • You lead people and align teams, not just write YAML

Key Responsibilities

Strategy, Planning & Ownership
  • Own DevOps / SRE roadmap aligned with product growth and business priorities
  • Translate product and business requirements into reliability, scalability, and platform plans
  • Define and evolve SLO / error budget strategy
  • Prioritize initiatives across stability, velocity, and cost
System Design & Technical Leadership (Senior IC)
  • Lead infrastructure and system design for scalable, reliable systems
  • Review and drive architecture decisions across services and platforms
  • Design CI/CD, deployment, and runtime architecture with long-term operability in mind
  • Anticipate failure modes and design for resilience
Requirement Analysis & Problem Solving
  • Work with Product, Backend, Data, and Security teams to analyze requirements
  • Break down ambiguous problems into clear technical plans
  • Balance trade-offs between speed, reliability, and complexity
  • Act as the final technical decision-maker for DevOps/SRE domains
Cross-team Communication & Coordination
  • Serve as the bridge between engineering, SRE, product, and business
  • Communicate technical risks and plans clearly to non-infra stakeholders
  • Drive alignment across teams on rollout plans, incidents, and priorities
  • Lead incident communication and postmortem discussions
Team Leadership & Organization
  • Lead and mentor DevOps / SRE engineers
  • Set engineering standards, best practices, and review culture
  • Help the team grow in technical depth and operational maturity
  • Plan team capacity, on-call rotation, and skill development
Reliability & Operations Excellence
  • Own incident response processes and escalation
  • Establish runbooks, playbooks, and operational readiness reviews
  • Drive continuous improvement through postmortems and metrics
  • Ensure systems meet defined reliability and performance goals

Requirements

Required Qualifications

Experience
  • 7+ years in DevOps, SRE, or Platform Engineering
  • Experience acting as tech lead or owner for infra/reliability initiatives
  • Proven experience designing and operating production systems at scale
Senior Technical Skills
  • Strong Linux, networking, and distributed systems fundamentals
  • Solid system design skills (scalability, availability, failure handling)
  • Hands-on experience with:
    • Cloud platforms (AWS / GCP)
    • Kubernetes & container ecosystems
    • Infrastructure as Code (Terraform / ArgoCD)
    • CI/CD pipeline design
  • Deep understanding of observability (metrics, logs, tracing)
SRE & Reliability Expertise
  • Experience defining and operating with SLI / SLO / SLA
  • Error budget–driven decision making
  • Incident management and root cause analysis
  • Capacity planning and performance tuning
Leadership & Communication Skills
  • Strong ability to plan, explain, and align
  • Comfortable communicating with engineers, PMs, and business stakeholders
  • Experience mentoring engineers and leading technical discussions
  • Able to make decisions and take ownership under ambiguity

Nice to Have

  • Experience leading cross-region or multi-cloud systems
  • Security, compliance, or data privacy experience
  • Experience scaling teams or platforms from early stage to growth stage
  • Background in backend or distributed system development

What We Value

  • Ownership mindset
  • Clear thinking under pressure
  • Pragmatic decision making
  • Strong technical judgment
  • Respectful, transparent communication

Interview process

Google Meet / On-site Interview

1
7 years of experience required
1,500,000 ~ 2,500,000 TWD / year
Managing 1-5 staff
Partial Remote Work
Personal Invitation Link
This is your personal referral link for job invitation. You'll receive an email notification when someone applied for the position via your job link.
Share this job
People who applied for this job also applied for
Full-time
Mid-Senior level
5
1M ~ 2M TWD / year
Full-time
Mid-Senior level
3
900K ~ 2.2M TWD / year
Full-time
Mid-Senior level
1
1M ~ 2M TWD / year
Full-time
Entry level
1
50K ~ 90K TWD / month
Full-time
Internship
1
2M ~ 2.2M IDR / month
Full-time
Mid-Senior level
1
1.5K ~ 3K USD / month
華捷智能股份有限公司 Berry AI
Artificial Intelligence / Machine Learning
51 ~ 200 people

About us

Berry AI於 2019/2 成立,是上市企業飛捷科技投資的子公司,也是間獨立營運的新創。團隊雖小,但不惜重本打造一支夢幻隊。技術團員分別來自:Microsoft、Appier、HTC 等大型跨國科技企業與 AI Labs、Umbo CV、Gogolook、Cubo AI 等知名新創。團隊包含了深度學習、電腦影像、演算法、前後台軟體設計以及硬體設計等各領域頂尖專家 。


Berry AI 主要客戶包含全球第 2 與第 3 大漢堡連鎖業者 Burger King 和 Wendy’s,以及全球第 3 大披薩連鎖業者 Little Caesars。Berry 於 2022 年開始與多家美國客戶展開合作,試營運獲得肯定,並於 2023 年擴大合作,將產品導入更多分店。2025 年更與美國品牌 Zaxby’s 簽署全品牌導入合約,預計導入近千家門市,顯示我們產品在美國市場得到客戶高度信任,且已擁有成熟的技術能力。Berry AI將成為美國速食 AI 科技領導業者。

我們有新創的敏銳反應與發揮舞台,卻也擁有飛捷的穩定金流、硬體設計經驗和客戶關係。我們是一群自律、肯捲起袖子腳踏實地,充滿理想相信台灣未來的團隊。若你喜愛尋求挑戰,敢顛覆台灣懼怕新創的傳統思維,願和我們同甘共苦一起打造台灣下間獨角獸,歡迎聯繫我們,我們需要你這種人!

Founded in 2019, Berry AI is dedicated to leveraging computer vision and AI technologies to help quick-service restaurant (QSR) operators analyze service workflows and improve operations. Our team is made up of passionate AI and software engineers from top universities and major tech companies around the world. We are backed by Flytech Technology, a publicly listed company in Taiwan and one of the world’s top three POS manufacturers, providing us with stable financial resources, customer connections, and technical support.

Berry AI serves several of the world’s top ten fast-food chains. In 2025, we signed a full-brand rollout agreement with Zaxby’s, covering nearly 1,000 locations across the U.S., further demonstrating our solution's market trust and scalability. Both our business and team are growing rapidly.

Berry AI is based in Taiwan. Our technical team includes deep learning experts, senior industrial camera algorithm engineers, computer vision experts, mechanical control and hardware design professionals.




Company Values


■ Stay humble

No individual is greater than the team, we succeed and fail together. Practice servant leadership and never stop learning.

■ Just do it

No task is too menial. Opt to get your hands dirty. Take action to improve the company, even if it is not in your job description.

■ Confront mistakes

Be willing to admit mistakes and fix them. Have the backbone to avoid excuses, reflect on lessons-learned, share them, and keep moving forward.

■ Keep it practical

We don't have ping pong tables, we invest in computer monitors and keyboards. We don't make big statements for news headlines, we say what we can deliver.
We don't use complex industry jargon to appear sophisticated, we value simplicity.