Cake Job Search

Advanced filters
Off
為什麼大家喜歡在 RichWell Co.Ltd. 上班? 1.彈性上班-早上不趕打卡,想多睡一點、避開通勤人潮都OK。2.特休多多-不用等滿一年就能休假,我們比法規更大方,放假就是要爽爽的。3.獎金福利讚 年終、績效獎金該有的都有,努力絕對不白費。4.生日小驚喜,公司記得你的每個重要時刻。5.定期聚餐/Team Building 不只是工作夥伴,更是一起成長的戰友,吃吃喝喝感情更緊密。6.技術課、內部分享會,想學什麼我們都支持,讓你持續進化不退化! About the roleWe are building a reliability-first platform. Over the next 12 months, we will stabilize our Windows-based services, strengthen observability, and progressively containerize into Kubernetes. You will be a key contributor driving self-service operations and data-driven reliability across the stack. What you’ll do• Operational automation: Build self-service runbooks for Windows services (AWX/Rundeck), implement Ansible/PowerShell DSC workflows, health checks, and safe rollbacks implementations.• Observability: Standardize metrics/logs/traces (Prometheus/Grafana, windows_exporter, OpenTelemetry; ELK/Loki). Create golden-signal dashboards and actionable alerts.• Reliability engineering: Participates in on-call, handle incidents and post-incident reviews (PIR), and lead game days to institutionalize SOPs.• Resilience: Design and implement backup disaster recovery, capacity planning, and performance tuning.• Long-term: Drive service containerization and Kubernetes adoption (Helm/Kustomize, Argo CD/Flux, ConfigMap/Secrets) with a strong focus on security and compliance.
Windows Server
Site Reliability Engineer
Prometheus/Grafana
1.6M ~ 2.2M TWD / year
4 years of experience required
No management responsibility
- 職務說明 因應未來業務拓展、規劃,嘗試各種資訊解決方案設計、建置與維護 GCP 雲端基礎設施 (GKE, GCE, Cloud SQL 等),並專注於其可靠性、擴展性根據利害關係人或同事提出的問題,提供專業的意見,並協助解決問題與公司內部決策者合作,了解其目標,研究並提供解決方案管理 CDN、DNS 等網路基礎服務,確保外部連線的穩定與快速監控、告警與日誌系統的建置與優化CI / CD 設定與維護新技術研究簡單的電腦設備管理
50K ~ 80K TWD / month
1 years of experience required
No management responsibility
Established in 1987 and headquartered in Taiwan, TSMC pioneered the pure-play foundry business model with an exclusive focus on manufacturing its customers’ products. As of 2024, TSMC serves more than 500 customers and manufactures over 11,000 products for high-performance computing, smartphones, the Internet of Things (IoT), automotive, and digital consumer electronics. It is the world’s largest provider of logic ICs, with an annual capacity of 16 million 12-inch equivalent wafers. TSMC operates fabs in Taiwan as well as manufacturing subsidiaries in Washington State, Japan and China, and the Company began construction on a specialty technology fab in Dresden, Germany, in 2024. In Arizona, TSMC is building three fabs, with the first starting 4nm production in 2025, the second by 2028, and the third by the end of the decade. Manage and lead the design, implementation, and maintenance of AI infrastructure systems for reliable operations of VNAP's AI prediction services and training environments Co-work with IT/CIM infra teams, which host CPU/GPU application servers and database services such as VM/K8S, Kafka, MongoDB, Oracle middleware for VNAP, to ensure high availability and reliability through well-established monitor metrics and alarms.Design and implement infra-as-code tools like Ansible and Terraform to establish auto-recovery mechanisms to minimize tool idle/hold lot impacts caused by system issues.Develop and maintain applications using C#/Delphi/Python on top of those infrastructure systems.Work location : Hsinchu or TaoyuanHiring Organization: IMC
TGC Europe
40K+ TWD / month
3 years of experience required
No management responsibility
【Capsule】At FunNow, we’re building joyful experiences, at the speed of now. As a Site Reliability Engineer, you’ll play a crucial role in ensuring our platform stays fast, resilient, and secure for millions of users booking spontaneous fun across Asia. But here’s the twist: we don’t just monitor uptime — we build with AI and automation. From Kubernetes tuning to auto-healing infrastructure, CI/CD pipelines to incident response, you'll be hands-on in evolving our DevOps culture. If you love scalable systems, believe in developer efficiency, and treat infrastructure as code, welcome aboard.【Typical Accountability】1. Design robust architectures to comprehensively improve system availability, scalability, and service quality2. Ensure stable service operation, monitor core service status, and quickly troubleshoot issues3. Conduct in-depth analysis of system performance bottlenecks and propose and implement improvement solutions4. Maintain and optimize Kubernetes clusters (EKS/GKE), effectively handling resource pressure, node anomalies, and other situations5. Maintain and improve CI/CD pipelines and automated deployment systems (GitHub Actions / ArgoCD) to significantly enhance engineering team development efficiency6. Establish and continuously optimize system monitoring and alerting mechanisms (Prometheus / Grafana / Alertmanager)7. Assist with incident response and problem investigation8. Regularly participate in system inspections and audits, proactively proposing and implementing improvements9. Assist in maintaining and implementing fundamental security settings (e.g., IAM, resource permissions, encrypted storage)10. Actively share your experience to collectively enhance the team's engineering culture
Negotiable
2 years of experience required
No management responsibility
Established in 1987 and headquartered in Taiwan, TSMC pioneered the pure-play foundry business model with an exclusive focus on manufacturing its customers’ products. As of 2024, TSMC serves more than 500 customers and manufactures over 11,000 products for high-performance computing, smartphones, the Internet of Things (IoT), automotive, and digital consumer electronics. It is the world’s largest provider of logic ICs, with an annual capacity of 16 million 12-inch equivalent wafers. TSMC operates fabs in Taiwan as well as manufacturing subsidiaries in Washington State, Japan and China, and the Company began construction on a specialty technology fab in Dresden, Germany, in 2024. In Arizona, TSMC is building three fabs, with the first starting 4nm production in 2025, the second by 2028, and the third by the end of the decade. The Infrastructure and Platform Engineering Department (IPED) in TSMC's Intelligent Manufacturing Center (IMC) is dedicated to enhancing system stability and developer productivity. As TSMC's production scale continues to expand, the core focus is on smart manufacturing through deep system integration to maintain high yield and efficiency. Within this team, you will assist in implementing new cloud architectures (K8S) and other related cloud-native technologies to achieve the goals of DevOps. General job responsibilities for this job position are: Develop and maintain cloud-native related systems based on the K8S architecture.Tune system configurations based on metrics and user feedback.Work closely with internal development teams and seek new tools that can enhance productivity.Conduct POC (proof of concept) and assist internal development teams with implementation.
Negotiable
No requirement for relevant working experience
Managing staff numbers: not specified
WorldQuant develops and deploys systematic financial strategies across a broad range of asset classes and global markets. We seek to produce high-quality predictive signals (alphas) through our proprietary research platform to employ financial strategies focused on market inefficiencies. Our teams work collaboratively to drive the production of alphas and financial strategies – the foundation of a balanced, global investment platform. WorldQuant is built on a culture that pairs academic sensibility with accountability for results. Employees are encouraged to think openly about problems, balancing intellectualism and practicality. Excellent ideas come from anyone, anywhere. Employees are encouraged to challenge conventional thinking and possess an attitude of continuous improvement. Our goal is to hire the best and the brightest. We value intellectual horsepower first and foremost, and people who demonstrate an outstanding talent. There is no roadmap to future success, so we need people who can help us build it.Technologists at WorldQuant research, design, code, test and deploy firmwide platforms and tooling while working collaboratively with researchers. Our environment is relaxed yet intellectually driven. We seek people who think in code and are motivated by being around like-minded people. The Role: We're seeking a Senior Site Reliability Engineer to join the team. You will build and operate the infrastructure and tooling behind WorldQuant's data ingestion pipelines — systems that onboard, validate, and deliver large-scale datasets to the firm's research platform.This is a 70% build / 30% operate role. You'll spend most of your time engineering automation, observability, and developer tooling, while also participating in on-call rotations and incident response for production data pipelines. You'll partner with engineering, analyst, and research teams to ensure reliability at scale — this requires excellent analytical skills, clear communication, and the ability to collaborate across teams. What You'll Do: Build (70%): Design and develop automation, monitoring, CI/CD, and reliability features for the data onboarding pipeline Develop and maintain internal infrastructure and services that reduce toil and improve pipeline reliability Build observability solutions — dashboards, alerting, log aggregation — using Grafana, the ELK stack, and Vector Design and implement CI/CD pipelines, test automation, and release management workflowsWrite infrastructure-as-code for provisioning, scaling, and managing platform components: Kubernetes, bare metal hosts Integrate and extend tools such as Redis, Celery, MySQL Operate (30%): Keep production data pipelines healthy and respond to incidents Participate in on-call rotation, respond to production incidents, and drive post-mortems Define and track SLOs/SLIs for pipeline reliability, latency, and data freshness Diagnose platform performance and reliability issues, driving them to root cause Create and maintain runbooks for common operational scenarios Plan capacity and optimize resource utilization What You'll Bring 8+ years of experience in SRE, DevOps, or platform engineering roles Linux expertise: Power user proficiency in Linux with ability to manage infrastructure, deploy services, and troubleshoot production systems Python proficiency: Strong scripting and automation skills; experience building CLI tools, API clients, monitoring integrations, and operational tooling in Python Kubernetes containers: Deep hands-on experience with Kubernetes — deploying, scaling, debugging, and managing production workloads. Familiarity with Helm, resource management. Solid experience with Docker Observability: Hands-on experience with monitoring stacks — Grafana, Prometheus, ELK (Elasticsearch, Logstash, Kibana), or similar. Experience designing dashboards, alerts, and SLO-based reliability tracking CI/CD infrastructure-as-code: Experience designing and maintaining CI/CD pipelines (GitLab CI, or similar), including test automation and release management. Familiarity with Ansible or similar IaC tools Databases: Working knowledge of relational databases (MySQL/PostgreSQL), query tuning, and operational database management Message queues streaming: Experience with Kafka, Redis pub/sub, or Celery for event-driven architectures Networking APIs: Understanding of network fundamentals, DNS, load balancing, and REST/gRPC APIs Incident management: Experience with on-call rotations, incident response, post-mortems, and runbook-driven operations Leadership management: Proven track record of leading a team — mentoring engineers, driving technical roadmaps, coordinating cross-team initiatives, and managing priorities. Comfortable owning team delivery and representing the team to stakeholders AI-agent readiness: Openness to working alongside AI coding agents and LLM-powered tools as part of the development and operations workflow — treating AI as a force multiplier for automation, incident analysis, and toil reduction Nice to Have: Cloud platforms: Exposure to GCP or AWS for compute, storage, and managed services Data tools: Familiarity with Apache Arrow, gRPC, or columnar data formats Big data platforms: Familiarity with Hadoop or Apache Spark for large-scale data processing Programming languages: C/C++, Golang, Scala, JavaScript Financial services or data-intensive industry background SRE culture: Familiarity with Google's SRE book principles — error budgets, toil tracking, blameless post-mortems What We Offer: Competitive and attractive compensation package with clear career road-map – where you feel challenged everyday We offer a strong culture of learning and development: training courses, library, speakers, share and learn events Learn from who sits next to you! Working in WQ you are surrounded by smart and talented people Premium Health Insurance and Employee Assistance Program Generous time-off policy, re-creation sabbatical leave (based on tenure), Trade Union benefits for staff and family Team building activities every month: Local engagement events, Employee clubs: football, ping-pong, badminton, yoga, running, PS5, movies, etc. Annual company trip and occasional global conferences – opportunity to travel and connect with our global teams Happy-hour with tea break, snacks and meals every day in the office! #LI-QM1 By submitting this application, you acknowledge and consent to terms of the WorldQuant Privacy Policy. The privacy policy offers an explanation of how and why your data will be collected, how it will be used and disclosed, how it will be retained and secured, and what legal rights are associated with that data (including the rights of access, correction, and deletion). The policy also describes legal and contractual limitations on these rights. The specific rights and obligations of individuals living and working in different areas may vary by jurisdiction. Copyright © 2025 WorldQuant, LLC. All Rights Reserved.WorldQuant is an equal opportunity employer and does not discriminate in hiring on the basis of race, color, creed, religion, sex, sexual orientation or preference, age, marital status, citizenship, national origin, disability, military status, genetic predisposition or carrier status, or any other protected characteristic as established by applicable law.
Negotiable
No requirement for relevant working experience
Minimum qualifications: Bachelor’s degree in Computer Science, a related field, or equivalent practical experience. 3 years of experience with software development in one or more programming languages. Experience in one or more of the following: C, C++, Java, Python or Go. Preferred qualifications: Master's degree in Computer Science or Engineering, or a related field. Experience in analyzing and troubleshooting large-scale distributed systems, cloud computing, and large databases. Knowledge of database internals and Google infrastructure. About the jobSite Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.To learn more: check out our books on Site Reliability Engineering or read a career profile about why a Software Engineer chose to join SRE. Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.Responsibilities Lead the design, implementation, and testing of reliability-focused improvements to systems and processes. Identify and carry out improvements to automation, monitoring/alerting, and infrastructure. Create, influence and review ongoing design, architecture, standards and methods for services and systems. Write postmortems and lead incident analysis, with a focus on broad patterns and potential fixes. Triage, mitigate, and resolve common incidents, and coordinate incident response for complex ones. Work effectively with other Site Reliability Engineers (SREs), developers, and cross-functional teams. Assist in training new team members on operational procedures and best practices. Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.
Negotiable
No requirement for relevant working experience
At Google, we have a vision of empowerment and equitable opportunity for all Aboriginal and Torres Strait Islander peoples and commit to building reconciliation through Google’s technology, platforms and people and we welcome Indigenous applicants. Please see our Reconciliation Action Plan for more information.Minimum qualifications: Bachelor’s degree in Computer Science, a related field, or equivalent practical experience. 2 years of experience with software development in one or more programming languages. Preferred qualifications: Master's degree in Computer Science or Engineering. 2 years of experience with designing, analyzing, and troubleshooting large-scale distributed systems. About the jobSite Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. Google Play offers music, movies, books, apps and games for devices, powered by the cloud. It syncs across devices and on the web. As part of the Android and Mobile team, Googlers working on Google Play do everything from engineering our backend systems, to shaping product strategy, to forming great content partnerships. They make it possible for people to do things like buy an ebook or song on their Android phone, then have it instantly available on their laptop. The Google Play team enhances the Android ecosystem by giving developers and partners a premium store where they can reach millions of users.Responsibilities Write product or system development code. Review code developed by other engineers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency). Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback. Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality. Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies. Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.
Negotiable
No requirement for relevant working experience
At Google, we have a vision of empowerment and equitable opportunity for all Aboriginal and Torres Strait Islander peoples and commit to building reconciliation through Google’s technology, platforms and people and we welcome Indigenous applicants. Please see our Reconciliation Action Plan for more information.Minimum qualifications: Bachelor’s degree in Computer Science, a related field, or equivalent practical experience. 8 years of experience with software development in one or more programming languages. 3 years of experience with managing people or teams. 3 years of experience with leading projects. 3 years of experience in designing, analyzing, and troubleshooting distributed systems. Preferred qualifications: Master's degree in Computer Science or Engineering. Experience in problem solving and analyzing distributed systems. Experience with mobile development, application deployment. Experience with algorithms, data structures, analysis and software design or in Unix/Linux systems, Internet Protocol (IP) networking, performance and application issues. Ability to perform technical analysis across code, networking, operating systems, and storage, while maintaining the cognitive and verbal agility to manage discussions with executive leadership. Ability to manage the strategy while providing technical guidance to the team, enabling them to execute and deliver products on time and within budget. About the jobSite Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.To learn more: check out our books on Site Reliability Engineering or read a career profile about why a Software Engineer chose to join SRE. Android Site Reliability Engineering (SRE) manages the mission-critical infrastructure powering the global Android ecosystem of devices. The mission is to bridge the reliability gap between mobile and web platforms, building user trust through availability. We empower Product and Development teams to scale securely and seamlessly.Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.Responsibilities Lead software and systems engineers through planning, technical execution, and quality delivery. Grow engineering talent through mentoring and coaching strategies. Manage end-to-end availability and performance for mission-critical services while building automation to prevent recurrence. Collaborate with distributed partner teams and manage international on-call rotations. Align with Product and Developer teams to define and deliver Service Level Objectives (SLOs) that ensure reliability. Drive projects by leveraging existing frameworks and providing leadership in changing environments. Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.
Negotiable
No requirement for relevant working experience
工作內容:1.基礎設施管理:負責 Windows 及 Linux 系統 IT 基礎設施的實施、優化與日常維護。2.雲端維運支援:管理雲端平台系統,確保各項服務達到服務水準目標 (SLO) 與可用性要求。3.備援與災難復原:執行核心系統的資料備份與復原策略,規劃並落實系統災難復原(DR)及業務連續性計畫(BCP)。4.資安合規響應:協同資訊安全團隊,針對雲端平台潛在威脅執行安全修復與系統加固。5.維運自動化:推動雲端營運自動化,開發並維護自動化部署與配置管理流程,減少人工重複作業。6.監控與預警:執行系統健康檢查、容量監控及效能調優,並依據標準作業程序(SOP)進行問題排除與應急上報。7.技術文件撰寫:建立與維護 DevOps/SRE 基礎設施的技術架構圖、標準手冊及異動記錄。8.環境生命週期維護:支援生產(Production)、預覽(Staging)及開發(Dev)環境的穩定運行。9.線上值班機制 (On-call):配合專案需求與系統穩定性,執行輪值待命 (On-call) 任務,並於週末或公眾假期處理突發性緊急技術支援請求。上班時段日班/彈性On Call休假制度依公司規定
40K ~ 70K TWD / month
3 years of experience required
No management responsibility

Cake Job Search

Join Cake now! Search tens of thousands of job listings to find your perfect job.