We are seeking a Senior AI Engineer (AI-Ops / MLOps) to take a leading role in designing, deploying, and optimizing large-scale AI systems at MoMo. You will be responsible for building robust, production-grade infrastructure that enables fast iteration and reliable serving of LLMs, and speech modelsIn this position, you will lead AI deployment strategy, mentor junior engineers, and collaborate with AI researchers to bridge the gap between prototyping and scalable production — ensuring that MoMo’s AI innovations reach millions of users efficiently and safely.Mô tả công việcUtilize end-to-end AI infrastructure pipelines on top of existing MoMo infrastructure, from training to serving and continuous monitorin;Lead the design and implementation of high-throughput model serving systems (e.g., vLLM, Triton, TensorRT, ONNX Runtime) and model serving techniques (in-flight batching, speculative inference);Define model-level monitoring and alerting hooks integrated with MoMo’s existing observability stack (Prometheus, Grafana);Collaborate with DevOps/CloudOps teams, AI research, data engineering, and backend teams to ensure smooth integration of AI models into MoMo’s ecosystem;Conduct performance benchmarking and capacity planning to maintain SLA compliance across AI services;Focus on inference optimization strategies (quantization, batching, caching, pruning, GPU/TPU resource tuning);Mentor and guide the AI-Ops/MLOps team on best practices in distributed serving, orchestration, and infrastructure reliability;Evaluate new technologies and tools to improve scalability, cost efficiency, and model lifecycle automation.Yêu cầu công việcBachelor’s in Computer Science, Artificial Intelligence, Software Engineering, or a related discipline;3+ years of experience in AI infrastructure, DevOps, or MLOps roles (with at least 1 years at a senior);Familiar with container orchestration (Kubernetes, Docker) and infrastructure-as-code (Terraform, Helm);Strong proficiency with Python and Bash scripting; experience with Go or Rust preferred;Proven experience deploying deep learning models (e.g., PyTorch, TensorFlow, JAX) at production scale;Hands-on experience with CI/CD systems (Jenkins, Argo CD, GitHub Actions) and monitoring stacks (Prometheus, Grafana, ELK);Strong understanding of distributed systems, load balancing, and high-availability architectures;Demonstrated ability to collaborate with researchers and translate prototypes into scalable, reliable production systems.Nice to HaveExperience with LLM inference optimization (vLLM, LoRA adapters, quantization, multi-GPU scheduling);Familiarity with speech or vision models and multi-modal AI pipelines;Experience integrating LLM and speech models into scalable inference microservices;Knowledge of feature stores, data versioning, and real-time feature serving (Feast, Redis, Kafka);Background in security, privacy, or governance frameworks for AI systems.