About the Role:We are seeking an experienced Machine Learning Engineer to architect and own our end-to-end speech processing pipeline. In this role, you will bridge the gap between signal processing and generative AI, managing the flow from raw audio input to highly accurate transcription. You will be the primary owner of our ASR stack, with a specific mandate to optimize OpenAI’s Whisper for a high-throughput, low-latency cloud production environment.Key Responsibilities:1. Production ASR Architecture: Design and build scalable serving infrastructure using tools like Triton Inference Server, TorchServe, or vLLM to handle concurrent audio streams with minimal latency.2. Whisper Optimization: Push the limits of Whisper models in production. Implement distillation, INT8 quantization, and framework acceleration (e.g., Faster-Whisper, CTranslate2) to drastically reduce inference time without sacrificing accuracy.3. Advanced Noise Cancellation: Develop and integrate deep learning-based noise suppression models to pre-process audio. Your goal is to ensure high fidelity and low WER (Word Error Rate) even in complex, noisy acoustic environments.4. Model Fine-Tuning: Customize open-source ASR models (Whisper, Wav2Vec2, HuBERT) on our proprietary datasets to master domain-specific vocabulary, accents, and acoustics.5. Tech Stack Evolution: Actively evaluate and integrate the latest open-source audio models from the Hugging Face ecosystem to keep our stack on the cutting edge.
3 years of experience required
No management responsibility