Resume | Sankar Mukherjee

Experience

Senior Machine Learning Engineer
Omilia August 2023 – Present
- Finetuned Whisper ASR with Lora adapter, achieving 9% WER on noisy food order data.
- Developed an artificial speech detection system using wav2vec2 (1.01 EER), deployed on Nvidia Pytriton with 0.3 RTF.
- Built and deployed high-quality TTS voices in English, Greek, and Spanish (MOS 4.3, RTF 0.06), reducing synthesis costs by 67x compared to ElevenLabs.
- Led and mentored junior team members to accelerate POC to production.
Machine Learning Engineer
Oxolo June 2022 – January 2023
- Developed a voice cloning model that clones any voice from 1 minute of audio.
- Created a speech emotion recognition model detecting 15 emotions with 95% accuracy.
- Designed and built MLOps infrastructure to support AI models at scale, including CI/CD, data processing, evaluation, and monitoring.
- Skills: REST API, EC2, S3, Pydantic, PyTorch, Docker, Git.
AI Engineer
GOODIX Technology INC April 2021 – October 2021
- Developed a small memory footprint neural network for speech enhancement on mobile devices.
- Implemented model improvements and compression techniques, including pruning and quantization.
- Skills: Distillation, Quantization, Pruning, Sparsification.
Visiting Researcher
Speech Processing Group, University of Crete August 2020 – December 2020
- Researched generative modeling, disentangled speech representation, and adversarial learning under Prof. Yannis Stylianou.
- Developed a zero-shot multi-speaker, multi-style TTS (MOS 3.62, style similarity 3.41), presented at Interspeech 2021.
Speech Scientist
Defined.ai September 2019 – April 2020
- Developed and deployed audio event detection models in noisy speech.
- Built multilingual acoustic models for ASR.
- Collaborated with senior stakeholders to align MLOps goals with business priorities.

Education

PhD in Bioengineering and Robotics, 2019
Istituto Italiano di Tecnologia November 2015 – February 2019
Thesis on Neural markers of Speech Convergence during conversation. Supervised by Prof Luciano Fadiga.
MS in Speech Technology, 2014
Indian Institute of Technology Kharagpur June 2012 – May 2014
- Relevant coursework in Data structure, Algorithms and Digital Signal Processing.
- Thesis on “Bengali speech synthesis with natural prosody on mobile phone”.
B.Tech in Electronics and Communication Engineering, 2009
Jalpaiguri Govt. Engg. College May 2005 – June 2009

Skills & Hobbies

SOFTWARE & ML-OPS SKILLS

Python, Matlab, JAVA

Linux, Bash Script, TensorFlow, PyTorch, ONNX

Pydantic, scikit-learn, NumPy, SciPy, Matplotlib, Pandas

FastAPI, Streamlit, CI/CD, Docker, Triton Inference Server

VSCode, GIT, GitLab, Azure, AWS, Spark, DVC

Awards

10th Christian Benoît Award

ISCA ∙ September 2019

Won the 10th Christian Benoît Award for a research project “Neuro-behavioral aware conversational agent” in InterSpeech 2019.

Languages

100%

English

100%

Hindi

100%

Bengali

50%

Italian

25%

French

Experience

Senior Machine Learning Engineer

Machine Learning Engineer

AI Engineer

Visiting Researcher

Speech Scientist

Education

PhD in Bioengineering and Robotics, 2019

MS in Speech Technology, 2014

B.Tech in Electronics and Communication Engineering, 2009

10th Christian Benoît Award