Sankar
Open Menu
Close Menu
About
Resume
Projects
Blog
Experience
Senior Machine Learning Engineer
Omilia
August 2023 – Present
Finetuned Whisper ASR with Lora adapter, achieving 9% WER on noisy food order data.
Developed an artificial speech detection system using wav2vec2 (1.01 EER), deployed on Nvidia Pytriton with 0.3 RTF.
Built and deployed high-quality TTS voices in English, Greek, and Spanish (MOS 4.3, RTF 0.06), reducing synthesis costs by 67x compared to ElevenLabs.
Led and mentored junior team members to accelerate POC to production.
Machine Learning Engineer
Oxolo
June 2022 – January 2023
Developed a voice cloning model that clones any voice from 1 minute of audio.
Created a speech emotion recognition model detecting 15 emotions with 95% accuracy.
Designed and built MLOps infrastructure to support AI models at scale, including CI/CD, data processing, evaluation, and monitoring.
Skills: REST API, EC2, S3, Pydantic, PyTorch, Docker, Git.
AI Engineer
GOODIX Technology INC
April 2021 – October 2021
Developed a small memory footprint neural network for speech enhancement on mobile devices.
Implemented model improvements and compression techniques, including pruning and quantization.
Skills: Distillation, Quantization, Pruning, Sparsification.
Visiting Researcher
Speech Processing Group, University of Crete
August 2020 – December 2020
Researched generative modeling, disentangled speech representation, and adversarial learning under
Prof. Yannis Stylianou
.
Developed a zero-shot multi-speaker, multi-style TTS (MOS 3.62, style similarity 3.41), presented at Interspeech 2021.
Speech Scientist
Defined.ai
September 2019 – April 2020
Developed and deployed audio event detection models in noisy speech.
Built multilingual acoustic models for ASR.
Collaborated with senior stakeholders to align MLOps goals with business priorities.
Education
PhD in Bioengineering and Robotics, 2019
Istituto Italiano di Tecnologia
November 2015 – February 2019
Thesis on Neural markers of Speech Convergence during conversation. Supervised by
Prof Luciano Fadiga
.
MS in Speech Technology, 2014
Indian Institute of Technology Kharagpur
June 2012 – May 2014
Relevant coursework in Data structure, Algorithms and Digital Signal Processing.
Thesis on “Bengali speech synthesis with natural prosody on mobile phone”.
B.Tech in Electronics and Communication Engineering, 2009
Jalpaiguri Govt. Engg. College
May 2005 – June 2009
Skills & Hobbies
SOFTWARE & ML-OPS SKILLS
Python, Matlab, JAVA
Linux, Bash Script, TensorFlow, PyTorch, ONNX
Pydantic, scikit-learn, NumPy, SciPy, Matplotlib, Pandas
FastAPI, Streamlit, CI/CD, Docker, Triton Inference Server
VSCode, GIT, GitLab, Azure, AWS, Spark, DVC
Awards
10th Christian Benoît Award
ISCA ∙ September 2019
Won the 10th Christian Benoît Award for a research project “Neuro-behavioral aware conversational agent” in InterSpeech 2019.
Languages
100%
English
100%
Hindi
100%
Bengali
50%
Italian
25%
French