Experience

  1. Senior Machine Learning Engineer

    Omilia
    • Finetuned Whisper ASR with Lora adapter, achieving 9% WER on noisy food order data.
    • Developed an artificial speech detection system using wav2vec2 (1.01 EER), deployed on Nvidia Pytriton with 0.3 RTF.
    • Built and deployed high-quality TTS voices in English, Greek, and Spanish (MOS 4.3, RTF 0.06), reducing synthesis costs by 67x compared to ElevenLabs.
    • Led and mentored junior team members to accelerate POC to production.
  2. Machine Learning Engineer

    Oxolo
    • Developed a voice cloning model that clones any voice from 1 minute of audio.
    • Created a speech emotion recognition model detecting 15 emotions with 95% accuracy.
    • Designed and built MLOps infrastructure to support AI models at scale, including CI/CD, data processing, evaluation, and monitoring.
    • Skills: REST API, EC2, S3, Pydantic, PyTorch, Docker, Git.
  3. AI Engineer

    GOODIX Technology INC
    • Developed a small memory footprint neural network for speech enhancement on mobile devices.
    • Implemented model improvements and compression techniques, including pruning and quantization.
    • Skills: Distillation, Quantization, Pruning, Sparsification.
  4. Visiting Researcher

    Speech Processing Group, University of Crete
    • Researched generative modeling, disentangled speech representation, and adversarial learning under Prof. Yannis Stylianou.
    • Developed a zero-shot multi-speaker, multi-style TTS (MOS 3.62, style similarity 3.41), presented at Interspeech 2021.
  5. Speech Scientist

    Defined.ai
    • Developed and deployed audio event detection models in noisy speech.
    • Built multilingual acoustic models for ASR.
    • Collaborated with senior stakeholders to align MLOps goals with business priorities.

Education

  1. PhD in Bioengineering and Robotics, 2019

    Istituto Italiano di Tecnologia
    Thesis on Neural markers of Speech Convergence during conversation. Supervised by Prof Luciano Fadiga.
  2. MS in Speech Technology, 2014

    Indian Institute of Technology Kharagpur
    • Relevant coursework in Data structure, Algorithms and Digital Signal Processing.
    • Thesis on “Bengali speech synthesis with natural prosody on mobile phone”.
  3. B.Tech in Electronics and Communication Engineering, 2009

    Jalpaiguri Govt. Engg. College
Skills & Hobbies
SOFTWARE & ML-OPS SKILLS
Python, Matlab, JAVA
Linux, Bash Script, TensorFlow, PyTorch, ONNX
Pydantic, scikit-learn, NumPy, SciPy, Matplotlib, Pandas
FastAPI, Streamlit, CI/CD, Docker, Triton Inference Server
VSCode, GIT, GitLab, Azure, AWS, Spark, DVC
Awards
10th Christian Benoît Award
ISCA ∙ September 2019
Won the 10th Christian Benoît Award for a research project “Neuro-behavioral aware conversational agent” in InterSpeech 2019.
Languages
100%
English
100%
Hindi
100%
Bengali
50%
Italian
25%
French