Akkodis is seeking an AI/ML Engineer - Speech Data Scientist for a Contract position with a client located in Santa Clara, CA. The role involves measuring model performance, maintaining evaluation systems, and collaborating on product features while improving processes for speech data handling.
Responsibilities:
- Measure and benchmark model performance
- Maintain TTS model evaluation system
- Analyze model accuracy and bias and recommend the next course of action & Improvements
- Improve processes for speech data processing, augmentation, filtering & TTS Training sets preparation
- Gather knowhow on TTS datasets for training & evaluation
- Characterize performance and quality metrics across platforms for various speech AI components
- Collaborate with various teams on new product features and improvements of existing products
- Participate in developing and reviewing code, design documents, use case reviews, and test plan reviews
- Help innovate, identify problems, recommend solutions and perform triage in a collaborative team environment
Requirements:
- Master's degree (or equivalent experience) or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, Applied Math, Linguistics or Computational Linguistics
- 5+ years of experience
- Excellent programming skills in Python
- Strong fundamentals in Programming, optimizations and Software design
- Strong knowledge of ML/DL techniques, algorithms and tools with exposure to CNN, RNN (LSTM), Transformers
- Know how of Deep learning applications to Speech synthesis, LLM, and Speech-to-speech translations
- Hands-on experience on Speech Technologies like Speech Synthesis, voice cloning, etc
- Experience with Training of speech models
- Experience with 'PyTorch' Deep Learning Frameworks
- Exposure to basic speech digital signal processing and feature extraction techniques like FFT, MFCC, Mel Spectrogram, etc
- General background around version control and code review tools like Git, Gerrit, Gitlab
- Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environment
- Native or near-native fluency in a non-English language - Spanish / Mandarin / German / Japanese / Russian / French / UK English / Arabic / Hindi / Korean / Italian / Portuguese
- Experience developing multilingual code-switched TTS, voice cloning, and cross-lingual voice cloning
- Experience developing WFST and Neural networks-based Text-Normalization and Inverse Text-Normalization
- Experience working with G2P systems for multiple languages
- Strong personal interest in learning, researching, and creating new technologies related to foreign languages, linguistics, phonetics, phonology and language technology
- Feeling comfortable and motivated when working in a fast paced, highly collaborative, dynamic work environment
- Strong C++ programming skills
- Familiarity with GPU based technologies like CUDA, CuDNN and TensorRT
- Background with deploying machine learning models on data center, cloud, and embedded systems