Akkodis is seeking an AI/ML Engineer - Speech Data Scientist for a Contract position with a client located in Santa Clara, CA. The role involves measuring model performance, maintaining evaluation systems, and collaborating on product features while improving processes for speech data handling.

Responsibilities:

Measure and benchmark model performance
Maintain TTS model evaluation system
Analyze model accuracy and bias and recommend the next course of action & Improvements
Improve processes for speech data processing, augmentation, filtering & TTS Training sets preparation
Gather knowhow on TTS datasets for training & evaluation
Characterize performance and quality metrics across platforms for various speech AI components
Collaborate with various teams on new product features and improvements of existing products
Participate in developing and reviewing code, design documents, use case reviews, and test plan reviews
Help innovate, identify problems, recommend solutions and perform triage in a collaborative team environment

Requirements:

Master's degree (or equivalent experience) or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, Applied Math, Linguistics or Computational Linguistics
5+ years of experience
Excellent programming skills in Python
Strong fundamentals in Programming, optimizations and Software design
Strong knowledge of ML/DL techniques, algorithms and tools with exposure to CNN, RNN (LSTM), Transformers
Know how of Deep learning applications to Speech synthesis, LLM, and Speech-to-speech translations
Hands-on experience on Speech Technologies like Speech Synthesis, voice cloning, etc
Experience with Training of speech models
Experience with 'PyTorch' Deep Learning Frameworks
Exposure to basic speech digital signal processing and feature extraction techniques like FFT, MFCC, Mel Spectrogram, etc
General background around version control and code review tools like Git, Gerrit, Gitlab
Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environment
Native or near-native fluency in a non-English language - Spanish / Mandarin / German / Japanese / Russian / French / UK English / Arabic / Hindi / Korean / Italian / Portuguese
Experience developing multilingual code-switched TTS, voice cloning, and cross-lingual voice cloning
Experience developing WFST and Neural networks-based Text-Normalization and Inverse Text-Normalization
Experience working with G2P systems for multiple languages
Strong personal interest in learning, researching, and creating new technologies related to foreign languages, linguistics, phonetics, phonology and language technology
Feeling comfortable and motivated when working in a fast paced, highly collaborative, dynamic work environment
Strong C++ programming skills
Familiarity with GPU based technologies like CUDA, CuDNN and TensorRT
Background with deploying machine learning models on data center, cloud, and embedded systems

AI/ML Engineer - Speech Data Scientist

Key skills

About this role

Responsibilities:

Requirements: