SilverSearch, Inc. is a highly recognized organization seeking a Senior Machine Learning Engineer to design, build, and optimize production machine learning systems. This role focuses on developing and tuning inference pipelines for multimodal content, ensuring efficient processing of text, image, and video data.
Responsibilities:
- Designing, building, and optimizing ML-powered inference systems supporting text, image, and video workloads
- Developing scalable pipelines for embeddings, semantic search, vector retrieval, reranking, and multimodal processing
- Optimizing inference performance across transformer-based NLP and/or computer vision models, including tuning for latency, throughput, batching, concurrency, and memory efficiency
- Supporting large-scale distributed inference workloads across hybrid CPU/GPU environments and cloud infrastructure (AWS preferred)
- Building resilient asynchronous processing systems with strong observability, fault tolerance, logging, retries, caching, and performance monitoring
- Partnering with engineering and data science teams to continuously improve production model performance and deployment reliability
Requirements:
- Experience building and scaling inference pipelines in production environments
- Experience improving latency, throughput, memory utilization, and model-serving efficiency across distributed workloads
- Strong hands-on experience with technologies such as PyTorch, TensorFlow, transformer models, semantic/vector search, embeddings, retrieval systems, distributed inference, and production ML optimization
- Experience designing, building, and optimizing ML-powered inference systems supporting text, image, and video workloads
- Experience developing scalable pipelines for embeddings, semantic search, vector retrieval, reranking, and multimodal processing
- Experience optimizing inference performance across transformer-based NLP and/or computer vision models, including tuning for latency, throughput, batching, concurrency, and memory efficiency
- Experience supporting large-scale distributed inference workloads across hybrid CPU/GPU environments and cloud infrastructure (AWS preferred)
- Experience building resilient asynchronous processing systems with strong observability, fault tolerance, logging, retries, caching, and performance monitoring
- Experience partnering with engineering and data science teams to continuously improve production model performance and deployment reliability
- Experience with video processing workflows
- Experience with multimodal AI systems
- Experience with large-scale inference environments