Domino Data Lab builds software that empowers AI-driven organizations to operate advanced data science solutions. The Staff Software Engineer will work on the Model Development Lifecycle Team to enhance machine learning capabilities and support organizations in developing and scaling AI models.
Responsibilities:
- Integrate model monitoring to provide a holistic view of deployment health and performance
- Enhance tagging capabilities across Domino entities to improve discoverability and tracking
- Expand LLM hosting capabilities to address customer needs for scale, performance, and logging
- Innovate within our Domino Apps offering by incorporating feature requests from major customers
Requirements:
- Hands-on experience developing and managing high-performance back-end systems in distributed computing environments
- Working closely with cross-functional teams to integrate systems with front-end interfaces and third-party services
- Designing and implementing secure, scalable APIs (e.g., RESTful APIs, gRPC)
- Profiling and optimizing back-end performance, especially in cloud environments or with container technologies like Docker and Kubernetes
- Using robust testing frameworks (unit, integration, end-to-end) and setting up CI/CD pipelines
- Familiarity with model registries, versioning, and lifecycle management tools like MLflow or KubeFlow
- Experience with frameworks like Apache Spark, Azure ML, or SageMaker
- Proficiency with cloud providers (AWS, Azure, GCP) and deploying services in these environments
- Expertise in languages such as Python, Java, Scala, or Go