Buzz Solutions is revolutionizing the analytics and maintenance of power grid infrastructure through advanced AI solutions. They are seeking an Applied Machine Learning Platform Engineer to join their computer vision team, focusing on building and maintaining scalable training infrastructure and data pipelines.
Responsibilities:
- Design, build, and maintain scalable training infrastructure for computer vision workloads
- Implement and manage distributed training pipelines (multi-GPU, multi-node) to support large-scale model training and hyperparameter tuning
- Build and maintain robust data pipelines for ML development
- Design database schemas and storage strategies for managing large training datasets, annotations, and model artifacts
- Implement and manage feature stores, data versioning, and experiment tracking to support reliable model iteration
- Automate existing analysis workflows
- Maintain clear documentation for platform components, data contracts, and deployment processes
- Communicate infrastructure decisions, tradeoffs, and system limitations clearly to ML engineers and stakeholders
- Conduct thorough code reviews and write integration tests for ML pipelines