The Home Depot is a well-known retail company, and they are seeking a Staff Machine Learning Engineer to lead a team in building and designing machine learning systems. This role involves providing technical leadership, collaborating with product teams, and ensuring the development and deployment of scalable ML solutions.
Responsibilities:
- Collaborates and pairs with other product team members (UX, engineering, and product management) to create secure, reliable, scalable machine learning solutions
- Works with Product Team to ensure user stories that are developer-ready, easy to understand, and testable
- Configures commercial off the shelf solutions to align with evolving business needs
- Creates meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactively
- Participates in learning activities around modern software design, machine learning, and development core practices (communities of practice)
- Proactively views articles, tutorials, and videos to learn about new technologies and best practices being used within other technology organizations
- Attends conferences and learns how to apply innovations and technologies where appropriate
- Researches and analyzes business trends and behavioral data to identify opportunities for improvement and new initiatives
- Leads the evaluation development and recommendation of specific technology products and platforms to provide cost-effective solutions that meet business and technology requirements
- Researches and designs best fit infrastructure, network, database, security, and machine learning architectures for products
- Proactively creates and maintains tools for monitoring and support
- Participates in project planning and management across multiple efforts
- Develops formal training courses
- Fields questions from other product teams or support teams
- Monitors tools and participates in conversations to encourage collaboration across product teams
- Provides application support for software running in production
- Proactively monitors production Service Level Objectives for products
- Proactively reviews the Performance and Capacity of all aspects of production: code, infrastructure, data, message processing, and prediction quality
Requirements:
- Must be eighteen years of age or older
- Must be legally permitted to work in the United States
- 3 years of relevant work experience
- Strong experience designing, training, evaluating, and deploying machine learning models in production environments, including batch and real-time inference systems
- Experience with ML lifecycle management, including feature engineering, model versioning, experimentation, validation, and monitoring for data drift and model performance degradation
- Experience building and operating ML pipelines using cloud-native services, data platforms, and CI/CD practices for reproducible and reliable model deployment
- Strong understanding of applied statistics, model evaluation metrics, and tradeoffs between model accuracy, interpretability, latency, and operational cost
- Experience with algorithms such as clustering, forecasting, anomaly detection, and neural networks
- Experience with basic statistics and regression algorithms
- Experience in advanced machine learning techniques such as NLP, convolutional neural networks, autoencoders, and embedding generation and utilization
- Experience in training machine learning models with extremely large datasets
- Experience with Data Analysis and Machine Learning Tools and Libraries like Jupyter Notebooks, Pandas, SciPy, Scikit-learn, Gensim, TensorFlow, PyTorch, etc. - and experience integrating them into scalable software systems
- Experience in Google Cloud Platform and AI/ML-related components such as Vertex AI, BigQueryML, and AutoML
- Experience in effective data engineering practices and big data platforms such as BigQuery, Data Store, etc
- Experience in a modern scripting language (preferably Python)
- Experience in writing SQL queries against a relational database
- Experience in version control systems (preferably Git)
- Experience in a Linux or Unix-based environment
- Experience in a CI/CD toolchain
- Experience in REST and effective web service design
- Experience in production systems design, including High Availability, Disaster Recovery, Performance, Efficiency, and Security
- Experience in cloud computing platforms and associated automation patterns, and the machine learning services they provide
- Experience in defensive coding practices and patterns for high Availability
- Experience in A/B testing and effective REST design for scalable web services architecture
- Familiarity with advanced machine learning architectures, GANs, GRU, LSTMs, RNNs, CNNs, and style transfer
- 3 - 6 years of relevant work experience
- No additional education
- No additional years of experience
- None