Crossing Hurdles is seeking a Data Engineer for AI model training. The role involves evaluating AI-generated data engineering content for accuracy and scalability, reviewing technical outputs, and supporting AI model improvements through various engineering tasks.
Responsibilities:
- Evaluate AI-generated data engineering content for technical accuracy, scalability, reliability, and production-readiness
- Review AI-generated analyses, explanations, pipeline designs, SQL queries, orchestration workflows, and implementation recommendations related to modern data engineering systems
- Challenge advanced AI systems with realistic Data Engineer prompts involving SQL optimization, Python workflows, ETL/ELT architecture, orchestration, warehouse/lakehouse design, and production data reliability
- Analyze AI-generated solutions involving data pipelines, distributed systems, batch and streaming workflows, schema design, transformation logic, observability, and analytics-ready datasets
- Identify technical inaccuracies, inefficient implementations, weak assumptions, missing constraints, scalability risks, unreliable workflows, and unsafe recommendations in AI-generated data engineering outputs
- Review and refine AI-generated prompts, responses, reference solutions, evaluation rubrics, and implementation guidance to ensure alignment with senior-level data engineering best practices
- Evaluate whether AI outputs appropriately account for data quality, schema evolution, pipeline reliability, lineage tracking, orchestration dependencies, performance optimization, and operational maintainability
- Assess AI-generated reasoning related to warehouse modeling, transformation strategies, distributed data processing, observability tooling, data contracts, and production debugging workflows
- Interpret and assess data engineering artifacts including SQL transformations, orchestration DAGs, pipeline configurations, warehouse schemas, lineage models, validation checks, and infrastructure workflows
- Compare and rank multiple AI-generated data engineering responses based on correctness, efficiency, clarity, scalability, operational reliability, and usefulness to engineering teams
- Provide structured feedback documenting reasoning gaps, unsupported assumptions, implementation flaws, scalability concerns, missing validations, and unclear technical communication
- Support benchmarking initiatives by designing, reviewing, validating, and calibrating data engineering tasks across varying levels of infrastructure complexity and operational scale
- Help improve AI communication standards for data engineering topics by ensuring outputs demonstrate systems thinking, production awareness, debugging discipline, and practical implementation guidance
- Ensure AI-generated content reflects sound engineering principles for pipeline reliability, warehouse design, orchestration patterns, schema management, and scalable data processing
- Support AI model improvement through annotation workflows, technical QA reviews, response ranking, implementation validation, and structured data engineering documentation processes
Requirements:
- Education: Bachelor s degree in Computer Science, Data Engineering, Information Systems, Statistics, Engineering, or a related technical field required; equivalent professional experience may also be considered
- Minimum 4+ years of professional experience in data engineering with significant hands-on work designing, building, and maintaining production-grade data pipelines
- Deep understanding of SQL, data modeling, ETL/ELT architecture, orchestration frameworks, warehouse/lakehouse patterns, and modern data stack technologies
- Strong experience with platforms and tools such as dbt, Airflow, Snowflake, BigQuery, Databricks, Fivetran, or comparable modern data infrastructure ecosystems
- Strong knowledge of distributed data systems, batch and streaming workflows, schema design, data validation, data observability, lineage management, and pipeline reliability engineering
- Proven experience optimizing complex SQL queries, troubleshooting data quality issues, designing scalable transformations, and supporting analytics or machine learning-ready datasets
- Demonstrated ability to translate ambiguous business or technical requirements into durable data models, reliable pipeline designs, orchestration strategies, and implementation plans
- Excellent analytical thinking and attention to detail when evaluating pipeline correctness, transformation logic, data consistency, and production feasibility
- Strong written communication skills with the ability to explain complex data engineering concepts clearly and concisely for technical and cross-functional audiences
- Ability to evaluate AI-generated technical content for implementation quality, architectural soundness, operational reliability, and engineering realism
- Reliable remote work practices, confidentiality handling, and consistency across structured data engineering review workflows required
- Experience evaluating scalability, performance optimization, fault tolerance, monitoring workflows, orchestration dependencies, and operational debugging strongly preferred
- Previous experience with AI data training, engineering annotation, technical QA, or evaluation of AI-generated technical content strongly preferred
- Familiarity with AI systems and tools such as ChatGPT, Gemini, Claude, Perplexity, or similar platforms preferred