You.com is building the AI Search Infrastructure that powers modern AI systems. They are seeking a hands-on Data Engineer to help build and scale their modern data platform, developing reliable data pipelines and systems to ensure data quality and usability across the organization.
Responsibilities:
- Build and maintain scalable data pipelines (batch and streaming) using tools like Databricks, Spark, Kafka, and AWS services
- Design, develop, and optimize ETL/ELT workflows using DBT, PySpark, SQL, and tools like Fivetran
- Partner closely with marketing and growth teams to enable data use cases such as segmentation, campaign targeting, and lifecycle analytics
- Develop and maintain reverse ETL pipelines to sync data from the warehouse to tools like Salesforce, HubSpot, Braze, and other downstream systems
- Create and manage curated datasets to support analytics, reporting, and go-to-market initiatives
- Build and maintain dashboards and reporting layers to support marketing and business performance tracking
- Support AI/ML and agent-based applications by preparing and serving high-quality datasets for RAG pipelines and MCP (Model Context Protocol) integrations
- Monitor pipeline performance, troubleshoot issues, and ensure high data reliability and quality
- Implement data quality checks, validations, and alerting mechanisms across both ingestion and activation layers
- Collaborate with cross-functional teams to define data contracts and ensure consistency across systems
Requirements:
- 6+ years of experience in data engineering or a related field
- Strong hands-on experience with Databricks, AWS (S3, Glue, Athena, EMR, etc.), and Kafka
- Proficiency in Python (PySpark) and SQL for large-scale data processing
- Experience building and maintaining ETL/ELT pipelines (DBT/Airflow or similar experience preferred)
- Experience with data ingestion tools such as Fivetran (or similar)
- Familiarity with reverse ETL / data activation workflows and syncing data to tools like Salesforce, HubSpot, Braze
- Exposure to or experience with AI/ML data pipelines, including RAG architectures, vector databases, or embeddings workflows
- Familiarity with agent-based systems, MCP integrations, or LLM-powered applications is a strong plus
- Experience working with marketing, Product or growth teams on data use cases (segmentation, attribution, campaign analytics, etc.)
- Understanding of data modeling and working with large-scale datasets (batch and streaming)
- Experience creating dashboards and supporting reporting workflows (BI tools) for both internal and external audiences
- Strong problem-solving skills and ability to debug production data issues
- Strong communication skills and ability to work collaboratively across teams