Silverchair is the premier independent platform partner for scholarly and professional publishers, dedicated to expanding the reach of the world’s most valuable knowledge. The Data Engineer will build and maintain data pipelines that turn scholarly publishing activity into insights for clients, ensuring reliable data flow and supporting production data issues.
Responsibilities:
- Design, build, and maintain data pipelines that ensure reliable data flow from source systems through transformation layers to reporting
- Integrate data quality checks and validation into the pipeline workflow
- Implement error handling, logging, and retry capabilities to keep pipelines robust and recoverable
- Develop SQL and Python-based transformations that cleanse, enrich, and structure data for analytical use
- Design and implement dimensional models including fact tables and dimension tables
- Monitor and tune pipeline and query performance
- Use execution plans and profiling tools to identify bottlenecks and improve throughput and efficiency
- Troubleshoot and resolve production data issues using logs, monitoring tools, and systematic debugging
- Ensure pipelines run reliably and data is delivered on schedule
- Work closely with your scrum team and cross-functional partners across analytics, product, and engineering
- Document pipeline designs, data lineage, and business rules
- Participate in code reviews and contribute to team knowledge sharing
Requirements:
- 3-5 years of professional experience in data engineering or a closely related role
- Bachelor's degree in Computer Science, Data Science, Information Systems, or a related field, or equivalent practical experience
- Strong SQL skills including complex joins, CTEs, window functions, aggregations, views, functions, and stored procedures
- Ability to write clean, modular Python using functions and classes
- Experience designing dimensional models (star schema, fact/dimension tables)
- Hands-on experience building data pipelines with orchestration tools
- Production experience with Azure Data Factory and Azure Synapse Analytics (Dedicated SQL Pool, Serverless, Spark) is required
- Understanding of data partitioning, shuffling, and distribution strategies
- Proficient with Git for branching, merging, and pull request workflows
- Comfortable working in an Agile/Scrum environment with CI/CD practices
- Microsoft DP-700 (Fabric Data Engineer Associate) or Databricks Data Engineer Associate certification is a nice-to-have
- Hands-on experience with modern lakehouse or unified analytics platforms (e.g., Databricks, Microsoft Fabric, Snowflake)
- Familiarity with Kafka-based event streaming (we use Confluent)
- Experience with Change Data Capture (CDC), incremental ingestion strategies, and preservation of historical data
- Familiarity with BI tools such as Power BI, including an understanding of how dimensional models support semantic models and reporting
- Comfortable using AI coding tools as part of your workflow (we use Claude Code)
- Ability to work within Eastern Time Zone hours (8a-5p)