US Tech Solutions is a global staff augmentation firm providing a wide range of talent on-demand and total workforce solutions. They are seeking an experienced Data Engineer to design, build, and optimize data pipelines and infrastructure, enabling advanced analytics and data-driven decision-making.
Responsibilities:
- Develop/maintain scalable and reliable data pipelines for industrial data (like real-time streaming, time series, IoT, sensors, MES, ERP systems data)
- Integrate data from different sources (databases, clouds or on-premises) and Engineer workflows for efficient ETL/ELT processing and data validation
- Collaborate with architects, data engineers, data scientists, analysts, and business stakeholders to define and deliver solutions
- Build and maintain data infrastructure in compliance with data governance and security best practices
Requirements:
- Bachelor's degree in computer science or related fields with 3-5 years' experience as a Data Engineer
- Strong experience in building, maintaining, and optimizing ETL/ELT Cloud-agnostics data pipelines using Python, Pandas, PySpark and orchestrating workflows like Apache Airflow and Kedro framework
- Advanced SQL/ KQL query development and optimization across Oracle, MSSQL, and MySQL databases (hosted on-premises or via PaaS offerings)
- Strong understanding of cloud agnostic data engineering patterns, including batch vs. streaming ingestion, schema evolution, data partitioning, and cost optimized storage design
- Experience working with cloud object storage across providers (e.g., ADLS, S3, GCS) and designing reliable, scalable data lake or Lakehouse solutions
- Developing and consuming RESTful API (Fast API)s for data services and integration
- Proficiency in Linux shell scripting for automation
- Experience with DevOps practices, including CI/CD for data pipelines and use of tools such as Git, Docker and deployment
- Strong troubleshooting, process automation, and root-cause analysis skills
- Data Ingestion Pipeline: Python, PySpark, Airflow, Kedro, Linux shell scripting
- API Development: Flask, Fast API, RESTful design
- Data Storage & Querying: SQL (Oracle, MSSQL, MySQL), KQL
- Cloud Integration: Multi-cloud platforms (OCI, Clienture, GCP); cross-cloud data sharing/integration using portable Spark platforms (e.g., Databricks)
- Platform: Databricks, C3.AI
- Real-Time Data Streaming: Kafka, Clienture Event Hub, EMQX
- Collaboration: Wiki, Clienture DevOps Boards, MS Office 365