SoftStandard Solutions is looking for a Data Engineer to design, develop, and maintain scalable ETL/ELT pipelines for processing data. The role involves building data architectures and collaborating with teams to support analytics and AI/ML initiatives.
Responsibilities:
- Design, develop, and maintain scalable ETL/ELT pipelines for processing structured and unstructured data
- Build and optimize data architectures, data lakes, and data warehouses for large-scale analytics
- Develop data ingestion and transformation workflows using Python, SQL, Spark, and cloud technologies
- Work with big data tools such as Hadoop, Spark, Kafka, Databricks, and Snowflake for real-time and batch data processing
- Create and optimize complex SQL queries, stored procedures, and database schemas
- Implement data quality, validation, cleansing, and monitoring processes to ensure data integrity
- Collaborate with Data Scientists, Analysts, and Business teams to support reporting and AI/ML initiatives
- Develop and maintain cloud-based data solutions on AWS/Azure/GCP environments
- Automate workflows and deployments using CI/CD pipelines and version control tools
- Monitor pipeline performance, troubleshoot issues, and optimize data processing efficiency
- Ensure compliance with security, governance, and data privacy standards
- Participate in Agile/Scrum ceremonies and contribute to end-to-end SDLC activities
Requirements:
- Strong programming experience in Python, SQL, and Shell scripting
- Hands-on experience with Spark/PySpark, Kafka, Hadoop, Databricks, or Snowflake
- Experience building ETL pipelines and data integration workflows
- Knowledge of relational and NoSQL databases such as PostgreSQL, MySQL, MongoDB, Cassandra, or Oracle
- Experience with cloud platforms like AWS, Azure, or GCP
- Familiarity with Airflow, Jenkins, Git, Docker, and Kubernetes
- Strong analytical, troubleshooting, and problem-solving skills
- Experience with real-time streaming and distributed processing systems
- Knowledge of Data Warehousing concepts and dimensional modeling
- Exposure to ML/Data Science workflows is a plus
- Relevant cloud or big data certifications preferred