Particle41 is seeking a talented and versatile Data Engineer to join our innovative team. As a Data Engineer, you will play a key role in designing, building, and maintaining robust data pipelines and infrastructure to support our clients' data needs.
Responsibilities:
- Design, develop, and maintain scalable ETL (Extract, Transform, Load) pipelines to process large volumes of data from diverse sources
- Build and optimize data storage solutions, such as data lakes and data warehouses, to ensure efficient data retrieval and processing
- Integrate structured and unstructured data from various internal and external systems to create a unified view for analysis
- Ensure data accuracy, consistency, and completeness through rigorous validation, cleansing, and transformation processes
- Maintain comprehensive documentation for data processes, tools, and systems while promoting best practices for efficient workflows
- Collaborate with product managers, and other stakeholders to gather requirements and translate them into technical solutions
- Participate in requirement analysis sessions to understand business needs and user requirements
- Provide technical insights and recommendations during the requirements-gathering process
- Participate in Agile development processes, including sprint planning, daily stand-ups, and sprint reviews
- Work closely with Agile teams to deliver software solutions on time and within scope
- Adapt to changing priorities and requirements in a fast-paced Agile environment
- Conduct thorough testing and debugging to ensure the reliability, security, and performance of applications
- Write unit tests and validate the functionality of developed features and individual elements
- Writing integration tests to ensure different elements within a given application function as intended and meet desired requirements
- Identify and resolve software defects, code smells, and performance bottlenecks
- Stay updated with the latest technologies and trends in full-stack development
- Propose innovative solutions to improve the performance, security, scalability, and maintainability of applications
- Continuously seek opportunities to optimize and refactor existing codebase for better efficiency
- Stay up-to-date with cloud platforms such as AWS, Azure, or Google Cloud Platform
- Collaborate effectively with cross-functional teams, including testers, and product managers
- Foster a collaborative and inclusive work environment where ideas are shared and valued
Requirements:
- Bachelor's degree in Computer Science, Engineering, or related field
- Proven experience as a Data Engineer, with a minimum of 3 years of experience
- Proficiency in Python programming language
- Experience with database technologies such as SQL (e.g., MySQL, PostgreSQL) and NoSQL (e.g., MongoDB) databases
- Strong understanding of Programming Libraries/Frameworks and technologies such as Flask, API frameworks, datawarehousing/lakehouse, principles, database and ORM, data analysis databricks, panda's, Spark, Pyspark, Machine learning, OpenCV, scikit-learn
- Utilities & Tools: logging, requests, subprocess, regex, pytest
- ELK stack, Redis, distributed task queues
- Strong understanding of data warehousing/lakehousing principles and concurrent/parallel processing concepts
- Familiarity with at least one cloud data engineering stack (Azure, AWS, or GCP) and the ability to quickly learn and adapt to new ETL/ELT tools across various cloud providers
- Familiarity with version control systems like Git and collaborative development workflows
- Competence in working on Linux OS and creating shell scripts
- Solid understanding of software engineering principles, design patterns, and best practices
- Excellent problem-solving and analytical skills, with a keen attention to detail
- Effective communication skills, both written and verbal, and the ability to collaborate in a team environment
- Adaptability and willingness to learn new technologies and tools as needed