AirflowAmazon RedshiftApacheAWSAzureBigQueryETLGoogle Cloud PlatformNoSQLPandasPySparkPythonSQLAIMachine LearningNLPNatural Language ProcessingLarge Language ModelsRAGChromaPineconeWeaviateMilvusELTSnowflakeRedshiftApache AirflowGCPGoogle CloudPerformance OptimizationCommunicationCritical ThinkingCollaboration
About this role
Role Overview
Pipeline Development: Design, build, test and maintain scalable data pipelines (batch and streaming) and ETL/ELT processes.
AI Infrastructure: Develop and maintain data pipelines focused on the Machine Learning lifecycle, integrating structured and unstructured data.
Quality and Governance: Ensure data quality, integrity and security by applying governance and data curation practices for use in predictive models and large language models (LLMs).
Performance Optimization: Monitor data flow performance and optimize complex queries to reduce costs and processing time.
Collaboration with AI Teams: Work closely with Data Scientists and Machine Learning Engineers to understand requirements and enable large-scale data consumption.
Requirements
2–4 years of proven experience working as a Data Engineer.
Strong SQL skills (modeling, optimization and processing) and Python (data manipulation with Pandas, PySpark, etc.).
Hands-on experience with cloud platforms (AWS, GCP or Azure) and Data Warehouse services (BigQuery, Redshift or Snowflake).
Practical experience structuring unstructured data (text, PDFs, images) and integrating with vector databases (such as Pinecone, Milvus, Chroma, pgvector or Weaviate) to support semantic search and RAG (Retrieval-Augmented Generation) systems.
Experience with workflow orchestrators (preferably Apache Airflow).
Familiarity with relational and NoSQL databases.
Experience with APIs and integrating diverse systems.
Familiarity with natural language processing (NLP) concepts and embeddings.
Assertive Communication: Ability to interact with business and technical teams and explain technological limitations and possibilities clearly to non-technical stakeholders.
Critical Thinking and Business Awareness: Focus on addressing root causes of structural problems and prioritizing tasks that deliver the greatest value and cost efficiency to the company.
Proactivity/Autonomy and Ownership: Take ownership of pipelines, anticipate failures, actively propose improvements and document architectural decisions.
Collaborative Spirit: Empathy for data consumers’ needs and a willingness to share knowledge with the team.
Adaptability: Resilience to handle scope changes, new data sources or technology evolution without losing focus on delivery.
Tech Stack
Airflow
Amazon Redshift
Apache
AWS
Azure
BigQuery
ETL
Google Cloud Platform
NoSQL
Pandas
PySpark
Python
SQL
Benefits
Care for your health: Medical plan, Dental plan, Telemedicine and Life Insurance.
Customizable multi-benefit program (Flash).
Rest is essential: Paid time off.
Celebrate your day: Day off on your birthday!
We offer Gympass to support a healthy routine.
Autonomy and flexibility.
Workplace exercise and Quality of Life initiatives.
Training and development program, Academia X.
Start your self-awareness journey: Profiler and behavioral mapping.