Arva is a company focused on ecosystem modeling and measurement, and they are seeking a Data Engineer to build and scale their data and computational backbone. The role involves designing reliable data systems and maintaining production-grade data pipelines that integrate diverse datasets for biogeochemical modeling.

Responsibilities:

Design, implement, and maintain scalable data pipelines supporting ecosystem and biogeochemical modeling
Build reproducible workflows that generate standardized model inputs and manage outputs across space, time, and scenario analysis
Integrate heterogeneous datasets, including field data, management data, soil data, and weather data, into modeling pipelines
Develop and maintain cloud-based infrastructure to support modeling pipelines and optimization workflows
Implement data storage solutions using relational, spatial, and object-based databases
Support efficient data access and processing using platforms such as PostgreSQL, PostGIS, and cloud object storage
Ensure data quality, versioning, traceability, and auditability to support measurement, reporting, and verification requirements
Implement validation and monitoring processes to ensure reliability of model inputs and outputs
Support transparent, repeatable workflows suitable for regulatory and credit market review
Write clean, modular, and well-documented production code that supports maintainable and scalable data systems
Apply software engineering best practices including testing, version control, and documentation
Collaborate closely with Data Science and Technology teams to align data infrastructure with modeling, analytics, and production needs

Requirements:

3+ years demonstrated experience building and maintaining data pipelines for large, complex, and heterogeneous datasets
Strong proficiency in Python and modern data engineering tools, with experience writing production-grade, testable code
Experience working with cloud platforms, with AWS strongly preferred
Familiarity with containerization tools such as Docker and version control systems such as GitHub
Experience with relational and spatial databases, including PostgreSQL and PostGIS
Experience working with geospatial data formats and spatial data processing
Bachelor's or Master's degree or equivalent experience in Data Engineering, Computer Science, Environmental Informatics, or a related field
Experience supporting scientific or ecosystem modeling workflows
Familiarity with workflow orchestration tools such as Airflow or Prefect

Data Engineer

Key skills

About this role

Responsibilities:

Requirements: