Arva is a company focused on ecosystem modeling and measurement, and they are seeking a Data Engineer to build and scale their data and computational backbone. The role involves designing reliable data systems and maintaining production-grade data pipelines that integrate diverse datasets for biogeochemical modeling.
Responsibilities:
- Design, implement, and maintain scalable data pipelines supporting ecosystem and biogeochemical modeling
- Build reproducible workflows that generate standardized model inputs and manage outputs across space, time, and scenario analysis
- Integrate heterogeneous datasets, including field data, management data, soil data, and weather data, into modeling pipelines
- Develop and maintain cloud-based infrastructure to support modeling pipelines and optimization workflows
- Implement data storage solutions using relational, spatial, and object-based databases
- Support efficient data access and processing using platforms such as PostgreSQL, PostGIS, and cloud object storage
- Ensure data quality, versioning, traceability, and auditability to support measurement, reporting, and verification requirements
- Implement validation and monitoring processes to ensure reliability of model inputs and outputs
- Support transparent, repeatable workflows suitable for regulatory and credit market review
- Write clean, modular, and well-documented production code that supports maintainable and scalable data systems
- Apply software engineering best practices including testing, version control, and documentation
- Collaborate closely with Data Science and Technology teams to align data infrastructure with modeling, analytics, and production needs
Requirements:
- 3+ years demonstrated experience building and maintaining data pipelines for large, complex, and heterogeneous datasets
- Strong proficiency in Python and modern data engineering tools, with experience writing production-grade, testable code
- Experience working with cloud platforms, with AWS strongly preferred
- Familiarity with containerization tools such as Docker and version control systems such as GitHub
- Experience with relational and spatial databases, including PostgreSQL and PostGIS
- Experience working with geospatial data formats and spatial data processing
- Bachelor's or Master's degree or equivalent experience in Data Engineering, Computer Science, Environmental Informatics, or a related field
- Experience supporting scientific or ecosystem modeling workflows
- Familiarity with workflow orchestration tools such as Airflow or Prefect