Mapbox is the leading real-time location platform for a new generation of location-aware businesses. The Search Data team is looking for an Engineer to work with diverse datasets and build systems that support the ingestion of terabytes of data, while also mentoring other engineers and promoting a culture of operational excellence.
Responsibilities:
- Work with specialized telemetry and geospatial data sets including addresses, road networks, buildings, and points of interest (POIs)
- Build and support our batch and streaming ingestion systems that ingest terabytes of data per day
- Interface with engineers from other teams to understand their needs for geospatial data and provide solutions
- Simplify and strengthen Mapbox’s processes and tools for designing, deploying, and monitoring data processing and querying workloads on AWS
- Document your work and decision-making processes, and lead presentations and discussions in a way that is easy for others to understand
- Mentor other software engineers to develop all aspects of their engineering skill sets, including participating in design and code reviews
- Promote a culture of operational excellence by meticulously testing and monitoring our systems and code, writing documentation, and being on-call to support the health of our services
- Reduce technical debt, share your knowledge, and invest in your teammates’ health and happiness, while optimizing application performance and accelerating feature velocity
- Uphold a culture of collaboration, transparency, creativity, inclusion, and data-driven decisions
Requirements:
- 5+ years of experience building scalable backend systems and data pipelines
- Hands-on experience with AWS technologies like Lambda, S3, Athena, Glue, and EMR
- Strong proficiency in SQL and Python
- Proficiency in at least one modern programming language (NodeJS, Scala, or Java) suitable for backend services and data processing
- Demonstrated history of designing batch and real-time data processing systems and developed judgment to implement new data pipelines and best practices around it
- Familiarity working with Apache Spark or other Hadoop based technologies
- Familiarity with CI/CD processes
- Experience with introducing quality and operational metrics into a data ETL pipeline
- Integrating data with APIs and querying data through APIs
- Experience with AI tools in the software development lifecycle
- Experience with geospatial data analysis and processing
- Experience with Docker
- Experience with machine learning infrastructure