Alpaca is a US-headquartered self-clearing broker-dealer and brokerage infrastructure for various financial services. They are seeking a Senior Data Engineer to design and develop the data management layer for their platform, ensuring scalability and effective data handling for their growing customer base.
Responsibilities:
- Design and oversee key forward- and reverse-ETL patterns to deliver data to relevant stakeholders
- Develop scalable patterns in the transformation layer to ensure repeatable integrations with BI tools across various business verticals
- Expand and maintain the Alpaca Data Lakehouse architecture's constantly evolving elements
- Collaborate closely with sales, marketing, product, and operations teams to address key data flow needs
- Operate the system and manage production issues in a timely manner
Requirements:
- 7+ years of experience in data engineering, including 2+ years of building scalable, low-latency data platforms capable of handling >100M events/day
- Proficiency in at least one programming language, with strong working knowledge of Python and SQL
- Experience with cloud-native technologies like Docker, Kubernetes, and Helm
- Strong hands-on experience with relational database systems and object storage implementations like Apache Iceberg
- Strong hands-on experience with Google Cloud Platform and its various data-related services (Composer, Dataproc, Datastream, etc.)
- Experience in building scalable transformation layers, preferably through formalized SQL models (e.g., dbt)
- Ability to work in a fast-paced environment and adapt solutions to changing business needs
- Experience with ETL orchestrators / frameworks like Apache Airflow and Airbyte
- Production experience with streaming systems like Kafka
- Exposure to infrastructure, DevOps, and Infrastructure as Code (IaaC), like Terraform
- Deep knowledge of distributed systems, storage, transactions, and query processing utilizing open-source distributed query engines like Trino (formerly PrestoSQL)