Design and operate sovereign data lake and warehouse architectures: schema design, data contracts, lineage tracking, freshness SLAs, governance frameworks, and multi-source ingestion pipelines.
Build end-to-end data science systems: feature engineering, model training and evaluation, inference pipeline deployment, monitoring, and feedback loops.
Develop novel statistical models, predictive algorithms, and optimization frameworks applied to operational data
labor efficiency, asset performance, energy consumption, and SLA adherence.
Identify patentable innovations in data processing architectures, model designs, and analytical methods; author invention disclosures and support patent prosecution alongside legal counsel.
Translate raw operational data from physical infrastructure (sensors, CMMS, BMS, field logs) into structured, queryable, and model-ready data products.
Establish data engineering best practices: dbt transformations, data quality tests, observability tooling, and documentation standards across the data platform.
Collaborate with AI/ML engineers on feature stores, embeddings, and model-ready data products; partner with software engineers to integrate analytical outputs into user-facing applications.
Drive data governance: access controls, PII handling, audit trails, and compliance with data sovereignty requirements.
Architect and execute the migration from fragmented, siloed operational data systems to a decentralized federated data model
implementing federated query engines (e.g., Trino, Spark, or equivalent) and data virtualization layers that enable cross-domain analytics without requiring full data centralization; design domain-oriented data products aligned with data mesh principles, preserving source-system ownership while enabling platform-wide discoverability and governed access.
Requirements
8+ years combining data science and data engineering in production environments, including at least 3 years at a senior IC level.
Deep expertise in the Python data ecosystem: pandas, NumPy, scikit-learn, PyTorch or TensorFlow, and statistical modeling libraries.
Proven experience designing and operating large-scale data warehouses or data lakes (Snowflake, BigQuery, Databricks, or equivalent).
Strong SQL and transformation tooling (dbt, Spark, or similar); experience with streaming data pipelines (Kafka, Kinesis, or equivalent).
Full-stack capability: able to take a data product from raw source through pipeline, model, API, and user-facing interface without hand-offs.
Experience with intellectual property in the data or software domain: invention disclosures, prior art research, or patent application involvement.
Strong technical writing skills
able to articulate novel methodologies clearly for patent disclosures and analytical documentation.