Filmore Development is focused on revolutionizing the construction equipment industry through advanced data and AI systems. The role involves predictive modeling, causal inference, and building data pipelines to enhance decision-making for dealers in the equipment lifecycle.
Responsibilities:
- Own the models that drive the business: machine-level service & parts propensity, trade-in / replacement timing, churn and win-back, lead scoring, lifetime value, residual value forecasting, parts demand
- Build forecasts at the right grain: machine, customer, region, SKU for utilization, service demand, parts consumption, and revenue
- Distinguish what predicts from what causes
- Design and analyze experiments where we can; apply quasi-experimental and uplift methods where we can’t
- Translate the canonical model in Synapse Analytics into features that compound
- Maintain the OLTP plane in Azure Database for PostgreSQL (with pgvector) cleanly separated from the OLAP plane in Synapse; know which workloads belong where and why
- Use vision and long-context LLMs to turn unstructured documents filings, work orders, spec sheets into typed features the modeling pipeline can consume
- Build and maintain Python ingestion across public, third-party, and partner sources: government registries, filing systems, geospatial APIs, permit and contract systems
Requirements:
- Applied ML in production (5+ yrs). You've trained, deployed, and maintained models that drove business outcomes, not a thesis chapter, not a notebook that never shipped. Classification, regression, ranking, time-series, uplift you've done the boring 80% (data quality, calibration, drift, retraining cadence) and you've got the war stories
- Modeling fluency (5+ yrs). Python (scikit-learn, XGBoost / LightGBM, PyTorch when warranted), statistical reasoning that goes beyond 'the model said so,' feature engineering that respects leakage and temporal validity. Survival analysis, hierarchical models, or causal inference is a strong plus the equipment lifecycle is full of them
- Strong Python (5+ yrs). Production-grade service and pipeline code other engineers can read, extend, and trust six months later. Async, typing, packaging the boring parts done right. Modeling code is real code here, not a notebook export
- SQL & Postgres (5+ yrs). Schema design, migrations, query optimization, materialized views, index strategy. You read EXPLAIN plans without flinching. Bonus: dbt for metric & feature definitions and a modern warehouse (Synapse, Snowflake, Databricks)
- Messy real-world data. Inconsistent schemas, pagination edge cases, auth flows, dynamic JS-rendered pages, document parsing. You've debugged a scraper at 2am because a vendor changed their HTML
- LLM APIs in production (2+ yr). You've shipped real systems with Anthropic, OpenAI, or Gemini — designed extraction schemas, built agentic workflows, reasoned about cost/latency/accuracy at scale
- Modern data + agent stack familiarity. Temporal, LangChain, LangGraph, Pydantic AI, pgvector, MCP. We don't expect all of these — we expect you to learn the ones you don't
- You operate without supervision. We hand you a problem, not a ticket. You scope it, ship it, and tell us when we got the problem statement wrong
- You navigate ambiguity. The spec changes mid-week, the data is weird, and the customer feedback contradicts the design doc. You know when that's healthy startup velocity and when it's a signal something's broken
- You ship the smallest thing that proves the bet. Manual version first. Build the API only when it earns its place. Walk away from problems that don't move the dealer's P&L
- You're calibrated and bias toward action. When you don't know, you say so. When the data is wrong, you flag it. When an LLM output is suspect, you don't ship without guardrails. Then you keep moving
- You care about why this exists. Dealers run their businesses on tribal knowledge and relationships. We're building the platform layer to help them modernize without implementing. If that mission doesn't pull you forward, the rest of this won't
- You drive agentic IDEs as your primary loop. Claude Code, Cursor, or equivalent — not autocomplete, full agent sessions. You give the agent a problem, the right context, and the constraints, then review its work like a tech lead reviewing a strong junior. You know when to let it run and when to take the keyboard back
- You run agents in parallel. Multiple worktrees, multiple sessions, multiple branches in flight — one agent migrating a schema, another writing tests, another drafting docs. You've adapted your planning, review, and merge discipline to a world where throughput isn't bounded by what one human can type
- You design context, not prompts. You know an agent with the right files, schema, examples, and acceptance criteria does excellent work, and one with a clever prompt and no context does not. You write CLAUDE.md / agent specs / project rules the way you'd write a runbook — because you'll run them a hundred times
- You orchestrate agents like services. Typed I/O, structured outputs, retries, tool registries (MCP), golden-set evals, end-to-end observability. LangGraph workflows are version-controlled, tested, and instrumented like backend services. Not prompt engineering. Software
- You reason about the model layer in production. When Opus is worth the cost, when Haiku is enough, when Gemini's long context is the unlock, when an OSS model is the right call. Routing, failover, prompt caching, provider concentration risk — tradeoffs you've made for real, not in theory
- Strong plus: public/government/third-party data sources, enrichment pipelines with fallback logic, document extraction at scale