Data Scientist / Data Analytics Engineer

United States of America

Full Time

2 hours ago

Visa Sponsor

Key skills

Predictive modelingTime-series forecastingClassificationAnomaly detectionOptimizationRegressionSurvival analysisClusteringGradient boostingDeep learningModel selectionHyperparameter tuningCross-validationModel performance evaluationModel monitoringDrift detectionModel retrainingModel explainabilitySHAPFeature importancePartial dependenceOperational analyticsKPI reportingData pipeline developmentBatch data processingStreaming data processingAWS RedshiftAWS S3AWS GlueAWS LambdaAWS Step FunctionsAWS KinesisAWS MSKAWS EMRAWS AthenaAWS SageMakerMedallion architectureStar schema modelingDimensional modelingData curationData profilingData quality enforcementData contractsData lineageCI/CDInfrastructure as codeTerraformCloudFormationSQLAdvanced window functionsPerformance tuningPythonPandasScikit-learnStatsmodelsXGBoostLightGBMPyTorchTensorFlowProduction data pipelinesOrchestrationIdempotencyStatistical methodsHypothesis testingExperimental designUncertainty quantificationStakeholder partnershipProduct mindsetAnalytical rigorMLDeep Learningscikit-learnMLOpsMLflowData EngineeringAnalyticsLookerBIPower BIRedshiftKafkaAWSLambdaS3GlueAthenaSageMakerKinesisLeadershipPrototypingProduct ManagementCustomer Success

About this role

Transflo is seeking a Data Scientist / Data Analytics Engineer to design, build, and operationalize advanced analytics solutions that enhance their transportation and logistics operations. This role involves delivering predictive and operational analytics, managing data engineering tasks, and collaborating with various stakeholders to ensure the effectiveness of analytics products.

Responsibilities:

Design, train, validate, and deploy predictive models (regression, classification, time-series forecasting, survival analysis, clustering, anomaly detection, and gradient-boosted / deep learning approaches as appropriate to the problem)
Lead model selection, hyperparameter tuning, cross-validation, and rigorous performance evaluation using metrics aligned to business objectives (precision/recall trade-offs, MAPE, RMSE, lift, calibration, etc.)
Develop data products in areas relevant to transportation, including operational metrics, fraud signals, pricing analytics, industry trends,etc
Establish model monitoring, drift detection, retraining cadence, and explainability practices (SHAP, feature importance, partial dependence) to keep production models trustworthy and operationally self sustaining
Produce point-in-time analytics, KPI scorecards, and exception reporting to support daily operational decisions across dispatch, fleet, customer success, finance, and product teams
Partner with business stakeholders to translate questions into well-scoped analyses; deliver clear, defensible insights with documented assumptions and data lineage
Build and maintain reusable analytical datasets, semantic layers, and certified metrics so the organization works from a consistent source of truth
Build and maintain data pipelines (batch and streaming) on AWS using services such as Redshift, S3, Glue, Lambda, Step Functions, Kinesis / MSK, EMR, Athena, and SageMaker
Implement medallion (bronze / silver / gold) architecture patterns to progressively refine raw operational data into analytics-ready and ML-ready datasets
Apply STARR (Star schema / dimensional) modeling and related techniques to build performant, business-friendly data models in Redshift and the broader warehouse layer
Drive data selection, curation, profiling, and quality enforcement: define source-of-truth datasets, document lineage, and codify data contracts and validation tests
Collaborate with data engineering and platform teams on CI/CD for data and ML assets, infrastructure-as-code (e.g., Terraform / CloudFormation), and cost-aware design on AWS
Take customer-facing analytics features and products from idea to implementation — partnering with product management, design, and engineering to turn ambiguous business questions into shipped capabilities embedded in customer-facing applications
Contribute to product discovery: customer interviews, opportunity sizing, prototyping, and rapid iteration on analytical concepts before committing to full build-out
Own the analytical correctness of customer-facing metrics, models, and visualizations — including definitions, edge cases, performance under real-world data conditions, and how results are explained to non-technical end users
Define and instrument success metrics for shipped analytics features (adoption, engagement, accuracy in production, customer outcomes) and drive iterative improvements post-launch
Translate complex analytical results into clear narratives, visualizations, and recommendations for both technical and non-technical audiences, including executive leadership and customers
Partner cross-functionally with product, engineering, operations, and commercial teams to embed analytics into workflows, applications, and customer-facing products
Mentor analysts and engineers on statistical rigor, modeling best practices, and modern data architecture

Requirements:

Bachelor's degree in Statistics, Mathematics, or Supply Chain Management; a degree in Computer Science is also acceptable. Master's degree preferred but not required
Demonstrated professional experience in the transportation, trucking, freight, logistics, or broader supply chain industry, with working knowledge of the underlying operational data (loads, stops, shipments, ELD/telematics, TMS, dispatch, billing, etc.)
Proven track record of taking customer-facing analytics products or features from idea through implementation and launch — including product discovery, scoping, model and metric design, partnering with product/engineering, and supporting the feature in production with real customers. Candidates should be prepared to walk through at least one concrete example end-to-end
Strong applied experience building advanced analytical models end-to-end, including problem framing, data selection and curation, feature engineering, model training and validation, and deployment
Hands-on experience with AWS PaaS / analytics tooling, including Amazon Redshift and other relevant services such as S3, Glue, Lambda, Step Functions, Athena, Kinesis, EMR, and SageMaker
Proficiency in SQL (advanced window functions, performance tuning on Redshift or comparable MPP warehouses) and at least one analytics-grade programming language — Python strongly preferred — with libraries such as pandas, scikit-learn, statsmodels, XGBoost/LightGBM, and PyTorch or TensorFlow as appropriate
Experience designing and operating production data pipelines, with a clear understanding of orchestration, idempotency, observability, and data quality
Solid grounding in statistical methods: hypothesis testing, experimental design, regression, time-series, and uncertainty quantification
Master's degree in Statistics, Mathematics, Operations Research, Supply Chain, Computer Science, or a closely related quantitative field
Experience implementing medallion architecture (bronze / silver / gold) in a cloud data lakehouse or warehouse environment
Experience designing STARR / star-schema dimensional models for analytics consumption
Experience with streaming and event-driven data (Kinesis, Kafka/MSK) for near-real-time analytics on transportation events
Experience deploying and monitoring ML models in production using SageMaker, MLflow, or equivalent MLOps tooling
Familiarity with BI / visualization tools (e.g., QuickSight, Power BI, Looker) and semantic layer / metrics layer concepts
Exposure to optimization and operations research techniques (linear / mixed-integer programming, routing, network flow) applied to transportation problems
Experience working with ELD/HOS data, telematics feeds, geospatial data, or TMS / dispatch system data, brokerage data, and general understanding of transportation backoffice operations and business processes