Design and implement scalable data warehouse and lakehouse architectures on the Cloudera platform.
Define enterprise data models, governance frameworks, security standards, and data quality practices.
Architect and optimize analytics solutions across SQL engines including Impala, Hive, and Iceberg.
Design AI-powered analytics solutions leveraging LLMs, Retrieval-Augmented Generation (RAG), vector databases (such as PostgreSQL, Qdrant, Milvus), and NLQ capabilities.
Lead the integration of AI/ML capabilities into enterprise data platforms and data pipelines.
Leverage vibe coding / AI-assisted development tools to accelerate development and improve productivity.
Build and optimize batch and near real-time data pipelines.
Collaborate with business stakeholders to translate business requirements into scalable data products and analytics solutions.
Establish best practices for performance optimization, data architecture, and AI-assisted development.
Mentor teams on modern data architecture and AI-enabled development methodologies.
Ensure data security, governance, and compliance within enterprise data platforms.
Requirements
Bachelor’s degree in Computer Science or equivalent and 5-6 years of related experience; OR Master’s degree and 3-5 years of related experience; OR PhD and 0-3 years of related experience
Deep expertise in enterprise data warehousing, lakehouse architectures, and Cloudera-based data platforms.
Strong experience with CDP, including HDFS, Hive, Impala, Kudu, and Cloudera data ingestion and processing frameworks.
Strong understanding of distributed data systems and Hadoop-based architectures.
Advanced SQL skills, including performance tuning and query optimization.
Proficiency in Python and data engineering frameworks.
Experience with dimensional and normalized data modeling.
Strong understanding of data governance, lineage, metadata management, and enterprise security.
Experience implementing AI/ML, LLM, vector database, and RAG-based solutions in production environments.
Familiarity with AI-assisted development tools (e.g., GitHub Copilot and LLM-powered workflows).
Strong communication, stakeholder management, and problem-solving skills.
Ability to align enterprise data architecture with business objectives in Finance, Sales, and Revenue Operations.
Ability to bridge traditional data platforms with modern AI capabilities.