Hugging Face is on a journey to democratize good AI and is seeking a Data/Infrastructure Advocate Engineer to bridge the gap between data infrastructure and the global community. This role involves promoting the Hugging Face Hub, collaborating with teams to enhance user interaction with data, and engaging with the open-source community.

Responsibilities:

Grow and nurture the open-source data/infra community: launch initiatives, collaborate with data-focused groups, and organize events or challenges. Engage with communities like Apache Parquet, Open Table Formats, and data engineering forums to promote best practices and Hugging Face tools
Promote the Hugging Face Hub as the go-to platform for data storage, versioning, and collaboration, curating and showcasing datasets, benchmarks, and tools like Xet
Highlight use cases like efficient large-dataset updates, Parquet editing, and deduplication to demonstrate the Hub's value for data workflows
Create demos, benchmarks, and tools (for example Colab notebooks) that illustrate best practices for data storage and versioning, and experiment with Xet, Parquet, and other formats
Produce high-quality tutorials, blog posts, and videos that make complex topics accessible
Share insights on storage optimization, dataset versioning, and deduplication to empower developers
Actively participate in online communities (Discord, GitHub, forums) to highlight contributions, answer questions, and foster collaboration
Make sure datasets and tools released on the Hub are well-documented, with clear examples, benchmarks, and use cases

Requirements:

3+ years in developer relations or developer advocacy, ideally for data engineering, infrastructure, or ML tools and platforms
An established public presence as a technical voice, with a track record of regularly publishing data/infra/ML content and a demonstrable, engaged audience on LinkedIn and X (Twitter)
A portfolio of developer-facing content you can point to: tutorials, blog posts, videos, demos, benchmarks, or conference talks
Hands-on experience building and engaging open-source or developer communities (Discord, GitHub, forums)
Strong Python skills
Hands-on experience with data libraries such as pandas, pyarrow, and huggingface/datasets
Practical experience with storage systems and formats: Parquet, Open Table Formats, and S3
Working knowledge of dataset versioning, deduplication, and compression
Ability to explain complex technical topics clearly through writing, demos, or talks
Fluent written and spoken English
Experience with the Hugging Face Hub and datasets ecosystem, or with Xet
Open-source maintainer or contributor experience
Familiarity with large-scale data pipelines and data engineering workflows
Experience producing notebooks (for example Colab) for tutorials and benchmarks

Data/Infrastructure Advocate Engineer - US Remote

Key skills

About this role

Responsibilities:

Requirements: