(Senior) Data Manager / Data Governance for AI (f/m/d)
Mbiomics GMBH
Company Summary:
mbiomics was founded in 2020 by an experienced founding team with the vision to deliver effective microbiome-based therapeutics that will revolutionize the treatment of many chronic diseases. We recently closed a Series A funding, which enables us to start the development of game-changing live-bacterial therapeutics (LBT). mbiomics is leveraging its tailored microbiome analysis platform to overcome challenges in LBT development – by providing precision profiling data for improved bacterial consortia selection, patient stratification, and patient monitoring for clinical trials.
Position Summary:
You will be responsible for establishing mbiomics’ data governance framework while also building the data infrastructure to make research and clinical data usable, compliant, and AI/ML-ready. In this role you will set standards, implement cloud-based solutions, and collaborate closely with scientists to improve the capture, organization, and accessibility of our data.
As a growing startup, this position combines research and clinical data governance leadership with hands-on implementation. You will also play a key role in preparing mbiomics for the next generation of AI/ML applications by ensuring data is standardized, annotated, and interoperable, and by enabling teams to make effective use of enterprise AI tools.
Responsibilities/Duties:
Governance & Compliance
- Define and implement data governance policies (data lifecycle, data security, data integrity, metadata, standards, access, retention, lineage, alignment to infrastructure).
- Promote FAIR data principles across all R&D teams.
- Ensure compliance with data protection regulations (e.g., GDPR, EU IA act, HIPAA when applicable).
- Create governance documentation and ensure adoption of good practices across the company.
Implementation & Infrastructure
- Build and maintain pipelines/workflows for ingesting, organizing, and validating data (GCP/BigQuery preferred).
- Automate repetitive tasks such as metadata capture, file organization, and ontology mapping.
- Put in place a data catalogue and documentation systems for discoverability and reusability.
AI Readiness & Enablement
- Ensure datasets are properly structured, annotated, and versioned for AI/ML model training.
- Collaborate with computational biology and data science teams to prepare training datasets.
- Evaluate and integrate enterprise AI tools (LLM copilots, agents, workflow assistants) to accelerate documentation, validation, and reporting.
- Help non-programmers use safe, structured workflows with AI assistants for data exploration and reporting.
Collaboration & Cross-Functional Role
- Work closely with scientists to implement structured data capture at the point of generation.
- Provide training and support to promote adoption of governance standards and AI-ready practices.
- Build lightweight dashboards or interfaces for exploration, QC, and usability.
- Serve as a bridge between research, computational biology, IT, and leadership.
- Communicate progress, risks, and needs clearly across stakeholders.
Required Skills and Competencies:
- Strong understanding of data governance frameworks, FAIR principles, and metadata standards.
- Hands-on experience with cloud-based data management (GCP preferred).
- Strong skills in Python/SQL for data wrangling and pipeline development.
- Experience preparing datasets for AI/ML model training.
- Knowlege of at least one data management framework (e.g., Amsterdam, DAMA-DMBOK)
- Comfort with enterprise AI tools (LLM copilots, agent frameworks, documentation assistants).
- Familiarity with modern ML tooling (e.g., PyTorch, HuggingFace, OpenAI API, RAG architectures) as well as traditional approaches.
- Comfortable working with non-programmers, in research settings with ambiguity and iterative feedback cycles.
- Excellent communication and training skills for working with non-programmers.
- Highly organized, detail-oriented, and comfortable balancing strategy with hands-on execution.
- Desire to pursue professional development by acquiring new skills, presenting work at internal and external venues, and acting on feedback
Preferred
- Background in life sciences, bioinformatics, or NGS data.
- Familiarity with workflow/containerization tools (Nextflow, Docker).
- Experience with clinical or regulatory data management.
- Knowledge of ontologies and controlled vocabularies in biology.
Nice to Have
- Familiarity with CDISC
Education and Experience
- College degree in related field (CS / software engineering / bioinformatics)
- At least 6-12 months industry experience
- Experience with data analysis and workflow development
Environment at mbiomics
- Experience the unique dynamics and spirit of a biotech start-up
- Our team? A colorful mix of international talents, humor and brilliant minds.
- We offer flexible working hours and 30 days of vacation.
- Benefits include our job ticket offer, free coffee, fruit and candy and regular team-events.