Om
Posted:
June 05, 2026
Location:
Spain, Spain, Spain
Job Description
Accountabilities
- Own the Training Environment data architecture end-to-end: dataset design and schema for all ML training pipelines, including dialog corpora for LLM training, conversational steps for NLU models, annotated evaluation sets, and whole-call recordings for speech-to-speech model development.
- Define and govern data selection and sampling strategy: establish criteria that determine which production conversations have the highest training value, including diversity-optimized sampling, confidence-based filtering, edge-case prioritization, and deduplication strategies.
- Build and maintain the data catalog and dataset discovery infrastructure: enable ML engineers across LLM, NLU, Speech, and Agentic teams to find, understand, and use training data without friction.
- Define annotation pipeline architecture: establish requirements for data labeling — intent annotation, entity tagging, dialog act classification, task completion scoring, and age...
Apply for this Job
Submit your application for the Senior Data Architect position at Omilia.
Apply Now Save for LaterJob Overview
Job Type:
Full time
Location:
Spain, Spain
Posted:
June 05, 2026
Deadline:
July 15, 2026