InfoBay AI Logo
Resources

AI training data glossary for enterprise model teams.

A concise reference for the terms buyers, model teams, and governance reviewers use when discussing training data, alignment, medical AI, speech AI, and provenance.

Definitions

Browse by AI workflow

Explore the corpus

Data Operations

Data Annotation

Data annotation is the process of adding labels, metadata, or structured judgments to raw data so machine learning models can learn from it. For AI teams, annotation quality determines whether a model learns the right signal or simply memorizes noisy patterns.

Alignment

RLHF

RLHF, or reinforcement learning from human feedback, is a model-alignment method that uses human preference judgments to teach AI systems which outputs are more helpful, accurate, safe, or appropriate. The quality of the human feedback strongly affects the quality of the aligned model.

Fine-Tuning

SFT

SFT, or supervised fine-tuning, trains a model on curated input-output examples so it learns a desired task, style, domain, or reasoning pattern. Good SFT data is explicit, consistent, and aligned to the behavior the model should show in production.

Speech AI

Dual-Channel Audio

Dual-channel audio records two speakers or sides of a conversation on separate tracks, such as an agent and a customer in a call center. This separation makes it easier to train ASR, speaker diarization, conversation analytics, and voice AI systems.

Medical AI

DICOM

DICOM is the standard format used to store and exchange medical imaging data such as CT, MRI, X-ray, and ultrasound studies. For medical AI, DICOM data is valuable when it is de-identified, paired with reports or findings, and organized by modality and clinical context.

Governance

Data Provenance

Data provenance is the documentation of where training data came from, how it was collected or structured, and what metadata supports review. In AI, provenance helps teams evaluate dataset quality, licensing, compliance, and suitability for a model’s intended use.

Looking for governance language?

The glossary explains terminology; the FAQ and compliance pages explain how provenance and documentation fit enterprise review.