Curation for Model Outcomes
InfoBay designs datasets around the model behavior a team wants to improve, including reasoning, factuality, multilingual robustness, speech understanding, and domain accuracy.
- Pre-training corpus selection
- SFT and instruction dataset design
- Evaluation and benchmark dataset creation