
gittech. site
for different kinds of informations and explorations.
Open Thoughts: open data curation for reasoning models

Curating the best open reasoning datasets
A collaboration led by Bespoke Labs and the DataComp community
Our first goal is to curate a reasoning dataset to train state-of-the-art small reasoning models that surpass DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-7B on math and code reasoning benchmarks.
News
- [2025/02/16] π OpenThinker on Ollama reaches 400k downloads.
- [2025/02/14] π Chat with OpenThinker in the online playground.
- [2025/02/13] π OpenThinker is now available on Ollama for easy local inference.
- [2025/02/12] π We release OpenThinker-32B, the best open-data reasoning model.
- [2025/02/02] π OpenThoughts-114k dataset is the #1 trending dataset on Hugging Face.
- [2025/01/30] π Reasoning benchmarks are added to Evalchemy and compared to publicly reported scores.
- [2025/01/28] π Open Thoughts launches with OpenThoughts-114k dataset and OpenThinker-7B model.
- [2025/01/27] π Bespoke-Stratos-17k dataset is the #2 trending dataset on Hugging Face.
- [2025/01/22] π Bespoke-Stratos-17k dataset and Bespoke-Stratos-32B model are announced.
Results
The numbers reported in the table below are evaluated with our open-source tool Evalchemy.
OpenThinker-32B vs other 32B models:
Model Name | AIME24 | AIME25 I | MATH500 | GPQA-Diamond | LCBv2 All |
---|---|---|---|---|---|
OpenThinker-32B | 66.0 | 53.3 | 90.6 | 61.6 | 68.9 |
LIMO-32B | 56.7 | 49.3 | 86.6 | 58.1 | 60.0 |
s1-32B | 36.0 | 25.3 | 84.8 | 50.5 | 40.9 |
s1.1-32B | 64.7 | 49.3 | 89.0 | 60.1 | 65.5 |
DeepSeek-R1-Distill-Qwen-32B | 76.7 | 55.9 | 89.4 | 57.6 | 71.2 |
OpenThinker-7B vs other 7B models:
AIME24 | MATH500 | GPQA-Diamond | LCBv2 Easy | LCBv2 Medium | LCBv2 Hard | LCBv2 All | |
---|---|---|---|---|---|---|---|
OpenThinker-7B | 31.3 | 83.0 | 42.4 | 75.3 | 28.6 | 6.5 | 39.9 |
Bespoke-Stratos-7B | 22.7 | 79.6 | 38.9 | 71.4 | 25.2 | 0.8 | 35.8 |
DeepSeek-R1-Distill-Qwen-7B | 60 | 88.2 | 46.9 | 79.7 | 45.1 | 14.6 | 50.1 |
gpt-4o-0513 | 8.6 | 75.8 | 46.5 | 87.4 | 42.7 | 8.9 | 50.5 |
o1-mini | 64.0 | 85.6 | 60 | 92.8 | 74.7 | 39.8 | 72.8 |
Note: The AIME24 dataset has a small sample size, resulting in high variance in evaluation accuracy. To mitigate this, we updated the code to compute the average score over five evaluation runs with different seeds. No system prompt is used, the maximum token length is set to 32,768, and temperature is 0.7.
We are fully open-source. Our model weights, datasets, data generation code, evaluation code, and training code are all publicly available.
Open Weights | Open Data | Open Code | |
---|---|---|---|
OpenThinker-7B | β | β | β |
Bespoke-Stratos-7B | β | β | β |
DeepSeek-R1-Distill-Qwen-7B | β | β | β |
gpt-4o-0513 | β | β | β |
o1-mini | β | β | β |
Installation
make install
poetry shell
Set the DeepSeek API key:
export DEEPSEEK_API_KEY=your_api_key
Set HF_ORG to your organization id. Set HF_PRIVATE=true if you want to push to a private repo.
export HF_ORG=your_org_id
export HF_PRIVATE=false
Data Generation
Currently, we are generating data for the following domains:
- Code
- Math
- Science
- Puzzle
The recipe is outlined below:
More instructions are in open_thoughts/README.md.
Training and Evaluation
Training and evaluation code coming soon.
Links
- π Open Thoughts Launch Blog Post
- π§ OpenThoughts-114k dataset
- π§ OpenThoughts-Unverified-173k dataset
- π€ OpenThinker-32B model
- π€ OpenThinker-7B model
- π Bespoke-Stratos Blog Post
- π§ Bespoke-Stratos-17k dataset
- π€ Bespoke-Stratos-32B model
- π€ Bespoke-Stratos-7B model
Citation
@misc{Open Thoughts,
author = {Open Thoughts Team},
month = jan,
title = {{Open Thoughts}},
year = {2025}
}
About Us
We are a team of researchers and engineers from Bespoke Labs, Stanford, University of California Berkeley, University of Washington, UT Austin, Juelich Supercomputing Center (JSC), LAION, UCLA, UNC Chapel Hill, UT Austin, and Toyota Research Institute united around building the best datasets (and thus the best models). See our previous works at datacomp.ai and mlfoundations.
Sponsors
Open Thoughts is supported by
- Bespoke Labs
- Lambda Labs
- NSF IFML
- UT Austin Machine Learning Lab
- Juelich Supercomputing Center
- Toyota Research Institute