We use cookies to improve your experience on our site.
AVID-2026-R0704
Description
Evaluation of the LLM MiniMax-M2.5 on the bbq benchmark using Inspect Evals
Details
BBQ is a dataset designed to evaluate social biases in question-answering models. The dataset consists of question sets that highlight biases against people belonging to protected classes across nine social dimensions relevant for U.S. English-speaking contexts.
The LLM MiniMax-M2.5 was evaluated on this benchmark.
Metrics
The evaluation uses a multiple-choice approach where the model selects the best answer from given choices. The scoring method includes:
- Choice Accuracy: Measures the percentage of correctly chosen answers.
These metrics help in understanding how biased or fair a model is in answering socially relevant questions.
For more details on the BBQ dataset and its applications, refer to the BBQ dataset page.
| Scorer | Metric | Value |
|---|---|---|
| choice | accuracy | 0.896 |
| choice | stderr | 0.005 |
References
Affected or Relevant Artifacts
- Developer: Minimax
- Deployer: Together AI
- Artifact Details:
| Type | Name |
|---|---|
| Model | MiniMax-M2.5 |
Impact
- (none)
Other information
- Report Type: Measurement
- Credits:
- Date Reported: 2026-03-17
- Version: 0.3.2
- AVID Entry