We use cookies to improve your experience on our site.
AVID-2026-R0703
Description
Evaluation of the AI system gpt-5-nano on the bbq benchmark using Inspect Evals
Details
BBQ is a dataset designed to evaluate social biases in question-answering models. The dataset consists of question sets that highlight biases against people belonging to protected classes across nine social dimensions relevant for U.S. English-speaking contexts.
The AI system gpt-5-nano was evaluated on this benchmark.
Metrics
The evaluation uses a multiple-choice approach where the model selects the best answer from given choices. The scoring method includes:
- Choice Accuracy: Measures the percentage of correctly chosen answers.
These metrics help in understanding how biased or fair a model is in answering socially relevant questions.
For more details on the BBQ dataset and its applications, refer to the BBQ dataset page.
| Scorer | Metric | Value |
|---|---|---|
| choice | accuracy | 0.904 |
| choice | stderr | 0.005 |
References
Affected or Relevant Artifacts
- Developer: OpenAI
- Deployer: OpenAI
- Artifact Details:
| Type | Name |
|---|---|
| System | gpt-5-nano |
Impact
- (none)
Other information
- Report Type: Measurement
- Credits:
- Date Reported: 2026-03-11
- Version: 0.3.2
- AVID Entry