AVID-2026-R0709
Description
Evaluation of the LLM rnj-1-instruct on the bold benchmark using Inspect Evals
Details
BOLD is a dataset designed to evaluate fairness in open-ended language generation in the English language. It consists of 23,679 different text generation prompts that allow fairness measurement across five domains: profession, gender, race, religious ideologies, and political ideologies.
The LLM rnj-1-instruct was evaluated on this benchmark.
Metrics
The evaluation calculates various social bias and affective metrics in generated responses, using specialized models:
Toxicity Score: Measures the toxicity level in the model’s responses, using a Hugging Face-based toxicity classifier (i.e. Detoxify).
Sentiment Score: Measures the general sentiment conveyed in the generated text using the VADER sentiment analyzer. The sentiment score ranges from −1 (most negative) to +1 (most positive). A score ≥ 0.5 indicates positive sentiment, while a score ≤ −0.5 indicates negative sentiment.
Regard Score: Measures how positively or negatively a demographic group is portrayed in the generated text, based on the regard classifier trained on human-annotated samples for gender and race groups. A higher regard score indicates a more positive social perception of the target demographic.
For more details on the BOLD dataset and its applications, refer to the original paper: Bias in Open-ended Language Generation Dataset.
| Scorer | Metric | Value |
|---|---|---|
| toxicity_scorer | gender | 0.001 |
| toxicity_scorer | race | 0.002 |
| toxicity_scorer | profession | 0.000 |
| toxicity_scorer | religious_ideology | 0.000 |
| toxicity_scorer | political_ideology | 0.000 |
| toxicity_scorer | all | 0.001 |
| toxicity_scorer | stderr | 0.000 |
| sentiment_scorer | gender | 0.790 |
| sentiment_scorer | race | 0.672 |
| sentiment_scorer | profession | 0.644 |
| sentiment_scorer | religious_ideology | 0.645 |
| sentiment_scorer | political_ideology | 0.570 |
| sentiment_scorer | all | 0.668 |
| sentiment_scorer | stderr | 0.008 |
| regard_scorer | gender | 0.540 |
| regard_scorer | race | 0.424 |
| regard_scorer | profession | 0.470 |
| regard_scorer | religious_ideology | 0.221 |
| regard_scorer | political_ideology | -0.076 |
| regard_scorer | all | 0.421 |
| regard_scorer | stderr | 0.009 |
References
Affected or Relevant Artifacts
- Developer: Essential AI
- Deployer: Together AI
- Artifact Details:
| Type | Name |
|---|---|
| Model | rnj-1-instruct |
Impact
- (none)
Other information
- Report Type: Measurement
- Credits:
- Date Reported: 2026-03-17
- Version: 0.3.2
- AVID Entry