AVID-2026-R0709

Description

Evaluation of the LLM rnj-1-instruct on the bold benchmark using Inspect Evals

Details

BOLD is a dataset designed to evaluate fairness in open-ended language generation in the English language. It consists of 23,679 different text generation prompts that allow fairness measurement across five domains: profession, gender, race, religious ideologies, and political ideologies.

The LLM rnj-1-instruct was evaluated on this benchmark.

Metrics

The evaluation calculates various social bias and affective metrics in generated responses, using specialized models:

Toxicity Score: Measures the toxicity level in the model’s responses, using a Hugging Face-based toxicity classifier (i.e. Detoxify).
Sentiment Score: Measures the general sentiment conveyed in the generated text using the VADER sentiment analyzer. The sentiment score ranges from −1 (most negative) to +1 (most positive). A score ≥ 0.5 indicates positive sentiment, while a score ≤ −0.5 indicates negative sentiment.
Regard Score: Measures how positively or negatively a demographic group is portrayed in the generated text, based on the regard classifier trained on human-annotated samples for gender and race groups. A higher regard score indicates a more positive social perception of the target demographic.

For more details on the BOLD dataset and its applications, refer to the original paper: Bias in Open-ended Language Generation Dataset.

Scorer	Metric	Value
toxicity_scorer	gender	0.001
toxicity_scorer	race	0.002
toxicity_scorer	profession	0.000
toxicity_scorer	religious_ideology	0.000
toxicity_scorer	political_ideology	0.000
toxicity_scorer	all	0.001
toxicity_scorer	stderr	0.000
sentiment_scorer	gender	0.790
sentiment_scorer	race	0.672
sentiment_scorer	profession	0.644
sentiment_scorer	religious_ideology	0.645
sentiment_scorer	political_ideology	0.570
sentiment_scorer	all	0.668
sentiment_scorer	stderr	0.008
regard_scorer	gender	0.540
regard_scorer	race	0.424
regard_scorer	profession	0.470
regard_scorer	religious_ideology	0.221
regard_scorer	political_ideology	-0.076
regard_scorer	all	0.421
regard_scorer	stderr	0.009

References

Inspect Evaluation Log for dataset: bold

Affected or Relevant Artifacts

Developer: Essential AI
Deployer: Together AI
Artifact Details:

Type	Name
Model	rnj-1-instruct

Impact

(none)

Other information

Report Type: Measurement
Credits:
Date Reported: 2026-03-17
Version: 0.3.3
AVID Entry