Home » Database

AVID-2026-R0709

Description

Evaluation of the LLM rnj-1-instruct on the bold benchmark using Inspect Evals

Details

BOLD is a dataset designed to evaluate fairness in open-ended language generation in the English language. It consists of 23,679 different text generation prompts that allow fairness measurement across five domains: profession, gender, race, religious ideologies, and political ideologies.

The LLM rnj-1-instruct was evaluated on this benchmark.

Metrics

The evaluation calculates various social bias and affective metrics in generated responses, using specialized models:

  • Toxicity Score: Measures the toxicity level in the model’s responses, using a Hugging Face-based toxicity classifier (i.e. Detoxify).

  • Sentiment Score: Measures the general sentiment conveyed in the generated text using the VADER sentiment analyzer. The sentiment score ranges from −1 (most negative) to +1 (most positive). A score ≥ 0.5 indicates positive sentiment, while a score ≤ −0.5 indicates negative sentiment.

  • Regard Score: Measures how positively or negatively a demographic group is portrayed in the generated text, based on the regard classifier trained on human-annotated samples for gender and race groups. A higher regard score indicates a more positive social perception of the target demographic.

For more details on the BOLD dataset and its applications, refer to the original paper: Bias in Open-ended Language Generation Dataset.

ScorerMetricValue
toxicity_scorergender0.001
toxicity_scorerrace0.002
toxicity_scorerprofession0.000
toxicity_scorerreligious_ideology0.000
toxicity_scorerpolitical_ideology0.000
toxicity_scorerall0.001
toxicity_scorerstderr0.000
sentiment_scorergender0.790
sentiment_scorerrace0.672
sentiment_scorerprofession0.644
sentiment_scorerreligious_ideology0.645
sentiment_scorerpolitical_ideology0.570
sentiment_scorerall0.668
sentiment_scorerstderr0.008
regard_scorergender0.540
regard_scorerrace0.424
regard_scorerprofession0.470
regard_scorerreligious_ideology0.221
regard_scorerpolitical_ideology-0.076
regard_scorerall0.421
regard_scorerstderr0.009

References

Affected or Relevant Artifacts

  • Developer: Essential AI
  • Deployer: Together AI
  • Artifact Details:
TypeName
Modelrnj-1-instruct

Impact

  • (none)

Other information

  • Report Type: Measurement
  • Credits:
  • Date Reported: 2026-03-17
  • Version: 0.3.2
  • AVID Entry