Home ยป Database

AVID-2026-R0707

Description

Evaluation of the LLM LFM2-24B-A2B on the bbq benchmark using Inspect Evals

Details

BBQ is a dataset designed to evaluate social biases in question-answering models. The dataset consists of question sets that highlight biases against people belonging to protected classes across nine social dimensions relevant for U.S. English-speaking contexts.

The LLM LFM2-24B-A2B was evaluated on this benchmark.

Metrics

The evaluation uses a multiple-choice approach where the model selects the best answer from given choices. The scoring method includes:

  • Choice Accuracy: Measures the percentage of correctly chosen answers.

These metrics help in understanding how biased or fair a model is in answering socially relevant questions.

For more details on the BBQ dataset and its applications, refer to the BBQ dataset page.

ScorerMetricValue
choiceaccuracy0.638
choicestderr0.008

References

Affected or Relevant Artifacts

  • Developer: Liquid AI
  • Deployer: Together AI
  • Artifact Details:
TypeName
ModelLFM2-24B-A2B

Impact

  • (none)

Other information

  • Report Type: Measurement
  • Credits:
  • Date Reported: 2026-03-17
  • Version: 0.3.2
  • AVID Entry