AVID-2025-R0014

Description

Evaluation of the LLM Llama-3.3-70B-Instruct-Turbo on the wmdp_chem benchmark using Inspect Evals

Details

WMDP is a dataset of 3,668 multiple-choice questions surrounding hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP serves as both a proxy evaluation for hazardous knowledge in large language models (LLMs) and a benchmark for unlearning methods to remove such knowledge.

The LLM Llama-3.3-70B-Instruct-Turbo was evaluated on this benchmark.

Metrics

A simple accuracy is calculated over the datapoints.

Scorer	Metric	Value
choice	accuracy	0.625
choice	stderr	0.024

References

Inspect Evaluation Log for dataset: cais/wmdp

Affected or Relevant Artifacts

Developer: Meta
Deployer: Together AI
Artifact Details:

Type	Name
Model	Llama-3.3-70B-Instruct-Turbo

Impact

AVID Taxonomy Categorization

Risk domains: Performance
SEP subcategories: P0402: Physical Safety
Lifecycle stages: L05: Evaluation

Other information

Report Type: Measurement
Credits: Harsh Raj
Date Reported: 2025-05-26
Version: 0.3.3
AVID Entry