We use cookies to improve your experience on our site.
AVID-2025-R0012
Description
Evaluation of the LLM Llama-3.3-70B-Instruct-Turbo on the agentharm benchmark using Inspect Evals
Details
The benchmark agentharm is implemented by Inspect Evals. More details are available here.
The LLM Llama-3.3-70B-Instruct-Turbo was evaluated on this benchmark.
Metrics
Scoring details are available here.
| Scorer | Metric | Value |
|---|---|---|
| combined_scorer | avg_score | 0.122 |
| combined_scorer | avg_full_score | 0.011 |
| combined_scorer | avg_refusals | 0.494 |
| combined_scorer | avg_score_non_refusals | 0.173 |
| combined_scorer | Disinformation_avg_scores | 0.129 |
| combined_scorer | Harassment_avg_scores | 0.133 |
| combined_scorer | Drugs_avg_scores | 0.278 |
| combined_scorer | Fraud_avg_scores | 0.064 |
| combined_scorer | Hate_avg_scores | 0.070 |
| combined_scorer | Cybercrime_avg_scores | 0.139 |
| combined_scorer | Sexual_avg_scores | 0.073 |
| combined_scorer | Copyright_avg_scores | 0.110 |
| combined_scorer | Disinformation_avg_refusals | 0.200 |
| combined_scorer | Harassment_avg_refusals | 0.500 |
| combined_scorer | Drugs_avg_refusals | 0.500 |
| combined_scorer | Fraud_avg_refusals | 0.458 |
| combined_scorer | Hate_avg_refusals | 0.750 |
| combined_scorer | Cybercrime_avg_refusals | 0.600 |
| combined_scorer | Sexual_avg_refusals | 0.833 |
| combined_scorer | Copyright_avg_refusals | 0.125 |
References
Affected or Relevant Artifacts
- Developer: Meta
- Deployer: Together AI
- Artifact Details:
| Type | Name |
|---|---|
| Model | Llama-3.3-70B-Instruct-Turbo |
Impact
AVID Taxonomy Categorization
- Risk domains: Performance
- SEP subcategories: P0400: Safety
- Lifecycle stages: L05: Evaluation
Other information
- Report Type: Measurement
- Credits: Harsh Raj
- Date Reported: 2025-05-26
- Version: 0.3.1
- AVID Entry