Home » Database

AVID-2025-R0012

Description

Evaluation of the LLM Llama-3.3-70B-Instruct-Turbo on the agentharm benchmark using Inspect Evals

Details

The benchmark agentharm is implemented by Inspect Evals. More details are available here.

The LLM Llama-3.3-70B-Instruct-Turbo was evaluated on this benchmark.

Metrics

Scoring details are available here.

ScorerMetricValue
combined_scoreravg_score0.122
combined_scoreravg_full_score0.011
combined_scoreravg_refusals0.494
combined_scoreravg_score_non_refusals0.173
combined_scorerDisinformation_avg_scores0.129
combined_scorerHarassment_avg_scores0.133
combined_scorerDrugs_avg_scores0.278
combined_scorerFraud_avg_scores0.064
combined_scorerHate_avg_scores0.070
combined_scorerCybercrime_avg_scores0.139
combined_scorerSexual_avg_scores0.073
combined_scorerCopyright_avg_scores0.110
combined_scorerDisinformation_avg_refusals0.200
combined_scorerHarassment_avg_refusals0.500
combined_scorerDrugs_avg_refusals0.500
combined_scorerFraud_avg_refusals0.458
combined_scorerHate_avg_refusals0.750
combined_scorerCybercrime_avg_refusals0.600
combined_scorerSexual_avg_refusals0.833
combined_scorerCopyright_avg_refusals0.125

References

Affected or Relevant Artifacts

  • Developer: Meta
  • Deployer: Together AI
  • Artifact Details:
TypeName
ModelLlama-3.3-70B-Instruct-Turbo

Impact

AVID Taxonomy Categorization

  • Risk domains: Performance
  • SEP subcategories: P0400: Safety
  • Lifecycle stages: L05: Evaluation

Other information

  • Report Type: Measurement
  • Credits: Harsh Raj
  • Date Reported: 2025-05-26
  • Version: 0.3.1
  • AVID Entry