The database component of AVID houses full-fidelity information (model metadata, harm metrics, measurements, benchmarks, and mitigation techniques if any) on evaluation examples of harm (sub)categories defined by the taxonomy. The aim here is transparent and reproducible evaluations. It

  • Is expandable to account for novel and hitherto unknown vulnerabilities
  • enables AI developers can freely share evaluation use cases for the benefit of the community
  • Is composed of evaluations submitted in a schematized manner, then vetted and curated.

AVID stores instantiations of AI risks—categorized using the AVID taxonomy—using two base data classes: Vulnerability and Report. A vulnerability (vuln) is a high-level evidence of an AI failure mode, in line with the NIST CVEs. A report is one example of a particular vulnerability occurring, supported by qualitative or quantitative evaluation.

Information about either is schematized and stored in AVID. To learn more about the motivations and technical details of vulns and reports, you can read this document, or refer to their respective schema in AVID.


Vulnerabilities are linked to the taxonomy through multiple tags, denoting the AI risk domains (Security, Ethics, Performance) this vulnerability pertains to, (sub)categories under that domain, as well as AI lifecycle stages. A vulnerability in AVID can pertain to one or more of the three levels: dataset, model, or system.

List of Vulnerabilities



Reports are occurrences of a vulnerability. Based on the references provided in a specific report, reports can potentially more granular and reproducible than vulnerabilities. We classify reports in four types, in increasing degree of quantitative evidence:

  1. Issue: qualitative evaluation based on a single sample or handful of samples,
  2. Advisory: qualitative evaluation based on multiple Incidents,
  3. Measurement: quantitative evaluation with associated data and metric,
  4. Detection: A Measurement deemed critical by a threshold or statistical test.

These types are reminiscent of the three levels of AI Auditing, and accommodate diverse AI evaluation scenarios from the user perspective.

List of Reports