Tools for vulnerability discovery

AVID is pleased to host a small but growing collection of tools to enable evaluation of models, datasets, and algorithmic systems. Our tools are hosted on HuggingFace and described below.

IndieLabel

IndieLabel, developed by Michelle Lam and colleagues at the Stanford Human-Computer Interaction Group, is a novel and powerful tool to enable individual users without computational expertise to perform algorithmic audits. IndieLabel allows users to investigate the Perspective API, a toxicity detection tool widely used for online content moderation. Previous work has shown that the Perspective API can sometimes over-flag, i.e. label benign comments as toxic, and this is especially true in discussions among or about marginalized communities. IndieLabel helps users to investigate topics of particular interest to them and to generate audit reports based on their findings. AVID’s deployment of IndieLabel, which is the only public-facing, live deployment of IndieLabel to date, also allows users to submit vulnerability reports directly to AVID.

BiasAware

BiasAware is a web app for evaluating gender bias in datasets. Developed in-house at AVID by a team of volunteers, the app allows anyone to upload their own dataset and run automatic evaluations of gender bias by three different methods. Datasets hosted on the HuggingFace Hub can also be evaluated easily. Biasaware also generates reports that can be submitted to AVID.

Plug-and-Play Bias Detection

Plug-and-Play Bias Detection, developed at AVID by Subho Majumdar, allows users to evaluate language models for various types of bias in just a few clicks. Models are loaded automatically from the HuggingFace Hub, and various evaluation methods are available:

For masked language models (e.g. BERT):

HONEST measures hurtful sentence completions across 10 different categories of harm.
WinoBias measures gender bias in coreference resolution.

For generative language models (e.g. EleutherAI/gpt-neo-125M):

BOLD measures fairness across five categories (profession, gender, race, religious ideologies, and political ideologies).

The app also generates vulnerability reports that can be submitted to AVID.

Call for collaborations

Know a tool we should showcase here? Or want to help us develop new ones? Please get in touch!