Rethinking AI Safety: Addressing Disability Harms in...

In the rapidly advancing world of artificial intelligence, the introduction of DisaBench marks a important moment in addressing a critical oversight: the disability-related harms overlooked by general-purpose safety benchmarks for language models. This isn't just an incremental step. it's a fundamental shift in how we approach AI safety.

A Comprehensive Approach

DisaBench presents a detailed taxonomy of twelve disability harm categories, co-created with both individuals living with disabilities and red teaming experts. This collaboration ensures that the evaluation is grounded in real-world experiences. What sets DisaBench apart is its taxonomy-driven evaluation methodology, which pairs benign and adversarial prompts across seven life domains. The dataset contains 175 prompts, resulting in 525 prompt-response pairs, each meticulously annotated by four evaluators who have firsthand disability experience.

Uncovering Hidden Harms

The findings from this initiative are eye-opening. Harm rates aren't uniform. they vary sharply by disability type, and it's clear that these issues will only compound as AI moves into non-text modalities. This raises a pertinent question: are current AI models equipped to handle the nuanced needs of diverse users? Furthermore, the research highlights that terminology-driven harm is culturally and temporally bound, defying universal assessment. In simpler terms, what might be harmful in one culture or time may not be in another. This complexity calls for a more nuanced approach to AI safety than what's currently standard.

Beyond the Surface

Standard safety evaluations often catch overt failures but miss the subtle harms that only domain expertise can recognize. This is where DisaBench shines, as it acknowledges that disability harm is inherently personal, intersectional, and community-defined. It's not just about the data, it's about the full context of who a person is. The current benchmarks systematically miss these nuances, which could lead to significant negative impacts if not addressed.

Looking Forward

In a move towards openness and collaboration, DisaBench plans to release its dataset, taxonomy, and methodology through Hugging Face and an open-source red teaming framework. This integration into existing safety pipelines promises to enhance AI safety without the need for additional infrastructure. As AI continues to bridge the gap between the digital and physical, how we address these complex issues will define the future of technology's impact on society. It's a call to action for the industry to re-evaluate and prioritize inclusivity in AI development.

Rethinking AI Safety: Addressing Disability Harms in Language Models

A Comprehensive Approach

Uncovering Hidden Harms

Beyond the Surface

Looking Forward

Key Terms Explained