Can Embedded Racism In How LLMs Are Trained Be Managed?
You've got to be taught
To hate and fear,
You've got to be taught
From year to year,
It's got to be drummed
In your dear little ear
You've got to be carefully taught.
- Rodgers & Hammerstein in “South Pacific”
Readers of this website may notice I occasionally use AI tools to create the images that go with the articles. One thing I’ve noticed is that when I ask for creation of an image that includes groups such as scientists or technicians, I may need to explicity ask in the prompt for a mix of genders and demographics, otherwise the displayed group may turn out being unrealistically all white males.
It’s not surprising that how LLMs are trained is going to impact their output. This extends to treatment of how people talk. The recent article in the Sept. 28, 2024 Science by Cathleen O’Grady, AI makes racist decisions based on dialect, describes an experiment where it was found that speakers of “African American English” (AAE) were more likely than speakers of “Standard American English” to be negatively characterized by LLMs. From the Science article:
Just like humans, artificial intelligence (AI) is capable of saying it isn’t racist, but then acting as if it were. Large language models (LLMs) such as GPT4 output racist stereotypes about speakers of African American English (AAE), even when they have been trained not to connect overtly negative stereotypes with Black people, new research has found. According to the study—published today in Nature—LLMs also associate speakers of AAE with less prestigious jobs, and in imagined courtroom scenarios are more likely to convict these speakers of crimes or sentence them to death.
While I’m sure there will be pushback against attempts to inject non-racist filtering into the training of Large Language Models—look at how some companies are backing down from DEI training because it’s “racist”— perhaps another solution will be something like what the slaves in Percival Everett’s superb historical novel James did: “speak” in the manner expected by the recipient of the AI LLM output.
In James, a fanciful retelling of Huckleberry Finn but from Jim’s perspective, slaves speak “like slaves” when around white folks in order to portray themselves as nonthreatening, subservient, and unequal. Among themselves, though, they speak “standard English,” just like the white folks. Problem solved!
Could a similar approach be followed in how output is “tuned” for presentation to consumers of AI-based analyses in order to avoid a perception of being biased? Or would that by itself just add more artificiality to the manner in which content were processed in model training?
Copyright (c) 2024 by Dennis D. McDonald. Shortly after posting the above article I received the following relevant announcement from NIST: U.S. AI Safety Institute Signs Agreements Regarding AI Safety Research, Testing and Evaluation With Anthropic and OpenAI.