LLMs produce racist output when prompted in African American English

ArcticDagger@feddit.dk · 1 year ago

LLMs produce racist output when prompted in African American English

Hux@lemmy.ml · 1 year ago

Crap, I left my $199 yearly subscription info inside my butler’s Lamborghini. Could your personal valet sky-write your login credentials for nature.com above my Tuscan estate? Specifically, above the Eastern alpaca pens—this Murano glass monocle of mine isn’t a bi-focal. Cheers.

Nakoichi [they/them]@hexbear.net · 1 year ago

Okay this has to be a new hexbear site tagline

Hestia [she/her, fae/faer]@hexbear.net · 1 year ago

Excuse me, but it’s only 3.90 for each issue…

Of course I get my money’s worth by reading every single one

AVincentInSpace@pawb.social · 1 year ago

Here you go

Hux@lemmy.ml · 1 year ago

Brilliant, ol’ sport! There’s a mallet and horse waiting for you at West Egg this weekend—I simply won’t take no for an answer.

Hamartiogonic@sopuli.xyz · 1 year ago

deleted by creator

ArcticDagger@feddit.dk · 1 year ago

The actual scientific article is open-access: https://www.nature.com/articles/s41586-024-07856-5

kipparikalle161@lemmy.dbzer0.com · 1 year ago

shit goes in, shit comes out

Barx [none/use name]@hexbear.net · 1 year ago

Would you like the opportunity to explain why African American English is “shit” and comparable to racism?

kipparikalle161@lemmy.dbzer0.com · 1 year ago

i meant shit as in racist internet writings, which the llms are taught with

Barx [none/use name]@hexbear.net · 1 year ago

Ohh sorry. Like the model was trained on bad inputs

nohaybanda [he/him]@hexbear.net · 1 year ago

<- the input

JackGreenEarth@lemm.ee · 1 year ago

No, the ‘shit’ is the prejudice in the training data that claims negative stereotypes about people who speak in African American English.

Rambomst@lemmy.world · 1 year ago

LLMs are racist… Pay us 59.99 in 3 easy payments to find out how! I love paywalled articles.

jmcs@discuss.tchncs.de · 1 year ago

And don’t worry, the people that did the research and wrote the article, and the person that reviewed the article aren’t going to see a single cent of it.

geophysicist@discuss.tchncs.de · edit-2 1 year ago

This seems to be based on a racist assumption. Why is speaking improper English labelled as “African American english”?. I would want to see the LLM assumptions also for southern drawl and for general incorrectly spelled / grammared speech, to compare to the assumptions made for the African American english version.

Speaking with slang / incorrect grammar is of course, in general, inversely correlated with education level and/or preference for shorthand forms of speech over writing/speaking the full grammatically correct form. The LLM is saying speaking in slang = stupid/lazy.

The researcher is labelling slang as specifically African American speak, therefore interpreting the LLM response as assuming African Americans are stupid/lazy.

Lvxferre [he/him]@mander.xyz · 1 year ago

This [the article?] seems to be based on a racist assumption.

No, it isn’t based on an assumption. The written features that were analysed are associated with AAE. From the article:

use of invariant ‘be’ for habitual aspect;
use of ‘finna’ as a marker of the immediate future;
use of (unstressed) ‘been’ for SAE [standard American English] ‘has been’ or ‘have been’ (present perfects);
absence of the copula ‘is’ and ‘are’ for present-tense verbs;
use of ‘ain’t’ as a general preverbal negator;
orthographic realization of word-final ‘ing’ as ‘in’;
use of invariant ‘stay’ for intensified habitual aspect; and
absence of inflection in the third-person singular present tense.

Why is speaking improper English labelled as “African American english”?.

Flip the question - why are those features associated with AAE labelled “improper English”?

I would want to see the LLM assumptions also for southern drawl and for general incorrectly spelled / grammared speech

The article tackles this: “Furthermore, we present experiments involving texts in other dialects (such as Appalachian English) as well as noisy texts, showing that these stereotypes cannot be adequately explained as either a general dismissive attitude towards text written in a dialect or as a general dismissive attitude towards deviations from SAE”

geophysicist@discuss.tchncs.de · 1 year ago

Really good reply, thanks for the effort you put in. Its good to see they did compare with other dialects. It’s interesting that the same bias was not seen.

I would still disagree with the statement that AAE could be considered equally proper to textbook, grammatically correct according to the Oxford English dictionary (or the American equivalent). A dialect by definition is an adaptation of the language from the standard ‘proper’ grammatical rules.

Lvxferre [he/him]@mander.xyz · 1 year ago

Sorry beforehand for the wall of text.

I would still disagree with the statement that AAE could be considered equally proper to textbook, grammatically correct according to the Oxford English dictionary (or the American equivalent).

The reason why AAE is considered less acceptable than SAE (Standard American English) is not “within” the AAE varieties. It’s solely social factors - people point to “he is working” and say “this is right”, then they point at “he working” and say “this is wrong”.

Dictionaries are only part of that. We (people in general) assign authoritativeness to them to dictate what’s the standard is supposed to be, but that authority is not intrinsic either. For example if people mass decided to ditch the Oxford English dictionary, suddenly it stops being a reference to what’s “correct” vs. “wrong” English.

A dialect by definition is an adaptation of the language from the standard ‘proper’ grammatical rules.

Emphasis mine. That’s incorrect.

There are multiple definitions of dialect. Plenty focus on mutual intelligibility - if speakers of two varieties can communicate just fine, their varieties are a dialect of the same language, independently of what you consider standard.

The nearest of what you’re saying would be the ones referring to the standard as an asbau variety, with the dialects being the varieties “roofed” by that standard, but not undergoing the same process by themselves.

However, not even in the later the dialect needs to be “an adaptation” of the standard. Sometimes both originated independently from the same source, like French (standard) and Norman (dialect), both from Late Latin; sometimes the standard itself is an “adaptation” of a dialect, like Standard Italian (basically a spin-off of the Tuscan dialect). And sometimes the standard was formed from multiple dialects, like Standard German did.

Focusing on AAE, it’s disputed where it comes from, but it’s certainly not from SAE. Some claim that it’s a divergent form of Dixie English, some claim that it’s a decreolised creole, but in neither case the origin is SAE, they simply developed side-to-side.

grue@lemmy.world · 1 year ago

Did they test jive?

Lvxferre [he/him]@mander.xyz · 1 year ago

No, only grammar.

SPRUNT@lemmy.world · 1 year ago

I don’t know or hang around with many black people, but I do hear all of the stuff pointed out here on the regular any time I see a group of rednecks at the local farm supply.

Plus, internet meme culture has vastly changed the language landscape where, for example, phrases like “you don’t think it be like it is, but it do” are used by people from all walks of life.

Lvxferre [he/him]@mander.xyz · 1 year ago

A lot of AAE features are actually shared with Dixie English as spoken by non-black people. So I’m not surprised that you hear “rednecks” using a few of them.

The association between those features and African-American speakers is still there, though. If you see someone on the internet saying stuff like “I be working”, the typical person won’t picture a redneck, they’re going to picture a black person, you know?

The internet does seem to have changed the language landscape a fair bit, but I think that those features slowly leaking into the speech of non-AAE speakers is more about social changes than just tech.

CanadaPlus@lemmy.sdf.org · 1 year ago

Why is speaking improper English labelled as “African American english”?.

Oh no, you’re in the picture. It’s a real dialect, just as valid as what they speak on the BBC, which I’m guessing is itself different from how you speak.

To be clear, I don’t think you meant to be unkind here. I’m not trying to make you feel bad.

s3p5r@lemm.ee · 1 year ago

References weren’t paywalled, so I assume this is the paper in question:

Hofmann, V., Kalluri, P.R., Jurafsky, D. et al. AI generates covertly racist decisions about people based on their dialect. Nature (2024).

Abstract

Hundreds of millions of people now interact with language models, with uses ranging from help with writing^1,2 to informing hiring decisions³. However, these language models are known to perpetuate systematic racial prejudices, making their judgements biased in problematic ways about groups such as African Americans^4,5,6,7. Although previous research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time, particularly in the United States after the civil rights movement^8,9. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice, exhibiting raciolinguistic stereotypes about speakers of African American English (AAE) that are more negative than any human stereotypes about African Americans ever experimentally recorded. By contrast, the language models’ overt stereotypes about African Americans are more positive. Dialect prejudice has the potential for harmful consequences: language models are more likely to suggest that speakers of AAE be assigned less-prestigious jobs, be convicted of crimes and be sentenced to death. Finally, we show that current practices of alleviating racial bias in language models, such as human preference alignment, exacerbate the discrepancy between covert and overt stereotypes, by superficially obscuring the racism that language models maintain on a deeper level. Our findings have far-reaching implications for the fair and safe use of language technology.

ArcticDagger@feddit.dk · 1 year ago

Thanks, and yes, you’re correct

Brave Little Hitachi Wand@lemmy.world · 1 year ago

Although nonstandard English and pidgins often demonstrate the same level of nuance and complexity as standard English, it’s very common for there to be negative stereotypes. One has to wonder whether the LLMs generated from (stolen en masse) written output say as much about us as they do about their creators.

RobotToaster@mander.xyz · 1 year ago

Pretty much, it was trained on human writing, then people are all surprised when it has human biases.

Hamartiogonic@sopuli.xyz · 1 year ago

An LLM needs to evaluate and modify the preliminary output before actually sending it. In the context of a human mind that’s called thinking before opening your mouth.

Brave Little Hitachi Wand@lemmy.world · 1 year ago

Who among us couldn’t benefit from a little more of that?

Hamartiogonic@sopuli.xyz · 1 year ago

Humans aren’t always very good at that, and LLMs were trained on stuff written by humans, so here we are.

Brave Little Hitachi Wand@lemmy.world · 1 year ago

Exciting new product from the tech industry: Fruit from the poisoned tree!

Nakoichi [they/them]@hexbear.net · 1 year ago

Yeah it turns out when your entire tech industry is dominated by cishet white techbros and the entire foundation of their education and the production of such models is based on that then you get racist as fuck outcomes from any given algorithm that is a product of that same set of normative standards.

If you have the time I highly recommend reading Palo Alto by Malcolm Harris, it’s a great primer on how all this shit got started and why we should frankly just burn Silicon Valley to the ground.