this post was submitted on 30 Aug 2024
74 points (84.3% liked)

Science

3231 readers
28 users here now

General discussions about "science" itself

Be sure to also check out these other Fediverse science communities:

https://lemmy.ml/c/science

https://beehaw.org/c/science

founded 2 years ago
MODERATORS
all 36 comments
sorted by: hot top controversial new old
[–] ArcticDagger@feddit.dk 35 points 2 months ago

The actual scientific article is open-access: https://www.nature.com/articles/s41586-024-07856-5

[–] Hux@lemmy.ml 33 points 2 months ago (2 children)

Crap, I left my $199 yearly subscription info inside my butler’s Lamborghini. Could your personal valet sky-write your login credentials for nature.com above my Tuscan estate? Specifically, above the Eastern alpaca pens—this Murano glass monocle of mine isn’t a bi-focal. Cheers.

[–] AVincentInSpace@pawb.social 3 points 2 months ago (1 children)
[–] Hux@lemmy.ml 3 points 2 months ago* (last edited 2 months ago)

Brilliant, ol’ sport! There’s a mallet and horse waiting for you at West Egg this weekend—I simply won’t take no for an answer.

[–] furzegulo@lemmy.dbzer0.com 21 points 2 months ago

shit goes in, shit comes out

[–] Rambomst@lemmy.world 20 points 2 months ago (1 children)

LLMs are racist... Pay us 59.99 in 3 easy payments to find out how! I love paywalled articles.

[–] jmcs@discuss.tchncs.de 11 points 2 months ago

And don't worry, the people that did the research and wrote the article, and the person that reviewed the article aren't going to see a single cent of it.

[–] geophysicist@discuss.tchncs.de 17 points 2 months ago* (last edited 2 months ago) (2 children)

This seems to be based on a racist assumption. Why is speaking improper English labelled as "African American english"?. I would want to see the LLM assumptions also for southern drawl and for general incorrectly spelled / grammared speech, to compare to the assumptions made for the African American english version.

Speaking with slang / incorrect grammar is of course, in general, inversely correlated with education level and/or preference for shorthand forms of speech over writing/speaking the full grammatically correct form. The LLM is saying speaking in slang = stupid/lazy.

The researcher is labelling slang as specifically African American speak, therefore interpreting the LLM response as assuming African Americans are stupid/lazy.

[–] lvxferre@mander.xyz 22 points 2 months ago (4 children)

This [the article?] seems to be based on a racist assumption.

No, it isn't based on an assumption. The written features that were analysed are associated with AAE. From the article:

  • use of invariant ‘be’ for habitual aspect;
  • use of ‘finna’ as a marker of the immediate future;
  • use of (unstressed) ‘been’ for SAE [standard American English] ‘has been’ or ‘have been’ (present perfects);
  • absence of the copula ‘is’ and ‘are’ for present-tense verbs;
  • use of ‘ain’t’ as a general preverbal negator;
  • orthographic realization of word-final ‘ing’ as ‘in’;
  • use of invariant ‘stay’ for intensified habitual aspect; and
  • absence of inflection in the third-person singular present tense.

Why is speaking improper English labelled as “African American english”?.

Flip the question - why are those features associated with AAE labelled "improper English"?

I would want to see the LLM assumptions also for southern drawl and for general incorrectly spelled / grammared speech

The article tackles this: "Furthermore, we present experiments involving texts in other dialects (such as Appalachian English) as well as noisy texts, showing that these stereotypes cannot be adequately explained as either a general dismissive attitude towards text written in a dialect or as a general dismissive attitude towards deviations from SAE"

[–] geophysicist@discuss.tchncs.de 7 points 2 months ago* (last edited 2 months ago) (1 children)

Really good reply, thanks for the effort you put in. Its good to see they did compare with other dialects. It's interesting that the same bias was not seen.

I would still disagree with the statement that AAE could be considered equally proper to textbook, grammatically correct according to the Oxford English dictionary (or the American equivalent). A dialect by definition is an adaptation of the language from the standard 'proper' grammatical rules.

[–] lvxferre@mander.xyz 5 points 2 months ago

Sorry beforehand for the wall of text.

I would still disagree with the statement that AAE could be considered equally proper to textbook, grammatically correct according to the Oxford English dictionary (or the American equivalent).

The reason why AAE is considered less acceptable than SAE (Standard American English) is not "within" the AAE varieties. It's solely social factors - people point to "he is working" and say "this is right", then they point at "he working" and say "this is wrong".

Dictionaries are only part of that. We (people in general) assign authoritativeness to them to dictate what's the standard is supposed to be, but that authority is not intrinsic either. For example if people mass decided to ditch the Oxford English dictionary, suddenly it stops being a reference to what's "correct" vs. "wrong" English.

A dialect by definition is an adaptation of the language from the standard ‘proper’ grammatical rules.

Emphasis mine. That's incorrect.

There are multiple definitions of dialect. Plenty focus on mutual intelligibility - if speakers of two varieties can communicate just fine, their varieties are a dialect of the same language, independently of what you consider standard.

The nearest of what you're saying would be the ones referring to the standard as an asbau variety, with the dialects being the varieties "roofed" by that standard, but not undergoing the same process by themselves.

However, not even in the later the dialect needs to be "an adaptation" of the standard. Sometimes both originated independently from the same source, like French (standard) and Norman (dialect), both from Late Latin; sometimes the standard itself is an "adaptation" of a dialect, like Standard Italian (basically a spin-off of the Tuscan dialect). And sometimes the standard was formed from multiple dialects, like Standard German did.

Focusing on AAE, it's disputed where it comes from, but it's certainly not from SAE. Some claim that it's a divergent form of Dixie English, some claim that it's a decreolised creole, but in neither case the origin is SAE, they simply developed side-to-side.

[–] grue@lemmy.world 5 points 2 months ago (1 children)
[–] lvxferre@mander.xyz 1 points 2 months ago

No, only grammar.

[–] SARGE@startrek.website 5 points 2 months ago (4 children)

I always love when cough "educated" people (usually just what they like to say when they mean "not black") go on about how "black people don't speak proper English!" because certain vowels can be dropped here or there, grammar shifts, the works. Most of us have heard AAE (also maybe heard it called "Ebonics" if you're a little older) at one point or another, and likely don't have an issue understanding what anyone is saying. A few things that skew more metaphorical or slang words might slip by but you get the gist.

That's the point of language. Convey information. If the information is conveyed, then language has done its job. Yay language.

If anyone wants to continue saying "it's not PROPER English" well... I have bad news for you. Neither is any other modern form of English. So many words have been borrowed, or stolen, sentence structures have changed, entire words change meaning. And that's just in the last 100 years.

English is an amalgamation of many different root languages, and has so many borrowed words and phrases, along with nearly every other modern language, can any of them still be said to be "proper"?

When I think of the difference between "proper English" and "improper English" I'm reminded of My Fair Lady. "The rain in Spain stays mainly on the plain" Eliza vs Henry Higgins (or 'enry 'iggins if you're feeling improper)

[–] lvxferre@mander.xyz 3 points 2 months ago

I agree with most of what you said so I'll focus only on a specific point, OK?

That’s the point of language. Convey information. If the information is conveyed, then language has done its job. Yay language.

There's another point of language, besides conveying information, that is relevant here: identity. Different people speak different varieties because this allows them to express "this is who I am, I speak like my peers".

That's why AAE speakers use their varieties on first place - it's a way to tell the world "I'm black, this is who I am, I identify myself with other black people". And also why those varieties get such a stigma - because in USA, they want people to feel bad for being who they are, if they're black. (NB: I'm not from USA, but this is so fucking obvious that even an external observer gets it.)

[–] barsquid@lemmy.world 1 points 2 months ago

Habitual be is one of the greatest gifts any language has ever received and we are all richer for it.

[–] geophysicist@discuss.tchncs.de -1 points 2 months ago (1 children)

Of course there is a proper english. As defined by standard grammatical rules of the English language. A dialect is a variation upon that. I am not saying "black people don't speak proper English". There are plenty of black people who speak proper English, the same as there are plenty of white people who speak proper English and plenty of white people who speak dialects. I am saying that any and all dialects are not formal English, by definition of what the grammatical rules of the language are

[–] theilleists@lemmy.world 3 points 2 months ago (1 children)

What is the governing body of your alleged "proper English"?

[–] geophysicist@discuss.tchncs.de 1 points 2 months ago* (last edited 2 months ago) (1 children)

You do understand that English has grammatical rules, right? That's not up for debate. It's not controversial to say that standard English would be following the standard spellings and grammatical rules for the language. If you genuinely don't know what those are, then please buy a dictionary or hire an English teacher. You could also look up what the rules are for the English language exam required to become a citizen. These are taught in schools around the world. About the extent that the rules vary is if you are learning American English or British English. Each of those two has the backing of a nation state

[–] theilleists@lemmy.world 1 points 2 months ago* (last edited 2 months ago) (1 children)

I have a degree in linguistics. The most important thing it taught me is that there is a widely believed fiction, almost like a religion, underlying prescriptivist grammar. For the sake of social advancement, if you have both the means and the talent, it's generally necessary to learn a list of arbitrary but extremely complicated prestige markers for your language, to earn the approval of the self-appointed priestly caste of grammarians, in order to rub shoulders with the rich and powerful. An overly complex shibboleth.

It's a mechanism to oppress the lower classes while maintaining the pretense of pure meritocracy, by declaring arbitrarily that the dialect which is already spoken and written in the homes of the upper class children is proper, and all other dialects are improper, then implying that the "failure" of lower class children to acquire the prestige markers is an intellectual shortcoming, rather than the absence of privilege.

Can you buy books and hire tutors to learn these prestige markers? Of course. Is there general agreement among members of this cult about what their own rules are? Sure. If you choose not to use them, is your English "improper"? Absolutely not. It's different but equal, as long as your meaning is clear. I would wager that more than 90% of people do not go even one day without saying or writing some example of "improper" English, which is nevertheless understood perfectly well by the recipient. Successful transmission of the message is the only true test of linguistic legitimacy. Everything else is performative.

By the way, while it doesn't change much about this more fundamental basis for my opinion that "standard English" is an offensive fiction, neither British nor American English actually have the backing of a nation state. This is in contrast to, for example, French, which does. According to this article on language regulators, "The English language has never had a formal regulator anywhere, outside of private productions such as the Oxford English Dictionary." Prompting my rhetorical question to you earlier: Who is the governing body? There is none.

[–] geophysicist@discuss.tchncs.de 1 points 2 months ago (1 children)

I would argue there is a distinction between the prestige markers you're talking about and the more general grammatical rules that are followed. You can use grammatically correct English without subscribing to the over complexity that is, yes, possible.

A dialect can work locally, but there is a reason why many who have travelled abroad find themselves deliberately softening their dialect and shifting closer to proper English /queen's English / whatever we're calling it. It's because their dialect is hard to understand for people who are not used to it. My girlfriend always comments how my dialect comes out of the woodwork when I'm back home with family.

Saying "I be lit" or "gimme a pint guvna" just will not be understood outside of the area or cultural group that it is spoken in. Therefore for business and/or travel you need a more standard form of English that everyone can understand. That's not some overly complex cultural dance we play to keep the powerful powerful, that's just basic requirements of communication.

[–] theilleists@lemmy.world 1 points 2 months ago* (last edited 2 months ago)

"Should of" instead of "should have."

"Me and her went" instead of "she and I went."

"Flustrated" instead of "frustrated."

"To who" instead of "to whom."

"For all intensive purposes" instead of "for all intents and purposes."

"Aks" instead of "ask."

"Literally" to mean "figuratively."

"Shoe-in" instead of "shoo-in."

A semicolon instead of a colon.

Using a preposition at the end of a sentence.

Splitting infinitives.

Starting a sentence with a conjunction.

Each a simple "error" to remember. But there are thousands of them. None make an appreciable difference in understanding. None would ruin a business deal or a meeting except in terms of lost social standing for getting it "wrong." This category of errors is what I believe to be meant by "improper English." This is in contrast to "incomprehensible English."

As I said, successful transmission of the message is the only true test of linguistic legitimacy. You're absolutely right. People are instinctively aware of when their dialectical quirks are going to cause a problem communicating with outsiders, and they code switch. They simplify. Ironically, the less familiar the interlocutor is with English, the more "improper" a native speaker's English might become. "My name? John. Your name?" Yet in so doing, they become more compensable because they've dropped the complex cultural dance which they are so often required by the powerful to perform.

[–] corsicanguppy@lemmy.ca -2 points 2 months ago (1 children)

"educated" people (usually just what they like to say when they mean "not black")

Please don't be racist. Education level is unconnected to race.

Please don't call other people racist, unless slander is your thing.

[–] SARGE@startrek.website 5 points 2 months ago

The whole point of that was to hilight how racist people like to call someone "educated" when what they really mean is "wow you sound white"

It was mocking the exact kind of person you seem to believe I am.

[–] SPRUNT@lemmy.world 1 points 2 months ago (1 children)

I don't know or hang around with many black people, but I do hear all of the stuff pointed out here on the regular any time I see a group of rednecks at the local farm supply.

Plus, internet meme culture has vastly changed the language landscape where, for example, phrases like "you don't think it be like it is, but it do" are used by people from all walks of life.

[–] lvxferre@mander.xyz 2 points 2 months ago

A lot of AAE features are actually shared with Dixie English as spoken by non-black people. So I'm not surprised that you hear "rednecks" using a few of them.

The association between those features and African-American speakers is still there, though. If you see someone on the internet saying stuff like "I be working", the typical person won't picture a redneck, they're going to picture a black person, you know?

The internet does seem to have changed the language landscape a fair bit, but I think that those features slowly leaking into the speech of non-AAE speakers is more about social changes than just tech.

[–] CanadaPlus@lemmy.sdf.org 9 points 2 months ago* (last edited 2 months ago)

Why is speaking improper English labelled as “African American english”?.

Oh no, you're in the picture. It's a real dialect, just as valid as what they speak on the BBC, which I'm guessing is itself different from how you speak.

To be clear, I don't think you meant to be unkind here. I'm not trying to make you feel bad.

[–] s3p5r@lemm.ee 11 points 2 months ago (1 children)

References weren't paywalled, so I assume this is the paper in question:

Hofmann, V., Kalluri, P.R., Jurafsky, D. et al. AI generates covertly racist decisions about people based on their dialect. Nature (2024).

Abstract

Hundreds of millions of people now interact with language models, with uses ranging from help with writing^1,2^ to informing hiring decisions^3^. However, these language models are known to perpetuate systematic racial prejudices, making their judgements biased in problematic ways about groups such as African Americans^4,5,6,7^. Although previous research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time, particularly in the United States after the civil rights movement^8,9^. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice, exhibiting raciolinguistic stereotypes about speakers of African American English (AAE) that are more negative than any human stereotypes about African Americans ever experimentally recorded. By contrast, the language models’ overt stereotypes about African Americans are more positive. Dialect prejudice has the potential for harmful consequences: language models are more likely to suggest that speakers of AAE be assigned less-prestigious jobs, be convicted of crimes and be sentenced to death. Finally, we show that current practices of alleviating racial bias in language models, such as human preference alignment, exacerbate the discrepancy between covert and overt stereotypes, by superficially obscuring the racism that language models maintain on a deeper level. Our findings have far-reaching implications for the fair and safe use of language technology.

[–] ArcticDagger@feddit.dk 2 points 2 months ago

Thanks, and yes, you're correct

[–] Gradually_Adjusting@lemmy.world 9 points 2 months ago (1 children)

Although nonstandard English and pidgins often demonstrate the same level of nuance and complexity as standard English, it's very common for there to be negative stereotypes. One has to wonder whether the LLMs generated from (stolen en masse) written output say as much about us as they do about their creators.

[–] RobotToaster@mander.xyz 11 points 2 months ago* (last edited 2 months ago) (1 children)

Pretty much, it was trained on human writing, then people are all surprised when it has human biases.

[–] Hamartiogonic@sopuli.xyz 2 points 2 months ago (1 children)

An LLM needs to evaluate and modify the preliminary output before actually sending it. In the context of a human mind that’s called thinking before opening your mouth.

[–] Gradually_Adjusting@lemmy.world 4 points 2 months ago (1 children)

Who among us couldn't benefit from a little more of that?

[–] Hamartiogonic@sopuli.xyz 1 points 2 months ago (1 children)

Humans aren’t always very good at that, and LLMs were trained on stuff written by humans, so here we are.

[–] Gradually_Adjusting@lemmy.world 2 points 2 months ago

Exciting new product from the tech industry: Fruit from the poisoned tree!