Makes sense. AAVE is mostly a spoken thing, LLMs are mostly trained on the corpus of written text on the internet and in books. It's pretty rare for people to write in an AAVE style in those contexts.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
They can't possibly encounter much of it in training material... Of course they're not going to like it.
I'm not from USA, black, nor a native English speaker, but due to Linguistics I can give you guys some further info.
AAE (Afro-American English), in a nutshell, is a group of English varieties used by some speakers from USA and Canada. In a lot of aspects they resemble geographical varieties, like the ones you'd see in plenty other languages, but there's a key difference: it isn't used by people "of a certain region", but rather by people "of a certain race" (black people).
This is mostly but not completely spoken (cue to the term AAVE - the "V" stands for "vernacular"); it affects also the way that those people use the written language. So often you see AAE features in written English, like:
- Negative concord - for example, "I don't want to hear nothing about this shit, man."
- Habitual-be - for example, "They be talking about this everyday."
- bits of non-standard spelling, due to phonetic differences
- expressions and vocab typically used primarily by black people
What the article is saying is that LLMs are biased against those features. It's a rather strong bias, and not noticed for a geographical variety used as reference (Appalachian English). In other words: the LLM has been fed racist babble, and now it's regurgitating it.
I see, that's very different from most countries I imagine? People often speak on their own local dialect, here a northeastern would informally speak a completely different portuguese than someone from the south, doesn't matter the race.
Yup, it's atypical even in the rest of the Americas. I think that the nearest equivalent in Portuguese would be the quilombola dialects, but even then it's way off - because those dialects are still geographically associated with their respective quilombos, not just with race.
Since they’re vernacular you’ll mostly hear them being spoken, they aren’t really written
AAVE is commonly "written" now because most writing is texts and social media comments. So even if they luck out and learn "proper" English, people still going to type on their phones the same way they talk.
Even for white kids, most of Gen Z slang is just taken from AAVE, when older people complaining about not being able to read zoomer slang from text or comments, it's just heavily influenced by AAVE.
There's been bleed over for centuries, but with the Internet and social media it's merging faster, which is common for dialects of people that interact frequently
Warning: I've edited the comment that you're replying to. I'm saying this for the sake of transparency, as you're clearly quoting the earlier version.
The key here is that AAVE is not written, but AAE is. That "V" is for vernacular, it excludes written English by definition.
Now, I'm not sure if those white kids are using AAE or simply borrowing things from AAE into their written English. I simply don't have data on that.
There’s been bleed over for centuries, but with the Internet and social media it’s merging faster, which is common for dialects of people that interact frequently
Varieties merging or splitting is rarely the result of just more contact between people; it's all about identity. If things are happening as you described them, it's simply that those white kids stopped seeing black people as "the others", to see them as "part of the same group as us".
For anyone that, like me, was confused what the hell is this language: https://en.wikipedia.org/wiki/African-American_English
Seems to be proper name for the kind of language a stereotypical black character in a movie would use.
Can't say about real world, since I don't live in the USA.
I'd say this is exactly where the LLMs problems with it comes from. For most of us outside of the US and even a lot of people there, it's exactly that - a caricature of a lower class black person. However for many people it's a legit dialect of English they speak every day.
I don't live in America either, but I went on a cruise once and there were many Americans, including a black American couple who were very obviously urban. By which I mean, the wife wore high heels and a tight jeweled mini-skirt on a sea-kayaking excursion...clearly signalling that she hadn't spent much time outside of a city.
Anyway, I was shocked when they spoke exactly like The Jeffersons, with all the exaggerated whooping, non-stop vernacular, and stage-like mannerisms. It was so over-the-top that I honestly thought they were play acting, but after chatting with them for a while I realized that was just how they were. They were very nice people and clearly having a great time.
I was wondering where the V went...
Apparently African American Vernacular English (said AAVE, pronouncing each letter) is just a dialect and there's a couple other that fit under just AAE? I never knew about any of those beside AAVE.
Seems to be proper name for the kind of language a stereotypical black character in a movie would use. Can’t say about real world, since I don’t live in the USA.
AAVE is the "relaxed" English you're talking about. And with the interconnectedness of the Internet, AAVE is kind of displacing the rest.
But honestly from an etymological standpoint I think it makes sense to view AAVE as the base and then just having other flavors of it. From that link they're trying to break it down I to multiple distinct groups.
If I got this right the main difference between AAE and AAVE is scope: AAVE is strictly the vernacular varieties, used in everyday informal setting, while AAE includes all those AAVE varieties plus African-American Standard English and a few regional varieties.
Not to be confused with African-American Vernacular English.
Aave is what I'd say is more "the kind of language a stereotypical black character in a movie would use".
African-American Vernacular English[a] (AAVE)[b] is the variety of English natively spoken, particularly in urban communities, by most working- and middle-class African Americans and some Black Canadians.[4] Having its own unique grammatical, vocabulary and accent features, AAVE is employed by middle-class Black Americans as the more informal and casual end of a sociolinguistic continuum. However, in formal speaking contexts, speakers tend to switch to more standard English grammar and vocabulary, usually while retaining elements of the non-standard accent.[5][6] AAVE is widespread throughout the United States, but is not the native dialect of all African Americans, nor are all of its speakers African American.
Well, "not to be confused", but the same page says AAVE is just a dialect of AAE, so mostly not much of a difference, I think.
The difference here is mostly scope: AAE includes stuff like African-American Standard English (English as used by black people in more formal settings) and the written language, while AAVE refers only to the vernaculars.
Note that some don't even make this distinction, but I think that it's important.
So for those that didn't read the article, it basically explains how LLMs have a negative connotation about AAE. When asked to associate words with AAE written phrases, it used words like "aggressive". When given a normal English phrase and the same phrase but in AAE and then asked what jobs would suit this person, the LLM gave low income jobs for the AAE statement with broader options for the normal English one.
It's a serious problem because people that naturally write in AAE are most likely getting worse results. It stems mostly from old rascist newspaper articles and similar things.
i bet it's honestly more more from like 4chan and other modern online racist communities. where they would mock aave with racist caricatures. agree with the rest, but if it's related to aave then i doubt the old newspapers were the source.
It's a serious problem because people that naturally write in AAE are most likely getting worse results
Person using LLM built on grammatical rules of the English language has subpar results when operating outside of those rules. More at 6.
Is this the new term for ebonics and is ebonics offensive now or inappropriate?
Essentially, yes. Ebonics isn’t inherently offensive or inappropriate, as far as I can tell, but it has connotations that are not attached to AAE. Linguists avoid the term today, and modern uses of it tend to be derogatory.
Most LLMs support this. You just have to enable Jive mode.
Hey home, I can dig it.
African Americans have a weak bias against writing in African American English -> Colleges have weak bias against accepting African Americans as graduate students -> Academic text have strong bias for text written by graduate students -> LLM training data has bias for academic texts -> LLMs have a strong bias for writing like training data.
The error occurs upstream a bit, don't point at the coders.
We should cancel this LLM guy
Yet another case of: garbage in - garbage out.
Word.
Token
What about YTVE though?