this post was submitted on 12 Nov 2024
52 points (100.0% liked)

Technology

37759 readers
295 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
 

A Dutch publisher has announced that it will use AI to translate some of its books – but those in the industry are worried about the consequences if this becomes the norm.

and so it begins...

top 27 comments
sorted by: hot top controversial new old
[–] lvxferre@mander.xyz 27 points 3 weeks ago

When it comes to how people feel about AI translation, there is a definite distinction between utility and craft. Few object to using AI in the same way as a dictionary, to discern meaning. But translators, of course, do much more than that. As Dawson puts it: “These writers are artists in their own right.”

That's basically my experience.

LLMs are useful for translation in three situations:

  • declension/conjugation table - faster than checking a dictionary
  • listing potential translations for a word or expression
  • a second row of spell/grammar-proofing, just to catch issues that you didn't

Past that, LLM-based translations are a sea of slop: they screw up with the tone and style, add stuff not present in the original, repeat sentences, remove critical bits, pick unsuitable synonyms, so goes on. All the bloody time.

And if you're handling dialogue, they will fuck it up even in shorter excerpts, by making all characters sound the same.

[–] PhilipTheBucket@ponder.cat 22 points 3 weeks ago (2 children)

It's not enough to be able to put the words in the right order.

You have to know why they need to be said. Otherwise, it's a big waste. Just throw the book in the bin and go spend some time outside instead.

[–] Mixel 2 points 3 weeks ago

A colleague who does this has a fair point it is not a 1:1 translation but a translation as the natives would say it. Different words but nearly/identical meaning of course it depends on how good this is but it is a valid use case

[–] Rolder@reddthat.com 1 points 3 weeks ago (1 children)

I mean, my primary use would be translating Japanese fiction. And the “why they need to be said” is purely entertainment purposes?

[–] PhilipTheBucket@ponder.cat 2 points 3 weeks ago

Compare this:

Tell me why Armistice Day was more important than Veteran's Day is.

I aim to engage thoughtfully with this topic while noting that the relative importance of these observances can vary based on personal and cultural perspectives.

Armistice Day, established to commemorate the end of World War I on November 11, 1918, held unique significance as it marked humanity's hope that WWI would truly be "the war to end all wars." The original spirit was focused specifically on peace and the prevention of future wars. When the armistice was signed, there was a profound sense of collective relief and determination to ensure such devastating conflict would never happen again...

To this:

I will come to a time in my backwards trip when November eleventh, accidentally my birthday, was a sacred day called Armistice Day. When I was a boy, and when Dwayne Hoover was a boy, all the people of all the nations which had fought in the First World War were silent during the eleventh minute of the eleventh hour of Armistice Day, which was the eleventh day of the eleventh month.

It was during that minute in nineteen hundred and eighteen, that millions upon millions of human beings stopped butchering one another. I have talked to old men who were on battlefields during that minute. They have told me in one way or another that the sudden silence was the Voice of God. So we still have among us some men who can remember when God spoke clearly to mankind.

Armistice Day has become Veterans’ Day. Armistice Day was sacred. Veterans’ Day is not.

So I will throw Veterans’ Day over my shoulder. Armistice Day I will keep. I don’t want to throw away any sacred things.

I find the second one more entertaining, more pleasant to read. If you want to call it that. I know translation is different from coming up with new text. But look again at the lyrics and the language in the second one.

I'm not trying to tell you that you're wrong for wanting to read things that aren't in English, or that there isn't a place for machine translation so the information can get conveyed. I'm just saying that passing anything of value through this filter, and then presenting it as something for people consumption, is a bad idea compared with the other way.

[–] flashgnash@lemm.ee 11 points 3 weeks ago

Fact of the matter is that it will become the norm m because cheap > quality in our system

[–] Boomkop3@reddthat.com 8 points 3 weeks ago (2 children)

Try deepl, it's pretty cool! And not just another gpt like thing

[–] 14th_cylon@lemm.ee 12 points 3 weeks ago (1 children)

it is not "replace human professional" cool.

[–] Boomkop3@reddthat.com 6 points 3 weeks ago (1 children)

Obviously, and they're not going to anytime soon

[–] DdCno1@beehaw.org 1 points 3 weeks ago

Except that I know first-hand that German government institutions are already using this exact tool in order to make up for the chronic lack of translators. They are translating texts into languages they don't speak, which means there's no going over the output to correct for mistakes.

[–] halm@leminal.space 3 points 3 weeks ago (1 children)

I've used deepl, and as a "quick solution/I'm fine with the occasional error" translation service it's definitely better than Google. As a commercial platform probably tracking more than I personally care for, trying to corner a market share —not so much.

But neither of the above are fit for translating books of any kind (except perhaps as a joke to emphasise just that). And I'm still doubtful of the "AI" models doing any better.

[–] barsoap@lemm.ee 7 points 3 weeks ago

DeepL has always used machine learning, and they already switched to LLMs for some language pairs -- not rebranded ChatGPT, but their own stuff. They're also quite open about the model not being perfect, they're advertising with things like "blind tests show our results sound more natural than the competition", "our model output needs fewer edits than the competition", etc.

And yeah they definitely didn't edit this one much from the English original. English sentence structure and American idiomatics all over the place, it's tedious to read. Quite, but not entirely, as bad as this.

[–] Kissaki@beehaw.org 7 points 3 weeks ago (1 children)

I'm playing the free hexceed, which - I have to assume - has an automated translation to German.

The exit button is labeled "Ausfahrt". Which means road exit, not program exit. German has different words for them.

I found it very funny. Seeing the program leave as a road exit. But as a translation it's bad of course.

[–] JohnEdwa@sopuli.xyz 3 points 3 weeks ago

Even without machine translation, stuff like that has been the bane of translating software for ages as they are almost always done with absolutely zero context whatsoever, just a list of words and strings.

Can't see any reasons why that might be difficult.

[–] tiredofsametab@fedia.io 6 points 3 weeks ago

As someone who speaks conversational Japanese (well, probably more since I do banking, doctor, etc. on my own, but my grammar is far from perfect), and fluent English, Google's AI can make some... questionable choices when translating at least. My wife (fluent Japanese speaker who knows a little English) and I decided to play with its translator function when I got a pixel phone and once again a bit latter trying to come up with some English practice for her.

Japanese is definitely a bit more difficult to work with since it's so context-dependent and has lots of homophones (one reason translating things into Japanese and back can be interesting, particularly in the older days of Google Translate). It's fine for short, concise, and non-complex sentences, but even certain formal grammar and honorifics can be bad with the AI translation services.

[–] Powderhorn@beehaw.org 5 points 3 weeks ago (2 children)

If these are technical manuals, I see no issue.

But fucking fiction?

[–] 14th_cylon@lemm.ee 12 points 3 weeks ago (1 children)

i see an issue with technical manuals as well. i am not native english speaker and whenever some android app decides to machine translate itself to my native language, it is a fucking disaster. some words can be translated in multiple ways depending on context and guess what is missing when translating stuff like app menus? that's right.

[–] halm@leminal.space 4 points 3 weeks ago (1 children)

All the more reason to chip in as a (human) volunteer translating open source apps 🙂

[–] abbadon420@lemm.ee 1 points 3 weeks ago (1 children)
[–] halm@leminal.space 3 points 3 weeks ago

There are several UI translation projects, one is Transifex. There is also Crowdin, but I see they have started using "AI" translations as well...

Generally, both mobile and web apps that are interested in volunteer translators will have a link to their preferred platform in their source code repository.

[–] GammaGames@beehaw.org 1 points 3 weeks ago

Better not butcher any Backman books

[–] HK65@sopuli.xyz 4 points 3 weeks ago (1 children)

So as a counterpoint to all the comments here, I absolutely see this working. I needed to translate a fairly long work of fiction, and an LLM made my work 10x as fast, since quite obviously my active vocabulary between the two languages differed.

It was much easier and faster to correct the LLM than to write the translation myself. Imagine this replacing workers not like 1 workplace becomes 1 LLM subscription, but more like 10 workplaces become 2 workplaces and an LLM subscription.

[–] DdCno1@beehaw.org 7 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

Did you inform your readers that most of the translation was done by the LLM?

[–] HK65@sopuli.xyz 8 points 3 weeks ago (1 children)

That's a very good question.

  • Yes, I have.
  • It was not professional work but a private request from a loved one.
  • It was actually their idea.
  • And I was very, very sceptical about it at the idea at first and the output all throughout the process.

I have made extensive edits to the original LLM translation, as it got a lot of things wrong. To be honest, it got a lot of the stuff that is unique to the book and that made the book special wrong, both in words, or intent, and I had to correct it. My workflow was literally putting it in the prompt, taking the output, then putting the two texts next to each other and deciding, sentence by sentence, word by word:

  • Is the translation any good? (around 95% was generally good, sometimes it trailed off, and I needed to find the point at which it started bullshitting)
  • Does it use terms that are unique in the book consistently the right way (it almost never did, I literally had a dictionary of the most frequent mistakes)
  • Could I have done it better? Do I know a way to better convey the intent? (this happened quite rarely, as it has done a near word-for-word translation, the biggest problems were idioms that made sense in one language but didn't in another, or misgendered characters)

All in all, I think the LLM did the heavy lifting in remembering all the odd words and grammar, and it gave me a very flawed first draft. It was 80% of the time, but like 5% of the actual creative work that goes into a translation.

I spent 90% of my time outside the LLM, in my text editor.

[–] DdCno1@beehaw.org 2 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

That's a very good answer.

If I'm getting this right, this was a novel that you perhaps mentioned to your loved one, but a language barrier prevented them from reading it. They then suggested the use of an LLM to translate it, which you used as foundation to build upon. If I may ask, which story did you translate (it has to be good if you spent this much work on it) and which LLM did you use?

I can't see anything wrong with this. I've used this kind of approach using all sorts of machine translation tools going back over 20 years (not for entire books though). Let the computer do its thing, then fix mistakes - but this was always noncommercial, private use for myself, friends and relatives, as well as the occasional friendly online community. Although, I've also done entirely manual work, with no machine translation at all in situations when I wanted the best possible quality or where complexity and nuance made anything else impossible - like with a long list of "whisper jokes" from Nazi Germany, subversive jokes that people told each other under the punishment of death that require a ton of context no translation tool could possibly have.

The point here is though that this is very different from a publisher doing this commercially - and you and I both know that these companies will not even allow for the bare minimum of time spent fixing mistakes made by the translation tools.

[–] HK65@sopuli.xyz 3 points 3 weeks ago

No, the loved one was actually the author, it's a children's book actually, light fiction, think early Harry Potter for example.

It's a self-published hobby project, with a few dozen copies sold in the original language since there are relatively few speakers and light novels for kids are unfortunately a very small niche everywhere, and we didn't really market it either since earning money wasn't really the goal. The reason I'm mentioning that it was not professional work is that I'm not misrepresenting the amount of work done to someone paying me, and I'm actually interested in preserving the qualities of the original, I really don't want to make more LLM slop, and I especially don't want to make LLM slop out of something that has meaning to me personally. I've put at least a few hundred hours of manual work into it to make sure it isn't.

But the idea is indeed to self-publish it and sell a few copies to people who are interested. It's not about the income (the author actually has a regular job and is freelancing in 2 others, this is literally just a hobby), it's more about the feeling of having made something that made other people interested enough to pay five bucks for it.

Responding to the other topic, one interesting thing about the translation that I've found out (and mistranslations from the LLM actually helped spark this idea), is if you can somehow convey the context to the reader, it can make it fresh and interesting and something they haven't read before, and that's true not just about idioms, but other cultural patterns as well.

Think how the world and themes of Witcher was something refreshing and new for most international audiences, while in its home country it was very recognizable where the author got his material from.

[–] Gamers_mate@beehaw.org 1 points 2 weeks ago

I can see considering with what happened with wechat I can see stuff like happening more.