Who should read this?

Welcome. This is the very first issue of the Babylon Newsletter. If you are reading this, you are part of a small group I wanted to invite personally — because we are working together on a project, met at a conference, or share an interest in developing and using language technology.

Sent out on Tuesdays, I share what I observe or learn about AI language technology. Topics will be AI solutions for transcription, translation, dubbing, voice synthesis, subtitling. Plus: Writing, content creation, publishing.

The insights will usually point to recently published articles or videos. On occasion I will mention or link to older findings or research. Good information is not obsolete because it was published two weeks ago.

On occasion I will reflect on how these developments affect publication strategies — for media companies, institutions, and commercial organizations working with content. The idea is to make sense of it all and to find best practices.

My background is in journalism and innovation, so the focus is less on technical depth and more on patterns: if a technology got much better, what use cases are emerging? Where is the field heading? I try to understand the market logic based on what I see and experience - and of course hear your views in comments or mails. Good use of language technology should benefit humans. This is a bit idealistic, yes. But why give up good goals so early in the game?

The Number

7,170

This is the approximate number of living languages in the world today. Of those, around 40 percent are considered endangered. Africa alone has more than 1,400 distinct language varieties.

Yet AI language systems today focus almost entirely on English, Mandarin, Spanish, and a handful of other well-documented languages. The rest are massively underrepresented in training data.

The takeaway: Language tools are getting very good, very fast — but unevenly. Some languages and their speakers (and economies) will benefit enormously. For the rest, the risk is that AI does not close the gap but widens it. Work is needed to define why investments into training data and building language models is beneficial.

Bonus: If you click on the link you’ll see an amazing map of where the >7,000 languages are spoken.

Source: Ethnologue

Interesting

Alibaba's Qwen3.5 Raises the Economic Bar for Multilingual AI

Alibaba released Qwen3.5 — an open-weight model, meaning the weights are publicly available for anyone to download and run, unlike proprietary systems where the model stays locked. Released under an Apache 2.0 license, it covers 201 languages and dialects (up from 82 in the previous generation) and runs at approximately 60 percent lower cost than its predecessor. Benchmark performance is claimed to be comparable to leading closed-source models. Smaller variants followed on February 24–25 and March 2.

Why this matters: This introduces a freely available model with broad language coverage, at a fraction of the costs of APIs. For certain use cases this means the rationale for paying premium prices narrows. For media organizations and language service providers evaluating their stack, 201 languages at low cost changes baseline assumptions. The competitive advantage is no longer who has the best model — it is who can integrate and adapt it most effectively for their specific context.

Caveat: Benchmark performance and production performance are not the same. Independent evaluation on real-world language tasks, especially for non-English and low-resource languages, is not yet available.

Source: Qwen Blog

Reinforcement Learning as a Way to Reduce AI Hallucinations

AI models make things up. They generate text that sounds confident but was never in the source. Apple published research on a method that goes beyond detecting whether a hallucination occurred — it locates exactly which words or phrases are wrong. The system is called RL4HS and uses reinforcement learning to train models to pinpoint incorrect spans of text. On standard benchmarks it outperformed previous detection approaches.

Why this matters: The better AI transcription and translation gets, the harder it becomes to spot the errors that remain. A hallucinated phrase in a translated document reads naturally — there is nothing visually wrong. A human reviewer might catch it — but cost pressure is already pushing organizations toward fully automated pipelines with no review step at all. This is especially consequential in medical, legal, or statistical contexts, where a single wrong phrase has real consequences. RL4HS is research, not a deployed product. Whether it holds up outside the lab is still open.

Source: Apple Machine Learning Research (with a link to the scientific article)

Tool Spotlight

ElevenLabs Conversational AI: What a Polished Demo Actually Shows

Two weeks ago I attended a webinar by ElevenLabs where they demonstrated their Conversational AI platform. The scenario: a stranded airline passenger calling customer support to rebook a flight. The support representative was AI.

The demo was technically impressive. The system spoke with natural pacing, handled interruptions, adjusted emotional register, and retrieved live options from an airline database in real time. It felt like a real conversation — and like the conversation a stranded passenger calling with a problem would actually hope for: That there is a path to a solution.

Why this matters: These systems are trained on curated scenarios with defined policy boundaries. The AI, the model, the training and the rules are all important for the approaches to work. There is real potential to change how call centers work. But the gap between a controlled demo and a deployment handling real-world variability — accents, background noise, ambiguous requests, languages beyond English — remains substantial. The technology is real and will very likely evolve further quickly. The right question for any organization considering deployment is not "did the demo work?" It is: what happens in the difficult cases, in a second language, under poor audio conditions?

In the right hands, with a well-defined and customer-centric approach, this technology can genuinely improve the experience. But deployed badly — with overly restricted guardrails, no flexibility when the standard solution fails, no ability to book a hotel if no flight is available — it could create something even more hellish and Kafkaesque than the phone trees it replaces.

Source: ElevenLabs Conversational AI

Event

Coming this Friday: AI News Chatbots in Practice (Webinar)

On March 20 (14:00–15:30 CET), ChatEurope is hosting a webinar with European media organizations already running AI-powered news assistants. Speakers from Aftonbladet, Ouest-France, and the EBU will share what actually happened when they deployed — hallucination rates, user experience problems, lessons learned. The framing statistic is worth noting: EBU research found that 45% of AI-generated news answers contain at least one significant issue. The gap between a working demo and a trustworthy production system is the real topic here.

I will be presenting as part of the ChatEurope team alongside colleagues from AFP, Druid AI. Our role in the project is focused on evaluation.

Full disclosure: ChatEurope is one of the three projects I am involved in. See short disclaimer in the footer for more information.

Register here: Chat Europe Webinar (LinkedIn)

Notable links

Liquid content and how this might open perspectives for publishers: There is a shift of content usage where the answer is already given by AI. The result is that many publishers see a dramatic reduction of clicks. A potential news approach for the future could be “liquid content”. Three articles below describe how this could work. Start with reading Florent Daudens take, then move to Shuwei Fang. For background there is an older article about the current most evolved real-world example which is O’Reilly, who teamed up with an AI platform called Miso.
- Florent Daudens: Sell to the Agent
- Shuwei Fang: Beyond the Artifact: The Brutal Economics of Liquid Content
- O’Reilly (several authors): The New O’Reilly Answers - The R in “RAG” stands for “Royalties”
Calibrate Your AI Intuition
Can you tell the difference between AI-generated and human-written text? The New York Times built a quiz where you pick which of two texts you prefer — then reveals whether it came from a human or an AI. Worth trying before you read the result. Note: I used a “Gift link” to let you go past the paywall. I hope this works for you: The New York Times
Why Casey Newton Does Not Like Grammarly Using His Name: Grammarly added an AI mode promising to refine your writing in the style of a specific writer - Journalist Casey Newton found he was listed there, too. But he was never asked or contacted about the use of his name (and writing style). After considerable backlash Grammarly removed the subscription feature this week. For anyone working with AI: The question of originally created content or whose voice and style is being used, and who consented, is not going away. Just grabbing assets of others is not a way forward. The Guardian

🏆 Tool of the week

Source: Simple Bench

Simple Bench

What it does: A range of test questions to determine whether humans with high school knowledge are performing better than LLMs.

Learning: Simple bench provides a ranking based on these tests, although so far only by a small number of people. So far, people perform better than even the most advanced engines.

How to use: You can answer the questions yourself, which is an interesting experience. As a tool and ranking this is helpful when you want a quick assessment of AI models.

Link: https://simple-bench.com/

Let me know how you perceived this very first issue. Hit reply. Tell me what you are working on, what tools you are testing, what questions need answering. If you happen to have colleagues and friends who work in language technology friends, please forward this issue to them.

Babylon Newsletter

About this newsletter & a short disclaimer:

I am Mirko Lorenz. I work as an innovation manager for Deutsche Welle. I am involved in three projects: plain X(media localization platform), ChatEurope (AI chatbot network for European news partners), and MOSAIC (EU-funded multilingual media infrastructure). I will cover them when there is something worth reporting.

Babylon Newsletter, Issue #1