Babylon Newsletter, Issue #7

Issue #7 · April 28, 2026

THE NUMBER

79 billion

That is how many words of localised content Booking.com runs through machine translation every year — 200 million words a day, across 45 languages and more than 100 content types. The figure comes from Mik Szajna, Booking.com's Head of Localisation, on DeepL's New Fluency podcast on April 21. Source: DeepL Blog.

Why care? At this volume, localisation stops being a content problem and becomes an infrastructure challenge. Human-in-the-loop is not a real option anymore. What is not fully discussed in the podcast is: how on earth are they catching the little problems — place names, regional spellings? This is the headache language people have at the moment.

The bigger picture: AI localisation expands into more languages. Really interesting: being able to read in your own language is value. We know because Booking.com ran a very large A/B test on exactly that — switching localisation off for a small share of users across all 45 languages for several weeks.

Szajna's summary: "Can't read, won't buy."

THIS WEEK

Story 1: Live speech translation: three vendors in fourteen days

Three vendors released live speech translation offerings in the last fortnight: Zoom Voice Translator (Slator, April 24), DeepL Voice-to-Voice (April 16, covered in Issue #4), and Google Meet's mobile rollout (April 2026).

What's new: Three platforms shipped real-time speech translation inside meetings within fourteen days of each other. None of them does it the same way.

Why care? Meetings will change because of this. The next adjustment will come when the AI is part of the discussion — translating back and forth, possibly summarising, possibly speaking. Worth watching how that feels in practice before assuming it works.

Reality check: None of the three has been independently benchmarked. DeepL claims it outperforms Google and ChatGPT. Zoom and Google emphasise reach. Treat all three claims as vendor-stated, check for yourself.

What we know, what we don't: We know the pace of releases is increasing. We do not know which of the new offerings actually works in language pairs that are not English-anchored — German to Polish, Italian to Turkish, Arabic to French. That is where the field will be decided. But benchmark data is scarce.

Story 2: Google opens its translation model — coverage claim and the catch

What's new: Google released TranslateGemma on April 21 — an open-weight translation model in three sizes (4B, 12B, 27B parameters), Apache-style licensing, available on Hugging Face, Kaggle, and Vertex AI.

Why care? This is the second time in six months that a major lab has opened its translation weights, after Alibaba's Qwen3.5 in February (Issue #1). The competitive question is no longer who has the best model. It is who can adapt one fastest for their own use case — and at what cost.

Reality check: The headline coverage claim is 55 languages at production quality plus another 500 at lower quality. Those are Google's own benchmarks, no independent check yet. The 55 is roughly where commercial APIs already sit. The 500 is where the interesting question lies — and where the model has the most to prove.

What we know, what we don't: We know the licensing is permissive, and the architecture is documented. We do not know what "lower quality" means in practice for the 500 languages — usable for gisting, usable for first-pass localisation, or only usable as training data for something better.

Story 3: Translators become data labellers: the $9.3B pivot

What's new: Slator analysts have sized the Data-for-AI market at $9.3B and named it the dominant growth opportunity for language services companies — the firms that used to sell translation to enterprises now sell translators' work into AI training pipelines. One notable deal: Diuna, a Polish language services company, acquired Alingua on April 15 — both mid-sized, the deal explicitly aimed at expanding AI data services in the US market. Sources: Slator analyst feature, April 14 and analyst report, April 24.

Why care? This is the structural backside of the translator displacement story (Issue #2). Translators do not disappear, but they move from the front-office to a new back-office feeding machines. This can be meaningful work, if quality is achieved. But my best guess is: many experienced practitioners will not like this shift at all. There is a management task to focus on language quality in combination with making the job task transition bearable. Difficult.

The shift is also visible on the buyer side. A small Slator reader poll on March 27 (n=25) shows: more than half of respondents now use two or more AI models in production. The trend matches the Crowdin enterprise survey of 152 B2B professionals from March (Issue #3): multi-provider strategies are now the norm, not the exception.

Slator poll — "How many different AI models are you / your company using for translation in production?" March 27, 2026, n=25

Reality check: Slator frames this as a growth opportunity. The frame is correct for company shareholders. For the translators inside those companies, the chart looks different — the same hours, the same expertise, but a different type of work.

What we know, what we don't: We know the market size, the acquisition pattern, and the strategic direction. We do not know how the labour share is being divided — whether translators producing data for AI training are being paid as translators, as labellers, or under repurposed existing contracts.

TALK OF THE WEEK

Miso.ai and the new shape of an information subscription

Miso.ai is a small San Francisco company, founded in 2017 by Lucky Gunasekara and Andy Hsieh out of Cornell Tech. They run AI answer engines on top of publisher archives — about 75 sites so far, including Macworld, Nursing Times, CIO.com, PCWorld, and Outside. They have raised modest venture funding to date — named investors include 500 Global, Susa Ventures, and O'Reilly AlphaTech Ventures, the publisher's own venture arm. They were visible at the 2026 Perugia Journalism Festival, which is why many European readers are hearing about them now.

The mechanism: Miso pays publishers a royalty calculated per citation — when a source contributes to an answer, the source's share of that answer's value gets attributed and paid.

The mechanism is not the story. The business model is. The reference deployment is O'Reilly Media — which is also a Miso investor, worth keeping in mind. Miso's AI answer layer is embedded in O'Reilly's $49-a-month subscription, available to 2.5 million paying subscribers. O'Reilly publicly described the partnership as "really lucrative" at the Press Gazette Future of Media Technology conference in 2025.

If an AI-powered information pool answers professional questions reliably, the buyer is no longer a household choosing between three to five news subscriptions at €10 a month. The buyer is a professional or a company with a budget. That is a different market with different price elasticity. The pool competes with other pools, not with the New York Times.

The most interesting implication is at the other end. This setup might initiate a race to the top, a departure from the attention economy where outrageous and false content still makes money.

Future information pools are valuable in proportion to the quality of what is in them — exclusive, current, written and maintained by humans, not a copy of copies recycled across the AI universe. These are the first signals of how publishers might make money in the AI era. The model is not fully clear yet. The work is for risk-takers — teams combining writers and developers — willing to explore.

Note: All operational claims here come from Miso, O'Reilly, or Gunasekara's 2026 Perugia Journalism Festival talk. Independent verification of payout volumes and per-author shares does not yet exist publicly.

GOOD TO KNOW

INTERPOL is hiring leaders to run language services as an AI-led function

Two senior positions in Lyon, posted April 8 and April 9, application deadline April 28: Head of Global Language Services and Head of French Language Department. The mandate is to transition from full translation to AI-assisted revision across the four current working languages — Arabic, English, French, Spanish.

The four working languages describe how INTERPOL is currently set up. The operational reality is wider — 196 member countries, every category of serious crime, every language pairing the cases bring with them. The two job ads show the scale of global multilingual conversion this implies. This will not be solved right away.

Spotlight: an open-source investigative agent for journalists

Tom Vaillant (Buried Signals) released Spotlight in late April 2026 — an open-source system that turns an AI agent into an investigative workflow: start with a lead, generate a reporting methodology, run investigator and fact-checker loops, keep evidence on the local machine. It runs on the reporter's laptop when the case is sensitive, and it is not tied to a specific model provider. spotlight.buriedsignals.com

A single reporter can now run an end-to-end investigative workflow that used to need a small team. That changes what local journalism can do.

The site itself is worth a look: What used to be a rough open-source MVP release in the past now looks like a polished launch from a design agency. The standard for what one person can ship is shifting.

Designing for trustworthiness in AI-based fact-checking — DW and EBU paper

Lalya Gaye (EBU), with Anna Schild and Eva Lopez (DW Research and Cooperation Projects), published a peer-reviewed paper in the Springer LNCS series on how to design AI fact-checking tools that fit existing journalistic workflows and support user trust by design — a useful framework if you build, procure, or evaluate verification tools. Disclosure: Anna and Eva are my colleagues at DW. Read the paper.

The demand-side shock — Shuwei Fang in The Economist

David Caswell, posting on LinkedIn, called this article an account of "the largest expansion in the market for information in almost 600 years." Shuwei Fang (Harvard Kennedy School / Shorenstein Center) argues the AI conversation has been almost entirely about supply — content scraped, journalism displaced, IP captured. The demand side is the missing half: AI is creating a new category of audience (machines asking on behalf of humans) and surfacing demand from people who could not previously articulate what they needed. The asset that matters in this market is not clicks or time spent. It is the demand signal itself, owned by whoever holds the user relationship. Pairs directly with this issue's Miso.ai piece. Read it in The Economist (paywall).

ON THE CALENDAR

Data Makers Fest · 4–6 May 2026 · Porto · datamakersfest.com A practitioner festival for data and AI work — three days, around 1,000 attendees, focused on real implementations rather than vendor pitches.

EBU HORIZONS · 5–6 May 2026 · Geneva · ebu.ch European public-service broadcasters on AI and distribution.

re:publica · 18–20 May 2026 · Berlin · re-publica.com Europe's largest digital society festival. Useful for tracking where the wider digital culture conversation is heading.

BEFORE YOU LEAVE

Dual-format publishing, also called Generative Engine Optimization (GEO). What it means: content is no longer written for one audience. It is written for human readers and for AI systems that read on their behalf — answer engines, briefings, agents. This is not as easy to organise as it sounds. What do you tell your journalists, or your in-house AI system, about which version to produce? Best guess: the human version will be shorter, avoiding TL;DR. The AI version will be longer, more detailed, highly structured — not pleasant to read, but easy for data-hungry systems to parse and share.

ABOUT & DISCLOSURE

I am Mirko Lorenz. I work on language technology projects at Deutsche Welle in Germany. Three projects I am involved in as innovation manager — you will hear about all of them here:

plain X — media localisation platform (DW Innovation / Priberam). plainx.com
ChatEurope — AI chatbot network for 15 European news partners. chateurope.eu
MOSAIC — EU DIGITAL EUROPE-funded multilingual media infrastructure. mosaic-media.eu

I cover all three honestly — including when competitors do something better or when our approach has limits.

AI use: I use Claude (Anthropic) to research and edit this newsletter. Responsibility for stated facts, names, and links is entirely mine.

See you next week!

babylon-newsletter.com · Weekly updates

7,000 languages. AI works for 20.