The digital locks to make AI botflies pay
Technology is responding to the era of content harvesting and AI companies will have to pay for their base materials.
With the end of the internet's age of relative honour over content use coming to a close, and the rules of copyright and fair use in flux and question, placing our content under digital lock and key to prevent unwanted crawling by mosquito-like parasitic bots sucking up AI training data for free is now the order of the day.
We've written about the potential future of content marketplaces recently, and even as we discover work on the technical protocols to make such a thing possible is being done in one place, we find out it's being done in several others. The requirement for protection is meeting solutions, regardless of what is occurring in the legal sphere.
A couple of new examples of this effort have sailed into our field of view in the past few days. The Really Simple Licensing (RSL) standard is a very promising project that already has outfits such as Reddit, Quora and Yahoo onboard.
"The RSL Standard gives publishers and platforms a clear, scalable way to set licensing terms in the AI era," said Steve Huffman, CEO of Reddit in an RSL press release. "The RSL Collective offers a path to do it together. Reddit supports both as important steps toward protecting the open web and the communities that make it thrive."
One of the key personnel behind RSL is Eckart Walther, co-creator of the RSS syndication standard. If adoption is a true indicator of success, he clearly has a solid track record.
RSL is designed to be added to the robots.txt protocol, with machine-readable licensing and royalty terms essentially setting out the terms and conditions for the website content it sits in front of.
The organisation themselves do a better job of explaining it with brevity, so here you go:
RSL is an open, XML-based document format for defining machine-readable licensing terms for digital assets, including websites, web pages, books, videos, images, and proprietary datasets. It enables publishers, authors, and application developers to:
Define licensing and compensation terms, including free, pay-per-crawl, and pay-per-inference, to use digital assets for AI training, web search, and other applications
Create public, standardized catalogs and licensing terms for digital assets
Enable clients to automate licensing and paying for legal access to digital assets
Define and implement standardised licensing and royalty agreements
Essentially, RSL puts a solid door with terms of access clearly spelled out on it between content and crawler. Importantly, with RSL alone, that door can still be breached by bad actors, however the RSL team is working with Fastly, the content delivery network specialists, to create a technical lock on that door. The lock would only admit AI bots if the licensing conditions in the RSL file had been agreed to.
Arguably, if such a standard as RSL saw proper widespread adoption, then any such breach would become a matter of collective concern for everyone using it, as Doug Leeds, former CEO of IAC Publishing and Ask.com, who now sits on the RSL Publisher Advisory Council told The Verge: "All participants in the collective rights organisation participate in the enforcement of any infringement."
Apes together strong, indeed. The lock might not be needed, or I am being naive?
The other side needs to buy in, of course. By the other side, we mean the cash-engorged strip miners who've been busy "transforming" the sweat of our publishing brows into outrageous promises that investors are still, for the most part, finding attractive.
As Matthew Prince of Cloudflare pointed out recently, you only need someone big to accept they must pay for content, and then everyone remotely reputable will follow suit.
Another interesting solution in the stop-stealing-our-content movement comes from ProRata. The most interesting part of their offering is providing content attribution for AI-produced answers. That is to say, the sources from which the answer is drawn can be identified. They have built a proof-of-concept answer engine of their own, https://chat.gist.ai/, and are working with a number of larger publishers in order to bring their technology to bear and help restore some order to the pirates' playground enjoyed currently by AI bots.
Bit by bit, solutions to the situation we have found ourselves in are becoming known and becoming usable. Make no mistake, this is simply a restoration of the natural order of things, whereby ownership and creative rights mean something again and judgement is not based on some Will-o'-the-Wisp promises about AGI, and never mind what gets wrecked on the illusive path to it.
Speed up the pace of product development and new launches with Glide Go, a pre-configured deployment of Glide CMS paired with a full-featured website hosted and managed by Glide.
Shift your focus to content and revenue while we manage the rest.
Request a demo to see Glide Go in action.
Reddit Pro levels up: new tools and smarter reading
Reddit just rolled out a fresh batch of free tools for publishers via Reddit Pro, letting them spy on how their articles perform, sync RSS feeds, and get some AI hints on where to grow their stories in the sprawling Reddit jungle. After a sneaky test done with names such as The Atlantic and NBC News, Reddit is now opening the beta doors to more publishers. They haven't forgotten about the free users either, smoothing the news-reading ride and removing awkward toggling between reading and commenting. And as a bonus, no more subreddit member counts, now we have a sleek seven-day visitor tally to show who's really dropping by.
Read
No more Mr. Nice Publisher
AI bots are taking everything they can, and publishers are done playing nice. A new study from ImmuniWeb shows that media and academic institutions are leading the charge in blocking AI crawlers, with 83% of leading newspapers and 74% of academic journals closing the doors. Now we have over 250 lawsuits pending in the US alone, where tech giants are accused of scraping, copyright infringement, and pirated data. The options are more than clear: either deploy bot-blocking defences or prepare for a long and expensive date with the legal system.
Read
Judge puts Anthropic's settlement on ice
Judge William Alsup has stayed Anthropic's record-breaking $1.5 billion piracy settlement with writers, criticising it for leaving key details unresolved, such as the exact list of authors and works involved, how claims would be handled as well as how class action members would be notified. The case even has led one writer to recommend that the US invoke the Defense Production Act to end such copyright lawfare. Judge Alsup has pushed back and demanded clearer terms and protections to prevent future lawsuits before giving it the green light. Poorly defined settlements make for bad law.
Read
PR call or robocall?
Press Gazette continues its digging into dubious sources and mystery citations, and looks at the claims of PR firms turning to the AI well for their stories.
Read
Slicing up the AI saucisson
Some French publishers are handing journalists a cut of AI licensing cash thanks to "neighbouring rights" laws, think of it as the government telling publishers "sharing is caring". Big names such as Le Monde are now splitting up to 25% of their AI earnings with newsroom unions. NiemanLab asks if such arrangements could work elsewhere, such as in the US. We won't hold our breath on that one. Still, it's illustrative of the developing fight over the AI pie.
Read
News Corp and AI: no rushing, just results
News Corp Australia has been cautious in its adoption of AI in the newsroom. Six key pillars have been established by a specially-tasked ethics board in its use, six pillars that any and all AI utilisation must adhere to in order to be viable to the business. We particularly like "You hit publish, the AI does not."
Read
Meta's VR meltdown
Whistleblowers have yet again pulled back the curtain on Meta, this time exposing how the company allegedly muzzled its own safety research, particularly hinting at their VR and AR playgrounds which might be harming kids. Insiders say that Meta hit the brakes on youth safety studies, scrubbing data, silencing questions, and rerouting research. The end goal was to avoid leaving a paper trail which might lead to lawsuits, or accountability. American senators are not amused, and with the Kids Online Safety Act approaching in the US, Meta is trying to use the same old excuses: selective leaks, false narratives, and the good, old "we did nothing wrong".
Read
Guess less, doubt more
OpenAI wants to shed some light on why GPT-5 still confidently makes things up, and they are saying it's partly because the model was trained on polished language without ever being told what's true and what isn't. But the real problem, according to them, are the tests. The current evaluations reward lucky guesses but don't penalise confident nonsense. OpenAI now wants to change the rules: punish bold wrong answers and reward honest uncertainty.
Read
Warner Bros. takes aim at Midjourney
Warner Bros. Discovery just dropped a copyright bomb on Midjourney, who allowed their users to whip up everyone's favourite Batman, Scooby Doo, and Bugs Bunny without so much as a licensing nod. Following in Universal's and Disney’s footsteps, Warner Bros. is accusing Midjourney of ignoring copyright laws, while Midjourney is just choosing to ignore it all and continue churning out AI art like nothing's happening, for now at least. However, with every new copyright battle unfolding, the AI playground is getting less and less fun.
Read
The news traffic drought
Google's AI Overviews are quietly cutting into the traffic of news sites, with some publishers seeing click-throughs plunge by up to 89%. It turns out that users are content with letting AI Overviews do all the heavy lifting, skipping the click and leaving publishers to scramble for revenue. In a predictable "uh-oh" moment, media executives are warning that Google's new AI Mode could crank this even further, creating a full-blown crisis. Publishers are gearing up with lawsuits as well as fresh tactics - think better content and chasing audiences on other platforms - while Google insists that it still sends billions of clicks their way and claims their engagement is off the charts.
Read
Great stuff. Reciprocal - Subscribe here when time allows https://open.substack.com/pub/keithnewman/chat?r=3d4ef&utm_medium=ios&utm_source=share