Summary of The Battle Over Books3 Could Change AI Forever

Summary The Battle Over Books3 Could Change AI Forever | WIRED www.wired.com

3,618 words - html page - View html page

One Line

Copyright activists are demanding the elimination of Books3, a widely utilized AI training dataset, by major corporations.

Slides

Slide Presentation (11 slides)

Copy slides outline Copy embed code Download as Word

The Battle Over Books3 Could Change AI Forever

Source: www.wired.com - html - 3,618 words - view

Introduction

• Copyright activists are demanding the elimination of Books3, a widely utilized AI training dataset, by major corporations.

Books3 - A Controversial Generative AI Training Set

• Created by independent AI researcher Shawn Presser

• Includes around 196,000 books

• Used by big companies like Meta and Bloomberg

Copyright Activists' Efforts

• The Rights Alliance is working to remove Books3 from the internet

• Some progress has been made in taking it down from certain platforms

• The goal is to protect the rights of creators and their work

Balancing Rights and Access to Information

• The debate raises questions about the balance between creators' rights and access to information

• Copyright law exists to protect creators, but AI training on copyrighted material challenges this balance

• The future of AI and copyright is being shaped by this battle

Impact on Big Corporations vs. Smaller Companies and Researchers

• Cracking down on data sets like Books3 may benefit big corporations that have already used it

• It may hinder smaller companies and researchers from entering the field of generative AI

• The lack of clarity in the law benefits the biggest players

The Need for Transparency

• AI companies like Meta are becoming less transparent about their training sets

• Increased scrutiny on data sets makes it harder for creators to know when their work is being used

• Data transparency regulations may change the landscape of AI training

Opt-In Model for Generative AI Training

• There is a movement to shift generative AI training into an opt-in model

• Only work in the public domain or freely given would be included in data sets

• This would upend the current playing field and give more control to creators

Lawsuits and Legal Uncertainty

• Lawsuits have been filed against companies like Meta and OpenAI for copyright infringement

• Legal experts are uncertain about the outcome of these cases

• The pirated origins of data sets like Books3 may or may not be relevant to the issue of fair use

The Future of AI and Copyright

• The battle over Books3 is about what the balance between copyright and AI should look like

• Stephen King believes generative AI training on copyrighted material is inevitable

• Copyright lawyers and activists continue to fight for control and protection for creators

Conclusion

• The Battle Over Books3 has far-reaching implications for the future of AI and copyright

• It raises important questions about the rights of creators and access to information

• The outcome of this battle will shape the AI industry and who controls it

Key Points

The Battle Over Books3 is a controversial generative AI training set that is at the center of a copyright dispute.
The data set, created by independent AI researcher Shawn Presser, includes around 196,000 books and has been used by big companies like Meta and Bloomberg.
Copyright activists, such as The Rights Alliance, are working to remove Books3 from the internet and have made some progress in taking it down from certain platforms.
The debate surrounding Books3 raises questions about the balance between the rights of creators and the access to information in the age of AI.
Some argue that cracking down on data sets like Books3 may benefit big corporations and hinder smaller companies and researchers from entering the field of generative AI.

Summaries

18 word summary

61 word summary

Controversy surrounds the popular AI training set, Books3, as copyright activists advocate for its removal. Created by researcher Shawn Presser, it has been used by major companies like Meta and Bloomberg. Critics argue that using copyrighted material in AI training sets disregards artists' rights. The outcome will impact the AI industry's future and the balance between creators' rights and information access.

129 word summary

The Battle Over Books3, a popular AI training set, is stirring controversy as copyright activists push for its removal from the internet. Books3, created by independent researcher Shawn Presser, has been used by major companies like Meta and Bloomberg to train their language models. Critics argue that using copyrighted material in AI training sets disregards artists' rights. Presser reverse-engineered a data set similar to OpenAI's GPT-3 model, suspecting it originated from Library Genesis. Books3, part of Eleuther's data set called The Pile, gained popularity but faced takedown notices from The Rights Alliance. The Authors Guild demands compensation and some writers have filed lawsuits for copyright infringement. The outcome of this battle will determine the future of the AI industry and the balance between creators' rights and access to information.

435 word summary

The Battle Over Books3, a popular generative AI training set, is causing controversy as copyright activists seek to remove it from the internet. Created by independent AI researcher Shawn Presser, Books3 has been utilized by major companies like Meta and Bloomberg to train their language models. Critics argue that using copyrighted material in AI training sets disregards the rights of artists.

Presser and his team reverse-engineered a data set similar to the one used by OpenAI for their GPT-3 model. They suspect that the data set came from an online shadow library called Library Genesis. Presser scraped books from a shadow library called Bibliotik, using a script written by Aaron Swartz, and amassed a collection of 196,000 books, which he named Books3.

Books3 was released online as part of the nonprofit artificial intelligence collective Eleuther's larger data set called The Pile and quickly became popular for training AI models. However, The Rights Alliance, a Danish anti-piracy group, is determined to remove Books3 from the internet. They have filed takedown notices and are pursuing legal action against organizations hosting the data set. They have also contacted companies like Meta and Bloomberg, who have used Books3 to train their models.

The Authors Guild has organized an open letter demanding compensation for the use of copyrighted data sets by generative AI companies. Some writers have even filed lawsuits against companies like Meta for copyright infringement. However, legal experts are uncertain about the outcome of these cases and believe that companies may be able to argue fair use.

The controversy surrounding Books3 raises important questions about the balance between creators' rights and the collective right to access information in the age of AI. Some suggest that generative AI training should shift to an opt-in model, using only works in the public domain or freely given. Efforts are being made to persuade AI companies to respect artists' wishes and provide transparency about their training data sources.

The outcome of this battle could have significant implications for the AI industry, determining who controls the data sets used to train AI models and whether smaller companies and researchers have access to them. It also highlights the need for greater clarity in copyright law and regulations surrounding AI training materials.

Ultimately, the decision about whether generative AI training on copyrighted material is acceptable will shape the future of the industry. Some argue that it is inevitable and can benefit smaller companies and researchers, while others believe it disregards the rights of creators. The resolution of this battle will determine the direction of AI development and the balance between creativity and access to information.

475 word summary

The Battle Over Books3 Could Change AI Forever. Copyright activists are trying to remove a popular generative AI training set called Books3 from the internet. The set was created by independent AI researcher Shawn Presser and has been used by big companies like Meta and Bloomberg to train their language models. However, critics argue that using copyrighted material in AI training sets disregards the rights of artists and should not be allowed.

Presser and his team recreated the GPT-3 model released by OpenAI in 2020. They reverse-engineered a data set similar to one used by OpenAI, suspecting that it came from an online shadow library called Library Genesis. Presser used a script written by Aaron Swartz to scrape books from a shadow library called Bibliotik, amassing a collection of 196,000 books. He named this corpus Books3.

Books3 was released online as part of the nonprofit artificial intelligence collective Eleuther's larger data set called The Pile. It became a popular training data set for AI models. However, the Danish anti-piracy group The Rights Alliance is determined to remove Books3 from the internet. They have filed takedown notices against organizations hosting the data set and are pursuing legal action to block sites that host it. They have also contacted companies like Meta and Bloomberg, which have trained their models using Books3.

The Authors Guild has organized an open letter to generative AI companies using copyrighted data sets, demanding compensation for the use of their writings. Many writers have signed the letter and some have filed lawsuits against companies like Meta for copyright infringement. However, legal experts are uncertain about the outcome of these cases and believe that companies may be able to argue fair use.

The controversy over Books3 raises questions about the balance between creators' rights and the collective right to access information in the age of AI. Some believe that generative AI training should shift to an opt-in model, where only works in the public domain or freely given are used in data sets. Efforts are being made to persuade AI companies to respect artists' wishes and provide transparency about their training data sources.

The outcome of this battle could have significant implications for the AI industry. It could determine who controls the data sets used to train AI models and whether smaller companies and researchers have access to them. The fight over Books3 also highlights the need for greater clarity in copyright law and regulations surrounding AI training materials.

In the end, the decision about whether generative AI training on copyrighted material is acceptable will shape the future of the industry. Some argue that it is inevitable and can benefit smaller companies and researchers, while others believe it disregards the rights of creators. The resolution of this battle will determine the direction of AI development and the balance between creativity and access to information.

Raw indexed text (22,983 chars / 3,618 words / 406 lines)

The

Battle Over Books3 Could Change AI Forever \| WIRED

window.Martech.then((martech) => \{ martech.setConfig(\{ // custom

behavior for header based authentication authHeaders: () => (\{

'Authorization': 'Bearer ' + ( martech.util.getCookie('CN\_token\_id')

\|\| martech.util.getCookie('CN\_userAuth') ) }), });

});

.trc\_related\_container .thumbnail-emblem.top-left \{ top: 10px

!important; left: 10px !important; } /\*

e-split-organic-thumbs-feed-01-c-w-logo *//*

s-split-organic-thumbs-feed-01-c \*/ .organic-thumbs-feed-01-c

.trc\_header\_right\_column \{ background: transparent; height: auto; }

.trc\_related\_container .thumbnail-emblem.top-left \{ top: 10px

!important; left: 10px !important; } /\*

e-split-organic-thumbs-feed-01-c *//*

s-split-organic-thumbs-feed-y-em-delta */.organic-thumbs-feed-y-em-delta

.trc\_header\_left\_column \{ background: transparent; }/*

e-split-organic-thumbs-feed-y-em-delta \*/

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy h5,

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy h6,

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy li,

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy p,

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy a,

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy span,

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy td,

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy #cookie-policy-description {

color: #696969;

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy th {

color: #696969;

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy .ot-sdk-cookie-policy-group {

color: #696969;

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy #cookie-policy-title {

color: #696969;

#ot-sdk-cookie-policy-v2.ot-sdk-cookie-policy table th {

background-color: #F8F8F8;

.ot-floating-button__front{background-image:url('https://cdn.cookielaw.org/logos/static/ot_persistent_cookie_icon.png')}

Open Navigation Menu

Story Saved

To revist this article, visit My Profile, then

View saved stories

Close Alert

The Battle Over Books3 Could Change AI Forever

Story Saved

To revist this article, visit My Profile, then

View saved stories

Close Alert

Backchannel

Business

Culture

Gear

Ideas

Science

Security

Merch

Prime Day

Podcasts

Video

Artificial Intelligence

Climate

Games

Newsletters

Magazine

Events

Wired Insider

Jobs

Coupons

Kate Knibbs

Culture

Sep 4, 2023 6:00 AM

The Battle Over Books3 Could Change AI Forever

Copyright activists are on a mission to wipe a popular generative AI training set from the internet. Success could alter the industryand who controls it.

Facebook

Save Story

To revist this article, visit My Profile, then

View saved stories

Photograph: Henrik Sorensen/Getty Images

Save this story

Save

Save this story

Save

After OpenAI released

GPT-3 in July 2020

, independent artificial intelligence researcher Shawn Presser and a few of his fellow machine-learning enthusiasts set a challenge for themselves: Could they recreate it? We were like,

OK, theres actually not that much standing in the way of us doing this ourselves

, Presser says. So what if OpenAI had deep pockets and a head start?

Legal Person

The Inventor Behind a Rush of AI Copyright Suits Is Trying to Show His Bot Is Sentient

Will Bedingfield

State of the Art

The Andy Warhol Copyright Case That Could Transform Generative AI

Madeline Ashby

The Generative AI Battle Has a Fundamental Flaw

Will Bedingfield

That summer, they pored over papers about GPT-3, strategizing in marathon Discord chats about how to best approximate its training data sets. Presser honed in on the books they needed. Suspecting that one of OpenAIs data sets was sourced from an online shadow library like Library Genesis, which offers a vast repository of pirated text, he decided to reverse-engineer what he saw as a potentially similar corpus.

It was the right moment for Presser to dive into a new project. Unemployed, he struggled with making it to work on time. Hed get dressed, then fall asleep on the couch. Eventually, hed get a narcolepsy diagnosis. At the time, he just felt frustrated. He wanted to contribute to society.

Most Popular

Security

23andMe User Data Stolen in Targeted Attack on Ashkenazi Jews

Lily Hay Newman

Business

Men Overran a Job Fair for Women in Tech

Amanda Hoover

Business

The Las Vegas Sphere Makes Virtual Reality a Full-Body Experience

Steven Levy

Gear

40 Amazingly Addictive Couch Co-Op Games

Simon Hill

Books3 started as a passion project by a Midwestern guy going through a weird time. I poured my soul into the work, he says. He saw it as aligned with the open source movement, a way to democratize access to the kind of data sets OpenAI was already using. Some of his collaborators went on to found the nonprofit artificial intelligence collective

Eleuther

, and Books3 was released as part of Eleuthers larger data set, The Pile. But Presser remains, at core, a bit player on the fringes of the generative AI boom.

Despite his obscurity,

the data set Presser created is now

at the center

of a roiling controversy over the future of artificial intelligence. Books3 swiftly became a popular training data set, and not just among academic researchers and Eleutherbig companies, including

Most Popular

Security

23andMe User Data Stolen in Targeted Attack on Ashkenazi Jews

Lily Hay Newman

Business

Men Overran a Job Fair for Women in Tech

Amanda Hoover

Business

The Las Vegas Sphere Makes Virtual Reality a Full-Body Experience

Steven Levy

Gear

40 Amazingly Addictive Couch Co-Op Games

Simon Hill

While Presser sees Books3 as a contribution to science, others view his data set in a far less flattering light, and see him as sincere but deeply misguided. For critics, Books3 isnt a boon to societyinstead, its emblematic of everything wrong with generative AI, a glaring example of how both the rights and preferences of artists are disregarded and disrespected by the AI industrys main players, and something that straight-up shouldnt exist.

To that point, one small Danish anti-piracy group is on a mission to wipe Books3 from the internet. The Rights Alliance, which represents the interests of creative workers in Denmark, is taking a multifaceted approach to its quest to obliterate Pressers data set. And it is making a surprising amount of progress, especially considering it has only a handful of people working on the project from its Copenhagen headquarters.

Most Popular

Security

23andMe User Data Stolen in Targeted Attack on Ashkenazi Jews

Lily Hay Newman

Business

Men Overran a Job Fair for Women in Tech

Amanda Hoover

Business

The Las Vegas Sphere Makes Virtual Reality a Full-Body Experience

Steven Levy

Gear

40 Amazingly Addictive Couch Co-Op Games

Simon Hill

After spending a week sifting through the data set (tedious, says Rights Alliance head of content protection and enforcement Thomas Heldrup, the leader of the crusade), they discovered at least 150 works by authors they represented. Heldrup decided to file Digital Millennium Copyright Act (DMCA) takedown notices against the organizations hosting Books3, including The Eye. These efforts paid off. The Eye did, indeed, take the data set down, as did the research data-sharing site Academic Torrents. This did not permanently remove the data from the internet, of course. But it did make it harder to find.

(It also didnt necessarily change any minds within these organizations. Academic Torrents director Joseph Paul Cohen complied with the takedown notice, but he says he doesnt understand the intentions behind it. The greatest authors have read the books that came before them, so it seems weird that we would expect an AI author to only have read openly licensed works, he says.)

Most Popular

Security

23andMe User Data Stolen in Targeted Attack on Ashkenazi Jews

Lily Hay Newman

Business

Men Overran a Job Fair for Women in Tech

Amanda Hoover

Business

The Las Vegas Sphere Makes Virtual Reality a Full-Body Experience

Steven Levy

Gear

40 Amazingly Addictive Couch Co-Op Games

Simon Hill

Rights Alliance isnt

stopping there. It also wants to block sites that host Books3 through the European court system. And in addition to pursuing the data sets distributors, Rights Alliance has companies that have already trained their language models using Books3 in its sights, and it has contacted both Meta and Bloomberg on the issue. While Meta has not responded, Heldrup says that Bloomberg didand that the company told Rights Alliance it does not plan to train future versions of its BloombergGPT using Books3.

Meanwhile, in the US, the Authors Guild has

organized

an open letter to generative AI companies using copyrighted data sets like Books3. It is only fair that you compensate us for using our writings, without which AI would be banal and extremely limited, the letter states. Its been signed by more than 10,000 writers, many of whom have works that are contained in Books3. The Guild is also discussing a licensed version of The Pile (which includes Books3) with Eleuther. The goal is to ensure that going forward the AI companies only use licensed data sets, Authors Guild CEO Mary Rasenberger says via email.

Most Popular

Security

23andMe User Data Stolen in Targeted Attack on Ashkenazi Jews

Lily Hay Newman

Business

Men Overran a Job Fair for Women in Tech

Amanda Hoover

Business

The Las Vegas Sphere Makes Virtual Reality a Full-Body Experience

Steven Levy

Gear

40 Amazingly Addictive Couch Co-Op Games

Simon Hill

Some of these writers are taking the matter into their own hands. In a high-profile

lawsuit

filed against Meta, comedian Sarah Silverman and other authors

allege

that the company infringed their copyrights by training its set of large language models on Books3. (Silverman and the writers are

also suing

OpenAI in a similar case.)

Matthew Butterick, himself a writer and programmer, is one of the lawyers representing Silverman and the other authors in both lawsuits. Along with his co-counsel Joseph Saveri, Butterick has become one of the go-to plaintiffs lawyers in the nation on cases involving copyright and AI. He sees the widespread practice of training AI on copyrighted data as outrageous, and finds it infuriating that this behavior gets defended with claims that its democratizing access to information. Open source doesnt mean you took a bunch of peoples shit and gave it away for free, he says. That's theft.

Most Popular

Security

23andMe User Data Stolen in Targeted Attack on Ashkenazi Jews

Lily Hay Newman

Business

Men Overran a Job Fair for Women in Tech

Amanda Hoover

Business

The Las Vegas Sphere Makes Virtual Reality a Full-Body Experience

Steven Levy

Gear

40 Amazingly Addictive Couch Co-Op Games

Simon Hill

Many legal experts WIRED has spoken with range from uncertain to

skeptical

that these court cases will succeed. Some believe that companies like Meta may be able to successfully evoke fair use, a doctrine allowing use of copyrighted materials without permission under certain circumstances, to argue that what theyve done is aboveboard. (Several also said they believe that if Presser ever had legal action brought against him, he could also claim fair use.) Its unclear if courts would see the pirated origins of data sets like Books3 as relevant to the issue of fair use.

To draw a parallel, if Sarah Silverman was suing a

human

writer for infringing on the copyright for her memoir

The Bedwetter

say, someone who wrote a suspiciously similar book called

The Bedwetter, Too

how

said writer had originally read her work might not factor into the verdict. Whether the defendant had purchased a signed copy or flagrantly shoplifted a dog-eared paperback wouldnt matter during arguments over whether

The Bedwetter, Too

was a derivative rip-off or a transformative parody. Butterick, for his part, thinks the provenance

can

factor in: It speaks to the intentionality of your conduct.

Most Popular

Security

23andMe User Data Stolen in Targeted Attack on Ashkenazi Jews

Lily Hay Newman

Business

Men Overran a Job Fair for Women in Tech

Amanda Hoover

Business

The Las Vegas Sphere Makes Virtual Reality a Full-Body Experience

Steven Levy

Gear

40 Amazingly Addictive Couch Co-Op Games

Simon Hill

Presser knew people would get upset about Books3. We almost didn't release the data sets at all because of copyright concerns, he says. We thought that there would possibly be some backlash.

Looking back, Presser

admits that he couldve considered the implications a bit more. (Authors, I hear you.) But he still insists that releasing Books3 was the right thing to do. In his eyes, it leveled the playing field for smaller companies, researchers, and ordinary people who wanted to create large language models. He believes people who want to delete Books3 are unintentionally advocating for a generative AI landscape dominated solely by Big Tech-affiliated companies like OpenAI. If you really want to knock Books3 offline, fine. Just go into it with eyes wide open. The world that you're choosing is one where only billion-dollar corporations are able to create these large language models, he says.

Most Popular

Security

23andMe User Data Stolen in Targeted Attack on Ashkenazi Jews

Lily Hay Newman

Business

Men Overran a Job Fair for Women in Tech

Amanda Hoover

Business

The Las Vegas Sphere Makes Virtual Reality a Full-Body Experience

Steven Levy

Gear

40 Amazingly Addictive Couch Co-Op Games

Simon Hill

This is a view shared by many copyright lawyers. If youre OpenAI or Meta, you have the resources to litigate this until the end of time, says Kieran McCarthy, a lawyer specializing in data-scraping issues. A small organization is not going to have the resources to do that. So this lack of clarity in the law right now is benefiting the biggest players.

Butterick disagrees. A lawsuit can stop them, he says. If we prevail.

Science

Your weekly roundup of the best stories on health care, the climate crisis, genetic engineering, robotics, space, and more. Delivered on Wednesdays.

Your email

SUBMIT

By signing up you agree to our

User Agreement

(including the

class action waiver and arbitration provisions

), our

and to receive marketing and account-related emails from WIRED. You can unsubscribe at any time. This site is protected by reCAPTCHA and the Google

and

apply.

One thing everyone WIRED spoke with could agree upon? All this increased scrutiny on data sets has made AIs big players shy away from transparency. Meta is the prime example. It openly shared the data sets used to train the first version of its ChatGPT competitor

Llama

, including Books3. Now, its tight-lipped about what is used for newer versions. It behooves these companies to be opaque about their sources, McCarthy says. Knowing theyre likely to face lawsuits if they fess up to using copyrighted material in their data training sets is a powerful deterrent. This, in turn, will make it harder for writers to know when their copyright is potentially infringed.

Most Popular

Security

23andMe User Data Stolen in Targeted Attack on Ashkenazi Jews

Lily Hay Newman

Business

Men Overran a Job Fair for Women in Tech

Amanda Hoover

Business

The Las Vegas Sphere Makes Virtual Reality a Full-Body Experience

Steven Levy

Gear

40 Amazingly Addictive Couch Co-Op Games

Simon Hill

Right now, its

up to AI companies whether or not to disclose where their training sets come from. Without that information, its next to impossible for people to prove that their data was used, let alone ask for it to be removed. While the European Parliament has passed a draft law of AI regulations that would require increased data transparency, those regulations are not yet in effect, and other regions lag far behind.

This fight cuts to the heart of the often vicious disagreements about what role AI should have in our world. Copyright law exists to balance the rights granted to creators with the collective right to access information, at least in theory. The battle over Books3 is about what this balance should look like in the age of AI.

Most Popular

Security

23andMe User Data Stolen in Targeted Attack on Ashkenazi Jews

Lily Hay Newman

Business

Men Overran a Job Fair for Women in Tech

Amanda Hoover

Business

The Las Vegas Sphere Makes Virtual Reality a Full-Body Experience

Steven Levy

Gear

40 Amazingly Addictive Couch Co-Op Games

Simon Hill

Presser believes that if OpenAI has access to this kind of data set, the public deserves access to them too. From this perspective, attempts to crack down on Books3 may end up calcifying the industry, preventing smaller companies and researchers from entering without doing much to stop the current big players.

Copyright law exists to balance the rights granted to creators with the collective right to access information, at least in theory. The battle over Books3 is about what this balance should look like in the age of AI.

Pam Samuelson, a copyright lawyer who co-directs the Berkeley Center for Law and Technology, concurs that a crackdown might benefit big corporations that have already been using the data sets. You cant do it retroactively, she says. She also thinks regulations may change the landscape of where big players congregate. Countries like Israel and Japan have already adopted lax stances on AI training materials, so tighter rules in the EU or US may promote what she calls innovation arbitrage, where AI entrepreneurs flock to the nations friendlier to their ideas.

The heart of this fight boils down to whether we accept that generative AI training on copyrighted material is an inevitability. This is the stance Stephen King recently took after finding out that his work is in Books3. Would I forbid the teaching (if that is the word) of my stories to computers? Not even if I could. I might as well be King Canute, forbidding the tide to come in. Or a Luddite trying to stop industrial progress by hammering a steam loom to pieces, he

wrote

Most Popular

Security

23andMe User Data Stolen in Targeted Attack on Ashkenazi Jews

Lily Hay Newman

Business

Men Overran a Job Fair for Women in Tech

Amanda Hoover

Business

The Las Vegas Sphere Makes Virtual Reality a Full-Body Experience

Steven Levy

Gear

40 Amazingly Addictive Couch Co-Op Games

Simon Hill

Idealists who want to wrest back control for creators, like Butterick and Hedrup, arent yet willing to give up the fight. Theres a movement to make generative AI training shift into an opt-in model, where only work that is in the public domain or freely given goes into the data sets. It doesn't have to just be about scraping data sets off the web without permission, emerging technology researcher Eryk Salvaggio says. If AI companies are pushed to scrap the work theyve made on copyrighted materials and begin anew, it would certainly upend the current playing field. (Less certain? Whether its remotely possible.)

In the meantime, there are already stopgap efforts to persuade generative AI groups to respect the wishes of people who wish to keep their work out of data sets. Spawning, a startup devoted to this type of tool, has a search engine called Have I Been Trained? that currently allows people to check if their visual work has been used in AI training data sets; it is planning to add support for video, audio, and text next year. It also offers an API that helps companies honor opt-outs. So far, StabilityAI is one of the major players to adopt it, although Spawning CEO Jordan Meyer is optimistic that companies like OpenAI and Meta might one day get on board. And Meyer recently made contact with another potential collaborator: Shawn Presser.

After everything, Presser does want to help creative types feel they have some control over where their work ends up. I think it's totally reasonable for people to be able to say, Hey, don't use my stuff, he says. That's like a basic sort of tenet of the internet.

Get More From WIRED

Get the best stories from

WIREDs iconic archive

in your inbox

Sundar Pichai on Googles AI

, Microsofts AI, OpenAI, and did we mention AI?

AI-powered thought decoders

wont just read your mindtheyll change it

Scientists say youre

looking for aliens all wrong

What the fuck was this?: Behind the

1984

Dune

promotional tour

How to build the

Lego collection of your dreams

See if you take a shine to our picks for the best

sunglasses

and

sun protection

Kate Knibbs

is a senior writer at WIRED, covering culture. She was previously a writer at The Ringer and Gizmodo.

Senior Writer

Topics

Books

artificial intelligence

OpenAI

ChatGPT

Intellectual Property

machine learning

WIRED is where tomorrow is realized. It is the essential source of information and ideas that make sense of a world in constant transformation. The WIRED conversation illuminates how technology is changing every aspect of our livesfrom culture to business, science to design. The breakthroughs and innovations that we uncover lead to new ways of thinking, new connections, and new industries.

Facebook

YouTube

Instagram

Tiktok