intelligenza artificiale – Lorenzo Albertini

Altra azione contro società di A. I., basata su diritto di autore: Concord Music, Universal Music e altri c. Anthropic PBC

Tramite il modello AI chiamato Claude2, Anthropic violerebbe il copyright di molte canzoni (della loro parte letterariA) . Così la citazione in giudizio da parte di molti produttori (tra i maggiori al mondo, parrebbe).

Ne dà notizia The Verge oggi 19 ottobre (articolo di Emilia David), ove trovi pure il link all’atto introduttivo di citazione in giudizio.

Riposto solo i passi sul come fuinziona il traininig e l’output di Claude2 e poi dove stia la vioalzione.

<<6 . Anthropic is in the business of developing, operating, selling, and licensing AI technologies. Its primary product is a series of AI models referred to as “Claude.” Anthropic builds its AI models by scraping and ingesting massive amounts of text from the internet and potentially other sources, and then using that vast corpus to train its AI models and generate output based on this copied text. Included in the text that Anthropic copies to fuel its AI models are the lyrics to innumerable musical compositions for which Publishers own or control the copyrights, among countless other copyrighted works harvested from the internet. This copyrighted material is not free for the taking simply because it
can be found on the internet. Anthropic has neither sought nor secured Publishers’ permission to use their valuable copyrighted works in this way. Just as Anthropic does not want its code taken without its authorization, neither do music publishers or any other copyright owners want their works to be exploited without permission.
7.
Anthropic claims to be different from other AI businesses. It calls itself an AI “safety and research” company, and it claims that, by training its AI models using a so-called “constitution,” it ensures that those programs are more “helpful, honest, and harmless.” Yet, despite its purportedly principled approach, Anthropic infringes on copyrights without regard for the law or respect for the creative community whose contributions are the backbone of Anthropic’s infringing service.
8.
As a result of Anthropic’s mass copying and ingestion of Publishers’ song lyrics, Anthropic’s AI models generate identical or nearly identical copies of those lyrics, in clear violation of Publishers’ copyrights. When a user prompts Anthropic’s Claude AI chatbot to provide the lyrics to songs such as “A Change Is Gonna Come,” “God Only Knows,” “What a Wonderful World,” “Gimme Shelter,” “American Pie,” “Sweet Home Alabama,” “Every Breath You Take,” “Life Is a Highway,” “Somewhere Only We Know,” “Halo,” “Moves Like Jagger,” “Uptown Funk,” or any other number of Publishers’ musical compositions, the chatbot will provide responses that contain all or significant portions of those lyrics>>.

<<11. By copying and exploiting Publishers’ lyrics in this manner—both as the input it uses to train its AI models and as the output those AI models generate—Anthropic directly infringes Publishers’ exclusive rights as copyright holders, including the rights of reproduction, preparation of derivative works, distribution, and public display. In addition, because Anthropic unlawfully enables, encourages, and profits from massive copyright infringement by its users, it is secondarily liable for the infringing acts of its users under well-established theories of contributory infringement and vicarious infringement. Moreover, Anthropic’s AI output often omits critical copyright management information regarding these works, in further violation of Publishers’ rights; in this respect, the composers of the song lyrics frequently do not get recognition for being the creators of the works that are being distributed. It is unfathomable for Anthropic to treat itself as exempt from the ethical and legal rules it purports to embrace>>

Come funziona il training di AI:

<<54. Specifically, Anthropic “trains” its Claude AI models how to generate text by taking the following steps:
a. First, Anthropic copies massive amounts of text from the internet and potentially other sources. Anthropic collects this material by “scraping” (or copying or downloading) the text directly from websites and other digital sources and onto Anthropic’s servers, using automated tools, such as bots and web crawlers, and/or by working from collections prepared by third parties, which in turn may have been harvested through web scraping. This vast collection of text forms the input, or “corpus,” upon which the Claude AI model is then trained.
b.   Second, as it deems fit, Anthropic “cleans” the copied text to remove material it perceives as inconsistent with its business model, whether technical or subjective in nature (such as deduplication or removal of offensive language), or for other reasons.
In most instances, this “cleaning” process appears to entirely ignore copyright infringements embodied in the copied text.
c.   Third, Anthropic copies this massive corpus of previously copied text into computer memory and processes this data in multiple ways to train the Claude AI models, or establish the values of billions of parameters that form the model. That includes copying, dividing, and converting the collected text into units known as “tokens,” which are words or parts of words and punctuation, for storage. This process is referred to as “encoding” the text into tokens. For Claude, the average token is about 3.5 characters long.4
d.   Fourth, Anthropic processes the data further as it “finetunes” the Claude AI model and engages in additional “reinforcement learning,” based both on human feedback and AI feedback, all of which may require additional copying of the collected text.
55.   Once this input and training process is complete, Anthropic’s Claude AI models generate output consistent in structure and style with both the text in their training corpora and the reinforcement feedback. When given a prompt, Claude will formulate a response based on its model, which is a product of its pretraining on a large corpus of text and finetuning, including based on reinforcement learning from human feedback. According to Anthropic, “Claude is not a bare language model; it has already been fine-tuned to be a helpful assistant.”5 Claude works with text in the form of tokens during this processing, but the output is ordinary readable text>>.

Violazioni:

<<56.
First, Anthropic engages in the wholesale copying of Publishers’ copyrighted lyrics as part of the initial data ingestion process to formulate the training data used to program its AI models.
57.
Anthropic fuels its AI models with enormous collections of text harvested from the internet. But just because something may be available on the internet does not mean it is free for Anthropic to exploit to its own ends.
58.
For instance, the text corpus upon which Anthropic trained its Claude AI models and upon which these models rely to generate text includes vast amounts of Publishers’ copyrighted lyrics, for which they own or control the exclusive rights.
59.
Anthropic largely conceals the specific sources of the text it uses to train its AI models. Anthropic has stated only that “Claude models are trained on a proprietary mix of publicly available information from the Internet, datasets that we license from third party businesses, and data that our users affirmatively share or that crowd workers provide,” and that the text on which Claude 2 was trained continues through early 2023 and is 90 percent English-language.6 The reason that Anthropic refuses to disclose the materials it has used for training Claude is because it is aware that it is copying copyrighted materials without authorization from the copyright owners.
60.
Anthropic’s limited disclosures make clear that it has relied heavily on datasets (e.g., the “Common Crawl” dataset) that include massive amounts of content from popular lyrics websites such as genius.com, lyrics.com, and azlyrics.com, among other standard large text
collections, to train its AI models.7
61.
Moreover, the fact that Anthropic’s AI models respond to user prompts by generating identical or near-identical copies of Publishers’ copyrighted lyrics makes clear that Anthropic fed the models copies of those lyrics when developing the programs. Anthropic had to first copy these lyrics and process them through its AI models during training, in order for the models to subsequently disseminate copies of the lyrics as output.
62.
Second, Anthropic creates additional unauthorized reproductions of Publishers’ copyrighted lyrics when it cleans, processes, trains with, and/or finetunes the data ingested into its AI models, including when it tokenizes the data. Notably, although Anthropic “cleans” the text it ingests to remove offensive language and filter out other materials that it wishes to exclude from its training corpus, Anthropic has not indicated that it takes any steps to remove copyrighted content.
63.
By copying Publishers’ lyrics without authorization during this ingestion and training process, Anthropic violates Publishers’ copyrights in those works.
64.
Third, Anthropic’s AI models disseminate identical or near-identical copies of a wide range of Publishers’ copyrighted lyrics, in further violation of Publishers’ rights.
65.
Upon accessing Anthropic’s Claude AI models through Anthropic’s commercially available API or via its public website, users can request and obtain through Claude verbatim or near-verbatim copies of lyrics for a wide variety of songs, including copyrighted lyrics owned and controlled by
Publishers. These copies of lyrics are not only substantially but strikingly similar to the original copyrighted works>>

<<70.
Claude’s output is likewise identical or substantially and strikingly similar to Publishers’ copyrighted lyrics for each of the compositions listed in Exhibit A. These works that have been infringed by Anthropic include timeless classics as well as today’s chart-topping hits, spanning a range of musical genres. And this represents just a small fraction of Anthropic’s infringement of Publishers’ works and the works of others, through both the input and output of its AI models.
71.
Anthropic’s Claude is also capable of generating lyrics for new songs that incorporate the lyrics from existing copyrighted songs. In these cases, Claude’s output may include portions of one copyrighted work, alongside portions of other copyrighted works, in a manner that is entirely inconsistent and even inimical to how the songwriter intended them.
72.
Moreover, Anthropic’s Claude also copies and distributes Publishers’ copyrighted lyrics even in instances when it is not asked to do so. Indeed, when Claude is prompted to write a song about a given topic—without any reference to a specific song title, artist, or songwriter—Claude will often respond by generating lyrics that it claims it wrote that, in fact, copy directly from portions of Publishers’ copyrighted lyrics>>.

<<80.
In other words, Anthropic infringes Publishers’ copyrighted lyrics not only in response to specific requests for those lyrics. Rather, once Anthropic copies Publishers’ lyrics as input to train its AI models, those AI models then copy and distribute Publishers’ lyrics as output in response to a wide range of more generic queries related to songs and various other subject matter>>.

L’ Artificial Intelligence Act prosegue la sua strada. Presentata bozza per COREPER in vista del Consiglio UE

Il regolamento sulla intelligenza artificiale prosegue la sua strada.

il 2 novembre 2022 è stata diffusa la bozza preparata per il COREPER in vista dell’esame del Consiglio UE.

Link all’ultima versione -con evidenziazione delle modifiche- ad es. qui tratto da https://artificialintelligenceact.eu/documents/ , ove anche tutti gli altri testi provvuisori in utile ordine cronologico.

L’atto non va confuso con la proposta di regolamento sulla resposnabilità civile per l’intelligenza artificiale decisa il 20 ottobre 2020 dal Parlamento UE , titolata <Risoluzione del Parlamento europeo del 20 ottobre 2020 recante raccomandazioni alla Commissione su un regime di responsabilità civile per l’intelligenza artificiale (2020/2014(INL)>, n° P9_TA(2020)0276 A9-0178/2020 .

Di questo ultimo atto riporto solo la disposizione sulla responsabilità per i sistemi AI non ad alto rischio (è un rovesciamento dell’onere della prova rispetto alla consueta interpretazione del nostro art 2043 cc):

<< Articolo 8 Regime di responsabilità per colpa per gli altri sistemi di IA

1. L’operatore di un sistema di IA che non si configura come un sistema di IA ad alto rischio ai sensi dell’articolo 3, lettera c), e dell’articolo 4, paragrafo 2, e che di conseguenza non sia stato inserito nell’elenco contenuto nell’allegato al presente regolamento, è soggetto a un regime di responsabilità per colpa in caso di eventuali danni o pregiudizi causati da un’attività, dispositivo o processo fisico o virtuale guidato dal sistema di IA.

2. L’operatore non è responsabile se riesce a dimostrare che il danno o il pregiudizio arrecato non è imputabile a sua colpa per uno dei seguenti motivi:

a) il sistema di IA si è attivato senza che l’operatore ne fosse a conoscenza e sono state adottate tutte le misure ragionevoli e necessarie per evitare tale attivazione al di fuori del controllo dell’operatore, o

b) è stata rispettata la dovuta diligenza con lo svolgimento delle seguenti operazioni: selezionando un sistema di IA idoneo al compito e alle competenze, mettendo debitamente in funzione il sistema di IA, monitorando le attività e mantenendo l’affidabilità operativa mediante la periodica installazione di tutti gli aggiornamenti disponibili.

L’operatore non può sottrarsi alla responsabilità sostenendo che il danno o il pregiudizio sia stato cagionato da un’attività, dispositivo o processo autonomo guidato dal suo sistema di IA. L’operatore non è responsabile se il danno o il pregiudizio è dovuto a cause di forza maggiore.

3. Laddove il danno o il pregiudizio sia stato causato da un terzo che abbia interferito con il sistema di IA attraverso la modifica del suo funzionamento o dei suoi effetti, l’operatore è comunque tenuto a corrispondere un risarcimento se tale terzo è irrintracciabile o insolvibile.

4. Il produttore di un sistema di IA è tenuto a cooperare con l’operatore o con la persona interessata, su loro richiesta, e a fornire loro informazioni, nella misura giustificata dall’importanza della pretesa, al fine di consentire l’individuazione delle responsabilità>>.

Ora è pure stata adottata la posizione comune del Consiglio: v. press release 6 dicembre 2022 .

Anche il Regno Unito nega il brevetto all’invenzione generata da intelligenza artificiale (sul caso Thaler-DABUS)

la corte di appello UK , 21.09.2021, caso No: A3/2020/1851, Thaler c. COMPTROLLER GENERAL OF PATENTS TRADE MARKS AND DESIGNS, a maggioranza conferma il rigetto della domanda brevettuale.

La lunga battaglia processuale del dr. Thaler in molti Uffici brevettuali e tribunali, sparsi nel mondo, segna un’altra battuta di arresto (si v. quella di inizio mese in Virginia USA, ricordata nel mio post di ieri).

Secondo i giudici Arnold e Laing, nè DABUS è inventore (deve esserlo un umano) nè Thaler (poi: T.) ha indicato un titolo (derivativo o altro) per essere indicato lui come tale.

J. Birss concorda sul primo punto , ma non sul secondo: secondo lui i) T. ha in buona fede indicato chi secondo lui è l’inventore, § 58, e ii) quale costruttore della macchimna , gli spetta -per accessione, direi- il diritto sull’output della stessa e cioè l’esclusiva brevettuale, § 82 (J. Arnold nega l’invocabilità dell’accession doctrine: § 130 ss).

Quanto ad ii) non mi pronuncio, se non per dire che l’applicazine dell’accession agli intangibles è ammissibile pur se in base al criterio analogico (essendo indubbiamente dettata dal diritto positivo per le res) .

Quanto ad i), c’è un palese errore. La sec. 13.2.a del patent act, laddove dice <<identifying the person or persons whom he believes to be the inventor or inventors>>, intende si l’indicazione di chi secondo il depositante è l’inventore, ma sempre purchè sia persona fisica