Il problema della legittimità dell’uso dei training data per lo sviluppo dell’intelligenza artificiale

Il Trib. del Northern District della California 11 maggio 2023, Case 4:22-cv-06823-JST, Doe1 ed alktri c. Github e altri, decide (per ora) la lite promosa da titolari di software caricato sulla piattafoma Github (di MIcrosoft) contro la stessa e contro OpenAI per uso illegittimo dei loro software (in violazione di leggi e di clausole contrattuali).

La fattispecie -non è difficile pronostico-  diverrà sempre più frequente.-

I fatti:

<<In June 2021, GitHub and OpenAI released Copilot, an AI-based program that can “assist software coders by providing or filling in blocks of code using AI.” Id. ¶ 8. In August 2021, OpenAI released Codex, an AI-based program “which converts natural language into code and is integrated into Copilot.” Id. ¶ 9. Codex is integrated into Copilot: “GitHub Copilot uses the OpenAI Codex to suggest code and entire functions in real-time, right from your editor.” Id. ¶ 47 (quoting GitHub website). GitHub users pay $10 per month or $100 per year for access to Copilot. Id. ¶ 8.
Codex and Copilot employ machine learning, “a subset of AI in which the behavior of the program is derived from studying a corpus of material called training data.” Id. ¶ 2. Using this data, “through a complex probabilistic process, [these programs] predict what the most likely solution to a given prompt a user would input is.” Id. ¶ 79. Codex and Copilot were trained on “billions of lines” of publicly available code, including code from public GitHub repositories. Id. ¶¶ 82-83.
Despite the fact that much of the code in public GitHub repositories is subject to open-source licenses which restrict its use, id. ¶ 20, Codex and Copilot “were not programmed to treat attribution, copyright notices, and license terms as legally essential,” id. ¶ 80. Copilot reproduces licensed code used in training data as output with missing or incorrect attribution, copyright notices, and license terms. Id. ¶¶ 56, 71, 74, 87-89. This violates the open-source licenses of “tens of thousands—possibly millions—of software developers.” Id. ¶ 140. Plaintiffs additionally allege that Defendants improperly used Plaintiffs’ “sensitive personal data” by incorporating the data into Copilot and therefore selling and exposing it to third parties. Id. ¶¶ 225-39>>.

MOlte sono le vioalazioni dedotte e per cio il caso è interessante. Alcune domande sono però al momento rigettate per insufficiente precisazione dell’allegaizone , ma con diritto di modifica.

La causa prosegue: vedremo

(notizia e link alla sentenza da Kieran McCarthy nel blog di Eric Goldman)