In Nov. 2022, a lawyer and the legal team at Joseph Saveri Law Firm teamed up and filed a proposed class action lawsuit against OpenAI’s Copilot tool that generates code within a programmer’s code editor (The Verge). Copilot is trained to reproduce the publicly available code from GitHub, a code hosting platform.
This lawsuit targets Microsoft, GitHub, and OpenAI and accuses the platforms of relying on software piracy. Both Microsoft and GitHub have defended themselves with the claim that the lawsuit has two defects: lack of injury and lack of an otherwise viable claim. Similarly, OpenAI says that the accusers “fail to plead violation of cognizable legal rights.” These companies argue that the accusers make their claims based on hypothetical events, noting that they do not specifically describe the harm inflicted upon them.
Some say that this case is the first class-action lawsuit that challenges the training and output of AI and will not be the last. A man named Tim Davis has accused GitHub’s Copilot of copying his own code. Using a picture of his own code and some of GitHub’s code, he shows that they are the same.
In response to this, people have begun to question the probability of other people using his code without GitHub Copilot. They make the case that this is not exclusive to Copilot and is just a problem with AI and open-source software in general. Others also argue that US copyright laws do not protect ideas or concepts, so claiming copyright in the description of code does not protect the creator’s ideas (The Verge).
In addition to the case of AI copyright against Microsoft, GitHub, and OpenAI, there have been many discussions and debates over the legality of generative AI systems as a whole. Most people question the way these AI systems are trained and whether it proves they are guilty of copyright infringement. Most softwares work by “identifying and replacing patterns in data,” meaning that these systems create data from huge amounts of content on the web from a variety of domains, whether it’s text, code, or imagery (The Verge).
In deciding if something is in fair use, more aspects must be considered, including if the software changes the nature of the material in some way—also known as “transformative” use—or if it is guilty of copyright and threatens to compete with original content. Because there are so many different scenarios in which these elements are balanced differently, it is hard to say if all AI generators are guilty of copyright infringement
Despite legal challenges, Microsoft has planned to give billions of dollars to continue partnering with OpenAI. It is also rumored that they will soon bring AI technology to their other programs including Word, PowerPoint, and Outlook. However, the result of the lawsuit may hinder these advancements.
As a result of the legal troubles, further growth in the development of AI resources designed specifically to avoid accusations of copyright infringement is being made. For example, a dataset called “The Stack” within the company BigCode only includes code that has been approved by their developers and allows them an easy way to remove their data upon request. A model like this could possibly be used throughout the industry in the future (BigCode).
With AI still evolving, their lawsuit for copyright infringement continues to change. Although it’s hard to predict how it will affect the further development of AI resources, the lawsuit could set a precedent for the entire field of generative AI.
https://www.bigcode-project.org/docs/about/the-stack/
https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data