[This post was authored by Vishwa Patel, a student at Gujarat National Law University.]
Carl Malamud is an American technologist, archivist and a strong believer of the notion that ‘knowledge should be without any barriers’. Recently, a magazine called Nature published an article titled ‘The plan to mine the world’s research papers’, which attempted at throwing some light on Carl’s project (called ‘Data Depot’) which intends to increase accessibility to information which could be only accessed by paying huge sums of money. Essentially, Carl’s idea is to unlock the scientific literature which has been paywalled.
Carl and his team consisting of Indian researchers, have collected text and images of 73 million articles from different sources over the past year. This collection will be stored in 576 Terabyte (TB) storage facility at Jawaharlal Nehru University (“JNU”). Researchers, will be able to extract particular text through a software which shall help them in their ongoing research. However, at this juncture, it is important to note that the researchers will not be able to download full scientific papers. The computer software will analyse the text of the scientific papers’ database and only mine the information which appears to be relevant to the search query entered by the researcher.
Carl’s idea of Data Depot might be ‘good news’ for researchers for the reasons mentioned below –
- Data Depot shall make a researcher’s job easy as it will enable him/her to extract highly relevant information under the subject of research. The act of going through every line of the available literature is not only tedious but also sometimes impossible due to the presence of a large amount of literature. The computer software shall solve this problem for the researcher.
- The huge cost of accessing research papers and articles which have been blocked by the publishers’ paywall shall be subsequently eliminated, subjected to the availability of the JNU Data Depot.
After getting a brief idea of Carl’s project, readers may be able to draw similarities between it and Sci-Hub, a database which not only enables one to access research papers which are subject to availability on the basis of payment, but also helps to ease one’s research process.
What JNU’s Data Depot Actually/Technically Does?
JNU’s Data Depot uses an automated analytical technique called ‘Text and Data Mining’ (hereinafter referred to as TDM). Neither the Indian Copyright Act, 1957 nor the U.S Copyright Act provide for a definition for the term TDM, however, the European Union’s Directive on Copyright in Digital Single Market defines TDM as “any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations”.
EU’s Directive might sound a bit technical in nature, hence, to simplify: it can be said that TDM is a technique through which a large body of text and data present in digital format is analysed by a computer-system in order to extract the required information.
Legality Of The Depot
There are two different schools of thought – one which believes that the JNU Data Depot does not infringe the existing copyright laws of India, and the other which believes in the contrary. The arguments of the former school can be found here, here and here, while the arguments of the school which believes that the Depot infringes the existing copyright regime can be found here. However, the author is of the view that the Depot can be characterized as infringing the existing copyright laws of the country. The article published in the Nature magazine nowhere specifies the size of the snippets that the TDM software will return to the researcher. If it returns a snippet which just consists of three lines of the original articles then it might be considered to be non-substantial, but what if the snippet is so long that it eventually displays a full page or two pages of the article? What if the snippet consists of a chart which is copyrighted by the publisher? However, certain authors who are of the view that this act of TDM to display snippets would be shielded by the ‘fair dealing’ provision contained in Section 52 of the Indian Copyright Act, 1957. According to them, the Depot is covered under the said provision as – i) It is used for ‘non-commercial’ purpose, and ii) It is for ‘private use’.
However, the question which arises at this point is whether the ‘fair dealing’ provision is unfair to the publisher of journals? Most journals sought to be included by Malamud in the Depot charge for their access, thereby meaning that ‘JNU Data Depot’ is indirectly allowing researchers to access these research papers/articles without any cost (the only difference is that the TDM software is merely extracting and providing the required snippets to the researchers). The argument that the TDM software is only showing snippets is flawed in my opinion as the researcher is still deriving the requisite utility which he/she would have derived if he/she was provided with access to the complete paper/article.
Further, the article published in the Nature magazine state the following:
“Users have to physically visit the facility, and only researchers who want to mine for non-commercial purposes are currently allowed in.”
It is clear from the statement that the risk of the Depot being used for commercial purposes is impending in the future. Additionally, the omnipresent risk of data breach hovers over such storage and access of data. Considering, a situation where the database storing research papers/ articles is breached by some unauthorized person or organization and the data is leaked, deciding the liability for the leak and copyright infringement shall be a difficult task. Such situation may cause major economic loss to the publishers of these journals.
A supplementary but relevant question arises as to whether the articles/ papers collected by Malamud and his team were accessed through a paid subscription of Journals or any other lawful means? The article in Nature magazine is unclear on this point. If in case, there are some articles which were not acquired through paid subscription, but through some unlawful channels then it would result in infringing the rights of the publisher.
Form the above discussion it is evident that the project has a high potential to be termed as infringing copyright of journal publishers and hence, it would not be a wise decision to proceed further with the said project under the shield of the ‘fair dealing’ provision. It should also be noted that the project might be facing some other legal challenges than infringement of intellectual property rights, such as breach of the contractual agreement between the publishers and the university, but such discussions are beyond the scope of this post. After analysing Carl’s project from a multi-dimensional perspective, it can be concluded that the project is violative of the existing copyright regime.