Big Data and Copyright (Part I)
Copyright interfaces with Big Data in several aspects. From the computer software applied in data collection and processing to the data sets (collections of data), to the outcomes generated via Big Data technologies, we will explore in this article how Big Data can benefit from copyright protection.
According to the Berne Convention, copyright protects literary and artistic works that must first fulfil the “originality” requirement. Depending on the jurisdiction, such works may also have to fulfil the requirement of “fixation” and/or “human intellectual creations”.
An original work, in contrast to copies, reproductions, plagiarism, or derivative works, refers to a work created by the author and reflects the author’s own intellectual creation.
Works, as the object of copyright, are expressions of the author’s certain ideas and emotions. The intangibility of the object is the essential characteristic that distinguishes intellectual property rights from other property rights, as does the object of copyright. However, such intangible objects can usually be fixed in a tangible form.
Article 2.2 of the Berne Convention provides that “it shall, however, be a matter for legislation in the countries of the Union to prescribe that works in general or any specified categories of works shall not be protected unless they have been fixed in some material form.”
Let us take China as an example. Article 2 of Regulation for the Implementation of the Copyright Law defines “works” as “intellectual creations with originality in the literary, artistic or scientific domain, insofar as they can be reproduced in a tangible form”, which puts forward the requirement of “originality” and “fixation”.
Further, the mainstream view of Chinese scholars is that only results of human intellectual activities can be called “creation”.
Software
The TRIPS Agreement recognizes computer software as “literary work” under the Berne Convention.
In China, the protection of software is regulated by the Copyright Law and specific regulations such as Regulations on Computer Software Protection. The protection of software applies to both computer programs and relevant documents, but does not extend to the ideas, processing, operating methods, mathematical concepts, etc. used in software development.
Needless to say, software applied in data collection and processing can receive copyright protection in China if they meet the aforementioned requirements.
Data sets
The databases in the Big Data context are typically unstructured and non-relational (NoSQL). Compared to traditional relational (SQL) databases, which store data in structured tabular form, NoSQL databases are generally table-less, highly flexible and usually come with larger scales.
Structured and relational databases may meet the originality criterion for compilations, which require originality in the selection or arrangement of its content, and thus trigger copyright protection.
NoSQL databases, given their nature, are hardly selected or arranged in a way that sufficiently meets the threshold of originality.
It can be observed that the pursuit of “volume” and “variety” will inevitably deviate from the orientation set by “originally”. For databases that purse data integrity, it is difficult to meet the originality requirement and thus obtain copyright protection.
On the other hand, Big Data tends to rely on cloud computing and involves dynamic data sets, which will be almost impossible to “fix in a tangible form”. Therefore, in jurisdictions like China, such data sets may fail the “fixation” test.
Applications of Big Data
As we dive deeper into this topic, we may touch upon data-driven technologies such as data mining (TDM), machine learning and artificial intelligence (AI).
Big Data resources can usually produce visual outputs generated through data-driven technologies. These final products can be presented in a factual manner with raw data, or as more “creative” outputs through the further implementation of AI technologies.
So, will these final products qualify for copyright protection?
First, since these outputs are visualizations of data processing, they can be expressed in a material form. Thus, it is easy to say that they will meet the “fixation” requirement.
Second, it appears that these outputs will possess originality – either as compilations (outcomes of selection and arrangement of raw data according to an algorithm), or as a work of more creativity (articles, poems, painting, etc.).
That being said, legislators are more cautious regarding whether machine-generated content can be copyright protected, as most jurisdictions require the creative process to involve at least a certain human intervention.
In our next article, we will elaborate on copyright protection for the final products generated by the application of Big Data technologies and the copyright issues in the application of TDM.
Emma Qian
HFG Law&Intellectual Property