Who owns a trained machine learning model and its output?

By Dérick Swart on 5 January 2022
  Back

In basic, non-technical terms, a machine learning ("ML") algorithm can be used to process (i.e. "train" on) sample data to derive from it a model that can recognise familiar patterns in new data.  This ability can be used to have the ML model help make decisions, predictions or provide insights which traditional algorithms would never be able to do.  

This note explores the ownership of trained ML models and their output in South African law.

Approach and caveats

This is a fascinating legal topic and this note serves to unpack the issues at a high level on a first pass.  

Our intellectual property ("IP") legislation almost entirely predates the era of artificial intelligence and my intent here is not to get into the academic debates that are currently ongoing in this regard around issues of authorship and ownership of IP created by non-human authors.        

Also, machine learning is a broad and rapidly expanding field of data science.  This note deals with a typical scenario, which will not fit all cases.  

The various forms of intellectual property typically involved and their default ownership

1. Conceptual approach

Identifying the problem to be solved and the manner in which it can likely be achieved using data science could well have value and could constitute proprietary information.  This could originate from approaches developed by a data scientist, or a customer posing a very specific thesis, or the two working in collaboration.  

The main IP entitlement involved here would likely fall in the domain of know-how and/or trade secrets, which is not governed by statute for the largest part in South African law, but rather the common law.  

It would be a factual question who originated the know-how or trade secrets in this regard.  The creator(s) would be awarded with a proprietary claim to such IP, provided it is of commercial value and not in the public domain.

2. Machine learning algorithm 

Once an approach has been decided on, a ML algorithm has to be developed or deployed.

The Copyright Act of 1978 defines a "computer program" as "a set of instructions fixed or stored in any manner and which, when used directly or indirectly in a computer, directs its operation to bring about a result". 
 
This sounds like an algorithm to me and for purposes of this note, I am going to accept that an algorithm can accordingly constitute a work protectable at copyright law (although technically minded people will cringe, since the concepts of "computer program" and "algorithm" are vastly different in the real world).

Ownership of a "computer program" is by default vested in "the person who exercised control over the making of the computer program".   The interpretation of this phrase has enjoyed much time in our courts and an analysis of the current legal position falls outside the scope of this note, except to say that it is a substantive (and not only technical) test to establish whether the developer or customer exercised control, or both of them!

The Patents Act of 1978 provides that a "program for a computer" or "mathematical method" is not patentable.  Typically this is interpreted to include a prohibition on patenting "algorithms", except where they are part of achieving a so-called "technical result", but that also falls outside the scope of this note.  

Thus, I conclude for current purposes that the algorithm is likely a work protected in terms of copyright law and the author thereof is the person that exercised control over the making thereof (the first owner can of course be different to the author, for instance when dealing with employees).

3. Training data

Training a ML model can be done in a supervised manner (the input and expected output is provided for correlation) or an unsupervised manner (only the input data is provided).  

A compilation of data (i.e. a dataset) is a literary work in own right in terms of the Copyright Act and as such generally protected against copying and adaptation.  The author of such a dataset is "the person who first makes or creates the work".  

The data contained in the dataset can itself have separate authorship and ownership (aside from the compilation).  By default, in respect of text data, the author would be similar to that set out above in relations to literary works, but in respect of photographs, it would be "the person who is responsible for the composition of the photograph".

4. Trained ML model

The result of the training process is a trained ML model, which has derived from the dataset patterns that can be recognised in new data.  The trained ML model would typically not include recognisable copies of the original data.  Rather, it would contain information at an aggregated level such as correlations, covariance, etc; where reverse engineering to identify individual data is highly improbable.

The trained ML model would, similar to the ML algorithm, probably constitute a "computer program" in terms of the Copyright Act.  

5. Output

Here things become a little trickier.  

Can a non-human qualify as an author (in respect of copyright) or inventor (in respect of patents)?  Since copyrights and patents are creatures of statute, there can be no entitlement to these forms of IP outside of the (dated) legislation.  

A detailed analysis of this topic falls outside the scope of this note, but I will make the following comments:

  • The Copyright Act provides that the author of a literary, dramatic, musical or artistic work or computer program which is computer-generated, means the person by whom the arrangements necessary for the creation of the work were undertaken.  This is at least helpful where a non-human author does not work fully autonomously.  
  • The South African patent office has recently accepted a patent application where the inventor was cited as an artificial intelligent machine.  This does appear at odds with international precedent though.  
  • It is entirely possible that the output from a ML model cannot qualify for statutory intellectual property and in that case, contractual language should govern the ownership and usage of the proprietary rights that may vest in the output. 
Remarks

Use of training data

The use of training data would of course always have to be lawful, which essentially means it has to be used with the permission of the owner thereof.  For instance, a developer cannot hold on to training data that is owned by the customer to retrain its ML models, unless the developer has secured a licence to do so.  

Training data may include personal information as set out in the Protection of Personal Information Act.  Where personal information is involved, the processing thereof would need to be lawful.  Given the very broad definition of "processing" in the legislation, training a ML model on training data that contains personal information will likely fall within the ambit of the act.

Importance of clear agreements

What is clear from the above is that we are dealing with a complex bundle of rights when dealing with the authorship and ownership of ML models.   Before you can attempt to draft simple and elegant language that addresses ownership and licensing of ML models in agreements, a proper understanding from both a technical and legal perspective is required. 

The issues raised here can for the largest part be governed between the parties to a contract.  These terms must clearly set out ownership and rights of usage.  This should include legally effective assignment and licensing language to the extent necessary.  

It is important to know when we potentially move beyond the bounds of statutory intellectual property, because in that case effective contract language is key to legislating the rights and entitlements of parties to a contract.

Acknowledgement

My thanks go to Professor Kanshukan Rajaratnam (Director of the School for Data Science and Computational Thinking at Stellenbosch University) for his technical input on this blog post.











Back to top

Please note that our blog posts are informal commentaries on developments in the law as at the time of publication and not legal advice. You should place no reliance on our blog posts; we look forward to discussing your particular matter with you.