Machine Learning Funds and Investment Malpractice

Posted:

9 March 2018

Time to read:

2 Minutes

Author(s):

Marcos López de Prado

More From:

Marcos López de Prado

As a consequence of recent advances in pattern recognition, big data and supercomputing, machine learning (ML) can today accomplish tasks that until recently only expert humans could perform. An area of particular interest is the management of investments, for several reasons. First, some of the most successful hedge funds in history happen to be algorithmic or process-driven. A key advantage of process-driven portfolio managers is that their decisions are objective and can be improved over time. A second advantage is that processes can be automated, leading to substantial economies of scale. A third advantage is that process-driven investments address the all-important concern of conflict of interests so pervasive among financial institutions.

The next wave of financial automation does not only involve following rules, but more importantly, making judgment calls (see this book for numerous examples). As emotional beings, subject to fears, hopes and agendas, humans are not particularly good at making fact-based decisions, particularly when those decisions involve conflicts of interest. In these situations, investors are better served when a machine makes the calls, based on facts learned from hard data. This not only applies to investment strategy development, but to virtually every area of financial advice: granting a loan, rating a bond, classifying a company, recruiting talent, predicting earnings, forecasting inflation, etc. Furthermore, machines will comply with the law, always, when programmed to do so. Customers' interests are protected by the same process across the entire clientele. If a dubious decision is made, investors can go back to the logs and understand exactly what happened. From a legal standpoint, it is much easier to improve an algorithmic investment process than one relying entirely on humans.

At the same time, algorithmic investments present their own set of legal challenges. In 'The 10 Reasons Most Machine Learning Funds Fail', I have identified some of the main reasons why most ML-based investments fail. Investors should be aware of the specific issues surrounding ML-driven investments, so that they can make informed decisions, and hold those investment managers accountable. In particular, ML investment managers may be liable for gross negligence or malpractice when they engage in practices that are known to be scientifically wrong or unethical.

For example, one of the most pervasive mistakes in financial research is to take some data, run it through an ML algorithm, backtest the predictions, and repeat the sequence until a nice-looking backtest shows up. Academic journals are filled with such pseudo-discoveries, and even large hedge funds constantly fall into this trap. It does not matter if the backtest is a walk-forward out-of-sample. The fact that we are repeating a test over and over on the same data will likely lead to a false discovery. This methodological error is so notorious among statisticians that they consider it scientific fraud, and the American Statistical Association warns against it in its ethical guidelines (American Statistical Association [2016], Discussion #4). It typically takes about 20 such iterations to discover a (false) investment strategy subject to the standard significance level (false positive rate) of 5%.

Although there are no laws specifically prohibiting backtest overfitting (yet), investors may have a legal case against this widespread investment malpractice that professional associations of mathematicians have deemed unethical. Such offenders are abusing the public trust earned by bona fide scientists. This is but one example of the reasons why ML funds fail to perform as advertised. As legal analysts and regulators learn more about these unethical or negligent practices, laws and regulations may be passed to finally curtail some of these abuses.

Dr. Marcos López de Prado is Research Fellow, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA. The opinions expressed in this article are the author's and do not necessarily represent the views of the institutions and organizations he is affiliated with.

OBLB categories:

OBLB types:

OBLB keywords: