Friday, June 23, 2017

Bursting the Big Data Bubble....with Theory

There was an article in the June 2017 printing of Significance titled Bursting the big data bubble. Unfortunately I don't have paid access but here is the teaser:

"In the financial world, big data is hailed as a potential game changer for predicting stock market performance. But without adequate safeguards, big data analyses may result in spurious correlations, misguided predictions and disappointing returns."

You might not have to know anything about big data to know that building models, developing strategies, or coding algorithms to successfully predict stock returns (at least well enough to consistently earn above average returns in a portfolio long term) is a steep uphill climb against a mountain of economic theory.

Not being a financial economist I'll speak broadly and provide references with more detail below. But, according to the theory of rational expectations and efficient market hypothesis, all unexploited profit opportunities are eliminated because prices reflect all publicly available information. Prices follow a random walk and for all practical purposes are not predictable. Even when prices diverge from fundamental values, according to the theory, the divergence can't be predicted.

One exception is inside information. A trader with insider information (i.e. publicly unavailable, read illegal) would have an edge and could act on it and make profitable trades. I do wonder, however, could a firm have an edge if they developed a proprietary algorithm that makes *better* use of public information? Is novel interpretation of public information the next best thing to insider information?

I'm not sure. Definitely this may have had a chance early on for some quant funds. However, I still think in the long run other firms could replicate the strategy, eliminating unexploited profit opportunities. The citizen data scientist with a good understanding of statistics and willingness to crack a book can learn to implement the same advanced algorithms using open source packages (via R and Python) as someone with 2 PhDs who may have been hired a few years back working for a quant fund coding the algorithms from scratch.  

I think this is echoed somewhat in a recent Chat with Traders podcast with Matthew Hoyle when he discussed the fact that strategies have a short shelf life-what is valuable is the ability and energy to look at new and interesting things and put it all together with a sense of business development and desire to explore.

Fong, W. M. (2017), Bursting the big data bubble. Significance, 14: 20–23. doi:10.1111/j.1740-9713.2017.01035.x

See also:

Masters in Business with Barry Ritholtz Guest: Andrew Lo of MIT

In Praise of the Citizen Data Scientist

Efficient Capital Markets: A Review of Theory and Empirical Work. Eugene F. Fama
The Journal of Finance. Vol. 25, No. 2, Papers and Proceedings of the Twenty-Eighth Annual Meeting of the American Finance Association New York, N.Y. December, 28-30, 1969 (May, 1970), pp. 383-417

- a classic paper reviewing work related to efficient capital markets theory.

Hou, Kewei and Xue, Chen and Zhang, Lu, Replicating Anomalies (June 12, 2017). Charles A. Dice Center Working Paper No. 2017-10; Fisher College of Business Working Paper No. 2017-03-010. Available at SSRN: or

- the above reviews some of the market anomolies literature, finding many studies fall short in terms of methodology.


The anomalies literature is infested with widespread p-hacking. We replicate this literature by compiling a large data library with 447 anomalies. With microcaps alleviated via NYSE breakpoints and value-weighted returns, 286 anomalies (64%) including 95 out of 102 liquidity variables (93%) are insignificant at the 5% level. Imposing the t-cutoff of three raises the number of insignificance to 380 (85%). Even for the 161 significant anomalies, their magnitudes are often much lower than originally reported. Among the 161, the q-factor model leaves 115 alphas insignificant (150 with t < 3). In all, capital markets are more efficient than previously recognized.

No comments:

Post a Comment