Big Data Is a Big Deal But How Much Data Do We Need?
by Nikos Askitas
(June 2016)
published in: AStA Wirtschafts - und Sozialstatistisches Archiv, 2016, Journal of the German Statistical Society
[journal version]

The more conservative among us believe that "Big Data is a fad that will soon fade out" and they may in fact be partially right. By contrast, others especially those who dispassionately note that digitization is only now beginning to deliver its payload may beg to differ. We argue that all things considered, Big Data will likely cease to exist, although this will happen less because it is a fad and more because all data will eventually be Big Data. In this essay, I pose and discuss the question of "how much data do we really need" since everything in life and hence the returns from data increments ought to obey some kind of law of diminishing returns: the more the better, but at some point the gains are not worth the effort or become negative. Accordingly, I discuss small and large, specific and general examples to shed light on this question. I do not exhaustively explore the answers, rather aiming more towards provoking thought among the reader. The main conclusions, nonetheless, are that depending on the use case both a deficit and an abundance of data may be counterproductive, that individuals, data experts, firms or society have different optimization problems whereby nothing will free us from having to reach decisions concerning how much data is enough data and that the greatest challenges that data-intensive societies will face are positive reinforcement, feedback mechanisms and data endogeneity.
Text: See Discussion Paper No. 9988