Why Anderson is Right about the End of Theory
Anderson is correct in his assessment that big data has replaced the need for theory and models. In the past, models have been an approximation of reality that have their inconsistencies which makes it less reliable of an understanding. In addition, past models lacked the necessary amount of data to make them accurate as can be. The revolution of big data, which has large volume, velocity, variety, resolution, flexibility, and illustrates the whole picture rather than just a snapshot, has subverted the need for models with their innaccuracy. With data science techniques, people now have a direct view into the inner workings of the world using the data itself rather than a model to understand it. Data has become intuitive, speaking volumes for itself rather than needing a sometimes inconsistent translator.
Anderson brought up a multitude of supporting arguments which illustrate that there is no longer a need for models to understand causal links, with correlation now doing the same job. For example, he examined how google translate is able to translate every single language. This isn’t because the system itself actually knows or understands the lagnuage. Google translate wasn’t necessarily taught each individual langauge. It was shown a variety of data and it picked up on the patterns, allowing it to now translate languages without models. Anderson made this point even stronger by mentioning how the system would be able to learn a fake language from Star Trek using this same method. Furthermore, purely using data rather than models have been able to identify purchasing habits and predict future products. One example was from target who was able to predict that a woman was pregnent by recommending diapers before she herself knew she was pregnent. A pure model or theory would not have understood the link between buying chocolates or other miscellanious items from a store and being pregnent, showing data’s unique ability to understand unpredictable links and execute on them.
From this an observer may argue that these are quaint examples that have no true impact on the larger world or on something as important as scientific discovery. They would be wrong, however. Anderson metnioned in his article that data has been used in the scientific world to understand DNA and discover new specieis. This is a field that is most dominatly run by the deductive model of reasoning which cherishes hypothesis for groundbreaking discoveries. This method, however, was proven unnecessary by J. Craig Venter who used high speed computers to bulk analyze DNA sequences, in this case the big data, to understand the genome of a whole population. By doing so, he discovered a new specieis which the scientific community had never seen before. He did all of this not by being in a muddy field for hours on days to discoer something new, or by proposing a theory that this species might be out there, but rather used big data to peer straight through past models and view the data itself. These examples and the ease of use for data show the strongest argument for big data subverting the need for models and hypotheses.
Why Kitchen is Correct about the Need of Theory
Kitchen is correct in his rebuttle of Anderson’s idea that there is no longer a need for theory or hypotheses, a clear depature from the fundamental theory of science which has brought human understanding thus far. With the introduction of big data that has brought new insight, Anderson is too quick to throw out the past model of deduction for pure induction lacking context. This view of science and understanding is unsustainable as it lacks critical context or understanding of the links it presents. The method fails to understand the potential for erranous correlations to appear that have no true merit. Anderson prefers these correlational links that lack strength compared to strong causal links that the commonplace deductive reasoning has so long produced.
Anderson’s viewpoint argues that the data for which they examine is free of the chains of bias and illustrates a clear view of the world as is. This is simply wrong. There is still sampling bias based on how the data collectors decide to do their job. They must decide which variables they want to collect and view which in itself produces bias in a system that is touted as being as pure as can be. Bias has dire consequences on data and leads to unsound conclusions, a real threat of pure empiricist thinking as proposed by Anderson. Moreover, data which lacks context is just noise which allows fake correlations to arise. Since big data is so large with more than millions of entries, there is bound to be poor correlational links which have no merit. That which has an impossible probability is very probable at a large enough sample size. This is the same way that people can believe that the art of war is some how prophetic when analyzing letter sequences when in reality that’s just seeing random links that are there by chance. By purely focusing on data in this same way on larger issues, researchers have a higher likelihood of falling into the same trap but on issues which have a serious impact on acadamia and the larger world.
These points illustrate the ineffectiveness of using purely big data in instances outside of science, including the social sciences and humanities. By using data as is, without context, researchers are prone to make unsound conclusions which make experts of the field roll their eyes. Andreson argues that big data removes the need for context are interpretation. Many leaders of social science disagree. Kitchen talks about how there is a difference between the ability to identify random links and the ability to truly understand those links. That understanding is key, otherwise it is impossible to know if the linkage is due to pure random chance or has actual validity for the wider world. This point goes hand in hand with the notion that human behavior is too complex to predict by pure data. There may be slight patterns in human behavior, but the masses on not controlled by a ruling hand that says their behavior must act in a strict pattern, making data predictions less valid. For these reasons, an abductive reasoning method is crucial as it blends data driven science and theory.