View on GitHub

DATA-150

Disucssion of “Don’t Forget People in the Use of Big Data for Development”

Promises of Big Data

Joshua Blumenstock presented a breadth of promises that big data has when applied to real world situations. Though not perfect, he explained how there were indications of someone’s ability to pay-back debt when they lack a conventional credit score. Those indications included how many friends someone has on facebook compared to those in their near vicinity as well as their number of international calls. Though these factors seem odd and randomly drawn from a hat, big data allows people to unveil indicators from a world filled with an abundance of potential factors. As expressed later in the article, the aforementioned predictors are far from perfect but present a promise for the future where impoverished people can get the financial assistance and trust they need. If applied properly and with other context, these predictors could have a widespread impact on those in need, and thus should be researched further. The promises of data do not stop at finding people who will pay back loans, however, with data models allowing scientists to see crop yields and track malnourishment in kids. This impact, if maximized to the fullest, has an incredible humanitarian benefit which should not be overlooked. It would allow for aid to be better directed. Again though the process at this stage is not perfect: it needs work. But this sentiment is true for most developing processes which have not been around for a significant amount of time to properly develop. I look forward in anticipation for how these promises will better impact the world we live in.

Pitfalls of Big Data

In addition to his analysis of the promises of big data, Blumenstock illustrated the current limitations and pitfalls of data. One issue which is a common throughline in society is that data is controlled by the elite, or at least those wealthy enough to have phones, electricity, etc. Because of this, some are left out of the benefits of data. Beyond the benefits of data being in the hands of those wealthy enough to wield it, data techniques as of now are created in a bubble. Many new techniques are untested in a variety of situations, meaning that they can be unreliable when extrapolated. The example Blumenstock points out is the international calling being a better predictor of paying debt in Rwanda than Afghanistan. For this reason, it is important to keep in mind that correlations can only be drawn on the population they are derived from. To do otherwise is an extrapolation which leads to unusable conclusions. In tandem with current extrapolation is the problem that models can have a short life-span. Blumenstock showed an example where the correlative relationship was dependent on the time of the year. For this reason, models need to be more flexible and try only to predict the perimeters they can be confident in. Most concerning was Blumenstock’s discussion of “social score” limiting people in their use of public transport if their score is too low. This is an application of data that goes too far in its endeavours and harshly penalizes people. For this reason, I find it incorrect to use data at such a miniature level to punish people. Data should be used on larger problems, rather than being used to unproportionately limit the liberty of individuals for factors they are unaware of.

A Humble Data Science

In the end of his article, Blumenstock paints the picture of a “humbler data science.” He mentions the collaboration of data scientists and others, including government officials and humanitarian groups, to add more contextualization to the numbers. As we have seen with Covid-19, it is easy to sometimes forget that those numbers represent people, a life which is much more than just one more addition to a spreadsheet. This contextualization is paramount, for without it the data are meaningless along with the models they create. Another potential solution included in the article with which I agree is the use of conventional methods of data collection, in the article’s instance poll data, with newfound techniques. By doing so, we ensure more accurate data since it comes from two different sources. By doing so, we also have a control to compare the newfound techniques to test their reliability. Even if the use of both methods increases the cost of the operation, it provides a greater benefit to the team and lays the foundation for a future where the new techniques reign supreme and are more accurate than their predecessors.