Protecting Yourself from Black-Hat Applications of Data Science, as a Data Scientist
Recently a far-reaching scandal broke out as a reporter exposed a data science company called Cambridge Analytica. According to the information gathered, that company used a dataset harvested via Facebook and enriched with a lot of data from the Facebook graph too, in order to use it to affect the presidential elections of 2016 in the USA. It is important to note that the role of that project was not exploratory (e.g. related to the finding of insights related to the voters), but rather it aimed at steering the voters’ views on a certain candidate, in order to benefit the other candidate, which was the company’s client.
Personally I’m not vested in US politics and don’t have any strong views on the matter, which is why I chose to omit the names of the politicians involved. As a data science professional, however, I find what C.A. did was shameful and unethical, on many levels. Examples like this only go to show that just like everything else in applied science, data science can be used for malicious purposes too, something that every data scientist ought to be aware of and avoid whenever possible.
Also, a topic like this one concerns not just data scientists but anyone working alongside them, since it would be naive to believe that this whole fiasco was the result of a few data science professionals acting on their own. As the corresponding footage shows, the black-hat approach to data analytics was initiated by the company’s head, who was quite forth-coming about what the company was trying to do. That doesn’t make the data scientists working there innocent victims, but at least the responsibility of this dark project is shared among everyone there, not just them. Also, considering that it wasn’t a huge company, it’s quite unlikely that the data scientists weren’t aware of the unethical and immoral agenda their work was serving. However, it is clear that if they hadn’t cooperated with this plan, this could at the very least have slowed things down.
So, how can we guard ourselves from situations like that of the C.A. scandal, as data science professionals? First of all, we can avoid working for people who don’t have a moral compass and who are looking at how the data products developed can be used to covertly drive certain behaviors that if exposed, would be punishable. So, if the leaders of a project are shady individuals and don’t mind hurting others in order to make their clients happy, that’s a red flag.
The data itself could be another potential warning sign. If it is collected through unethical means and used in ways that compromise the people’s privacy, then that’s a tell-tale sign that there is something fishy going on. Another such sign is the insights discovered through such a project (in this case the categorization of the people involved into four groups that relate to some intimate aspects of their personalities). If we are not comfortable sharing these insights with those people (assuming that there were no NDA in place prohibiting that), because it just feels wrong, then we shouldn’t be digging up those insights to start with.
Finally, if the data products don’t serve the people involved in the data behind these products, even indirectly, then that’s another red flag. The products we create should be something we can talk about openly (without giving away any sensitive know-how behind them, of course), without feeling ashamed or guilty about their purpose.
Naturally, these few suggestions are but the tip of the iceberg of a very large topic related to the Ethics aspect of our profession. I cannot hope to do this topic justice through a blog article, or even a video like the one I made on this topic last year. However, it’s good to remember that we are not powerless against the malicious use of data science by people who are either immoral or amoral, caring only for themselves at the expense of the well-being of others. We may not always be able to stop their agenda, but we can at least identify an unethical project and not contribute to it. Besides, there are many things we can do with data science, so why not focus on the more beneficial ones instead?
Your comment will be posted after it is approved.
Leave a Reply.
Zacharias Voulgaris, PhD
Passionate data scientist with a foxy approach to technology, particularly related to A.I.