It is well known in the data science community that there is a bias issue with Big Data. It is incredibly susectible to pre-existing biases and prejudices in our society - this is easy to understand as we humans have produced this data and so it reflects the worst part of our opinions back at us.
The issue comes along when we try to build ML models and other decision making software, which are intended to have frequent usage in our day to day life. While the purpose of these machines is to improve the efficiency of decisions which have to be made repeatedly, if they are fed data sets with the same bias as our society has then they will reflect this back at us.
Feminist author, journalist and activist. Perez's best selling novel 'Invisible Women' shines a light on the lack of data that is collected and studied about women, due to 'male' being seen as a defule 'normal' human body.
Therefore women aren't included in the data that is used to make important decisions in all areas of life from the medical industry to designing public transport.
Consequently:
Buolawmwini's Project known as 'Gender Shades' revealed how IBM, Microsoft and Face++'s gender classification products were significantly biased towards white men.
Buolamwini used a specially curated dataset which was designed to be equally split around sex and had subjects selected from 3 African countries and 3 European countries to test these products on.
These were the results:
Lauren Klein and Catherian D'Ignazio's book Data Feminism talks about how challegning power can help mobilize data science and push back against unequal power structures
They argue it's important to consider who you are trying to persuade with your data; if you want to show there is a bias or inequality in your community then it's not your neighbours you need to convince that the inequality exists but those in positions of power and dominant groups who through their background and influence bear some level of responsibility for helping fix this inequality.
To move from data ethics to data justice, we should first:
Because they can identify a source of problems in technical systems
Because this helps us acknowledge "structural power differentials" and work towards removing their influence.