Often in the tech world, we hear of algorithms that can predict with accuracy, the gender of the person who wrote a particular document, tweet, etc. Is this inherently unethical? To discover the answer, we can use the following as a jumping off point to arrive at a decision.
In every project, there is the potential for biases to be introduced. Some may ask how this could be possible if an algorithm is doing all the work. This is an inaccurate idea. It is important to realize there are people behind every algorithm written. It’s trained on data provided by people who have thoughts, feelings, and opinions which could be translated into the training material provided. Does the training data perpetuate gender stereotypes or other biases?
Another element to consider is privacy. When collecting information about genders of authors, how is that data being used within the project? Was consent even obtained from the individuals providing the data? Was this communicated to the participants? If data were to be exposed would it cause harm? Would it be possible to anonymize the data and still provide significant results?
It is also important to consider social and political context when attempting to analyze gender using computational text analysis. Do the results perpetuate power dynamics between socially constructed gender roles? If so, this could reinforce what has been ingrained in our society. However, constructs change over time. Have historical and cultural context been taken into account to eliminate misunderstandings of the results? Since gender does not stand on its own, was there an intersectional approach taken within the experiment? Other social categories such as race, social class and sexuality are highly intertwined.