Response Blog Post 1

Can AI accurately determine gender when we conduct different kinds of studies? Can AI help in improving gender and racial equality? Allen Jiang’s article is very clear and compelling, showing us an approach to detecting gender with a machine-learning model with Twitter data. I appreciate Jiang’s methodology and explanation but still doubt the initial questions “Can you guess the gender better than a machine” and “What are business cases for classifying gender.” From my perspective, for example, pregnancy is not a gender-specific topic, and optimizing advertising costs could be achieved by non-binary customers’ gender detection. These questions reflect the definitions of gender I constructed derived from readings in week 2, Language and Gender by Penelope Eckert and Sally McConnell-Ginet, and We Should All Be Feminists by Adichie, to name a few. Sex is undoubtedly a different concept compared with gender. Applying a spectrum of gender in a scalar structure would challenge Jiang’s model, particularly in terms of data organization.

Specifically, Jiang primarily selects Twitter accounts belonging to famous people. In the data collecting and cleaning process, how could Jiang avoid the “celebrity effect?” I could think of methods like improving data diversity. And I am also very confused why Jiang included the data of followers. Is it possible that followers could be a feature impacting gender detection or a confounding variable? I previously raised the question about the significance of repetition in experiments to reduce error effects, which assumes a single correct result as the goal. This article reminds me of cross-validation’s importance in training models’ performances in different scenarios. The key question here I propose is to reduce the dependence on a single model designed based on one dataset with specific features.

The Gendered Language in Teacher Reviews focuses on a more diverse group sharing the same occupation, “teacher.” Upon revisiting the process, I discovered that he writes, “Gender was auto-assigned using Lincoln Mullen’s gender package. There are plenty of mistakes–probably one in sixty people are tagged with the wrong gender because they’re a man named ‘Ashley,’ or something.” The data source of Lincoln Mullen’s package is “names and dates of birth, using either the Social Security Administration’s data set of first names by year of birth or Census Bureau data from 1789 to 1940.” (https://lincolnmullen.com/blog/gender-package-now-on-cran/) Unfortunately, Ben Schmidt is no longer maintaining the teach reviews site, but updating this data visualization remains crucial, as the association between names and gender conventions is constantly changing. Using data from the 1940s to train a model for detecting gender in contemporary society may not yield ideal results.

I continued to read two articles to help me think about the question bias in AI. One is an interview Dr. Alex Hanna received about her work at Distributed AI Research Institute on AI technology and AI bias and constraints. I recommend this interview, especially for her answers to the question on independent AI research and purposes not funded by tech companies.(https://www.sir.advancedleadership.harvard.edu/articles/understanding-gender-and-racial-bias-in-ai)

There is also another super interesting but quite dangerous reading named Natural Selection Favors AIs over Humans (https://arxiv.org/abs/2303.16200)

The author Dan Hendrycks is the director of the Center for AI Safety. He discussed the natural selection and the Darwinian logic applied to artificial agents. In such large-scale computational studies and models, if biased natural selection is also incorporated, would the consequences be unbearable for us as humans? If there is natural selection within AI, where does the human position be?