Author Archives: Alexander Lee

Rethinking textual data practices as feminist intervention

If, as Sara Ahmed writes, feminism is “a building project” (Living a Feminist Life, 14), then a critical point of feminist intervention in the textual analysis pipeline is in the selection, handling, and use of textual data. The text tokens — and the methods we use to parse, tag, and untangle them — are the nails and hammers, respectively, in that building project.

Every stage of identifying and preparing data (i.e., prior to its analysis) offers opportunities for feminist intervention. When working with an existing dataset (a readymade), we as researchers can interrogate its provenance, genealogy, potential for misuse, and more. Asking these questions can clue us into the assumptions that shaped its collection. 

If collecting data ourselves, we should look to domain experts to help us circumvent our own biases and situate knowledge within its local context — otherwise, we risk harming the communities the data claims to represent. When cleaning data, we have an opportunity to preserve difference and nuance by avoiding the tendency to aggressively standardize data to make it “easier” to analyze. 

We also have opportunities to rewrite or redistribute data in ways that actively oppose oppression — gender-based or otherwise — by revising or adding metadata, documenting and explaining datasets to promote more informed use, and even constructing alternative datasets that subvert dominant ideologies. These actions can not only bring our research efforts in line with a feminist aim, but also create the conditions for other researchers to do the same.

Building human context into how we understand and process textual data (response blog post to week 12 readings)

In the article “Automatically Processing Tweets from Gang-Involved Youth: Towards Detecting Loss and Aggression,” a team of computer scientists and social workers describe their use of natural language processing (NLP) to analyze tweets from known gang members in Chicago. Their goal is to understand when a tweet conveys a feeling of loss or aggression — which might precede an act of violence — with the hope that this automatic detection can support community outreach groups’ intervention efforts. 

The team realizes they cannot easily use existing NLP techniques given how significantly the textual data (i.e., the vernacular and slang found in the tweets) differ from the language these systems are trained on. Existing part-of-speech taggers, for instance, cannot parse the abbreviations and non-standard spellings found in the tweets. 

To create a more accurate model, the researchers hand-label the tweets, specifically turning to domain experts including social workers and two “informants” — 18-year-old African American men from a Chicago neighborhood with more frequent violence — to interpret each tweet. They call this act of decoding a “deep read” (2198), and the computer scientists work closely with the domain experts to understand the data and tune the model appropriately.

I feel this represents a feminist approach to working with textual data as it places primary importance on respecting the context of the data — and the people behind the data. Using existing NLP tools or foregoing the step of consulting domain experts would not only result in significant errors in the results, but could also misrepresent and even damage the communities the researchers are trying to support. As Lauren Klein and Catherine D’Ignazio write in Data Feminism, “Refusing to acknowledge context is a power play to avoid power. It’s a way to assert authoritativeness and mastery without being required to address the complexity of what the data actually represent” (ch. 6). 

Notably, the authors of the research article conclude by pointing to future research directions, explicitly stating that they aim to “extend our corpus to include more authors, more time periods, and greater geographical variation.” This suggests an iterative and process-oriented approach to modeling the data in line with Richard Jean So’s article “All Models Are Wrong,” in which he encourages researchers to assume that a model is wrong from the start but that it can be useful for exploration and can be improved through iteration (669). The researchers behind the Twitter data could (and should) continue to refine their model to better represent and engage with the people whose data they are modeling.