Book Review: Digital Lethargy: Dispatches from an Age of Disconnection by Tung Hui-Hu

Tung-Hui Hu is an Associate Professor of English at the University of Michigan, a published poet, and a former network engineer. His first book, A Prehistory of the Cloud published in 2016, illustrates how the cloud grew out of older digital networks, and “examines the gap between the real and the virtual in our understanding of the cloud.” (Hu, 2016)

In his most recent published work, Digital Lethargy, Hu examines the phenomena of exhaustion and lethargy shared by digital users in an age of digital capitalism, where concepts of ‘liveliness’ and ‘agency’ are sold to users for the benefit of big tech and advertisers, specifically with data. Digital platforms created by tech industrialists gather data from “active users” through their clicks, likes, status updates, and even their racial and gender identity in order to generate profits from advertisers.

“Under digital capitalism, ‘being yourself’ is the dominant set of codes for how we understand ourselves and others. It is a form of empowered individualism, where we equate a user account with personhood, and we equate choice with agency.” (Hu, 2022, viii)

This results in users carrying the heavy burden of ‘performative livelihood’, in which personhood is intrinsically linked to the digital self, leaving little to no room for identity outside of digital capitalism. Instead of shaming feelings or acts of lethargy, Hu argues that lethargy in a digital capitalist system can be a form of resistance itself. It can be a response to a disconnect; in a digital system where communications are tracked, collected, and analyzed as data, the lack of relation to others can indicate participants refusal to adhere to scripted and algorithmic modes of “connectedness”.

Hu makes the point that privileged concepts of ‘agency’ and ‘choice’ presented by digital platforms are “not afforded to all populations in the same ways.” (Hu, 2022, xiv) In digital capitalism, microwork is typically employed to those who can afford to work long, menial tasks with little pay, given the circumstances they were born into.

“‘Robotic’ work takes place in countries like the Philippines and India and Mexico, whose populations are already stereotyped in the West as being hardworking and technically competent people ‘inherently’ fit for manual labor, for being given commands and executing them.”
Hu, 2022, xiv

This system in which money is made from “propping up the active user at the expense of the passive server” creates what political theorist Cedric Robinson terms racial capitalism. The racial differences of regional or ethnic groups are exaggerated in order to justify employing such groups for ‘inferior work’, such as transcription (Rev), ‘digital janitor’ (CrowdFlower), or filtering out graphic, offensive content on social media platforms and language-based models like ChatGBT.

It seems the theme consistent throughout the book is the idea that individuals, whether digitally or physically, do not and cannot exist within a vacuum. Even in a so-called liberal, Utopian system of “the Internet”, where users can construct visual identities of themselves without the limitations of a physical body, values such as race and gender are still taken into account. Hu provides the paradox of tech industrialists attempts at inclusivity, striving for more race and gender-neutral workspaces, with the 2016 controversy in which it became public that Facebook provided an option for advertisers “to exclude ads to Black and Latinx groups.” Facebook’s response was to make its internal advertising preferences that capture race as a ‘multicultural affinity’ visible.

“These responses indicate that the technology industry understands race and gender as identity markers– in other words, data values– that can be chosen or are user preferences. Yet by tagging persons of color with interests in ‘African-American culture,’ ‘Asian culture’, or ‘Latino culture’, algorithms contrast them with the default values of whiteness. The inclusion of these ‘affinities’ only reinforce the power a system of classification exerts over those perceived as different: as the poet Edouard Glisant puts it, “I understand your difference… I admit you to existence, within my system.”
Hu, 2022

In Digitizing Race: Visual Cultures of the Internet, published in 2008, Lisa Nakamura explores the concepts of race and gender existing in cyberspace, not just as data, but as artifacts of visual culture and modes of communication between communities with ‘low social power’, such as women and people of color.

Even though these two books are published at least a decade apart, there are still common themes when addressing digital culture. She touches upon Omi and Winant’s theory of racial formation, in which she deposits her theory of digital racial formation, “which would parse the ways that digital modes of cultural production and reception are complicit with this ongoing process”, and concepts of visual capitalism, coined by Lisa Parks in Satellite and Cyber Visualities: Analyzing The Digital Earth Project, as “a system of social differentiation based on users’/viewers’ relative access to technologies of global media.” Nakamura wishes to describe access to the Internet and other technologies not as ‘binary’, but instead, as a spectrum.

“The problematic that I wish to delineate here has to do with parsing the multiple gradations and degrees of access to digital media, and the ways that these shadings are contingent on variables such as class position, race, nationality, and gender.”
Nakamura, 2008

Both authors refer to visual media to break down the response to digital and visual capitalism we currently exist in, and how users operate given the power relations created through binaries such as “user/server”, “spectator/owner”, and “object/representation”. Nakamura provides examples of visual representation on the Internet pre-2008, such as AIM buddy icons, online racial profiling in a Web site called www.alllooksame.com, and online web forums dedicated to providing emotional support to pregnant and conceiving women.

Hu analyzes contemporary and performance art, such as Unfit Bits Metronome (2015) by Tega Brain and Surya Mattu, in which a Fitbit fitness tracking device is strapped to a metronome. He states “For, Brain and Mattu observe, the ‘healthy, active lifestyle is an economic privilege, one that is out of reach (and literally unaffordable) to many working consumers, who may ‘lack sufficient time for exercise or have limited access to sports facilities.”

Fitbit fitness tracking device strapped to metronome

Hu also analyzes the 2011 film Sleeping Beauty directed by Julia Leigh in which he describes the main characters endurance as a passive and unconscious sex-worker to the passivity of users constricted in digital modes of relationality.

The main character, Lucy, takes on a job as a sex worker to make ends meet as a full time student and waitress. The sex work involves her taking a sleeping drug, getting into bed, and sleeping until the evening is over. The subject throughout the film is, however, deadpan and seemingly nihilistic about the situation she finds herself in. In one scene, she is shown lighting a stack of bills on fire right after getting paid.

The viewers are challenged at the characters lack of resistance, or even lack of internal struggle displayed on the screen. Hu states “lethargy embraces the potential of being an ‘object being’ as one that relieves a subject of the burden of having to perform aliveness, individuality, and interactivity– all ‘human’ attributes that are actually gendered, ableist, or racially coded.” (Hu, 2022, 120)

Nakamura states in her Introduction:

“So rather than focusing on the idea that women and minorities need to get online, we might ask: How do they use their digital visual capital? In what ways are their gendered and racialized bodies a form of this new type of capital? What sort of laws does this currency operate under? It doesn’t change everything, but what does it change? This brings us back to the privilege of interactivity and its traditional linkage with the creation of a newly empowered subject.”

Hu states in his Introduction:

“For a server, lethargy is the exhaustion of having only a partial claim on selfhood: of needing to ‘be yourself’ for other people, or alternately of having to suppress it; of being what feminist scholars Neda Atanasoski and Kalindi Vora call “human surrogates,” rather than full humans. And yet this is the same problem that afflicts users: a feeling of selfhood as something out-of-reach, burdensome, or even unwanted that trails the feeling of sovereignty like a leaden shadow.”

The books read were interesting, contrasting digital data with digital culture and the implications that follow weaving through concepts of gender and race. As gender has consistently been established as a performative aspect of ones identity, one can question how the performative aspect of gender is translated in visual medium in contrast with data– how do you define data that is gendered on a spectrum of male and female, when the performative aspects are tied to a culture that is every-changing? While one can argue that gender ‘doesn’t matter’, it becomes apparent that even on digital platforms where one can afford to be an anonymous, participating presence in a community, the lived experience of that individual, in reality, is affected by such physical attributes of race and gender, and it is a reality that cannot be ignored.

Book Review: More Than a Glitch

In Feminist Text Analysis this semester I had the opportunity to read “More Than a Glitch – Confronting Race, Gender, and Ability Bias in Tech” by Meredith Broussard. I thought I knew a fair amount about bias in the technology space, but my eyes have been widened to problems I didn’t even know existed. Many think that the fix to these biases is that we must simply enhance technology as it is the only answer. This is what has been coined “technochauvinism”. This is akin to what Catherine D’Ignazio and Lauren Klein, authors of Data Feminism, call “Big Dick Data” . This concept explains that many think the more data the better and that it can never be wrong¹. However, Broussard makes the argument that while technology, namely AI (Artificial Intelligence) can be helpful, it can also be detrimental to society. To promote equality and equity, the right answer to these issues may be to not use technology at all. She asks, “Why use inferior technology to replace capable humans when humans are doing a good job?”

Societal issues cannot be solved tech alone and even tech can deepen the issues as well. These issues or “glitches” within software, AI, etc come off as a simple fix but this may not be the case. Broussard explains this as the difference between social and technological fairness. To put it simple, computers are just “machines that can do math”². While they can compute at a high level to produce an answer, they do not have feelings or experiences and therefore cannot be the entire solution to these highly complex problems. We can align this idea to the concept of “resistant reading” we spoke about this semester. What are the alternatives? What can we do to challenge the norm and provide better results for our communities?

Humans build code, the code may contain faults. AI is therefore not a neutral technology. These faults are often at the expense of already marginalized groups. Examples mentioned but are not limited to are predictive policing, AI facial recognition software, Google searches, testing technology for schools, lack of accessibility, reinforcing gender binaries and even the use of automated soap dispensers.
The intersectionality between race, gender, and technological advances is the main theme when being critical over technologies. Building technologies that need, or say they need, to accept race or gender as a datapoint have traditionally been built as a boolean or a select one fixed list within a user interface. We know that gender, and even so called biological sex is socially constructed. Before this was in the cultural zeitgeist, people were not aware that you could change your gender, and many databases made these fields uneditable. These ideals still persist today in coding as well as in legacy systems. Programmers are taught to optimize code in order to save memory when building programs. A boolean is cheaper in memory than a string of text. The concept of elegant code is therefore enforcing the gender binary and promoting cis-heteronormativity. Even the biggest names in tech, like Microsoft and Google, that promote themselves as a LGBTQIA+ allies³ sometimes have trouble recognizing ze, hir, xie, etc. as acceptable words to use or yield no results in their dictionaries within their respective word processing softwares.

Race, medicine and technology is yet another example of where these glitches take place. As mentioned previously, many softwares only allow for a user to check off one race within a list when identifying themselves. However, multi-racial people exist! What are they to do? How are they supposed to identify in these scenarios? People don’t fit into the neat little boxes decided and created by software engineers. One example of this happening is with electronic medical records, EMRs. As soon as race is entered into these charts, the type of care that is often received is unfortunately linked to the color of someone’s skin. It is known that historically that any complaints of pain from Black women have often been ignored whether that is from conscious or unconscious bias⁴. Social factors are also at play here which is why so many more Black women die from birth related events than other groups⁵ and it’s not just from the prejudice of doctors. Not all technology works equally. Pulse oximeters, a very common device that measures a person’s oxygen rate, often give false readings to those with darker skin tones⁶. Why would the FDA or any governing body decide it’s ok to sell and distribute this tech? Most likely it wasn’t tested on these underserved populations therefore showing no issue in the compliance process. You can’t provide results for something that is never tested. The same can be said for AI technology.

At its core, AI is a way to provide high level statistics. Data scientists train algorithmic models on datasets. From those datasets, the model is able to predict accuracy or probability for new data that it is fed. What happens when the training data is missing important information? The model will be flawed and potentially hurt those affected by it. As an example, Broussard even ran her own experience with breast cancer through an AI to see if she was able to detect it herself over a doctor. While she was able to, it took immense trial and error, required outside help and hundreds of hours to get the right answer. On the other hand, her doctor was able to tell her in minutes from looking at a simple scan. Sometimes, it just doesn’t make sense to use these predictive technologies to replace expert humans.

Finally, we need to be more careful in how AI is created and need to be transparent about how it works. AI is often described as a black box. Therefore Broussard suggests further governmental action as well as action by individuals. As a single person, it is also possible to be critical of these systems. Broussard calls this “Bullshit Detection”⁷. We can use the following three questions to be critical about AI or whatever software is being advertised to us.

Who is telling me this?

How do they know it is possible to achieve results?

What are they trying to sell me?

Additionally, all tech companies need to be held accountable for their actions which is where algorithmic auditing comes in handy. Similar to accounting audits, there are organizations dedicated to understanding algorithms and providing feedback to companies for them to manage risk. Major players in that field include but are not limited to Cathy O’Neil and Julia Angwin.

O’Neil is the author of Weapons of Math Destruction and founded ORCAA⁸, a more traditional auditing advisory company focusing on understanding big tech and algorithms in order to help them mitigate risk. Angwin founded The Markup⁹ which is a news organization dedicated to watching and investigating big tech. What I found most interesting about them is that they provide documentation to their readers on how to replicate their studies. This is exactly what is meant by increasing transparency in the tech space, especially for algorithmic issues.
Ultimately, I agree with Broussard in challenging new technologies to make sure they are suited for the common good. She sums this up nicely by stating, “And if we must use inferior technology, let’s make sure to also have a parallel track of expert humans that is accessible to everyone regardless of economic means”¹⁰.

Footnotes

1 D’Ignazio, C., & Klein, L. (2020). 6. The Numbers Don’t Speak for Themselves. In Data Feminism. Retrieved from https://data-feminism.mitpress.mit.edu/pub/czq9dfs5

2 Broussard, Meredith. More than a Glitch: Confronting Race, Gender, and Ability Bias in Tech. MIT Press, 2023.

3 https://unlocked.microsoft.com/pride/, https://pride.google/

4 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4843483/

5 https://www.cdc.gov/healthequity/features/maternal-mortality/index.html

6 https://hms.harvard.edu/news/skin-tone-pulse-oximetry

7 Broussard, Meredith. More than a Glitch: Confronting Race, Gender, and Ability Bias in Tech. MIT Press, 2023.

8 https://orcaarisk.com/

9 https://themarkup.org/

10 Broussard, Meredith. More than a Glitch: Confronting Race, Gender, and Ability Bias in Tech. MIT Press, 2023.

Roundtable 2 Abstract: Feminist Theories and Humanistic Computing

Our roundtable today loosely follows several themes within the larger question of what a feminist Text Analysis might look like for disciplines in the Humanities.

We begin with Atilio Barreda’s argument for a more strategic application of transfer learning as a model of a self-consciously feminist textual analysis that would recognize and account for the situated and contingent status of Machine Learning.

Similarly, Zico Abhi Dey, in his recent examination of Open AI, implies that open-source language models with their shift in scale, and attention to computational costs over efficiency, might provide a feminist alternative to flawed Large Language Models.

Our other panelists, Livia Clarete, Elliot Suhr, and Miaoling Xue, bring a much-needed multi-lingual perspective to current Text Analysis and the applications of Feminist Data principles.

Clarete examines communication styles in English- and Portuguese-language healthcare systems and asks how feminist notions of care and power-relations might inform our use of linguistic analysis and corpus analysis studies.

Suhr explores how the biases in data collection, language models, and algorithmic functions can exacerbate disproportions of power in dominant and minoritized languages and suggests that an intersectional feminist framework is essential to unpacking these issues.

Xue approaches another aspect of Humanistic computing—the construction of the historical past—by looking specifically at what is lost and what can be gained by applying current Western feminist models of digital archival reconstruction when approaching a corpus that differs in space, time, and significantly language. She concludes by considering the implications for representations of women in narrative history and the occluded labor of women in the production of texts, particularly in the so-called “invisible” work of editorial notation and translation.

Taken together, these discussions animate and ground what Sara Ahmed calls “the scene of feminist instruction” which she identifies as “hear[ing] histories in words; . . . reassembl[ing] histories in words . . . attending to the same words across different contexts” (Ahmed 2016) and which could equally be a description of a responsible and informed feminist text analysis itself.

Participants: Atilio Barreda, Bianca Calabresi, Livia Clarete, Zico Abhi Dey, Elliot Suhr, Miaoling Xue

Book Review: Meredith Broussard’s Artificial Unintelligence: How Computers Misunderstand the World

I started my reading of Meredith Broussard’s Artificial Unintelligence: How Computers Misunderstand the World with a set of haunting questions: “Will AI eventually replace my job?” and “Should we be worried about the future where AI might dominate the world?” The entire reading experience was enlightening and inspiring, with one of the most direct takeaways being that as a woman, I should not fear studying STEM, despite societal notions during my childhood that my brain may not be qualified to process complex logic and math. Broussard starts the book by illustrating some personal experiences like dissecting toys/computers to learn, providing women’s perspectives in the male-dominated STEM field, and writing criticisms on our current definitions of artificial intelligence. Each chapter stands alone as an independent piece, yet they all contribute to a cohesive narrative that guides readers from understanding the limitations of AI to the concept of “technochauvinism” and how this misplaced faith in tech superiority can lead to a harmful future.

The book starts with a critique of technochauvinism. She gives a wonderful description in the first chapter, “The notion that computers are more ‘objective’ or ‘unbiased’ because they distill questions and answers down to mathematical evaluation; and an unwavering faith that if the world just used more computers, and used them properly, social problems would disappear and we’d create a digitally enabled utopia.” (8) I suggest juxtaposing this statement with one from Chapter 7, where she writes, where she writes, “Therefore, in machine learning, sometimes we have to make things up to make the functions run smoothly.” (104) By comparing the two arguments, I understand that Broussard addresses that a clean, repeatable, large-scale structure and mathematical reasoning do not guarantee a valid model in all contexts. While some abstract aspects in human lives, like emotions, memories, values, and ethics, are reduced to numeric representations in a single dimension but with the wrong direction, data and ML algorithms are “unreasonably effective”(119) in calculations and would intensify bias, discrimination, and inequity.

This book is quite scathing in its criticism of the technochauvinist male leaders in the tech industry. You can find quite direct praise and criticism in the sixth chapter of this book. She recalls the mindset and working methods that tech entrepreneurs and engineers have been employing to guide the world since the 1950s. Bold innovation and disregard for order are two sides of the same coin. What intrigues me is how feminist voices can lead to specific interventions in such an environment. This book was written in 2018, and as of 2023, with the increasingly rapid development of AI, we are witnessing a ‘glass cliff’ phenomenon for female tech leaders. Check the news on the newly appointed Twitter CEO Linda Yaccarino and the concept of the glass cliff that women are usually promoted to leadership roles in industries during the crisis and are, therefore, set up for failure.

In the book’s first chapter, Broussard emphasizes what is considered a failure and why we should not fear failure in scientific learning. I find it fascinating to connect Chapter 6 with the recent news about the “glass cliff,” which reminds us to consider the definition of failure dialectically. The glass cliff of women leaders further alerts us that failure can also be unilaterally interpreted from data. The intervention of women in technochauvinism might also present a state of failure. It raises questions about how we can move beyond data and computational methods to consider feminist interventions in technological development.

Regarding feminist interventions, one possible example of self-driving robot cars in Chapter 8 provides unique insight. We talked about the concept of resist reading by Judith Fetterley in class and the scalar definitions of gender at the beginning of the course. One difference between the two kinds of algorithms in robot driving mentioned by Broussard reminds me of the above discussion. I will try to explain my idea here to make some connections.

Both the resist reading and the scalar definitions are trying to find alternative interpretations and respect non-mainstream experiences.
A robot car model by the CMU that uses the idea of the Karel problem (a chess game in a grid-like pawn for learning programming) challenges the preexisting assumption that we need to build a robot car mimicking human perception. Instead, they propose the idea that we use machine like machine to collect data from building 3D maps, do the calculations quickly, and feed the results back to driving on a grid-like path, which is just like how humans invented airplanes (we were not building a mechanical bird but discovered another model of flying).

If you think about 1 and 2 together: The workflow described above breaks down the idea of reconstructing human intelligence using machine/replacing humans with machine and provides alternative thinking in utilizing machine/algorithm. This narrow definition of machine learning/AI is also a restrained use of machine. I believe this case represents an instance of not excessively breaking the rules and advocating technochauvinism. They respect humanism yet are still achieving innovative results. It probably could also serve as an abstract example of feminist intervention.

And last, let us return to an example of textbook distribution in Chapter 5. Broussard believes that we have overestimated the role of algorithms and artificial intelligence in textbook distribution, resulting in many students without access to books. The sophistication of textbook database models cannot prevent the chaos of input data, and the disorder of textbook distribution and management cannot be solved by large models. Therefore, she visited several schools and discovered the management chaos behind big data. This action reminds me of the data cleaning phase we mentioned earlier, where we would fill in data to maintain a controllable cleaning structure. This kind of on-site investigation for problem identification might be considered another form of data cleaning. Although it seems to bring chaos to the big data model, her visit accurately identified the root cause. Therefore, if human problems are not solved, technology ultimately cannot solve societal problems.

Overall, Broussard raises many challenging questions, shares her perspective as a woman in STEM, and presents an argument for a balanced approach to technology.

Book Review – “Raw Data” Is an Oxymoron

Data, in the modern-day world, became such a phenomenon that people began (and continue) claiming that it provides an objective set of facts (the fundamental stuff of truth itself) and consequently, methods and tools for its analytics would truly revolutionize everything and allow us an understanding of the “raw” and “unfiltered” sense of the world around us.

The book “Raw Data is an Oxymoron” stands as an answer to that claim, critiquing and analyzing the fundamental element of the process itself – data. While much has changed ever since the book was released in 2013 (like the advent of large language models and other AI-powered automated tools released daily), it can be taken as a rather early work on the subject to uncover what data really is.

The book is actually a compilation of essays providing different perspectives on data through different lenses of observation. While it is divided into chapters each being essay by a different author(s), every two sets of chapters can be taken to be different “topics” or “sections” in the book (as outlined in the introduction).

The first section looks rather at a historical and cultural account of some of its early beginnings (including how the word came to be, its meanings, and development over time) and critique of early modern arithmetic (I believe the inclusion of this chapter is because data is tied to math but it felt a little out of touch because it began to delve more into a critique of arithmetic than data). The second section looks at the formulation, creation, and collection of data in different disciplines (one being economics, the other astronomy). The third section relates to essays involving ideas of databases, data aggregation, and information management exploring the ethical implications of data and how it can be used to influence and manipulate. The fourth and final section relates to the use of data in society, a sort of exploration of the future of data, how it comes to impact people in the (then) current day world, and the challenges that large (and ever-growing) sets of data possess as data itself seems to be becoming an organism on its own.

As the title suggests what the book discusses through its essays and chapters is the idea that data is never “raw” and is rather “cooked”. When considered alone, data is still not raw in the sense that it’s a reflection of the world which itself has its biases and issues and so they are transferred over. Data is also “conceptualized” and “collected” and in that process contains biases of the humans or human-made tools collecting them. Also, data has to be manipulated, cleaned, structured, processed, etc. before any analysis can be performed and observations can be made. These end up “cooking” up the data we first begin with. Now when an analysis is performed through the usage of particular models run through tools, both the models and tools have embedded biases in them and contexts that they do not consider and that apply to any field whether economics or astronomy or others. Finally, even after the analysis is done, the results have to be interpreted which brings in biases from the humans doing it.

So, what we essentially understand is that data can never be raw or objective, that at every stage, it contains, grows, and evolves on its limitations and biases and so it is important to consider that since data or rather the analysis and conclusions we draw from data has wide implications whether in public policy, politics, education, business, etc.

The book connects to our class conversation in all the stages we have discussed. By simply replacing “data” with “text” – which is the element of critique in our class, we can still apply the facts that text itself, the processes performed to collect, process, clean, analyze and interpret it all have hidden biases which give rise for the need to approaching the problem from a feminist perspective.

Two quotes really showed the duality that data (text in our class discussion case) possesses. “Data require our participation. Data need us.” and “Yet if data are somehow subject to us, we are also subject to data.” This connects to the discussions we have that data requires labor and participation of humans and does not exist without a form of human intervention and, quite equally, we are subject to data itself. And being subject to data, we are ourselves being shaped by it. This is further discussed in an essay with the writing “The work of producing, preserving, and sharing data reshapes the organizational, technological, and cultural worlds around them.”

Another interesting idea I found can be summarized by the quote “When phenomena are variously reduced to data, they are divided and classified, processes that work to obscure – or as if to obscure – ambiguity, conflict, and contradiction.” And that is a point of concern to us, especially as feminist scholars critiquing the topic that when we reduce real-world phenomena to data, we are in many cases just oversimplifying things and lose out on the depth and complexity of the topic (for example classification algorithms can fit real-world phenomena into a binary while these can be on a spectrum rather than a strict binary).

The chapter on economic theory and data also tied to our conversations about how “all models are wrong, but some are useful” showing that economists’ use of data and the modeling done are actually just approximations of reality only useful “for the formulation in which they appear.”

Something that I did not think about earlier but now completely agree with can be summarized through the following sentence from the book: “Raw data is the material for informational patterns still to come, its value unknown or uncertain until it is converted into the currency of information.” It made me think how data is only valuable and useful to us when it can be converted into the “currency of information” or in other words when it provides us with information that is useful or profitable (whichever sense the word is applied to whether purely business or even academic) to us. This brings to the sense that data has in most cases become a utilitarian as well as a capitalist tool.

Also, the book does touch on the topic of how power dynamics influence the production, analysis, and interpretation of data and how that can be used to influence and manipulate as well as exert power – which is an important factor to consider when building a feminist critique.

While the book is now outdated since many more technological leaps (and dangers) have grown within the last decade, it does provide a good introduction to the topic especially given that it is a relatively early work in the field of data studies. Also, it provides a rather introductory, theoretical understanding of data but does not provide a practical framework that can be used to develop strategies to deconstruct and counteract the issues being brought up (something that later works like D’Ignazio and Klein’s book “Data Feminism” achieves).

However, it does produce early thoughts and perspectives that can be applied to the landscape of computational text analysis.

For example, the central idea of the book that data is “cooked” or in other words constructed, is a core topic discussed in computational text analysis (as well as by feminist scholars) and prompts all CTA work to be well documented and thought out of the ways that biases and assumptions influence it.

The book discusses the importance of context when it comes to understanding data which is very crucial and a hotly debated topic in the field.

It also touches not just on how humans influence data, but how data evolves at a point to almost mimic a live organism itself that beings to make us its subjects and influence and control us, which is important to consider in modern computational text analysis, especially in a feminist perspective, when algorithms and AI dominate the field to influence society in every manner. However, the book itself does not consider the current day situation of the topic.

Finally, the book draws on ideas from a diverse set of disciplines be It an anthropological account, an economic analysis, a humanities critique, scientific development, etc. That is important when applied to computational text analysis which itself is a subject that is highly interdisciplinary and can benefit from the incorporation of insights from different disciplines to provide a more nuanced understanding. Also, building a feminist text analysis itself requires us to bring diversity in the perspectives of participants.

Response Blog Post – originally posted 4/8/23

I’ve often wondered why anyone would need to build an algorithm that would produce the gender or ethnicity of the author of text. To me, it feels a bit creepy and teeters on the verge of a big brother reality. One of the readings assigned that relates to this was a Medium article entitled, “AI Ethics Identifying Your Ethnicity and Gender” by Allen Jiang.

Blog articles are meant to be approachable to all audiences and even though this is a highly divisive topic, at first I thought the author did a good job of explaining AI and how it can be used to understand ethnicity and gender. Jiang gave a few business case examples as to why one would do this including, but not limited to, better customer experience and better customer segmentation.

However, one of the first sentences as discussed in class is as follows “This is an analogous question to: if we had complete discretion, would we teach our children to recognize someone’s ethnicity.” The author uses this comparison as rational for this type of AI model. On the surface, one could glance at this and continue on reading without question. But taking a minute to think, this analogy is not valid.

The author is equating the human experience to computers. We teach children to be accepting of all people, even when they may look different than themselves. We don’t ask a child to point out their friend’s race or ethnicity. While we need to be aware of our surroundings to be sensitive at a higher level this is not what this AI prompt is being used for. Additionally, humans learn in a completely different manner than computers. Human learning is based on experiences and emotions. Computers don’t have emotions or experiences. Computers are programmed to make decisions, which is what the author is equating to learning, based on decision trees, dictionaries and other methods. This is procedural knowledge, not experiential.

Even if the author was to use the data collected (assigning gender to text from celebrity tweets) for business purposes, what would be the impact? One impact could be reinforcing stereotypes and gender binaries. The results from the experiment could mislead the business and they may not truly understand their customer’s needs, wants and preferences. Additionally, looking at the results from the experiment, the accuracy rate is only 72%. This is only about 1/4 higher than the prediction rate of simply guessing if a tweet was written by a male or female (50%). Ultimately, a poor model leads to a poor proxy.

Perhaps one day I’ll come across a compelling argument for why AI would be helpful in detecting gender or ethnicity but it’s not today.

Weapons of Math Destruction – a deep dive into the recidivism algorithm

Automation has been occupying more space in the productive system in today’s society and algorithms are at the core of it. They are logical sequences of instructions that enable autonomous execution by machines. The expansion of these calculation methods is largely due to the use of software, able to collect a more significant amount of data from different sources. It is embedded into users’ everyday tools such as Google’s search, social media networks, movies and music recommendations of Netflix and Spotify, personal assistants, video games, surveillance, and security systems.

Computers are much more efficient in replicating endless tasks without committing mistakes. They connect information faster and establish protocols to log input and output data. They don’t get distracted, tired, or sick. They don’t gossip, or miscommunicate with each other. The Post-Accident Review Meeting on the Chornobyl Accident (1986) reviewed that poor team communication and sleep deprivation were the major issues that caused the disaster.

In 2018, the project Blue Brain relieved brain structures are able to process information in up to 11 dimensions. Computers, on the other hand, process zillions of dimensions and are able to unhide patterns that the human brain could not imagine. The concept of big data goes beyond the number of dataset cases. It also involved the number of features/ variables able to describe phenomena.

Of all the advantages of computers, the most important one is their inability to be creative – at least so far. If I go to bed trusting that my phone is not planning revenge or plotting against me with other machines is because computers don’t have their own wills. Computers don’t have an agenda. Human beings do. Public opinion has become more aware of the impact of automation on the global economy. Accordingly, to a Pew Research 2019 study, 76% of Americans believe that work automation is more likely to increase inequality between rich and poor people, and hurt workplaces (48%). 85% of them favor limiting machines to dangerous or unhealthy jobs.

Computers uncover patterns; they don’t create new ones. Machines use data to find patterns from past events, which means their predictions will replicate the current reality. If the reliance is on algorithms, the world will continue as it is. In “Weapons of Math Destruction,” Cathy O’Neil adds a new layer to how automation has propagated inequality by feeding biased data to models. “Weapons of Math Destruction” is a book by Cathy O’Neil published in 2016, which explores the societal impact of algorithms. O’Neil introduces the concept of “weapons of math destruction,” referring to big data algorithms that perpetuate existing inequality. She highlights three main characteristics of WMDs: they are opaque, making it challenging to understand their inner workings and question their outcomes; they are scalable, allowing biases to be magnified when applied to large populations; and they are difficult to contest, often used by powerful institutions that hinder individuals from challenging their results. Stretching her own example, if we based educational decision-making policies on college data from the early 1960s, we would not see the same level of female enrollment in colleges as we do today. The models would have primarily been trained on successful men, thus perpetuating gender and racial biases.

This article is intended to explore one of the examples she gives in her book, about the recidivism algorithm. One case to illustrate it was published in May 2016 by the nonprofit ProPublica. The article Machine Bias denounced the impact of biased data used to predict the probability of a convict committing new crimes in a commercial software Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) risk scores. The algorithms used to predict recidivism were logistic regression and survival analysis. Both models are also used to predict the probability of success of medical treatment among cancer patients.

“The question, however, is whether we’ve eliminated human bias or simply camouflaged it with technology. The new recidivism models are complicated and mathematical. But embedded within these models are a host of assumptions, some of them prejudicial. And while Walter Quijano’s words were transcribed for the record, which could later be read and challenged in court, the workings of a recidivism model are tucked away in algorithms, intelligible only to a tiny elite”.

To calculate risk scores, COMPAS analyzes data and variables related to substance abuse, family relationship and criminal history, financial problems, residential instability, and social adjustment. The scores are built using data from several sources, but mainly from a survey of 137 questions. Some of the questions include “How many of your friends have been arrested”, “How often have you moved in the last twelve months”, “In your neighborhood, have some of your friends and family been crime victims”, “Were you ever suspected or expelled from school”, “how often do you have barely enough money to get by”, and “I have never felt sad about things in my life”.

According to the Division of Criminal Justice Service of the State of New York 2012, “[COMPAS-Probation Risk]Recidivism Scale worked effectively and achieved satisfactory predictive accuracy”. The Board of Parole currently uses the score for decision-making. Data compiled by the non-profit Vera show that 40% of people were granted parole in 2020 in NY. In 2014, Connecticut reached 67% of the granted parole rate, Massachusetts 63%, and Kentucky 52%.

Former U.S. former Attorney General Eric Holder commented about the scores that “although these measures were crafted with the best of intentions, I am concerned that they inadvertently undermine our efforts to ensure individualized and equal justice […] may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society.”

Race, nationality, and skin color were often used in making such predictions until about the 1970s, when it became politically unacceptable, according to a survey of risk assessment tools by Columbia University law professor Bernard Harcourt. Despite that it is still targeting underprivileged communities, unable to access welfare. In 2019, African-Americans and Hispanic Origin Groups’ poverty rate was 18.8% and 15.7%, respectively, compared to 7.3% of white people.

The assessment of social projects has shown a decrease in violence among vulnerable communities assisted by income transfer programs in different parts of the world. In the US, the NGO Advance Peace conducted an 18 monthly program targeting members of a community that are at the most risk of perpetrating gun violence and being victimized by it in California. The program includes trauma-informed therapy, employment, and training. The results show a decrease of 55% in firearm violence after the implementation of the program in Richmond. In Stockton, gun homicides and assaults declined by 21%, saving $42.3 -$110M in city expenses in the 2-year program.

In this sense, using algorithms will propagate the current system. Predictions reinforce a dual society in which the wealthy are privileged to receive personalized, humane, and regulated attention, as opposed to vulnerable groups that are condemned to the results of “smart machines”. There is no transparency in those machines, and no effort from companies or governments to educate the public opinion regarding how the decisions are made. In this regard, a scoring system is created to evaluate the vulnerable. The social transformation will come from new policies directed to reduce inequality and promote well-being.

Abstract – Identifying Sexism in Text

Examining social and cultural questions using computational text analysis carries significant challenges. Texts can be socially and culturally situated. They reflect ideas of both, authors and their target audiences. Such interpretation is hard to incorporate in computational approaches.

Research questions should be identified through data selection, conceptualization and operationalization, and end with analysis and interpretation of results.

The person who produces sexist language is not given any space for productive change, but may simply become more entrenched in their position (Post-feminist analysis, 236)

Attempt reforming sexism in language can become a failure if it simply focuses on the eradication of certain phrases and words.

Because of this, approaching digital text collection from a literary textual critic’s perspective might require questioning the context behind the digital text. Rather than working on raw text and relying on results produced by machine processing, it will make more sense to understand the environment, reason and validity of the information provided in the text.

Instead of taking data at face value and looking toward future insights, data scientists can first interrogate the context, limitations and validity of the data under use. This being said, feminist strategy for considering context is to consider the cooking process that produces “raw” data (Klein and D’lgnazio, Numbers don’t speak for themselves, 14)

Researches too easily attribute phraseological differences to gender when in fact other intersecting variables might be at play. As far as gender can be counted as a performative language use in its social context, it is important to avoid dataset and interpretational bias. (These Are Not The stereotypes You are looking for”,15)

D’Ignazio, Catherine; Klein, Lauren; – Data Feminism. 3. On Rational, Scientic, Objective Viewpoints from Mythical, Imaginary, Impossible Standpoints 6. The Numbers Don’t Speak for Themselves. Published on: Mar 16, 2020
Dong Nguyen, Maria Liakata, Simon DeDeo, Jacob Eisenstein, David Mimno, Rebekah Tromble, and Jane Winters. “How We Do Things With Words: Analyzing text as Social and Cultural Data”. Published online 2020 Aug 25
Koolen, C., & van Cranenburgh, A. (2017). These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution. In Proceedings of the First Ethics in NLP workshop (pp. 12-22)

Abstract Proposal; Men in Wonderland: The Lost Girlhood of the Victorian Gentleman

Catherine Robson published a book titled Men in Wonderland: The Lost Girlhood of the Victorian Gentleman in 2001 under Princeton University Press. She is an assistant professor of English at the University of California, Davis, where she specialized in nineteenth-century British literature and culture. In this book, she explores the fascination with little girls in Victorian culture through 19th-century literature by British male authors. In doing so, she reveals the link between idealization of little girls and a wide-spread fantasy of male development. Robson’s argument is that the concept of ‘little girls’ during this era offered an adult male the best opportunity to reconnect with his own lost self.

The individuals work explored in her book include Wordsworth, De Quincey, Dickens, Ruskin, and Carrol. Along with these works of literature, she compares them cultural artifacts during this era, including conduct books, government reports, fine art, and popular journalism.

Along with Robson’s close-reading of literature, there can include a text-analysis on these works to reveal certain patterns and findings that coincide with Robson’s analysis of such works. Such distant-readings of childhood and masculinity in the Victorian Era could contribute to our own contemporary Western understanding of masculinity and femininity in pop-culture.

Unveiling the Patient Journey: A Gender Perspective on Chronic Disease-Centered Care

Abstract

Healthcare is the industry in which customers want to deal with humans. No machines, they want to connect with real people in their most vulnerable time. In this context, women are more likely to be the center of the patient journey: from taking their loved ones to a doctor’s appointment to being the primary care of kids with chronic diseases (such as asthma).

The burden of care is still under women dealing with unpaid work. However, it’s not any better for men. In 2019, Cleveland Clinic conducted a survey which found that 72% of respondents preferred doing household chores like cleaning the bathroom to going to the doctor; 65% said they avoid going to the doctor for as long as possible; 20% admitted they are not always honest with their doctors about their health. On average, men die younger than women in the United States. American women had a life expectancy of 79 years in 2021, compared to 73 for men (CDC, 2022).

The goal of this research is to explore the gender differences in a patient journey by applying a corpus linguistic approach to create and annotate a dataset about chronic disease in Portuguese and English manually using social media data from Facebook, Instagram, YouTube and Twitter. Then, I’m applying text analysis methods to describe the dataset. Lastly, comparing the classification results of Generative AI to the traditional machine learning text analysis.

This analysis also wondered between the benefits and detriments of performing with such analysis. Despite the investment of language model resources, it’s valuable to use AI to uncover gender inequalities. The final goal is open discussion about how to take the burden from women, and also empowering men to feel comfortable about their own health. It’s also open space to discuss new methods exploring different gender classifications.

Goal: This proposal is intended to describe a study of how corpus linguistic and text analysis methods can be used to support research on language and communication within the context of healthcare using social media data about chronic disease in English and Brazilian Portuguese.

The specific goals evolve:

Performing literature review based on previous studies and benchmark datasets in the healthcare field – process finished.
Creating a dataset with social media posts from 2020 to 2023 in social networks such as Twitter, YouTube, Facebook, and local media channels. The dataset is composed of around 7k posts and specifies the patient’s gender, type of treatment/ medication, number of likes and comments – process already finished: dataset here.
Categorizing the corpus according to gender and the patient journey framework, from initial symptoms to diagnosis, treatment, and follow-up care – process already finished: dataset here.
Documenting the dataset and creating a codebook explaining the categories and the criteria for the categorization process – process already finished: code book and the the connection of the ontologies.
Applying categorization based on GPT3 results – in progress
Comparing the results with the manual classification with GPT3 results

Literature review

Several linguistic analyses and corpus analysis studies have investigated the patient journey in healthcare, exploring different aspects of communication between patients and healthcare providers, patient experience, and clinical outcomes. One area of research has focused on the use of language by healthcare providers to diagnose and treat patients. For example, a study by Roter and Hall found that physicians used a directive communication style, using commands and suggestions, more often than a collaborative communication style when interacting with patients. This style can create a power imbalance between the physician and patient, potentially leading to dissatisfaction or miscommunication.

Another area of research has investigated patient experience and satisfaction. A corpus analysis study by Gavin Brookes, and Paul Baker examined patient comments to identify factors influencing patient satisfaction with healthcare services during cancer treatments. They found that factors such as communication, empathy, and professionalism were key drivers of patient satisfaction.

Finally, several studies have investigated the use of language in electronic health records (EHRs) to improve patient care and outcomes. A corpus analysis study by Xi Yang and colleagues examined the use of EHRs and found that natural language processing techniques could effectively identify relevant patient information from unstructured clinical notes.

Overall, the literature on linguistic analyses and corpus analysis studies on healthcare patient journey suggests that communication and language play a critical role in patient care and outcomes. Effective communication between patients and healthcare providers, as well as clear and concise language in patient education materials and EHRs, can lead to improved patient satisfaction, empowerment, and self-management.

Method overview

Data collection: collecting data based on keywords on social media;
Coding data: using qualitative coding and annotation
Data analysis: performing linguistics and statistical analysis

References

“Doctors Talking with Patients—Patients Talking with Doctors: Improving Communication in Medical Visits.” Clinical and Experimental Optometry, 78(2), pp. 79–80

Yang, X., Chen, A., PourNejatian, N. et al. A large language model for electronic health records. npj Digit. Med. 5, 194 (2022). https://doi.org/10.1038/s41746-022-00742-2

Peterson KJ, Liu H. The Sublanguage of Clinical Problem Lists: A Corpus Analysis. AMIA Annu Symp Proc. 2018 Dec 5;2018:1451-1460. PMID: 30815190; PMCID: PMC6371258.

Adolphs, S, Brown, B., Carter, R., Crawford, C. and Sahota, O. (2004) ‘Applying Corpus
Linguistics in a health care context’, Journal of Applied Linguistics, 1, 1: 9-28

Adolphs, S., Atkins, S., Harvey, K. (forthcoming). ‘Caught between professional requirements and interpersonal needs: vague language in healthcare contexts’. In J. Cutting (ed.) Vague Language Explored Basingstoke: Palgrave

Skelton, J.R., Wearn, A. M., and F.D.R. Hobbs (2002) ‘‘I’ and ‘ we’: a concordancing analysis of doctors and patients use first person pronouns in primary care consultations ’, Family Practice, 19, 5: 484-488

Biber, D. and Conrad, S. (2004) ‘Corpus-Based Comparisons of Registers’, in C. Coffin, A.
Hewings, and K. O’Halloran (eds) Applying English Grammar: Functional and Corpus
Approaches London: Arnold

Feminist Text Analysis, Spring 2023

Can there be a feminist text analysis? Feminism, text, and analysis in a computational world