Technologies of Speculation: The Limits of Knowledge in a Data-Driven Society — Sun-Ha Hong (NYU Press, 2020)
Starting from the question, “What counts as knowledge in our data-driven society?”Sun-Ha Hong argues that technologies of datafication are not only transforming the conditions for producing, accessing, wielding, and validating information—they are also articulating a new set of normative demands: that we, as liberal subjects, come to know our social/political world and ourselves within that world through these technologies, regardless of the technologies’ imperfections and contradictions.
Hong maintains that we are called on to live, reason, and act through ‘data-driven insights.” Confronted with technological systems of power and control so vast and complex that “transparency” becomes burdensome rather than empowering, the public becomes overwhelmed by surveillance dragnets and driven to speculation and paranoia. At the same time, provided with a surfeit of recursive data about our desires, habits, and even our bodies, we are encouraged to privilege machinic forms of intelligence and analysis over human experience and affect. In so doing, we’re rewarded for tailoring our behaviors to be compatible with the machines surrounding us—and thus compatible with the institutions behind the machines. For Hong, “datafication is turning bodies into facts: shaping human life, desire, and affect into calculable and predictable forms, and doing so, changing what counts as the truth about those bodies in the first place” (4).
Operating at the nexus of sociology of knowledge, history of science, and critical data studies, Technologies of Speculation grounds its examination into the epistemological consequences of datafication by focusing on two sites: the 2014 Snowden affair (revelations of mass-scale government surveillance) and automated self-tracking (e.g. Fitbit). Though these research sites seem distant at first glance, Hong uncovers remarkable commonalities between state surveillance on the one hand and self-surveillance on the other. In both, the process of turning bodies into facts results not in crystalline objectivity, but in troubling gaps and asymmetries: “If data expands the vistas of human action and judgement, it also obscures them, leaving human subjects to work ever harder to remain legible and legitimate to the machines whose judgment they cannot understand” (7).
So how does this often-philosophical investigation of data-driven, algorithmic intelligence and its discontents relate back to our realm of text analysis? We might start by considering how powerful regimes like the ‘intelligence community’ confront uncertainty when saturated in data. In Chapter 5, “Bodies into Facts,” Hong explores the fragile and improvisational techniques that are used to fabricate so-called “insights” out of massive reams of data and metadata. Attending to public discourse on post-9/11 counter-terrorism and surveillance, he argues that beneath the guise of objectivity, datafication is implemented through “inconsistent and locally specific strategies governing how uncertainties are selected and legitimated as predictive insights” (115). Hong specifies three techniques by which fabrications achieve knowledge status:
- Subjunctivity: “As-if” reasoning, where a hypothetical or unproven situation is operationalized for decision-making (e.g., Acting to prevent the terrorist threat that could have happened; Acting as if you are being surveilled).
- Interpassivity: Something or someone else knows or acts for the subject, in their stead. A kind of subjective outsourcing whereby one is not responsible for a statement/belief (e.g., a rumor heard elsewhere) yet that belief/statement—neither disavowed nor claimed as one’s own—nonetheless provides the basis for actions or opinions. But the other that knows and acts for the subject is not just other people—it van also be a machine.
- Zero-Degree Risk: The dilemma of calculating risk out of uncountable streams of data with the goal of total de-risking: preventing a single outcome (e.g. a terrorist attack). Statistics, probabilities, and equations are invoked to mitigate fundamental, radical uncertainty and legitimate costly regimes of surveillance and control (NYPD counter-terrorism budget and the attacks that ‘might have been’).
Though Hong focuses explicitly on terrorist threats and state surveillance, we can consider how these techniques of fabrication may be operative in computational text analysis. For instance, “subjunctivity” undergirds many classificatory schemes. Cameron Blevins and Lincoln Mullen’s package for inferring gender based on historical first name data works as if gender is a binary, a limitation they explore in depth within their article introducing the method (Blevins and Mullen). More fundamentally, tokenization processes generally require us to treat texts as if they lack punctuation or line-breaks. We must make locally specific choices as to whether words that are capitalized should be treated differently, which will then reshape the predictive insights we can make about frequency, association, topics, and sentiments. Similarly, the process of training a model might be an instance of “interpassivity” when that model is working on a training set that has not been annotated by our own hand. In turn, we encounter issues analogous to risk calculation when weighing questions of model fit and overfit.
But given the scant references to “text” qua text in Technologies of Speculation, it seemed to me that the monograph provided few, if any specific, new avenues of research or new methodologies we might employ in text analysis. Instead, Sun-Ha Hong asks us to step back and consider the moral dimensions of the sort of knowledge production in which we are involved. He writes in the acknowledgements that “this book is about values: the subservience of human values to the rationality of technology and capital and, specifically, the transmutation of knowledge from a human virtue to raw material for predictive control” (201). From this standpoint, methods that may have once appeared value-neutral become morally charged. The study of authorship and style through quantification—sentence length, punctuation, vocabulary size—submits a humanist endeavor to machinic rationality. It isolates “style” from meaning and social/political context, reducing a lifetime of thought to numeric signature, eschewing genealogies of interpretation and contestation among human readers while at times obscuring its own algorithmic innerworkings. Consider another case: binary classification using TensorFlow neural networks. Let’s say we wanted to take a large corpus of prose paragraphs and sort it into “prose poems” and “not prose poems.” We have training data: a whole bunch of prose poetry ranging from Baudelaire to Rosmarie Waldrop, intermixed with paragraphs from magazine pieces, novels, and other texts. But after we train the model, we cannot ask it why or how it chose to call one paragraph a poem and another not. Its reasoning is obscured, and in Hong’s framework, the knowledge of what constitutes poetry has been transmuted into “raw material for predictive control,” where before, for better or worse, it was thought “a virtue.” Perhaps this is all beside the point, but at a time when humanism can feel under threat, where we are hailed by machine intelligence to shape our behaviors according to the whims of the algorithm, the stakes of computational text analysis can feel heightened, even dangerous, after reading Hong’s book.
Works Cited
- Blevins, Cameron, and Lincoln Mullen. “Jane, John … Leslie? A Historical Method for Algorithmic Gender Prediction.” Digital Humanities Quarterly, vol. 009, no. 3, Dec. 2015.
- Hong, Sun-ha. Technologies of Speculation: The Limits of Knowledge in a Data-Driven Society. New York University Press, 2020.