Author Archives: Laird Gallagher

Book Review: Technologies of Speculation

Technologies of Speculation: The Limits of Knowledge in a Data-Driven Society — Sun-Ha Hong (NYU Press, 2020)

Starting from the question, “What counts as knowledge in our data-driven society?”Sun-Ha Hong argues that technologies of datafication are not only transforming the conditions for producing, accessing, wielding, and validating information—they are also articulating a new set of normative demands: that we, as liberal subjects, come to know our social/political world and ourselves within that world through these technologies, regardless of the technologies’ imperfections and contradictions.

Hong maintains that we are called on to live, reason, and act through ‘data-driven insights.” Confronted with technological systems of power and control so vast and complex that “transparency” becomes burdensome rather than empowering, the public becomes overwhelmed by surveillance dragnets and driven to speculation and paranoia. At the same time, provided with a surfeit of recursive data about our desires, habits, and even our bodies, we are encouraged to privilege machinic forms of intelligence and analysis over human experience and affect. In so doing, we’re rewarded for tailoring our behaviors to be compatible with the machines surrounding us—and thus compatible with the institutions behind the machines. For Hong, “datafication is turning bodies into facts: shaping human life, desire, and affect into calculable and predictable forms, and doing so, changing what counts as the truth about those bodies in the first place” (4).

Operating at the nexus of sociology of knowledge, history of science, and critical data studies, Technologies of Speculation grounds its examination into the epistemological consequences of datafication by focusing on two sites: the 2014 Snowden affair (revelations of mass-scale government surveillance) and automated self-tracking (e.g. Fitbit). Though these research sites seem distant at first glance, Hong uncovers remarkable commonalities between state surveillance on the one hand and self-surveillance on the other. In both, the process of turning bodies into facts results not in crystalline objectivity, but in troubling gaps and asymmetries: “If data expands the vistas of human action and judgement, it also obscures them, leaving human subjects to work ever harder to remain legible and legitimate to the machines whose judgment they cannot understand” (7).

So how does this often-philosophical investigation of data-driven, algorithmic intelligence and its discontents relate back to our realm of text analysis? We might start by considering how powerful regimes like the ‘intelligence community’ confront uncertainty when saturated in data. In Chapter 5, “Bodies into Facts,” Hong explores the fragile and improvisational techniques that are used to fabricate so-called “insights” out of massive reams of data and metadata. Attending to public discourse on post-9/11 counter-terrorism and surveillance, he argues that beneath the guise of objectivity, datafication is implemented through “inconsistent and locally specific strategies governing how uncertainties are selected and legitimated as predictive insights” (115). Hong specifies three techniques by which fabrications achieve knowledge status:

  • Subjunctivity: “As-if” reasoning, where a hypothetical or unproven situation is operationalized for decision-making (e.g., Acting to prevent the terrorist threat that could have happened; Acting as if you are being surveilled).
  • Interpassivity: Something or someone else knows or acts for the subject, in their stead. A kind of subjective outsourcing whereby one is not responsible for a statement/belief (e.g., a rumor heard elsewhere) yet that belief/statement—neither disavowed nor claimed as one’s own—nonetheless provides the basis for actions or opinions. But the other that knows and acts for the subject is not just other people—it van also be a machine.
  • Zero-Degree Risk: The dilemma of calculating risk out of uncountable streams of data with the goal of total de-risking: preventing a single outcome (e.g. a terrorist attack). Statistics, probabilities, and equations are invoked to mitigate fundamental, radical uncertainty and legitimate costly regimes of surveillance and control (NYPD counter-terrorism budget and the attacks that ‘might have been’).

Though Hong focuses explicitly on terrorist threats and state surveillance, we can consider how these techniques of fabrication may be operative in computational text analysis. For instance, “subjunctivity” undergirds many classificatory schemes. Cameron Blevins and Lincoln Mullen’s package  for inferring gender based on historical first name data works as if gender is a binary, a limitation they explore in depth within their article introducing the method (Blevins and Mullen). More fundamentally, tokenization processes generally require us to treat texts as if they lack punctuation or line-breaks. We must make locally specific choices as to whether words that are capitalized should be treated differently, which will then reshape the predictive insights we can make about frequency, association, topics, and sentiments. Similarly, the process of training a model might be an instance of “interpassivity” when that model is working on a training set that has not been annotated by our own hand. In turn, we encounter issues analogous to risk calculation when weighing questions of model fit and overfit.

But given the scant references to “text” qua text in Technologies of Speculation, it seemed to me that the monograph provided few, if any specific, new avenues of research or new methodologies we might employ in text analysis. Instead, Sun-Ha Hong asks us to step back and consider the moral dimensions of the sort of knowledge production in which we are involved. He writes in the acknowledgements that “this book is about values: the subservience of human values to the rationality of technology and capital and, specifically, the transmutation of knowledge from a human virtue to raw material for predictive control” (201). From this standpoint, methods that may have once appeared value-neutral become morally charged. The study of authorship and style through quantification—sentence length, punctuation, vocabulary size—submits a humanist endeavor to machinic rationality. It isolates “style” from meaning and social/political context, reducing a lifetime of thought to numeric signature, eschewing genealogies of interpretation and contestation among human readers while at times obscuring its own algorithmic innerworkings. Consider another case: binary classification using TensorFlow neural networks. Let’s say we wanted to take a large corpus of prose paragraphs and sort it into “prose poems” and “not prose poems.” We have training data: a whole bunch of prose poetry ranging from Baudelaire to Rosmarie Waldrop, intermixed with paragraphs from magazine pieces, novels, and other texts. But after we train the model, we cannot ask it why or how it chose to call one paragraph a poem and another not. Its reasoning is obscured, and in Hong’s framework, the knowledge of what constitutes poetry has been transmuted into “raw material for predictive control,” where before, for better or worse, it was thought “a virtue.” Perhaps this is all beside the point, but at a time when humanism can feel under threat, where we are hailed by machine intelligence to shape our behaviors according to the whims of the algorithm, the stakes of computational text analysis can feel heightened, even dangerous, after reading Hong’s book.

Works Cited

  • Blevins, Cameron, and Lincoln Mullen. “Jane, John … Leslie? A Historical Method for Algorithmic Gender Prediction.” Digital Humanities Quarterly, vol. 009, no. 3, Dec. 2015.
  • Hong, Sun-ha. Technologies of Speculation: The Limits of Knowledge in a Data-Driven Society. New York University Press, 2020.

HDTMT: Locating the beautiful, picturesque, sublime and majestic

Locating the beautiful, picturesque, sublime and majestic: spatially analysing the application of aesthetic terminology in descriptions of the English Lake District

https://doi.org/10.1016/j.jhg.2017.01.006

Authors/Project Team:
Christopher Donaldson – Lancaster University
Ian N. Gregory – Lancaster University
Joanna E. Taylor – University of Manchester

WHAT IT IS

An investigation of the geographies associated with the use of a set of aesthetic terms (“beautiful,” “picturesque,” “sublime,” and “majestic”) in writing about the English Lake District, a region in the northwest of England with a long and prestigious history of representation in English-language travel writing and landscape description, notably in the 18th and 19th centuries. The Lake District has been a particular focus within the field of spatial humanities for well over a decade, motivated in part by “an awareness of the braided nature of the region’s socio-spatial and cultural histories; and an understanding of this rural, touristic landscape as a repeatedly rewritten and imaginatively overdetermined space” (Cooper and Gregory 90).

Focusing on the four aforementioned terms, which exemplify a new language of landscape appreciation emerging in late 18th century British letters, Donaldson and his co-authors intend to “demonstrate what a geographically orientated interpretation of aesthetic diction can reveal about the ways regions like the Lake District were perceived in the past” (44).

Through this case study, the authors introduce the method of “geographical text analysis,” which they locate at the nexus of aesthetics, physical geography, and literary study. The project combines corpus linguistics with geographic information systems (GIS) in a novel fashion.

Primary Data Source:

  • Corpus of Lake District Writing, 1622-1900 (Github)

The corpus contains 80 manually digitized texts totaling over 1.5-million word tokens.

Natural language processing (NLP) techniques were used to identify place names and assign these names geographic coordinates—a method called “geoparsing.” But the project members also went beyond what was possible at the time with out-of-the-box NLP libraries and geoparser tools in order to deeply annotate the texts, linking place-name variants and differentiating a wide range of topographical features. As such, the corpus “forms a challenging testbed for geographical text analysis methods” (Rayson et al.).

What you’d need to know to conduct “geographical text analysis”:

Step 1: Geoparsing

If your corpus is not already annotated, you will need to “geoparse”—convert place-names into geographic identifiers.

Geoparsing involves two stages of NLP:

  • Named Entity Recognition (NER) – a method for automatically extracting placenames from text data
  • Named Entity Disambiguation (NED) – a method for linking the extracted and identified terms with existing knowledge, enabling cross-referencing and connections to metadata such as geo-spatial information.

Tools:

Step 2: Collocation analysis.

The authors go about identifying the specific geographies associated with “beautiful,” “picturesque,” “sublime,” and “majestic” by noting when those terms appear alongside placenames. Thus, the authors develop a dataset of placename co-occurrences or “PNCs” extracted from their corpus. They then assess the frequency of co-occurrence to determine the statistical significance of the association between a given place and one of the aesthetic terms.

Tools:

Step 3: Spatial analysis

With the statistically significant PNCs identified, the authors use geoparsing tools to assign latitude/longitude (mappable) coordinates to each PNC. This enables researchers to analyse the spatial distribution of PNCs through GIS software such as ArcGIS, creating standard dot maps as well as density-smoothed maps. They also use Kulldorf’s Spatial Scan Statistic (traditionally an epidemiological statistic) to identify clusters.

With sophisticated GIS, they can map the spatial coordinates of the PNCs onto topographical and geological datasets, enabling a rich understanding of how places described as “majestic,” for example, map onto different elevations or different geological formations.

Digital terrain models (DTMs) or Digital Elevation Models (DEM) are vector and raster maps that can be imported into GIS tools if they are not already included. National geological surveys provide geology data in the form of GIS line and polygons that can be matched with PNC spatial metadata.

Tools:

Results

Donaldson et al.’s geographic analysis yields some striking findings on how the four aesthetic terms are applied to the Lake District landscape, which the authors summarize thusly:

As we have seen, whereas beautiful and, more especially, picturesque are often associated with geographical features set within, and framed by, their environment, majestic is more typically associated with features that rise above or extend beyond their surroundings. Sublime, true to Burke’s influential conception of the term, stands apart from these other terms in being associated with formations that are massed together in ways that make them difficult to differentiate […] The distinctive geographies associated with the terms beautiful and picturesque, on the one hand, and majestic and sublime, on the other, confirm that the authors of the works in our corpus were, as a whole, relatively discerning about the ways they used aesthetic terminology.

(Donaldson et al. 59)

References Cited:

Cooper, David, and Ian N. Gregory. “Mapping the English Lake District: A Literary GIS.” Transactions of the Institute of British Geographers, vol. 36, no. 1, 2011, pp. 89–108.

Donaldson, Christopher, et al. “Locating the Beautiful, Picturesque, Sublime and Majestic: Spatially Analysing the Application of Aesthetic Terminology in Descriptions of the English Lake District.” Journal of Historical Geography, vol. 56, Apr. 2017, pp. 43–60. ScienceDirect, https://doi.org/10.1016/j.jhg.2017.01.006.

Rayson, Paul, et al. “A Deeply Annotated Testbed for Geographical Text Analysis: The Corpus of Lake District Writing.” Proceedings of the 1st ACM SIGSPATIAL Workshop on Geospatial Humanities, Association for Computing Machinery, 2017, pp. 9–15. ACM Digital Library, https://doi.org/10.1145/3149858.3149865.