Daily Archives: May 26, 2023

Book Review: What Is ChatGPT Doing … and Why Does It Work?

https://www.goodreads.com/book/show/123245371-what-is-chatgpt-doing-and-why-does-it-work

ChatGPT is a sophisticated computational model of natural language processing that is capable of producing coherent and semantically meaningful text through incremental word addition. This is accomplished by scanning and locating every occurrence of the text in question from an extensive corpus of human-authored text.Afterward, the system generates a ranked list of potential subsequent words, accompanied by their corresponding probabilities. The fascinating thing is that when we ask ChatGPT with a prompt like “Compose an Essay”, all it does is ask, “Given the text so far, what should the next word be?” over and over again, and then it adds a word one after another. However, if the model uses the highest-ranked words all the time, the essay it will produce will not be creative. The system utilizes a “temperature” parameter to decide how often lower-ranked probable words will be used. “Temperature” is one of the tunable parameters of the model, ranging from 0 to 1, where 0 produces the flattest essay with no creativity and 1 is the most creative.

The foundation of neural networks is numerical. Therefore, in order to utilize neural networks for textual analysis, it is necessary to establish a method for numerical representation of the text. This concept is fundamental to ChatGPT, which employs an embedding to attempt to represent the entities as a set of numbers. To create this kind of embedding, we must examine vast quantities of text to determine how similar the contexts in which various words appear are. Word embedding discovery requires beginning with a trainable job using words, such as word prediction. For instance, to solve “the ___ cat” problem, let’s say among the 50000 most common words used in English, “the” is 914 and “cat” (with a space before it) is 3542. Then the input is {914, 3542}. Output should be a list of approximately 50,000 numbers representing the probabilities for each of the potential “fill-in” terms. By intercepting the embedded layer, the neural network reaches its conclusion about which terms are appropriate.

Human language and the cognitive processes involved in its production have always appeared to be the pinnacle of complexity. However, ChatGPT has shown that a completely artificial neural network with a large number of connecting nodes resembling connected neurons in the brain can generate human language with amazing fidelity. The fundamental reason is that language is fundamentally simpler than it appears, and ChatGPT effectively captures the essence of human language and the underlying reasoning. In addition, ChatGPT’s training has “implicitly discovered” whatever linguistic (and cognitive) patterns make this possible.

Syntax of language and parse trees of language are two well-known examples of what can be considered “laws of language,” and there are (relatively) clear grammatical norms for how words of various types can be combined, such as nouns may have adjectives before them and verbs after them, but two nouns usually can’t be immediately next to each other. ChatGPT has no explicit “knowledge” of such principles, but through its training, it implicitly “discovers” and subsequently applies them. The most crucial information presented here is that a neural net can be taught to generate “grammatically correct” sequences, and there are several methods to deal with sequences in neural nets, including the use of transformer networks, which is what ChatGPT accomplishes. Like Aristotle, who “discovered syllogistic logic” by studying many instances of rhetoric, ChatGPT is projected to do the same by studying vast quantities of online literature with its 175B parameters.

Book Review: The Digital Black Atlantic as a resource for feminist textual analysis.

The Digital Black Atlantic, ed. Roopika Risam and Kelly Baker Josephs (U Minnesota P: 2021), print and digital editions.

https://www.upress.umn.edu/book-division/books/the-digital-black-atlantic

Part of the Debates In the Digital Humanities series edited by Matt Gold and Lauren Klein, The Digital Black Atlantic gathers a wide group of experts in Africana studies from across the globe to consider the intersection of digital humanities and the study of African diasporas from a post-colonial perspective. Using Paul Gilroy’s foundational 1993 concept as a way to approach the long history of “the interstices of Blackness and technology” in order to work towards “a recognizable language and vocabulary . . . that spans the breadth of interdisciplinary scholarship in digital studies and digital humanities—including disciplines as varied as literary studies, history, library and information science, musicology, and communications,” editors Roopika Risam and Kelly Baker Josephs take “the Black Atlantic” in its broadest global sense as method and “object of study” (x). As such, the volume offers a helpful counterpart to questions raised by Feminist Text Analysis, including the interchange between theory and practice, even as its ultimate impact goes well beyond a feminist application. Acknowledging that this is a reductive reading, then, this book review will suggest how would-be feminist-text-analysis practitioners might find useful theoretical and practical examples in this important collection.

Grouped into four sections following Risam’s and Josephs’ Introduction—Memory, Crossings, Relations, Becomings—the twenty essays provide a range of methodologies and disciplinary areas from which to learn. Several focus on soundscapes and music, others on mapping and data visualizations—including video representations and immersive 3D simulations, still others emphasize the need for qualitative analysis in the construction of digital knowledges and for mindfulness of community applications and engagement. While all the pieces have something to offer feminist digital humanists, I will focus on three or four that are particularly suggestive in relation to the questions and challenges of Feminist Text Analysis raised by our course.

Amy E. Earhart’s “An Editorial Turn: Reviving Print and Digital Editing of Black-Authored Literary Texts,” emphasizes the need to engage both print and digital media in the project of textual recovery of a minoritized group’s writings. The essay’s focus on the limits of facsimile editions, particularly when texts and authors were under particular pressure to accommodate resistant reception, reminds us of the multiple mediations and lives of a single work and the importance of reconstructing their contexts for an understanding of the text’s potential intervention into socio-political conditions. Using two examples of information lost in productions of the facsimile series, the Collected Black Women’s Narratives, Earhart notes how the desire for consistency, similarity, or put another way, homophilia, “conceals the ways that the materiality of the texts indicates differences in authorial authority, notions of radicalness, and even difference in the gaze on the Black female body” (34) in works by Susie King Taylor and Louisa Picquet. She argues that a well-edited digital edition has the potential to allow “materials to be presented more fully because it does not face the space or economic constraints of print publication” including attention to color images, covers and frontispieces, as well as prefaces written by white male editors and publishers. Her final plea for making “careful editing a long overdue priority” particularly resonates for the recovery of Black female-authored texts that make up the bulk of her surprising examples of neglected works by or unremarked interventions in the texts of such celebrated writers as Zora Neale Hurston, Nella Larsen, and Toni Morrison, including the renaming of the titles of Passing and Paradise respectively.

In “Austin Clarke’s Digital Crossings,” Paul Barrett demonstrates how “the productive acts of translation required to move between the digital and the textual and that are inherent in the digital interpretive act” (85) demand that we recognize both the promise and the limits of literary textual analysis tools like topic modeling when approaching authors like Clarke whose works exemplify the “acts of crossing” inherent in the Black Atlantic diasporic imaginary. Revealing how “[t]hese acts of crossing . . . run counter to the intuition of topic modelling, which attempts to identify the thematic structure of a corpus and isolate themes from one another to make them readily identifiable” (86), Barrett provides a table of Topic Proportions and Topical Keywords that underline “the incommensurability between Clarke’s aesthetics of crossing and topic modeling algorithms” (88). Rather than marking this endeavor as a “failure of method” Barrett uses it to identify the need for a “methodology of digital humanities research that emerges our of an engagement with Black Atlantic politics and textuality” (86), the need for a new set of questions, or “a need to conceive of the method differently” (88)—a self-described “reflexive approach to topic modeling” (88) which feminist text analysis at its best also advocates. Like Earhart’s, Barrett’s close analysis indicates the consistent imbrication of race and gender identities in their textual examples: Barrett’s investigation into Clarke’s nation language and Creolization finds that “speaking in nation language is a decidedly masculine pursuit in Clarke’s work” (89). Barrett ends by summarizing “three important dimensions of what might be conceived of as a critical digital humanities” one we might consider equally urgent for a feminist digital practice of recovery and analysis: “rendering a text worldly, resisting the positivism of computational logic by working to represent the presence of absence, and recognizing that the act of ‘making it digital’ is actually a re-formation of the text into something new” (90-91).

Anne Donlon’s “Black Atlantic Networks in the Archives and the Limits of Finding Aids as Data” offers important comparisons between metadata applied at different times and to different archives that problematize looking for cultural network histories for specific groups. “People researching minoritized subjects not overtly represented in collections have learned to read between the lines and against the grain to find their subjects” (168); similarly “digital methods and tools might offer increased access and possibilities for understanding archival collections in new ways, but they do not do so inherently” (169). Donlon proceeds to chart her own experiences working with collections in the Manuscripts, Archives, and Rare Book Library at Emory as a case study for how to revise one’s expectations and outcomes when confronted with disparities and lacunae in the archive. She explains, “Rather than try to read these networks as representative of cultural and historical networks, I came to read them on a more meta level, as representative of how collections are described and arranged” (177). Suggesting that “Perhaps, then, we could develop methods to read data from finding aids against the grain . . . to identify bias . . . [t] imagine new structures to describe collections” (177), Donlon extends Barrett’s “presence of absence” [op cit] to the bibliographical textual corpus itself, seeing it as well as a site for reproduction of power imbalances and the reification of prior canon-building.

Finally, Kaiama L. Glover’s and Alex Gil’s exchange, “On the Interpretation of Digital Caribbean Dreams,” offers a welcome corrective to a familiar tension between the “theory” brought by literary critics to the texts and the “tools” provided by digital designers and makers or, as Barrett puts it, “the difficult dialogue between the texts we study and the digital tools we use” (90). Barrett’s suggestion that “dwelling in the space between the incommensurability of the text and the digital tools suggests the possibility of a worldly digital humanities practice that eschews traditional forms of humanities and humanism” (90), gains traction in Gil’s emphasis on deploying minimal computing for his and Glover’s joint project. Rather than a division of labor in which the “dreams of the humanist too easily . . . end . . . up piped into existing visualization frameworks or . . . some D3 templates . . . in [this] case, the interpreters made the work interesting to themselves by exploiting the fortuitous intersection between cultural analytics and knowledge architectures” (231). By engaging “interpreters” and “dreamers” as equals rather than as “implementers” (232) and “auteurs” (231) and by resisting ready-made tools in favor of a slower but ultimately more streamlined “infrastructure, workflow, and pursestring” (231), Gil and Glover show another dimension and set of rewards for self-reflexive approaches to digital humanities projects. Or as Glover puts it, “it involved . . . getting me to see that there is no magic in my PC, that anything we were able to make would involve knowledge, transparency, ethics, error, and labor” (228). Ultimately, it is the acknowledgement of those conditions that offers the best hope for a truly feminist textual analysis.

Coda on the relative virtues of the print and digital editions:

Functionality—the Manifold edition allows annotations to be seen and shared, highlighting etc. including by our program colleagues and classmates (Majel Peters!), and easy access to links to other sites and references.

The print copy is a pleasure to read, with quality paper, legible lay-out, well-designed space, & attractive typography. It also keeps its extradiagetical materials easily at hand, with endnotes and bibliography at end of each essay. However, it has no searchable index, unlike the digital edition.

It should be said that the cover & opening pages and section divisions are designed for print but retained in the digital edition leading to some moments of static and illogical design. For example, the cover’s suggestive and striking iridescent bronze lettering only comes alive with a light source or manual use and ceases to function in digital form.

Portolio Link

https://lolusername.github.io/text-analysis-methods/

How Did They Make That: The DECM Project

Title: Digging into Early Colonial Mexico (DECM) Historical Gazetteer

What it is:

A searchable digital geographic dictionary of 16^th-17^th-century Mexican toponyms, their present-day place names, and geographical coordinates.

It includes a GIS dataset with shapefiles containing geographic information for early colonial administrative, ecclesiastical, and civic localities–from provinces, to dioceses, to villages–for interactive mapping.

It deploys, for its historical data, the 16th-century Relaciones Geogràficas de Nueva España compiled from 1579-1585 in Madrid, that was based on a 1577 questionnaire sent to civic sites in New Spain, as well as modern editions of the RG, related secondary studies, and similar compilations for the province of Yucatán.

How (well) it works:

I’ve been obsessed with this database and the larger project of which it is a part—“Digging Into Early Colonial Mexico”—since I discovered it in 2021. https://www.lancaster.ac.uk/digging-ecm/

It applies a range of computational techniques, including Text Mining, Geographic Information Systems, and Corpus Linguistics, to render newly usable an early exemplar of imperial technology, the printed administrative survey, that was compiled in the late 17^th century into a multivolume, multimodal, multilingual work. Through semi-automated access, applied language technologies, and geospatial analysis, this early modern textual corpus and its several thousand pages become uniquely functional despite their resistance to easy translation into contemporary digital form.

It also exemplifies the interdisciplinary and inter-institutional possibilities of Digital Humanities, bringing together specialists from a number of research centers, universities, and academic disciplines to share knowledge and professional experience in new ways. And its acknowledgement of manual disambiguation as a necessary part of the process confirms the need for the slow, careful, historically and contextually aware analog-to-digital transformation foregrounded in current theories of feminist and post-colonial digital praxis.

What you’d need to have, know, or use:

Comprehensive primary sources in print pdf or digital editions.

Present-day cartographic, geographical, or toponymic databases: eg. GeoNames; National Geospatial-Intelligence Agency (NGA) place names database; Getty Thesaurus of Geographical Names. (see Murrieta-Flores 2023 for topographic and toponymic catalogues and databases specific to the colonial New Spain period.)

Adobe Acrobat or Google Drive (OCR conversion from PDF image format to machine-readable text/txt format)

Excel for shapefiles, tables, combined information (xy coordinates, notes, bibliographical references), and metadata.

ArcGIS or equivalent GIS Desktop tool for shapefiles: to join colonial toponyms to current place names for linguistic and spatial disambiguation (may require manual input and review) and to create layers.

Alternatives or additions in extreme cases:

Named Entity Recognition tool in (eg) Recogito (for extracting place names from modern, European, monolingual documents) with Natural Language Processing platform like Tagtog for model training.

ArcMap and Google Earth for advanced spatial disambiguation

Useful links:

For more on the making or potential applications of the DECM Historical Gazetteer:

Murrieta-Flores, P. (2023) The Creation of the Digging into Early Colonial Mexico Historical Gazetteer. Process, Methods, and Lessons Learnt. Figshare. 10.6084/m9.figshare.22310338

https://github.com/patymurrieta/Digging-into-Early-Colonial-Mexico

https://docs.google.com/document/d/1yC5-lDeN-piIJaDC2kAVfIqi1YqBeN2A8_Ft-Cskyq4/edit

https://storymaps.arcgis.com/stories/9c6efb33ef2b4afdab3c9c6865dbb4cc

For additional inspiring projects coming out of this database:

https://www.lancaster.ac.uk/digging-ecm/2019/07/pathways-to-understanding-16th-century-mesoamerica/

Response Blog Post: Week 10

In a lecture on mass media, one of my undergraduate professors posited a question: “Does media shape culture, or does culture shape media?” As we would come to learn throughout the course, the answer wasn’t either but both.

The same is true for technology. In the series foreword for “Pattern Discrimination” by Apprich, et al., the authors quote Friedrich Kittler: “Media determine our situation.” In the series of articles that follows, authors address points in the continuum between technological determinism and social constructionism, while explicating issues ranging from homophily in network science to the politics of pattern recognition.

Throughout the Text Analysis course we’ve learned about various tools and methodologies. We’ve also been encouraged to think critically about data, its collection, and use. And while some of the dystopian futures presented in the series are more imaginative, some (especially regarding pattern bias) are present, here and now, affecting and shaping our lives.

In “Queerying Homophily” part of the series, Wendy Chun states: “It is critical that we realize that the gap between prediction and reality is the space for political action and agency.” There are also other spaces, at the personal and interpersonal-level where we can contribute to how society and technology are shaped, experienced, and lived. There are decisions made on every level of data collection, manipulation, and programming. There are also decisions we make in how we interact, petition, and talk about our workspaces, communities, and experiences.

This week, we were introduced to a basic TensorFlow template for creating a predictive model for text. The immediate possibilities were exciting; however, as the articles emphasize, excitement should be tempered by thought, action by caution, and seek “unusual collaborations that both respect and challenge methods and insights, across disciplines and institutions” as stated by Chun.

As tools proliferate, how we consider and utilize tools become more and more important. But perhaps more practically, how we view reality is of equal importance–the communities, language, and environment that surround us. Chun says: “Rather than similarity as breeding connection, we need to think, with Ahmed, through the generative power of discomfort.” Productive discomfort holds the potential of creating more human and inclusive patterns.

HDMT – Atilio

Development of ASKode

I created a simple web app, ASKode to help me code, I can ask the ChatGPT api a question about a small coding project and it will reply.

This open-source coding assistant aids in the process of learning and understanding code, making technology more approachable and accessible to beginners or to those with non-traditional backgrounds in coding. Such initiatives can help to disrupt the gender and racial disparities in the tech world and work towards a more diverse and inclusive tech community.

GITHUB LINK

ASKode’s development involved several steps, beginning with identifying the main requirements for the application: It should be able to answer coding-related questions based on the user’s local codebase.

Choosing the Technology Stack: The first step was choosing the technology stack for the project. As ASKode is a relatively simple app with no front-end, Node.js was chosen for its simplicity and compatibility with OpenAI’s GPT-3 API.
Setting up the OpenAI API: The next step was integrating GPT-3 into the application. This required obtaining an API key from OpenAI and setting it up to be used in the application.
Creating the Routes: Once the tech stack was in place and the API was set up, I created the necessary route in the application: the ‘ask’ route.
Developing the Answer Generation Process: The core of ASKode is the answer generation process. When a POST request is sent to the ‘ask’ route, the application extracts the question from the request body, and sends it to GPT-3. The model generates an answer based on the question and the content of the user’s local codebase, and this answer is then returned to the user.

Usage of ASKode

To use ASKode, follow these steps:

Clone the Repository: Clone the ASKode repository to your local machine.
Navigate to the Root Directory: Navigate to the root directory of the project in your command line.
Install the Dependencies: Run the command “npm install” to install the required dependencies.
Set up the API Key: Set the environment variable “API_KEY” to your OpenAI API key.
Set the Local Directory: Set the path to your local directory containing the code files by passing it as a command line argument when launching the app: npm run start -- /path/to/your/code.
Access the App: Go to the home page using http://127.0.0.1:5000/ or http://localhost:5000/.
Use the ‘ask’ Route: To use the app, send a POST request to the ‘ask’ route with a JSON body containing the question you want to ask, like so: { "question": "What is the purpose of this function?" }. The app will use GPT-3 to generate an answer to the question based on the code files in the specified directory

Feminist Text Analysis, Spring 2023

Can there be a feminist text analysis? Feminism, text, and analysis in a computational world