Daily Archives: April 24, 2023

Blog Post: AI, Correlation, Homophily, and Topic Modeling

I have read the first three chapters of Artificial Unintelligence. Broussard discusses the issues in applying computer technology to every aspect of our lives and the limitations of artificial intelligence. She gives the name “Technochauvinism” to the idea that computers could always solve all human problems. The example she writes about AlphaGo is pretty straightforward, telling us computers work very well or “intelligently” in a highly structured system, just like the “Smart Games Format.” However, AI or computers can lead to algorithmic injustice and discrimination due to the data or orders the creators feed them.

Chun discusses the concepts of “correlation,” “homophily,” and “network science” in her article “Queerying Homophily” in Pattern Discrimination. I also found her explanation in this interview, Discriminating Data: Wendy Chun in Conversation with Lisa Nakamura amazing; she explained how correlation is treated by eugenics and big data as a future predictor, which actually turns out to close the future. I understand what she means by “closing the future” is closing alternatives for the future that are not predictable by the current and past data. I understand homophily is something amplified by people sharing their previous knowledge, experience, behavior, preferences, etc. Echo chambers and bubbles are generated by this homophily data collection and analysis method, further reinforcing the segregation in our society.

I am trying to make a connection here between the above two readings and the Topic Modeling articles for this week. My question for topic modeling will be how to avoid oversimplifying the context, figurative language, and nuance in discovering the topics. If an alternative reading is possible, how can this kind of representation be included in deciding the topics? For example, while the Latent Dirichlet Allocation (LDA) method could give you the percentage of possibility in similarities for topics, is it possible to address something like the impossible topics or the farthest options, or any other kinds of correlations?

The readings for this week remind me of a conversation my friend and I had a long time ago. We think math is a very romantic field because it starts with an agreement between two people that 1 + 1 = 2. If we don’t have a mutual agreement like this, the world will change accordingly. For example, I read that someone was trying to train ChatGPT that 1 + 1 = 3, and the machine will eventually accept it after a bit of struggle. To summarize my post, I believe we need to revisit concepts like “homophily” and “correlation” to keep space for agreement and mutual understanding, going beyond the aim of finding superficial similarities.

Response blog on topic modeling

The ones who are invited or planning to be in the buffet must have the idea of what is served on the menu. The background knowledge that is what I am referring to. The one perhaps is more important than the computational process or even the decision making process of how many latent topics are waiting to be discovered within a seemingly large pool of documents aka corpus. The magic is fascinating but overwhelming if the magician can not communicate with the audience. In the case of topic modeling, the magic is the machine with computational ability that does not have the ability to express on its own(not sentient!!!). There is a role for a magician, an expert to make the magic audience enchanting, putting context to the result.

Let me provide an example. A while back, I conducted an experiment of topic modeling on the DigitalNZ archive of historical newspapers. The result is the following interactive illustration of topics. I decided to uncover 20 topics that are more prevalent during the New Zealand Wars in the 1800s.

LDA

The interactive visualization is available in the following URL

https://zicoabhidey.github.io/pyldavis-topic-modeling-visualization#topic=0&lambda=1&term=

I played the role of the magician to demystify the result and present it to a broader audience who is not a historian by no means. I used intuition to and lent the superpower of Google to support my intuition to derive the bowl that represents the topic. Below is the result I came up with. I could not produce all the 20 topics I was hoping to find.

TopicsExplanation
“gun”, “heavy”, “colonial_secretary”, “news”, “urge”, “tax”, “thank”, “mail”, “night”‘Implying political movement and communication during the pre-independence declaration period.
“bill”, “payment”, “say”, “issue”, “sum”, “notice”, “pay”, “deed”, “amount”, “person”Business-related affairs after the independence declaration.
“distance”, “iron”, “firm”, “dress”, “black”, “mill”, “cloth”, “box”, “wool”, “bar”Representing industrial affairs mostly related to garments.
“Vessel”, “day”, “take”, “place”, “leave”, “fire”, “ship”, “native”, “water”, “captain”Represent maritime activities or war from a port city like Wellington.
“land”, “acre”, “company”, “town”, “sale” , “road”, “country”, “plan”, “district”, “section”Representing real-estate-related activities.
“year”, “make”, “receive”, “take”, “last”, “state”, “new”, “colony”, “great”, “give”No clear association.
“sail”, “master”, “day”, “passage”, “auckland”, “port”, “brig”, “passenger”, “agent”, “freight”Representing shipping activities related to Auckland port.
“Say”, “go”, “court”, “take”, “kill”, “prisoner”, “try”, “come”, “witness”, “give”Representing judicial activities and crime news.
“boy”, “pull”, “flag_staff”, “mount_albert”, “white_pendant”, “descriptive_signal”, “lip”, “battle”, “bride”, “signals_use”Representing traditional stories about Maori Myth and Legend regarding mount Albert.
Table 1: some of the topics and explanations from gensim LDA model
TopicsExplanation
‘land’, ‘company’, ‘purchase’, ‘colony’, ‘claim’, ‘price’, ‘acre’, ‘make’, ‘system’, ‘title’Representing real-estate-related activities.
‘native’, ‘man’, ‘fire’, ‘captain’, ‘leave’, ‘place’, ‘officer’, ‘arrive’, ‘chief’,  ‘make’Representing news regarding New Zealand War. 
‘government’, ‘native’, ‘country’, ‘settler’, ‘colony’, ‘man’, ‘act’, ‘people’, ‘law’Representing news about the sovereignty treaty signed in 1835.
‘mile’, ‘water’, ‘river’, ‘vessel’, ‘foot’, ‘island’, ‘native’, ‘side’, ‘boat’, ‘harbour’Representing maritime activities from a port city like Wellington.
‘settlement’, ‘company’, ‘make’,’war’, ‘place’, ‘port_nicholson’, ‘settler’, ‘state’, ‘colonist’, ‘colony’Representing news about Port Nicholson during the war in Wellington 1839
Table 2: some of the topics and explanations from gensim Mallet model

After working long hours on this project, although I am super delighted that I have produced an interactive visualization of the mallet model which is hard to produce, there is always a feeling of disappointment that I did not have the knowledge of the historian. A historian with special knowledge of New Zealand’s history might have judged better.

Attending buffet without the knowledge of the menu is like sailing a boat without the compass but isn’t it what the distant reading is? A calculated leap of faith.