Author Archives: Disha Kanada

Abstract for Round Table

Artificial Intelligence (AI) systems are developing way too quickly with advanced Deep Neural Networks (DNNs) that enacts like biological neurons. Therefore, similarities between humans and AI are nothing but expected. For this to happen, AI needs to be trained and exposed to real world data. The problem of biasness occurs here. The datasets through which they get trained are not sufficiently diverse (for example, in facial recognition systems) and they are gender biased as well. The worst part is that the model can show high accuracy, but it will be biased (that usually goes unnoticed because of the prevailing supremacy of certain groups of people). Even Data Feminism ‘s “Data is Power” chapter talks about the failed systems in computational world due to an unequal distribution of the power that benefits small group of people at the expense of everyone else.

We can see various examples of gender bias adopted by the AI models and replicating the outdated views (at least not how we want our society to progress). For example, if the training dataset does not have enough contributing women, then there will be holes in AI’s knowledge as well. So, if the AI, wired with such biasness, gets standardized, then that is a big problem. If AI fails to understand the fundamental power differentials between women and men, then is feminist text analysis possible using deep neural network system of AI without any biasness? May be, if the feminist approaches are introduced at the initial phase of training an AI model, then there is still some hope. However, my stance lies in the opposite side as well. If biases are unavoidable in real-life, then how is it possible to not make it an unavoidable aspect of new technologies. After all, it is created by humans, based on the human brain, and is trained on data created by humans which makes it more complex. The solution that I see over here is a need for diverse data for which binary system needs to be wiped out.

Blog Post 2: Supervised Learning Readings

By definition, supervised learning is generally used to classify data or make predictions, whereas unsupervised learning is generally used to understand relationships within datasets. Therefore, supervised learning is much more resource-intensive due to labelled data. Various examples of supervised learning have been given in the assigned reading, such as spam detection as part of an email firewall, distinguishing between conglomerate and non-profit novels, and spotify’s recommended songs model. These differences made me think of the notebook we did previously with the one we did today. In unsupervised learning, we do not have any training dataset which is the plus point for supervised learning and therefore, it is the best predictor.

Sinykin and Roland talks in: Against Conglomeration: Nonprofit Publishing and American Literature After 1908” about how ‘multiculturalism’ evolved in the world of literature. It was started by the government to include the diverse population that defined the new America; however, during the process of establishing the ‘multiculturalism’, things ended up being categorized in the form of specific titles and reputation given to the authors (African American/ Asian American/ Indian American) who had no specific goals to achieve such prejudiced and racist titles that created categories in the name of diversity. But is this really a multiculturalism? Aren’t we categorizing people according to their race and expecting them to create their work on their cultural basis? Non-profits did this because they had a gain of money, and it was the government who promoted this which got standardized due to the profit-gain. However, apart from all the downside, we can not deny that due to non-profits, chances were given to those who were considered outsiders (non-white people) in the field of literary.

We are experiencing a similar situation in the current period where all the non-profits are collecting data to improve society and create less discrimination. However, they are facing a lot of challenges in doing so.  For example, Machine learnings (ML) algorithms are generated to pick candidates for hiring. In order to make unbiased decision, the algorithm has to be taught to not gender/race discriminate the candidate. According to supervised learning process, these algorithms would need the data on gender and race to align with the unbiasedness. Therefore, in reality, it is very hard to remove the gender and race-specified data as they are required to fight against the discrimination. However, most of the time it is misused at this certain place. As Ben Schmidt states in his article “the most important rule for thinking about artificial intelligence is that it’s deleterious effects are most likely in places where decision makers are perfectly happy to let changes in algorithms drive changes in society. Racial discrimination is the most obvious field where this happens”. Therefore, this is the most opportunistic area for the Feminist scholars to work on. I have provided a similar argument in the Notebook as well.

In conclusion, we can say that supervised learning is a really good feature of machine learning if used properly; or else, it can create many societal issues such as discrimination and racialization by categorizing things in groups.

Blog Post 1: Topic Modelling

Topic modeling is a machine learning technique that spontaneously analyzes the text data to determine the clustered words of a set of texts. In other words, this is called ‘unsupervised’ machine learning as it does not require a predefined list of tags or trained data that has already been classified by humans. Topic modelling helps to identify common themes of the texts. Text can have multiple perspectives that can cause the problem of being unable to address text at all its possible levels simultaneously. Topic modelling helps to achieve this goal.

After reading various assigned articles, I can visualize both the advantages and the disadvantages of LDA topic models. There is comparison/similarity about LDA with market produce that was mentioned in Lisa Rhody’s article. But what I am wondering is – isn’t the produce at market a very simple concept compared to LDA. The size of topics reflects the estimation of how much each kind of topic (in poetry) is available. However, would it be unfair if the algorithm somehow misinterprets certain words (co-occurrence of words) with something else and hence, gathers false evaluation of the estimated topics? Though the authors reflect on this saying that LDA does a pretty good job with its method of discovery, there is still no sign for 100 percent accuracy. Therefore, this may lead to some loss of authenticity or loss of accuracy of theme evaluation.

I want to make some comparisons with what we learned in our previous readings. We worked on clustered algorithms which is also unsupervised machine learning similar to topic modelling. Whoever, while looking at the contrary side, typical clustering algorithms like K-means rely on distance measure between topics, but LDA topic model does not perform any distance measuring. This means that LDA lacks the ability to predict the relation of topics to one another and just performs a probability test. Matthew Jocker’s article also talks about similar thing where it is stated that “the manner in which the computer (or dear Hemingway) does the calculation is perhaps less elegant and involves a good degree of mathematical magic”. This tells us how the narrative feature/structure (for example, of poetry) is lost along with the relation between two topics while calculating just the probability of the topics/themes.

In conclusion, it will not be wrong to say that topic modeling is more of an “exploratory” data analyzer rather than “explanatory”. Topic modeling can reveal patterns and initiate questions, but it is less appropriate to test and confirm them.