Artificial intelligence: 4 times more heroes than heroines in literature

The research was organized by two scientists from USC Viterbi School of Engineering – Mayank Kejriwal and Akarsh Nagaraj. The former is the head of USC’s Information Science Institute (ISI), and he was encouraged to work on a literature-related topic after reading the latest scientific papers on gender bias and his own knowledge in the field of neurolinguistic programming (NLP). Akarsh Nagaraj, on the other hand, is a machine learning engineer.

What books have been studied?

The study was limited to 3,000 titles published as part of Project Gutenberg. This project was started in 1971 by Michael Hart, often referred to as the inventor of the electronic book because he was the first to send other users a digital version of a printed book. And a book as famous as the American Declaration of Independence. By 2016, more than 50,000 e-books had been made available through Project Gutenberg, and scientists decided to use its database so as not to be accused of bias – Project Gutenberg offers very different titles . The genres of the books studied varied: from science fiction and adventure titles to fiction and poetry. Based on the analyzes conducted, the researchers found that the disproportion between the representation of men and women in the literature is 4:1. Importantly, however, most of the titles made available through the project were published before 1924 (they can be made available for free as their copyright has expired). The study therefore focuses on books from at least a century ago.

Artificial intelligence used to draw attention to the issue of inequality

In the study, the scientists used, among other things, a neurolinguistic programming technique called NER (Named Entity Recognition), which allows you to automatically determine the meaning of individual text fragments. For example, it can define whether the given content names a person, a thing, a place, as well as the gender of the hero/heroine. Thus, the researchers calculated how many feminine and masculine pronouns appear in the analyzed books. Another technique was to check how many female characters are the main characters of specific publications. Interestingly, the disproportion between the number of male and female characters in a book decreases when its author is a woman – in the past, women wrote about their gender more often than men.

As Kejriwal pointed out:

Sexism in literature is a fact. And when we read books that are already classics, we realize that there are four times less heroines in them than there are, it has a subliminal effect on us, users of culture.

Nagaraj added that books are a window into the past, and the books of the authors interviewed give us insight into how people perceive the world and how it changes over time.

The researchers have been criticized, among other things, for not including transgender and/or non-binary people in their work. The researchers say they agree in some respects with the criticism and explain that, in their view, transgender and non-binary people were almost completely ignored in the literature, so it would also be difficult to find research material. They also added that there are not yet effective tools to recognize the pronoun “they” in text from people who use it.

However, NLP technology is devoid of the bias that might arise in human surveys and also allowed researchers to find adjectives associated with a specific gender in the text. The words associated with women are adjectives such as: “weak”, “pretty”, “stupid”, “nice”, and with men: “leadership”, “imperious”, “strong”. The researchers hope their work will serve to highlight the importance of interdisciplinary research – in this case, AI technology has been used to highlight social issues, inequalities.

Yesterday and today

It should be remembered that we have already written about a study confirming the overrepresentation of male heroes in books. An analysis of thousands (3,280) books published between 1960 and 2020 showed that there are more boys than girls in American children’s literature, although each year these proportions change and the number of female characters increases.

Interestingly, the authors of the popular “Bedtime stories for young rebels“Conducted an experiment in which mother and daughter participated. The ladies visited the bookstores and first removed from the shelves the books in which there are no male characters (as you can see in the video below, there were only two such items ), then publications in which there is not a single heroine – they found as many as 74 such works. Then the mother and daughter took the books from the shelves, in which the female characters are there, but they do not say anything – they put aside another 67 books. Finally, they decided to remove from the shelves the rooms in which the princesses wait for the prince to save them. How many books are left at the end? See for yourself!


Leave a Comment