Computational analysis of the body in European fairy tales

"The paper ’Computational analysis of the body in European fairy tales‘ by Scott Weingart and Jeana Jorgensen is in the journal Literary and Linguistic Computing, and is available to read for free for a limited time." I actually discovered it through Understanding ‘the body’ in fairy tales on the OUP Blog. Here's a lengthy excerpt from there. I need to read the article itself more carefully after reading this summary but my initial thoughts and what I will look for as I read appear below the excerpt. Don't miss those! And add your own for this is fun stuff. Really!
The biggest news to hit the streets recently combined the power of Google, a few Harvard mathematicians, and five million digitized books covering the last two centuries. They dubbed their computational study of culture “culturomics”, and several more research projects have grown in its wake.

This type of research has traditionally been limited by inadequate technology, incomplete data, and the scarcity of scholars well-versed in both computation and traditional humanities research. That scene is now changing, due largely to efforts from both sides of the cultural divide, the humanities and the sciences. It is in this context that we undertook a study of European fairy tales, yielding interesting and occasionally unexpected results.

An analysis of over 10,000 references to people and body parts in six collections of Western European fairy tales can reveal quite a bit. Understanding fairy tales pays off twofold: they reveal the popular culture and beliefs of the past, while simultaneously showing what cultural messages are being transferred to modern readers. There is no doubt that the Disney renditions of classic fairy tales both reflect assumptions of the past and helped shape the gender roles of the present.

One finding from this analysis dealt with the use of adjectives when describing bodies or body parts in the stories. The most frequently-used adjectives cluster around the themes of maturation, gaining and maintaining beauty or wealth, and the struggle for survival, all concepts that still have a prominent place in our culture.

The use of age in these stories is of particular interest. While young people are described more than twice as frequently as old, the word old (and similar words indicating old age) appears more frequently than the word young (and related terms). That means the tellers of these stories rarely find it necessary to mention when someone is young, but often feel the need to describe the age of older people.

In fact, old people tend to attract more adjectives than their younger counterparts in general. If someone is going to be described in any way at all, whether it be about their beauty or their age or their strength, it’s far more likely that those descriptions are attached to the old rather than the young. This trend also holds true with regards to gender; men are described significantly less frequently than women. Combining these facts, it appears that although old women are brought up relatively infrequently, they are described much more frequently than would be expected.

The fact that women are described more frequently than men fits with a common feminist theory suggesting Western culture treats the male perspective as universal, unmarked, public, and default. Extending that theory further, the fairy tale analysis reveals that the young perspective is also default and unmarked. Older people and especially older women must be described in greater detail and with greater frequency, marking them as old or as women or both, because otherwise the character is assumed as young and masculine, maintaining those traits which are considered defaults.

These results just scratch the surface of what can be discovered using the automated and quantitative analysis of cultural data. As technology and data sources improve, there will be an increasing number of studies which combine algorithms and statistics with traditional humanistic theories and frameworks. The holy grail, which we are reaching ever-closer to, is the successful bridging of traditional close reading approaches of humanistic inquiry and the distant reading quantitative methods being developed by researchers like Franco Moretti and the Google Ngrams Team. This is another step on that path.

Scott B. Weingart is an Information Science Ph.D. student at Indiana University studying the history of science. and Dr. Jeana Jorgensen is a recent graduate of Indiana University who specializes in folklore and gender studies. This work is from a paper they co-presented at Digital Humanities 2011, for which they won the Paul Fortier Prize for best young researchers at the conference.
Keep in mind, I haven't read through the entire original article carefully as I write this. The following are my thoughts as I read and consider it based on the blog entry and the abstract. Sometimes it helps me to get my critical thoughts gathered first so I am not just automatically agreeing with a viewpoint.

Although I overall find myself agreeing with the overarching theory and conclusions, I would like to know more about the source materials. I am interested in how the diminuitive forms of words in other languages are handled, for example. I run into those quite frequently in my reading and translating. We get "Little Snow White", for example, in a literal translation of a title but we most often simply get "Snow White" in English translations. Translated texts are so very tricky and let's face it, most of the European fairy tales were not originally recorded in English so all of this includes the factor that English translators bring their cultures and expectations to translation as well as expected language norms and the demands of art and storytelling for entertaining and educations their intended audience.

But, yes, the female characters, young or old, are described more frequently in physical terms but the ageism is a little more problematic for me since the tales usually designate both pretty clearly and age often implies experience, wisdom, helpfulness more often than not. So "old" is a shortcut to an understanding of the character just as "young" is implied inexperience with a challenge to be overcome.

My experiences are anecdotal since I haven't done any ennumeration of terms and their frequency. This is a interesting spark for some great conversations!

  1. You made an excellent linguistic point about the possible effects of using translated texts. I followed your link to the article, and footnote #4 ends:

    "Where possible, we use translations done by scholars who are folklorists, guaranteeing some amount of cultural sensitivity to context. Additionally, it is possible to treat translations like any other folkloric text: an equally valid variant. Folklore is always in flux, always being translated between languages and cultures, so working with tales in translation is acceptable so long as the translation is trusted."

    So, the analysis would appear to be on secondary English translations, which would have fundamentally different linguistic conventions for coding age and gender, as you mentioned. No translator would transliterate every gender or age related suffix particle or suggestive word choice into an English adjective, and they would make the same audience-driven word choices as any editor, even when trying to be 'faithful' to the source text.

    Wonderful post. Thank you.