Text Mining and Historical Scholarship

In Graphs, Maps, and Trees, Franco Moretti gives an exceptional introduction to text mining, which raises the possibilities and shortcomings of this new tool that are noted below.


Text Mining gives us the ability to…

  • Analyze large amounts of data,
  • Search through more niche markets.
  • Recognize new patterns.
  • Asks questions we may not be able to answer.
  • Can break down text more and make them more searchable.
  • Can track how users view content to see what is important.


  • Less focus on individual text
  • Can only give you representations of data not interpretation of data.
  • Tries to fit everything within one framework. What if there is no framework? What if life is random? What if the framework changes?
  • Less focus on politics because this is fleeting concern of the present.
  • Possibly takes away some of the unique historical context of each work.
  • There are more books that were made than the ones counted to make the graphs.
  • Publication numbers do not always reflect who actually read the books. For instance, more than one person could read same book.
  • Hard to explain novelty, uniqueness.
  • Takes away from human agency.

He makes an interesting point that the ability to analyze large amounts of data affects what type of scholarship gets produced.  These tools allow one to focus on the overall historical context and not just focus on the few major events usually studied in history.  He shows how these major events are usually connected to much larger patterns.  Historians need to analyze the strength and weakness of this source just like any other source they use.

Graphs, Maps, Trees: Abstract Models for Literary History by Franko Moretti

Sometimes, I think, the graphs give a false sense of objectivity to his data.  Moretti’s Literary Genre graph looks very objective showing the number of books in three different genres over time.   However, is there still subjectivity to this? For example, the classification of a book can be subjective if it is on the edge of two genres.   Moretti starts out by saying he is a Marxist and this text mining enables him to generate material that supports his ideological bias.  Burke had a good point when he said text mining does not do a good job at showing uniqueness and human agency.  I believe humans have free will, even if limited by their circumstances, and this is an important part of history.  Thus, text mining is a valuable tool for historians but historians should use many different tools and sources to gain the best picture of reality they can.  It is interesting that the fundamentals of the historical profession like having a variety of sources and analyzing the strengths and weaknesses of sources is still important even when discussing relatively recent advances in digital history.


1 Comment

Filed under US History

One response to “Text Mining and Historical Scholarship

  1. Such a good point about the genres: this was also the location of discomfort for me, cf also the section of our Theibault reading that dealt with nodes in human network analysis. What looks objective still depends on subjective decisions about analytical categories, and here our ideologies still lie hidden. Thanks for bringing this up!

Please Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s