The text analysis resources here cover topics such as installing computer programming languages (like R and Python), running exploratory scripts of word tokenizations and counts, and more advanced approaches like topic modeling and word embedding models.

Getting Started

Python

R

Topic Modeling

  • Journal of Digital Humanities’s Special Issue  – Special issue of JDH specifically on Topic Modeling in the humanities published in 2012.
    • Topic Modeling: A Basic Introduction – Introductory article by Megan R. Brett from JDH’s special issue explaining the basic concepts of topic modeling.
    • Words Alone – Article on Latent Dirichlet Allocation’s (LDA’s) limitations by Ben Schmidt.
  • Topic Modeling Made Just Simple Enough – An introduction to topic modeling written by Ted Underwood of University of Illinois, Urbana-Champaign.
  • Guided Tour – A comprehensive guide to topic modeling with many links by Scott Weingart of Carnegie Mellon University.
  • MALLET – Website for downloading and installing Mallet, an open-source and Java-based Latent Dirichlet allocation (LDA) package.
    • Topic Modeling Tutorial – Tutorial by Shawn Graham, Scott Weingart, and Ian Milligan’s on setting up a command line environment for using MALLET.
    • Mallet R Package – Ben Schmidt’s wrapping MALLET
    • GUI Tools that use MALLET
      • Google’s Topic Modeling Tool – A graphical user interface for doing topic modeling.
      • Serendip – A system for visualizing topic models by Eric Alexander and Joe Kholmann of the University of Wisconsin-Madison.
  • Topic Modeling Toolbox – An alternative to MALLET for LDA topic modeling from Stanford University.

Word Embedding Models

Other Text Analysis Tools and Resources