Some places to get texts

Plain text

TEI-Encoded

Resources from Laura Nelson’s “Analyzing Complex Digitized Data”

Demonstration Corpora, by Alan Liu

  • U.S. Presidents’ Inaugural Speeches
  • Abraham Lincoln Speeches and Letters
  • Documenting the American South
    • The Church in the Black Community
    • First-Person Narratives of the American South (African Americans, women, enlisted men, Native Americans, ex-slaves, etc.)
    • North American Slave Narratives
  • Sunday School Books in 19th Century America
  • The Grange Visitor (Michigan newspaper)
  • Historic American Cookbooks
  • Adult British Fiction – 1880s (by gender)
  • Children’s Fiction – 1880s (by gender) (I have formatted some of these data, ask me)
  • William Wordsworth writings
  • Book summaries and film summaries from Wikipedia
  • U.S. patents related to the humanities
  • List of sites containing full text books

Springboard List of Free Datasets for Data Science

Corpora from Miriam Posner’s crowdsourced document:

Nicolas Iderhoff’s Collection of NLP Datasets