Can natural language processing help us get an overview of a corpus of text?  What can we learn or what useful information can we extract from a set of governmental speeches?  If yes, then how?   Taking, for instance, the annual State of the Union (SOTU) addresses (available from 1790 to 2016), can we detect patterns related to the stylistic or rhetorical evolution given these speeches?  Can we discover similarities between presidents? Are these similarities reflecting their political party affiliation?

Using simple tools, we can observe that, over time, the presidents are aiming to reach to a larger audience through a more familiar tone.  When inspecting the specific vocabulary of each president, we can automatically detect important issues related to each presidency (e.g., slavery with A. Lincoln, jobs and taxes with B. Obama).  Such an extraction technique allows us to summarize a presidency by one or a few sentences.  When trying to assign each address to its respective presidency, some speeches are rather challenging to classify correctly. Understanding the reasons behind such difficulties is more pertinent than trying to achieve a higher accuracy rate.

Prof. Jacques Savoy is a full Professor in computer science at the University of Neuchatel, Switzerland.  J.  Savoy received a Ph.D.  in quantitative economics from  the  University of Fribourg (Switzerland) in 1987.   From 1987-1992 he was member of the faculty of computer science at the University of Montreal (Canada).  His research interests cover mainly natural language processing and particularly information retrieval for languages other than English (European, Asian, and Indian) as well as multilingual and cross-lingual information retrieval. For many years, he contributed to and participated in various evaluation campaigns such as TREC (Washington, DC), CLEF (Europe), NTCIR (Tokyo), and FIRE (India) which address these research questions.  His current research interests focus on statistical modeling and evaluation of natural language processing such as text clustering and categorization as well as authorship attribution.  Application-wise, he is working on automatic an
alysis of political speeches (both governmental and electoral) with a particular focus on the United States.