-
Word Embedding for the Historian: Employing LSI to Understand How Words Were Historically Used
- Author(s):
- Lisa Baer-Tsarfati (see profile)
- Date:
- 2020
- Group(s):
- CSDH-SCHN 2020
- Subject(s):
- Great Britain, History, Computational linguistics, Digital humanities, Research, Methodology
- Item Type:
- Conference paper
- Conf. Title:
- CSDH-SCHN 2020
- Conf. Org.:
- CSDH-SCHN
- Tag(s):
- Latent Semantic Analysis, Semantic Text Analysis, Vector Space Modeling, Word Embedding Models, British history, Computational lingustics, Digital humanities research and methodology, Gender history
- Permanent URL:
- http://dx.doi.org/10.17613/n4m6-6j18
- Abstract:
- Historians are often confronted with the challenge of defining words or ideas in an historically appropriate manner. Language evolves; words lose some meanings and gain others over time, and it is important, when examining the past, for the historian to ensure that their analysis accurately reflects the language in use during the chosen period of study. This paper explores the use of semantic text analysis and vector space modeling as a method for excavating an historically appropriate understanding of the ways in which word meanings were conceptualized in the past. It argues that word embedding can be used not only to understand the semantic features connecting the words within a text or multiple texts to one another, but also as a means for characterizing words based on their semantic distance within a corpus of primary source texts. Built upon a theoretical framework that combines discourse analysis with computational linguistics, this paper expands upon Deerwester et al.’s method of Latent Semantic Indexing (LSI) to apply the methodology to the examination of period language. As LSI is based on the principle that words that are used in the same contexts tend to have similar meanings, it is possible to employ LSI in the semantic characterization of words and concepts within an unstructured corpus of historical discourse texts. In demonstrating this, this paper explores the mathematical underpinnings of Latent Semantic Indexing, discusses vector space modeling, and then presents an LSI/word embedding case study based upon my doctoral research on the relationship between ambition, gender, class, and control in sixteenth- and early seventeenth-century Scotland and England.
- Metadata:
- xml
- Status:
- Published
- Last Updated:
- 3 years ago
- License:
- All Rights Reserved
- Share this:
Downloads
Item Name: baer-tsarfati-word-embedding-for-the-historian.pdf
Download View in browser Activity: Downloads: 123
-
Word Embedding for the Historian: Employing LSI to Understand How Words Were Historically Used