• Word Embedding for the Historian: Employing LSI to Understand How Words Were Historically Used

    Author(s):
    Lisa Baer-Tsarfati (see profile)
    Date:
    2020
    Group(s):
    CSDH-SCHN 2020
    Subject(s):
    Great Britain, History, Computational linguistics, Digital humanities, Research, Methodology
    Item Type:
    Conference paper
    Conf. Title:
    CSDH-SCHN 2020
    Conf. Org.:
    CSDH-SCHN
    Tag(s):
    Latent Semantic Analysis, Semantic Text Analysis, Vector Space Modeling, Word Embedding Models, British history, Computational lingustics, Digital humanities research and methodology, Gender history
    Permanent URL:
    http://dx.doi.org/10.17613/n4m6-6j18
    Abstract:
    Historians are often confronted with the challenge of defining words or ideas in an historically appropriate manner. Language evolves; words lose some meanings and gain others over time, and it is important, when examining the past, for the historian to ensure that their analysis accurately reflects the language in use during the chosen period of study. This paper explores the use of semantic text analysis and vector space modeling as a method for excavating an historically appropriate understanding of the ways in which word meanings were conceptualized in the past. It argues that word embedding can be used not only to understand the semantic features connecting the words within a text or multiple texts to one another, but also as a means for characterizing words based on their semantic distance within a corpus of primary source texts. Built upon a theoretical framework that combines discourse analysis with computational linguistics, this paper expands upon Deerwester et al.’s method of Latent Semantic Indexing (LSI) to apply the methodology to the examination of period language. As LSI is based on the principle that words that are used in the same contexts tend to have similar meanings, it is possible to employ LSI in the semantic characterization of words and concepts within an unstructured corpus of historical discourse texts. In demonstrating this, this paper explores the mathematical underpinnings of Latent Semantic Indexing, discusses vector space modeling, and then presents an LSI/word embedding case study based upon my doctoral research on the relationship between ambition, gender, class, and control in sixteenth- and early seventeenth-century Scotland and England.
    Metadata:
    Status:
    Published
    Last Updated:
    3 years ago
    License:
    All Rights Reserved
    Share this:

    Downloads

    Item Name: pdf baer-tsarfati-word-embedding-for-the-historian.pdf
      Download View in browser
    Activity: Downloads: 123