Corpus Linguistics Impacts Founding Era Meaning

Written by Thomas Leahy

Modern lawyers are required to keep up with emerging legal technologies in order to stay competitive and adequately serve their clients, but recent technological innovations have also begun impacting traditionally analogue fields, like originalist constitutional interpretation. Originalist scholarship that focuses on the “original public meaning” of a constitutional or statutory term has often been criticized for the inherent uncertainty or impracticability that comes with trying to ascertain the meaning of a word as it was used centuries in the past. In response to this criticism, originalist legal scholars have sought more empirical ways of determining original public meaning, including some scholars who have begun advocating for a methodology driven by “corpus linguistics.”

The burgeoning field of corpus linguistics in legal scholarship encompasses a variety of methodologies which use data and technology to find the original meaning of a constitutional or statutory term or phrase; application of the methodology is made possible through the increasing digitization of historical documents and continual advances in data analytics. To find the original public meaning of a term using corpus linguistics, a scholar will undertake a keyword-coded search of a “corpus” (a vast body of text or other dataset) which contains compiled texts from the relevant time period. Depending on their search terms, a researcher interested in the original meaning of a term may code their search of a dataset to return, for example, their chosen term as a “key word in context” (showing the search term in the context of its usage) or may look for “collocates” (words used in proximity with their term-of-interest). These types of searches aim to allow a legal scholar to quickly, and empirically, determine how a word was used during a particular time period.

Research institutions have already compiled various datasets for this purpose, and some of them are easily accessible to the public. For example, Brigham Young University has assembled a “Corpus of Historical American English” including over 400 million words and allows users to break down results by decade, placement relative to other words, and to show words in context. BYU is also currently developing a Corpus of Founding Era American English, which would be an enormous research asset for original public meaning originalists.

Recent examples of this type of empirical analysis have explored the original public meaning of Constitutional terms like “commerce,” “emolument,” and “officers of the United States.” While corpus linguistics may be garnering significant attention in some branches of originalist academia, the judiciary has not yet substantially engaged with the scholarship on this issue. There is some indication, however, that corpus linguistics methodology can be persuasive to the courts. For example, in Justice Thomas’s dissenting opinion in Gonzales v. Raich (2005), he cited to Professor Randy Barnett’s corpus linguistics driven investigation into the original public meaning of the word “commerce” (as it was used in the Commerce Clause) to support his interpretation of the word. In State v. Rasabout (Utah 2015), Justice Lee of the the Utah Supreme Court wrote an extensive concurring opinion in which he argued that corpus linguistics methodology should be used to determine the meaning of the word  “discharge” in the context of an ambiguous firearm-related statute. Justice Lee is a strong advocate for the use of corpus linguistics, and his opinion serves as a persuasive response to critics of the methodology.

While a few other courts have also considered evidence derived from corpus linguistics methodologies while analyzing original meaning, the field is still developing, and it is likely that the practice will receive significant future attention. Its current status serves as a reminder that as technology develops, the way we practice law, and even interpret the Constitution, may need to change as well.