LLMs can contribute to interpretation of legal text’s ordinary meaning, says US concurring opinion
In a concurring opinion, Judge Kevin Newsom of the United States Court of Appeals for the Eleventh Circuit concluded that, while large language models (LLMs) were imperfect, they could potentially contribute to interpreting the ordinary meaning of legal terms.
The defendant in this case pointed a gun at a cashier during a robbery. Before the courts, the analysis focused on the issue of whether the term “physically restrained” applied to the facts.
The United States District Court for the Middle District of Florida enhanced the defendant’s sentence under s. 2B3.1(b)(4)(B) of the U.S. Sentencing Guidelines, which added a two-level enhancement for situations where a victim was physically restrained.
Newsom agreed with the appeals court’s majority that the prior rulings in United States v. Jones, 32 F.3d 1512 (11th Cir. 1994), and United States v. Victor, 719 F.3d 1288 (11th Cir. 2013), required affirming the enhancement of the sentence.
However, Newsom questioned whether those decisions were correctly decided. He expressed doubts that the phrase “physically restrained” should apply when no physical contact occurred and when a weapon merely ensured a victim’s compliance.
In his concurring opinion to United States v. Deleon, Newsom explored how LLMs might contribute to determining the ordinary meaning of legal texts, especially in cases involving multi-word phrases like “physically restrained” that may not be readily available in standard dictionaries.
Newsom explained that he ran a little experiment where he queried LLMs about the meaning of “physically restrained.” Two LLMs – OpenAI’s ChatGPT and Anthropic’s Claude – provided definitions that aligned with conventional understanding of the phrase and that referenced the use of physical force or devices to limit a person's movement, he said.
Newsom noted the slight variability in responses from the LLMs. When asked the same question multiple times, the models provided slightly different but substantively similar answers, he said. For instance, Claude’s responses differed in length and level of detail, but the core meaning remained consistent.
“Because LLMs are trained on actual individuals’ uses of language in the real world, it makes sense that their outputs would likewise be less than perfectly determinate—in my experience, a little (but just a little) fuzzy around the edges,” wrote Newsom. “What’s important, though—and I think encouraging—is that amidst the peripheral uncertainty, the LLMs’ responses to my repeated queries reliably revealed what I’ve called a common core.”
Because the variability of LLMs reflected realistic speech patterns, this feature could potentially make these models more accurate predictors of ordinary meaning, Newsom said.
Newsom reiterated that, while LLMs should not replace traditional tools like dictionaries or established interpretive methods, it could play a supplementary role in understanding the ordinary meaning of legal terms, particularly when dealing with composite phrases such as “physically restrained.”