Allens unveils landmark gen-AI Australian law benchmark

The benchmark found that LLMs provided inaccurate and inconsistent legal advice

Allens unveils landmark gen-AI Australian law benchmark
Miriam Stiel

Allens has unveiled an Australian law benchmark for generative AI – a “first-of-its-kind” initiative according to the firm.

The Australian law benchmark is modelled after the LinksAI English law benchmark released by Linklaters last October. The initiative trialled large language models’ (LLMs) ability to tackle Australian law-related questions; moreover, a structured framework was developed to test, compare, and monitor the development of generative AI capabilities over time, the firm said.

As per the benchmark, LLMs should not be used to generate Australian legal advice without expert human supervision because of their inaccuracy and inconsistency, even though models at the level of GPT-4 could summarise well-understood areas of law. The benchmark highlighted citation issues among the models it tested; it also revealed that the LLMs were often inaccurate, prone to hallucinations, and displayed “a general inability to discern authoritative sources”, Allens said.

Among the LLMs tested, none was considered consistently reliable when it came to addressing complex legal concerns. GPT-4 was the top performer, but while its benchmarking score exceeded 50%, the tool “did not demonstrate the competency expected of a mid-level associate”, Allens noted.

The benchmark also found that 52% of GPT-4's responses to legal queries received a substance score of 1 or 2, which indicated mostly incorrect answers or answers that had several errors. Moreover, even the LLMs were asked to cite sources, 32% of the generated responses “either lacked the underlying case law or legislation or included fabricated information”, the firm said.

“While we're seeing some impressive developments in AI technology applied to law, our findings underline that there is still considerable progress needed before these tools can be relied upon fully without human oversight”, IP head Miriam Stiel said.

She pointed out that while AI was expected to improve in terms of capability and accuracy in this area, “it's crucial to remember that providing accurate legal advice is just one facet of a lawyer's role, which also involves the exercise of judgement and risk analysis, to assist clients in their strategic and commercial decision making”.