Reasoning as the Engine Driving Legal Arguments | by Vern R Walker | Sep, 2024


Machine-Learning Results

In our experiments, ML algorithms have the hardest time classifying reasoning sentences, compared to other sentence types. Nevertheless, trained models can still provide useful predictions about sentence type. We trained a Logistic Regression model on a dataset of 50 BVA decisions created by Hofstra Law’s Law, Logic & Technology Research Laboratory (LLT Lab). That dataset contains 5,797 manually labeled sentences after preprocessing, 710 of which are reasoning sentences. In a multi-class scenario, the model classified reasoning sentences with precision = 0.66 and recall = 0.52. We got comparable results with a neural network (“NN”) model that we later trained on the same BVA dataset, and we tested on 1,846 sentences. The model precision for reasoning sentences was 0.66, and the recall was 0.51.

It is tempting to dismiss such ML performance as too low to be useful. Before doing so, it is important to investigate the nature of the errors made, and the practical cost of an error given a use case.

Practical Error Analysis

Of the 175 sentences that the neural net model predicted to be reasoning sentences, 59 were misclassifications (precision = 0.66). Here the confusion was with several other types of sentences. Of the 59 sentences misclassified as reasoning sentences, 24 were actually evidence sentences, 15 were finding sentences, and 11 were legal-rule sentences.

Such confusion is understandable if the wording of a reasoning sentence closely tracks the evidence being evaluated, or the finding being supported, or the legal rule being applied. An evidence sentence might also use words or phrases that signify inference, but the inference being reported in the sentence is not that of the trier of fact, but is in fact part of the content of the evidence.

As an example of a false positive (or precision error), the trained NN model mistakenly predicted the following to be a reasoning sentence, when it is actually an evidence sentence (the model originally assigned a background color of green, which the expert reviewer manually changed to blue) (the screenshot is taken from the software application LA-MPS, developed by Apprentice Systems):

Example of an evidence sentence, text highlighted with blue background color, misclassified by the NN model as a reasoning sentence.
Image by Vern R. Walker, CC BY 4.0.

While this is an evidence sentence that primarily recites the findings reflected in the reports of an examiner from the Department of Veterans Affairs (VA), the NN model classified the sentence as stating the reasoning of the tribunal itself, probably due in part to the occurrence of the words ‘The Board notes that.’ The prediction scores of the model, however, indicate that the confusion was a reasonably close call (see below the sentence text): reasoning sentence (53.88%) vs. evidence sentence (44.92%).

As an example of a false negative (or recall error), the NN model misclassified the following sentence as an evidence sentence, when clearly it is a reasoning sentence (the model originally assigned a background color of blue, which the expert reviewer manually changed to green):

Example of a reasoning sentence, text highlighted with green background color, misclassified by the NN model as an evidence sentence.
Image by Vern R. Walker, CC BY 4.0.

This sentence refers to the evidence, but it does so in order to explain the tribunal’s reasoning that the probative value of the evidence from the VA outweighed that of the private treatment evidence. The prediction scores for the possible sentence roles (shown below the sentence text) show that the NN model erroneously predicted this to be an evidence sentence (score = 45.01%), although reasoning sentence also received a relatively high score (33.01%).

In fact, the wording of sentences can make their true classification highly ambiguous, even for lawyers. An example is whether to classify the following sentence as a legal-rule sentence or as a reasoning sentence:

No further development or corroborative evidence is required, provided that the claimed stressor is “consistent with the circumstances, conditions, or hardships of the veteran’s service.”

Given the immediate context within the decision, we manually labeled this sentence as stating a legal rule about when further development or corroborative evidence is required. But the sentence also contains wording consistent with a trier of fact’s reasoning within the specifics of a case. Based only on the sentence wording, however, even lawyers might reasonably classify this sentence in either category.

The cost of a classification error depends upon the use case and the type of error. For the purpose of extracting and presenting examples of legal reasoning, the precision and recall noted above might be acceptable to a user. A precision of 0.66 means that about 2 of every 3 sentences predicted to be reasoning sentences are correctly predicted, and a recall of 0.51 means that about half of the actual reasoning sentences are correctly detected. If high recall is not essential, and the goal is helpful illustration of past reasoning, such performance might be acceptable.

An error might be especially low-cost if it consists of confusing a reasoning sentence with an evidence sentence or legal-rule sentence that still contains insight about the reasoning at work in the case. If the user is interested in viewing different examples of possible arguments, then a sentence classified either as reasoning or evidence or legal rule might still be part of an illustrative argument pattern.

Such low precision and recall would be unacceptable, however, if the goal is to compile accurate statistics on the occurrence of arguments involving a particular kind of reasoning. Our confidence would be very low for descriptive or inferential statistics based on a sample drawn from a set of decisions in which the reasoning sentences were automatically labeled using such a model.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here