Skip to main content
Photo of a Law Library

Observing the Effects of Automating the Judicial System with Behavioral Equivalence

Joseph A. Blass[1]*

Building on decades of work in Artificial Intelligence (AI), legal scholars have begun to consider whether components of the judicial system could be replaced by computers. Much of the existing scholarship in AI and Law has focused on whether such automated systems could reproduce the reasoning and outcomes produced by the current system. This scholarly framing captures many aspects of judicial processes, but overlooks how automated judicial decision-making likely would change how participants in the legal system interact with it, and how societal interests outside that system which care about its processes would be affected by those changes.

This Article demonstrates how scholarship on legal automation comes to leave out perspectives external to the process of judicial decision-making. It analyses the problem using behavioral equivalence, a computer science concept that assesses systems’ behaviors according to the observations of specific monitors of those systems. It also introduces a framework to examine the various observers of the judicial process and the tradeoffs they may perceive when legal systems are automated. This framework will help scholars and policymakers more effectively anticipate the consequences of automating components of the judicial system.


The possibility that components of the judicial system might be replaced by computers has recently moved from the realm of science fiction into the halls of academia. And why not? Research in Artificial Intelligence (AI) and Law has steadily produced results for decades,[2] advances in data-driven Machine Learning systems are being used for legal analysis and prediction,[3] and computers are generating court filings[4] and assessing defendants’ risk if released.[5] When legal scholars ponder what other judicial decisions might be made by computers,[6] they are not engaging in speculative fiction, but only imagining today’s and tomorrow’s technology.

Much of the work in both AI and legal scholarship argues—or at least, implicitly assumes—that if an AI system generates the same legal decision (or argument, or analysis) as does the current system, then that AI system might reasonably replace the humans who currently perform that judicial reasoning and decision-making. This conclusion assumes that the legal outcomes—and their justifications—being simulated are the only relevant outputs of the legal process to be replaced. Under this view, if the subsystem that generates a legal outcome—usually, a judge—can be replaced without changing that outcome or the reasoning that justifies it, then the overall system has not been significantly affected by the replacement.

This Article demonstrates why that assumption is wrong, using methods from computer science that analyze the effects of replacing components of systems. While other scholars have discussed what components of the judicial system might be replaced by computers (and whether they should be), this Article is the first to approach the issue by examining what it means for one system to behave identically to or differently from another. The analysis focuses attention on the outputs of the judicial system beyond the decision made in a case. A difference-detecting method will empower scholars and policymakers to discern unintended consequences of making changes to the judicial system, particularly where they might otherwise assume the effects of the changes will be limited to the specific components that are altered.

When computer scientists consider whether to replace one system with another, they ask whether the two systems are behaviorally equivalent, i.e., whether their behavior is interchangeable.[7] Behavioral equivalence is not an objective standard but a subjective one: whether two systems are behaviorally equivalent depends on who is observing those systems,[8] and only those things being observed are included in the equivalence analysis. For example, consider washing dishes by hand versus in the dishwasher: to the sleeping child who takes a clean glass from the cupboard the next day, the processes are behaviorally equivalent; to the parent doing the washing, they are clearly distinguishable. Furthermore, behavioral equivalence demonstrates that if truly everything about some process is observed, then no replacement to that process can be equivalent to it.[9] The judicial system presents exactly such a case, where no change can be made that will not be visible to some stakeholder in society who might care about that change.[10] This means that any claim that some replacement system is faithful to the original must be ignoring the ways in which it is not. Focusing only on the legal decisions that a system makes risks ignoring the downstream effects of replacing it with an automated “equivalent.”

But just because true behavioral equivalence is impossible, policymakers should not throw up their hands and avoid trying to make positive changes. A behavioral equivalence analysis is useful even where a replacement system is intentionally different from the original, because a full accounting of the effects of a change allows decision-makers to properly understand and weigh its consequences. The key is to be sure to understand what is being changed, and for the changes to be made deliberately. Recognizing that whatever system might replace the current one cannot, by definition, be equivalent to it is an important first step toward determining the kind of system that is desirable, in light of the tradeoffs involved.

A behavioral equivalence analysis can help identify those tradeoffs. Although behavioral equivalence is not a crystal ball that perfectly predicts unintended consequences, it shows where and why those consequences are likely to appear, often from parties that interact with the changed system but are not the direct target of the changes. For example, attempts to amend the criminal justice system are common,[11] but the changes made often result in such unintended consequences. The Sentencing Reform Act of 1984 was designed to reduce judicial discretion and eliminate disparities in sentencing,[12] but it unexpectedly led to most criminal defendants taking plea deals rather than going to trial and facing sentences under those guidelines.[13] Similarly, computational risk predictors are meant to use data-driven analyses to help judges more objectively grant defendants bail or probation rather than prison, but these predictors have been used improperly by judges to determine how punitive a prison sentence should be.[14] Civil tort reform that limited noneconomic payouts in medical malpractice cases to reduce the incidence of such cases and control insurance premiums[15] reduced overall payments,[16] but it also led to dramatically higher awards for economic damages, limiting the overall reduction of awards.[17]

In all of these instances, a process driven by human judgment was replaced by one designed to reduce discretion and guesswork, but the change failed to take into account the characteristics and perspectives of those used as inputs to the systems, which led to unintended consequences. Sentencing reform focused on judges and the disparities and discretion in their sentencing decisions, but did not consider how the prospect of a guaranteed sentence would affect prosecutors’ and defendants’ decision-making.[18] The adoption of risk assessment software focused on defendants and sought to translate data about those defendants into a score usable by judges, but it also introduced new biases into judicial decision-making and insufficiently constrained how judges would use those scores.[19] And although economic damages are more calculable than pain and suffering losses, tort reform to limit non-economic damages in malpractice cases did not account for attorneys innovating new ways of calculating economic losses—calculations that juries would fail to closely examine.[20] In each instance, the unintended consequences arose in a different part of the system than that which was targeted by the change.

This Article provides a framework to evaluate the consequences of replacing components of the legal system.[21] It demonstrates that it is theoretically possible to engineer perfect substitutes for those components only if one limits where one looks for the effects of those changes. But if one instead takes a broader view of the legal system, any substitutions are guaranteed to result in observable changes to that system. Focusing on the different observers of the legal system will allow policymakers to anticipate the hidden effects of making intentional changes to the system. The Article describes the tradeoffs that might arise from replacing parts of the legal system, which fall into four categories: informational access tradeoffs, which concern the information that emanates from a legal process; reasoning tradeoffs, which concern the mechanisms by which a legal decision is made; outcome tradeoffs, which involve changing the actual results of legal processes; and process tradeoffs, which concern how participants can interact with the system. This framework can help policymakers anticipate unintended consequences and factor them into their decision-making by drawing attention to the interests that will perceive the consequences of any changes being made.

This Article proceeds in three parts. Part I introduces the scholarship in Law and AI on automating legal decision-making. It draws out the assumption in this scholarship that the primary factor to attend to when considering automating some part of the judicial system is whether legal reasoning and decision-making remain unchanged. It then introduces the concept of behavioral equivalence and explains why behavioral equivalence must be evaluated from the perspective of some observer. Finally, it explores the implications of observer-dependent evaluations by asking how to evaluate behavioral equivalence in a non-deterministic domain like a courtroom, why an all-seeing observer defeats behavioral equivalence, and how observers’ limitations enable malicious actors who understand the inner workings of a system to take advantage of it.

Part II considers several possible observers that might evaluate behavioral equivalence, beginning with the judicial system itself.[22] It shows that with this observer, behavioral equivalence can be assessed using the concept of appealability. If a new substitute process differs in some way from the old one, and if the legal outcome using that new process can be appealed specifically because of that difference, then the judicial system distinguishes between the processes. But if the difference cannot be the basis of an appeal, the judicial system does not distinguish the processes: the difference introduced by the replacement is irrelevant to how the process operates within the larger legal system.[23] Using the judicial system as an observer is congruent with the assumption that only legal outcomes matter in considering legal automation. This explains why AI and legal scholarship have thus far implicitly assumed behavioral equivalence is possible: it has largely focused on a single observer.

To illustrate why this single observer fails to capture all the changes that might matter, Part II brings in principles of procedural justice. Focusing on procedural justice demonstrates why only observing legal outcomes—as relying on the judicial system as the observer does—may be short-sighted. Doing so ignores the observations made by outside observers who are relevant to the judicial system’s role in the body politic but are irrelevant to the internal operations of the system itself. After examining the various societal interests that observe different parts of the legal system, Part II concludes that these observers are collectively equivalent to the all-seeing observer, and that they therefore defeat any hope of achieving behavioral equivalence when replacing components of the legal system.

Although true behavioral equivalence may be impossible in the eyes of society at large, an analysis grounded in specific observers can help evaluate the consequences of making changes to the legal system. Part III begins by describing the tradeoffs involved in making such changes. It then examines which of these tradeoffs are implicated by scholars’ proposals for legal automation but are overlooked by a focus only on legal outcomes and reasoning.

We begin with the AI & Law literature and behavioral equivalence.

Behavioral Equivalence and AI & Law

As AI further permeates modern life, the third decade of the twenty-first century promises exciting changes, including to the judicial system. AI researchers have long studied computational modeling of legal reasoning, and legal scholars have begun contemplating which parts of the legal process might be replaced or improved by computer systems. This research largely focuses on recreating the outcomes and reasoning techniques of current legal processes, with the (at times, implicit) assumption that an automated system that works the same way as the current system could replace it.

But what does it mean for two systems to work the same way? Computer scientists asking this question examine the systems’ performance through the lens of behavioral equivalence. Behavioral equivalence allows one to turn away from the philosophical question of what something is and towards the grounded question of what it does. Two systems[24] display equivalent behavior (meaning they are behaviorally equivalent) when they yield the same output given the same inputs.[25] A behavioral equivalence analysis involves determining what counts as the outputs of those systems, and observing them. Insights from behavioral equivalence are not limited to computer science, and behavioral equivalence therefore provides a method by which to examine the effects of replacing legal subsystems—by computer algorithms, or new analog systems—and to reason about the tradeoffs created by making those replacements.

This Part begins by introducing scholarship in AI and law that contemplates replacing some aspect of the legal system with a computer. It then introduces behavioral equivalence, explains why equivalence is always relative to some observer, and explores the implications of the observer. Throughout, it illustrates how to apply these concepts to legal systems.

Scholarship on Automating Legal Decision-Making

For over thirty years, computer scientists in the AI & Law research community have created computational models of legal reasoning. While these researchers have never argued that their systems ought to eventually replace human decision-makers,[26] the existence of these models and other AI advances invite the public and members of the legal academy to imagine that they might one day be used in legal decision-making. Indeed, computer systems have recently become sufficiently advanced that legal scholars focused on AI[27] have begun to imagine what sorts of legal reasoning are ripe for being replaced by AI. Work from both research communities reveals a common assumption that the decisions of a legal system, and the reasoning with which it arrives at those decisions, lie at the core of faithful models of the judicial system.

AI & Law

A survey of landmark research in AI & Law—including work on precedential reasoning by analogy, with rules, and using machine learning, and on models of legal argumentation—shows that AI & Law research generally focuses on modeling the reasoning in or outcomes of legal cases. Understanding what these systems cover is necessary to understand the blind spots in the literature and their implications for judicial automation.[28]

The HYPO family of algorithms, developed by Professor Ashley and colleagues, uses a library of resolved cases to reason about new cases.[29] These algorithms are tailored to particular domains[30] and represent cases as collections of legal factors—legally salient concepts identified by researchers—along with the outcome of the case (for the cases in the library).[31] A HYPO-style algorithm first retrieves the most similar case—the one sharing the most factors with the new case—from the library and proposes its outcome for the new case. It then responds, by retrieving the most similar case with the opposite outcome and proposing that the differences in factors across the retrieved cases are salient.[32] Finally, it responds to the counterargument. HYPO algorithms thus model at least three components of legal reasoning: reasoning from case precedents, legal argumentation, and deriving verdicts.

Others have used AI techniques to predict case outcomes from precedents using logical rules extracted from cases (sometimes also relying on factors). Professor Horty developed a model that weighs rules extracted from precedents to determine which rules should apply to a new case.[33] This model captures not only what rules apply but why, to distinguish otherwise-applicable precedents. Professor Verheij uses case models: sets of cases in formal logic that are collectively logically consistent, different, and mutually compatible, and which together encode logical rules governing a body of law.[34] These cases can be applied as rules or by analogy.[35] Like HYPO, these approaches model both the verdict in a case and its justification.[36]

AI & Law research has not only focused on modeling judges’ reasoning and decision-making. For example, researchers have studied how to build coherent stories that connect case facts to the rules that resolve it.[37] Instead of predicting outcomes, such approaches explain how outcomes are derived. Others have studied how to evaluate formal argument structures, in case-based reasoning systems[38] and using logic.[39] The focus is again on modeling reasoning, but of litigants, not judges. Research on argumentation examines not only the structure and form of arguments, but how to manage and resolve them. For example, Professor Prakken has described a formal model that captures litigants’ discourse, along with an adjudicator who manages the dialogue and tracks whether burdens are shifted or have been met.[40] This model can capture the reasoning of parties and judges, but it also models the process itself.

Finally, several researchers have used Neural Networks to predict case outcomes from case facts.[41] One such approach also identified intermediate factors that contributed to the outcome.[42] Although this system generated both predictions and the facts underlying them, its reasoning is unlike that which humans use to solve similar cases; instead, its data-driven approach leverages human annotation and computer-derived measures of similarity.[43]

Though researchers sometimes suggest their systems could help lawyers, pro se litigants, and adjudicators make and analyze arguments,[44] none claim their systems can or should replace human decision-makers, or that their models capture all of the information within the legal system.[45] A recurring theme in the literature is that modeling work can help illustrate and formalize how legal reasoning works, to help the legal world understand itself.[46] But these and other advances in AI naturally invite legal scholars and the public to consider how AI systems might be given a role in legal decision-making. In fact, legal scholars have begun thinking along these lines; we now turn to them.

Law & AI

As with researchers in AI & Law, legal scholars who consider what parts of the legal system could be automated and what it would mean to do so tend to focus on automating the system’s internal processes. They also look to those same internal processes to discover the consequences of that automation, a view which this Article argues will fail to detect many such consequences.[47] We examine several proposals in turn.

Several legal scholars have imagined that computer systems could independently make factual determinations. Professor Gowder argues that a machine learning system could eventually be used to determine whether, for example, something is a vehicle for the purposes of a “no vehicle in the park” rule.[48] Professor Gowder argues that such a system could never entirely replace human judges because human judgment is necessary both to fill in gaps in—or to change—the law, and for the requirements of procedural justice.[49] Professor Livermore argues that, far from leaving gaps for a human to fill, a Deep Learning system that classifies potential vehicles in parks might solve legal indeterminacy by allowing policymakers to write new kinds of laws, such that something is a vehicle under the law exactly if the algorithm says it is.[50] Professor Livermore argues that such a system would eliminate the need for human participation in resolving certain kinds of disputes.[51] And Professor Genesereth suggests that the technology in our pocket could collect and use data to inform us that we are violating the law (for example, our cell phones could tell us if we are speeding)—a use he characterizes as taking the role of a friendly police officer advising us on the law.[52] While Professor Genesereth imagines a benign, helpful legal advisor, the same technology could be used as an unfriendly police officer that writes users a ticket.

Legal AI systems could do more than make factual determinations. Professor McGinnis and Steven Wasick argue for dynamic rules, rules and standards that change depending on empirical data, without human intervention.[53] Professor Coglianese and David Lehr argue that administrative agencies could use Machine Learning like any other tool, not only to adjudicate disputes but in crafting regulations.[54] And Professor Volokh has argued that an AI which passes the Turing test, such that it could write a persuasive judicial opinion indistinguishable from that of a judge, should be allowed to be a judge in actual cases.[55]

These pieces paint a picture of an emerging focus in the legal academy regarding what legal processes could be automated: factual determinations, crafting rules, determining whether and how some rule applies, and composing judicial opinions.[56] As with the research in AI & Law,[57] the focus is largely on what occurs in the courtroom: the interior systems of the legal process.[58] That list surely covers a great deal of the legal process, and perhaps an automated system that perfectly models these elements could replace the humans who currently do so, with minimal effect upon the overall system.[59]

This Article argues that an exclusive focus on reproducing reasoning or outcomes leaves out important considerations. By attending only to perspectives internal to the judicial system—how judges and litigants reason and argue—it leaves out those external to the operations of that system, that is, the perspectives of interests who are not involved in court cases. Thus, a system that automates legal decision-making “works the same way” as the current judicial system only to the extent that legal decision-making is the only part of the judicial process. And as this Article will show, it is impossible to build a system that will “work the same way” as the current one in the eyes of everyone who cares about how the legal system works.[60] To see why, we must examine what it means for two things to work the same way. Behavioral equivalence is a natural lens through which to examine this question because it is the tool with which computer scientists assess the consequences of replacing one program with another, and this Article contemplates replacing legal processes with computerized ones.

Before discussing behavioral equivalence and the law, note two basic assumptions this Article starts from to limit the scope of its analysis. This Article explores what it means to substitute components of the legal system with computational systems that mimic their operation, and how to examine the consequences of doing so, but is not concerned with how to accomplish that substitution. It therefore does not address two questions crucial to doing so: “Is this actually technologically possible?” and “Would this be legal?”

The first assumption holds that it will one day be possible to perfectly mimic any given legal process’s computation of legal conclusions as an input-output system.[61] That is, given some input to a legal subprocess, this Article assumes technology will one day exist that computes the same output as does the current system.[62] For example, given some system that rules on whether evidence is admissible, this assumed system would generate the same answer and explanation as the judge who would otherwise rule on it. Whether this is actually true for any given legal process is a separate question orthogonal to this Article. This assumption is revisited in Part III.

Second, this Article assumes such a system could be legitimately authorized. The issues of authorization and democratic delegation of authority[63] implicated by this discussion are not so obviously insurmountable that it is useless to even consider what it means to replace current systems with digital substitutes.[64] This question of how an automated system could properly be authorized is a serious one to be dealt with more directly in future work.

We begin with these assumptions so as to be able to describe and discuss behavioral equivalence in the ideal case.

Behavioral Equivalence

Court systems are not about to be replaced by computers.[65] But the judicial system is not a single process. It is a set of interconnected processes that interact with and hand off to each other,[66] and computers have already been used to replace some subprocesses. For example, automated recidivism predictors are used to predict whether criminal defendants should receive bail, supervised release, or a prison sentence—determinations that used to be made by humans.[67] And as discussed above, scholars in both law and AI have described and developed systems that might one day replace aspects of human judicial decision-making.[68] Because this Article examines the consequences of replacing elements of the judicial system with algorithms, it is natural to use tools from computer science that evaluate such consequences.

Behavioral equivalence is a concept developed in engineering research that has been especially studied within the Computer Science literature concerning programming languages (PL).[69] (PL as a field refers to the study of principles underlying the programming languages with which users can write programs; the field, which is capitalized and abbreviated in this Article, is distinct from its objects of study.)[70] Two systems are behaviorally equivalent if they behave in the same way,[71] but what does it mean to behave the same way? To avoid the sometimes-tricky question of precisely defining behavior,[72] PL researchers ask whether two systems are observed to behave the same way. This subtle change in formulation avoids defining behavior and instead asks whether two processes are indistinguishable, which focuses on the perception of that behavior by the system within which a process is embedded.[73]

To illustrate the concept of behavioral equivalence, imagine two programs that perform addition and are given the equation “47 + 85.” The first one computes the way children are taught to in school, adding digits from right to left using memorized single-digit sums and carrying the one as needed. It first adds “7+5” (which it knows is “12”), and records a “2” in the rightmost position. It then adds “4 + 8 + 1” and records the result. The second program instead increases the left number by “1” and correspondingly decreases the right until the latter is “0,” at which point it returns the left number: “47 + 85” becomes “48 + 84,” then “49 + 83,” and so on until it has “132 + 0.” Both programs return the same answer—the first in three operations and the second in eighty-five. To a user who observes only the programs’ outputs, they are behaviorally equivalent. But crucially, these programs are not equivalent to the programmer who wonders why her computer runs more slowly when using the second program and who checks the programs’ memory usage. Thus when one asks whether a component of some larger system is behaviorally equivalent to a potential replacement, one is asking whether the old component can be replaced with the new without the relevant observer being able to tell the difference after the switch. Whether a process is equivalent therefore depends on the observer: to a novice user the two addition algorithms are behaviorally equivalent, but not to the expert attuned to her computer’s performance.

Behavioral equivalence provides a way to focus on what matters while ignoring what does not: it asks what something does, not what it is. Behavioral equivalence helps to abstract information, to strip away irrelevant details.[74] For example, filing a lawsuit on the morning of the last day before the statute of limitations runs is behaviorally equivalent to filing it that afternoon. Morning and afternoon are different times, but for the purposes of filings, they are the same: What matters is not the filing time, but the date.[75] Or consider the legal subprocess to determine whether evidence is hearsay. One can imagine a variety of alternative ways of making that determination, and any such replacement could be evaluated for its behavioral equivalence to the current system.[76]

Currently, judges decide the admissibility of evidence.[77] Judges can offload this work to a clerk, checking the clerk’s work and following the clerk’s reasoning, rather than completing the work themselves.[78] The clerk working through the admissibility analysis and the judge thoroughly checking it is equivalent to the judge doing the work herself, assuming she would have come to the same conclusion.[79] Alternatively, the judge might simply rubber-stamp the clerk’s work without examining the analysis, only glancing over it to ensure the order is complete and to see the bottom line she will declare. Critics might reasonably have reservations about endorsing such a practice,[80] but the ruling would be as binding as if the judge had done the analysis herself.[81]

We can also imagine a computer algorithm that does the admissibility analysis and outputs both a decision and some minimum explanation of it. The rules of evidence might be encoded as logical rules, information about the evidence translated into a logical format, and the information fed to the rule-based system.[82] Or a dataset of real-world evidentiary rulings could be used to train a machine learning system to make those decisions.[83] Or the rules of evidence could be used to generate a dataset of fake but realistic evidentiary rulings to train such a system.[84] As in the current regime, these systems would take in information and output an admissibility decision.

What would have to be true to conclude that one of those theorized computer systems is behaviorally equivalent to the judge or her clerk doing the analysis (assuming the judge signs off on the system’s output, as with the clerk’s)? One could examine whether the system perfectly mimics an individual judge: for admissibility questions where the judge’s ruling is known, does the system output the same ruling? More importantly, for new admissibility questions, can any decision be attributed to the system or to the judge? Because individual judges may display some variability in their decision-making, an analyst might instead gather several decisions from both the judge and the computer to see whether they could match decisions to their author. Judgments vary not only within judges but across them, so yet another possibility is to ask whether the system’s judgments are consistent with a range of judges’ judgments. Thus the analyst could poll five judges plus the computer about whether evidence should be admitted, and see whether they could pick out the computer’s judgments.

While this Article contemplates replacing components of the legal system with computer systems, behavioral equivalence also helps to analyze the effects of swapping those components with any replacements, not just automated ones. Clerks’ duties provide a helpful illustration. For example, the cert pool system at the Supreme Court, where clerks prepare memoranda regarding certiorari petitions for many Justices at once, replaced a regime wherein each Justice’s clerks prepared memoranda for every certiorari petition, allowing clerks more time to attend to other duties.[85] Every petition still gets a memo and every Justice still receives one, but the memoranda are no longer tailored to individual Justices. Though it frees up clerks’ time, the cert pool system has been credited for reducing the Court’s case load.[86] If the observer evaluating behavioral equivalence only checks whether Justices get memos, the cert pool system might be equivalent to the old regime; if it observes secondary effects, it might distinguish them. Similarly, before judges regularly had clerks[87] they drafted all their own opinions, but the practice of having clerks draft opinions for judges is now widely accepted.[88]

If a clerk’s work is treated as behaviorally equivalent to a prior practice, it is because the work is seen as in fact coming from the judge.[89] Indeed, the judge is supposed to check the clerk’s work sufficiently closely that it truly becomes the judge’s work.[90] But would anyone be able to know whether a judge is checking the clerk’s work, or is rubber-stamping it without even reading it? If a judge goes into her chambers alone with a clerk’s draft opinion and emerges with the opinion entirely unchanged, is there a difference between the judge who read the opinion carefully and was so impressed that she didn’t change a word, and one who played solitaire and did not glance at the opinion?[91] What if the clerk was particularly competent, and was so good at predicting a judge’s reasoning and style that over time the judge has come to find that she need not edit the clerk’s work, because any change would make the opinion less like that the judge would have written herself? Every word is written by the clerk; the judge reads and agrees with them, changes not a one, and affixes her signature. It seems there is no meaningful difference between that situation and the one where the judge fixes errors in the clerk’s draft. But what if over time, the judge comes to conclude that there is no point to her reading the opinions, because she knows she will not want to change a word? Why should anyone care if the judge reads it, if the reading is simply a side effect that does not affect the decision-writing process?[92]

The question is facetious—of course people would care if they learned a judge did not bother even to review opinions written by her clerk.[93] They might also conclude the opinion lacked legitimacy since the clerk was not Senate-confirmed with Article III protections of salary and tenure (or appointed as a judge under state law). But the issue is not whether this would happen, or what the consequences would be. The question that nags is: how would anyone know?

This returns us to the key insight that behavioral equivalence is only definable relative to some observer.[94] The judge will know what she did but no one else will, not even the clerk, if the judge assures the clerk he simply did a perfect job drafting the opinion. The case will proceed as though the judge herself wrote the opinion, because no relevant observers will be able to tell the difference between the legitimate process and the illegitimate one. The illegitimate process is behaviorally equivalent to the legitimate one because two processes are behaviorally equivalent exactly if the observer cannot distinguish between them. The processes having different internal mechanisms or side effects does not defeat behavioral equivalence if the observer is blind to those factors.[95] Furthermore, the observer must be defined within the outermost bounds of the larger system: if some observer outside the system can distinguish between the two processes but cannot transmit that information inside the system, then those two processes are still behaviorally equivalent for the purposes of that system.[96] The judge may have a wise pet cockatiel in her chambers that knows if the judge is working or not, but because the cockatiel has no way of communicating that information, it does not defeat behavioral equivalence.

In summary, asking “Do these two systems behave the same way?” can involve an unwieldy inquiry because it assumes a clear understanding of what constitutes behavior.[97] Behavioral equivalence reframes the question as “Can these two systems be distinguished?”, which emphasizes the observer of the system rather than the system itself. And this focus on the observer yields further insights about evaluating whether two systems are equivalent.

Implications of the Observer

That behavioral equivalence is only ever relative to an observer carries three implications: to the omniscient observer a process has no equivalent; two systems need not be strictly identical to be behaviorally equivalent to a non-omniscient observer; and a clever malefactor who knows how a system works can manipulate it undetected by the observer. We examine each proposition in turn.

  1. The Omniscient Observer Defeats Behavioral Equivalence

An observer who can observe everything about some process—an omniscient observer—defeats behavioral equivalence, such that a process is only equivalent to itself.[98] In our earlier arithmetic example, an observer who witnessed the steps the two addition algorithms went through would be able to distinguish them by their internal states. Similarly, an observer who peered into the judge’s chambers could distinguish between the judge reading the draft opinion and playing solitaire. A judge who knew she was being watched could hold up the draft opinion and pass her eyes over it to fool that witness into thinking she is working, but an omniscient observer—one who observes truly everything about a process—could read the firings of her synapses and distinguish her pretense from actual reading.

Just as a single omniscient observer defeats behavioral equivalence, so too does the union of all possible observers.[99] Our discussion of behavioral equivalence focuses on the observer because it directs our attention to that which is being observed. If every part of some system’s behavior is observable, then all possible observers collectively observe everything, just as a single omniscient observer does. For example, a neuroscientist could strap the judge into a brain-scanner and, from the next room, determine if—though not what—the judge is reading. Sitting inside the room is the clerk, who can see what paper the judge’s eyes are focusing on. Neither alone can tell if the judge is reading the clerk’s draft opinion, but their combined observations can. Together, they are equivalent to the omniscient observer who sees everything about some process.

  1. Behavioral Equivalence Does Not Require Absolute Fidelity

If the observer is not omniscient then behavioral equivalence demands less than strict identicality, because two systems that differ only in ways the observer cannot detect are behaviorally equivalent. This implies that if any part of the legal system is nondeterministic,[100] an algorithm could still be behaviorally equivalent to it so long as the observer sees them as operating the same way.[101] For nondeterministic systems, behavioral equivalence does not demand the same outputs given the same inputs. Instead, it requires that both systems either compute an output by using the same rules to pass through the same set of intermediate states (though not necessarily the same actual states),[102] or that the outputs of both systems fall along the same probability distribution.[103] The legal system already tolerates the same legal processes leading to different outcomes, for example, when a judge makes increasingly harsh parole decisions as lunch approaches, then becomes more lenient again after eating.[104] It also already tolerates observable differences across sets of outcomes, for example, when different judges at the same court have observably different patterns of rulings or are more or less favorable to plaintiffs.[105] Different rulings within and across judges at the same court are still treated as equivalent within the legal system.

Thus, for those parts of the legal system that are nondeterministic, it is already acceptable for an observer not to be able to differentiate between different outputs from the same input. When this nondeterminism occurs inside the brains of humans, it occurs out of sight of the legal system and any observers of it. When it is visible (e.g., the assignment of judges to cases), different instantiations of a process are still meaningfully considered to be the same process.[106] A case proceeding through a district court takes but one of many possible branching paths as variables in the court case are resolved: the case will get only one of several possible judges; only a dozen of many possible jurors; and only one of a variety of outcomes. If it travelled a different path nearly everything about how the case proceeded might be different. Nonetheless, no matter what path the case travelled, it has received the same legal process: any of these paths that do not involve a legal error are treated as equivalent to each other. This is true even when litigants try to control or account for the variables, for example through forum shopping, calibrating arguments to judges, and other tailoring of cases based on the variables in court.[107] When litigants affect the process, they simply become additional sources of nondeterminism and complexity within that process. Unless the case is resolved in a way that is traceable to the peculiar idiosyncrasies of a judge (a judicial signature, of sorts), an observer will be unable to tell from the case’s outcome what process the litigants received.

Assessing behavioral equivalence in a nondeterministic system means asking whether an observer would see the system’s outputs as equally acceptable or probable under the old regime, not whether they are the same. The bar to achieve behavioral equivalence might thus be lower than one might assume, since it does not require perfect mimicry. Regardless, this Article assumes that even nondeterministic legal processes can be modeled, and the equivalence of such systems assessed.[108]

  1. Knowledgeable Bad Actors Can Take Advantage

One downside of behavioral equivalence being relative to an observer is that a bad actor who understands the internals of how a system works might take advantage of that knowledge and manipulate the system, out of sight of and undetected by the observer.[109] For example, imagine a computer program that stores users’ private information (say, social security numbers) and only reveals such information when given some password. Unbeknownst to anyone but the programmer, the program stores this information on a computer’s hard drive such that one user’s ID is stored in memory location 1, then that user’s SSN is in memory location 2, then the next user’s ID is in memory location 3 and their SSN in location 4, etc. This is generally a perfectly secure way to store this information. But if some malicious actor knows the program is written this way, he will know that if some user’s ID is stored at memory location N, he can find the corresponding SSN by reading the contents of memory location N+1, without going through the secure program and providing a password.[110] Observers may also have blind spots that should be observed and can be exploited.[111] Exploiting unmonitored blind spots is like a bank robber drilling directly into a bank vault from the storefront next door, bypassing the security measures in the bank itself.[112] And though a banker is likely to eventually notice an empty vault, other back-door exploits (like information security leaks) might go entirely unobserved and never be detected.

Thus, if an algorithmic replacement for some aspect of the legal system cannot be distinguished from the current system by the relevant observer,[113] that replacement might still be exploited by someone who knows its internal operations, like a defendant blackmailing a judge’s trusted clerk to persuade the judge a case should be dismissed. Some such bad actor might get the system to consistently rule in her favor.

A related issue could arise if the new system was computationally faithful to the old system but was more transparent about its operations. This might lead savvy participants to provide different inputs to make their preferred outcomes more likely than they previously could have. For example, a judge might find an expert witness exceedingly persuasive and always rule for the side that engages her, but if the judge publicly stated “I’ll never rule against this persuasive witness,” litigants might start hiring her in every possible circumstance. In such a case, a change alters the overall system not by changing its operation but by changing what inputs are given to it.

Calibrating inputs to the legal system to get one’s desired output already happens, of course, and does not stop at jurisdiction shopping.[114] Litigants may tailor their arguments to judges based on what they think those judges will find persuasive.[115] A defense strategy at trial or whether a criminal defendant takes a plea deal can depend on which judge was assigned to the case.[116] And sadly, there is evidence to support the perception that litigants receive the justice they pay for.[117] The issue is not that introducing a behaviorally equivalent algorithm to replace some part of the legal system will allow people to manipulate that system to their advantage; that is already happening. The issue is that the observer’s blind spots may introduce new ways to manipulate the system. Some blind spots overlook backdoor exploits; others overlook whether the system transmits information that can be used to carefully calibrate inputs to the system that manipulate it towards a desired outcome.

Having explained the basic concept of behavioral equivalence and the effect of it always being evaluated relative to some observer, this Article now returns to the consequences of replacing aspects of the judicial process with computer systems, by asking what the observer evaluating behavioral equivalence in the legal domain should be, and what it will observe.

Potential Observers Evaluating Behavioral Equivalence in the Law

Part I explained how to consider replacing part of a legal process with a different process,[118] and that when two processes are behaviorally equivalent, it means some observer cannot distinguish between them. This Part explores who the relevant observer of the legal system might be, and what they might observe. Attending to observers is critical not only when one wants to ensure a system’s replacement leaves its overall function unchanged, but when one seeks to change how a system works, because the observer and its observations define where changes will be detected. That means the observer defines what counts as a change: a change the observer cannot detect is perceived as no change having been made at all. Carefully specifying the observer is therefore critical when considering altering the legal system, whether to preserve its functionality or alter it, because observers define which changes are perceived and which are ignored.

Several observers are considered, beginning with the judicial system itself, then moving to other observers in society at large. Each has its advantages and disadvantages and corresponds to different benchmarks for how faithful a computational replacement to the current legal system ought to be. Using the judicial system as the observer might make behavioral equivalence possible, but at the cost of threatening the system’s legitimacy by ignoring the ripple effects from changing the legal system that are felt outside the courthouse walls.

The Judicial System as the Observer

Research in AI & Law has focused on modeling reasoning, argumentation, and decision-making in the legal system;[119] research in Law & AI has imagined automating fact-finding and other legal reasoning and analysis, and establishing computer-augmented legal standards.[120] When scholars describe systems as reproducing such behaviors, behavioral equivalence instructs that they must mean displaying the same behavior in the eyes of some observer.[121] So what observer sees the outputs of such systems and can evaluate any changes for behavioral equivalence? That is, is there an observer that observes enough about the legal system to determine whether systems designed to faithfully replicate current behavior do so, while others designed to effect change lead to the desired results?

The judicial system itself, including the judiciary and its processes, might serve as such an observer. The judicial system is the natural observer for the changes proposed by scholars in Section I.A because those changes concern processes interior to the judicial system itself: the rules applied in cases, the arguments made therein, the reasoning used to resolve them and the final decisions governing them. Because these subprocesses are all internal to the legal process, the larger judicial system is not only well-positioned to observe and regulate them, but already has mechanisms to do so, most importantly the appeals process. These could be used to monitor changes made by automation. Using the judicial system as the observer would also require potential replacement components to be relatively less faithful to current components. This Section describes what the judicial system’s observations would entail, addressing first the question of how it would make observations to determine behavioral equivalence, then what it would observe.

  1. How the Judicial System Could Observe Equivalence

An observer of a system captures information about that system, information that is potentially transmitted back into the system to further affect its processing.[122] The judicial system processes and disposes of cases, and those processes, outcomes, and their justifications are among its outputs.[123] These are further processed by higher courts through appeals, whose decisions are in turn fed back into the judicial system. The appeals process is therefore already a mechanism by which the judicial system observes its own outputs. The judicial system as the observer could therefore assess whether two processes are equivalent by asking whether a difference between them could be the basis of an appeal (and potentially a reversal). Under this view, if two processes are identical but for a single difference, and the parties could not have appealed the case under the old system had that difference occurred, then the judicial system treats them as equivalent. Thus a replacement system is behaviorally equivalent to that which it replaces if its operation and output would not be reversed on appeal if they occurred under the old system.[124] Note that the question is not whether the outcome under the new system would be reversed (although that is salient), but whether anything that happened (or failed to happen) differently in the new process would lead to reversal had it occurred under the old one.

This formulation of the observer and its observations is appealing because it corresponds to how the judicial system currently manages itself. For example, a party cannot appeal a case simply because he was unhappy with which judge was randomly assigned to hear it. Similarly, judges are given discretion over many aspects of managing a trial, including ruling on admissibility of evidence.[125] Parties may have difficulty appealing a case over a ruling on admissibility if the judge has carefully explained the reasoning underlying her exercise of discretion, even if the case turns on the presence or absence of that evidence or the reviewing judges might have ruled differently. In such cases, the system has determined that the parties have received a legally indistinguishable process from the one where the admissibility decision went the other way, even if the outcomes would be enormously different.

Under this formulation, the judicial system already knows how to evaluate behavioral equivalence in the legal process and has been doing so since before the founding of the country; it would just apply that evaluation process to new systems and ask whether cases under the new system would have been appealable under the old system. Furthermore, under this definition many things are already known to be behaviorally equivalent because they are currently not subject to appeal, while others are known not to be equivalent. For example, any judgments that require explanations could not be made by a system that does not generate any;[126] nor could a system exercise discretion that judges lack. That said, the insight that current legal doctrine already defines certain equivalence relationships previews a criticism of this formulation, that the current legal system’s doctrine governing appeals ignores too much. For example, while statistics about general trends may be used to draw inferences about specific cases in some instances, this is not true generally;[127] an automated system that significantly changed a pattern of decisions across cases but whose individual decisions were always acceptable might therefore be seen as behaviorally equivalent to the current system.[128]

Nonetheless, using the formulation of non-appealability as equivalence, we can imagine some potentially valid replacement systems from the constraints placed on current components of the legal system. For example, jurors may already be struck from a jury pool for a variety of reasons,[129] so a computer system that helped determine which jurors to strike (for reasons other than those forbidden by law)[130] would not be the basis of an appeal and therefore would be behaviorally equivalent to the lawyer doing it herself, even if juries ended up with noticeably different compositions compared to now.[131] Lawyers who meet certain criteria are treated as legally interchangeable subject to certain constraints;[132] a party represented by a computer system that met these criteria and was approved to practice law could presumably not appeal based on its competent representation—if they could, one would ask why the computer had been approved to practice law.[133] Computers already help judges make bail and sentencing determinations.[134] A judge could similarly get assistance from a computer to help determine how to rule on a range of motions, and those rulings could be upheld as not being an abuse of discretion, even if they differ from how the judge would have ruled without the computer’s input.

Thus far our discussion has focused on what happens within a trial, but the idea of the judicial system as the relevant observer determining behavioral equivalence could apply beyond the walls of the courtroom. In fact, the judicial system already routinely uses this concept without framing it as such. This is a sort of de jure equivalence: binding arbitration is legally behaviorally equivalent to litigation, even though the practice is controversial and its outcomes are not equivalent to those of court cases.[135] Similarly, when states conform to federal standards to receive federal funds, the determination that the standards are met amount to a declaration that the states’ programs are equivalent for the purposes of federal law, no matter their differences.[136] International treaty law performs a similar role in determining compliance with the rules by which nations agree to be governed.[137] What these observers (and the appeals system) have in common is that they are top-down, with the observers clearly defining what they will be observing. While our discussion has focused on who the observer is, what is being observed crucially defines what will be seen as behaviorally equivalent. We now turn to these observations.

  1. What the Judicial System Would Observe

Recall that, under the formulation of the judicial system as observer, a replacement to some component of the legal process is behaviorally equivalent to it if the replacement does not lead to reversible error. Therefore an effective observer within the judicial system must monitor every source of such errors. One obvious place to start is on the outcomes of cases based on their facts: if a party is ruled against but there is not evidence sufficient to support a finding of their liability, the case will be reversed.[138] This narrow frame treats the judicial system as a pure input-output system, a black box. But this formulation is insufficient: even determining that the facts could not support the judgment requires peering inside the system to examine the rule or standard used to reach that judgment.[139] The observer must therefore also have access to the rules and reasoning used to derive outcomes from inputs.[140]

But even the inputs, outputs, and rules are not enough, because the intermediate states those inputs go through, and the process by which they are transformed, can also be the subject of an appeal. The observer must track not only the rules themselves, but how they are applied. Our judicial system is not one large system that cranks over facts and spits out a decision along with its justification, but many subsystems yoked together, and the process by which they transform and operate over things from one step to the next matters as well as what is transmitted across those steps. When a criminal defendant is denied the opportunity to object to evidence as being obtained through an illegal search, the denial of due process (the right to object) might lead to an appeal even if the evidence is otherwise admissible. The judicial system observes not only the decision to admit the evidence, but the process by which that decision was made. Thus when the judicial system observes itself, it is not enough to know what conclusion some system has come to and why, but how. Because the judicial system protects due process, the process by which rules are applied, and not only the rules and decisions themselves, must be considered an observable output of the system that factors into any determination of behavioral equivalence.

The inputs to a legal process include not only filings and evidence but the participation of the parties, and due process requirements govern what is owed to those participants. An observer evaluating behavioral equivalence in the judicial system must therefore attend not only to inputs, outputs, rules, and process, but participation too. For example, if the government were to use a new automated system to determine when to take away something in which a person has an interest, that system must give the person the opportunity to respond (and must attend to the response).[141] Nor would an automated decision-maker be allowed to deny a criminal defendant the right to an attorney.[142] These violations constitute reversible error in our current system; if “cannot constitute reversible error” is the standard for behavioral equivalence, the observer must include due process amongst its observations.

  1. Left Unobserved: Procedural Justice and Harmless Error

Due process moves our discussion beyond questions of reversibility and to core issues regarding the role of a legal system in society and how its citizens accept it. Due process implicates procedural justice; the justice inherent in the system itself rather than only its outcomes.[143] Due process does not act only to guarantee the correct outcome—indeed, it sometimes requires letting a person the court knows is guilty to walk free—but also to confer legitimacy upon and public trust in the proceedings.[144] In fact, legal systems may be more capable of ensuring equitable and fair dispute resolution than determining with certainty the facts in dispute.[145] This evenhanded and fair treatment is crucial to the legal system’s legitimacy.[146]

The problem with treating the judicial system’s appeals process as the relevant observer is that sometimes process violations are deemed “harmless errors” and do not lead to reversal. Our formulation of that observer would therefore sometimes treat a proceeding with a process violation as behaviorally equivalent to one without. This observer might thus confer the blessing of behavioral equivalence upon a system that, through compounding harmless errors, lacks legitimacy in the eyes of the people whose behavior it regulates.

Harmless error occurs when a reviewing court finds that a lower court made an error, but the error did not affect the outcome of the case or overly offend principles of justice and therefore does not require reversal.[147] For example, when a court instructs the jury to apply the wrong standard to determine guilt, the error is reversible if the jury convicts using too low a standard, but not if the jury convicts using too high a standard.[148] Under the reversibility formulation of behavioral equivalence, that rule means that applying too high a standard of guilt when finding a defendant guilty (but not when acquitting them) is exactly the same as applying the right standard.

The harmless error rule was developed to promote judicial efficiency by only reversing a case for an error that has a substantial effect on the rights of the parties.[149] The Supreme Court has furthermore said that “the central purpose of a criminal trial is to decide the factual question of the defendant’s guilt or innocence,”[150] meaning that when a trial record establishes guilt, errors should be considered harmless[151] notwithstanding a process error that may have even included a violation of a constitutional right. But because harmless errors do not lead to reversal, the judicial system would treat a replacement system that increased the incidence of such errors as behaviorally equivalent to one that eliminated them.

It is a sad observation that a system which routinely applies the wrong rules and standards against overwhelmingly evidently guilty criminal defendants would be legally indistinguishable from the legal system we have now. It also suggests that the judicial system’s appeals process is the wrong observer for evaluating behavioral equivalence, because it would treat too many things as equivalent to each other when they patently are not. Appeals might not be the only way the judicial system could observe its own operations: even though harmless error is not reversible, for example, a court can take note of whether an automated system led to more such errors. But the judicial system’s current tolerance for significant differences across processes and outcomes suggests that courts may be too insensitive to such differences to be the only observers of whether a changed legal system is faithful to the original. Litigants whose rights may be burdened by such changes (and their lawyers) have a great interest in detecting whether changing a legal process changes its outcomes. And why stop at litigants and litigators? The legal system is embedded within a larger society, and broadening our analysis to include other interests within that society reveals that many changes might affect those interests in ways the judiciary might not observe. We therefore must consider these other societal interests as potential observers of whether replacements to the legal system are behaviorally equivalent to that which they replace.

Other Observers

The judiciary’s appeals system is the wrong observer, indifferent as it is to certain errors and extra-judicial interests. Modern society furnishes many possible alternate candidates interested in the operation of the judicial system; one or more of these, or the union of all of them, might be the relevant observer. These interests may care about those parts of the legal process that the appeals system treats as irrelevant to disposing of a case, and would not describe a system that did not reproduce them as equivalent to the current one.[152] Furthermore, focusing on these external observers can draw attention to the unintended, downstream consequences of making changes to the system.

If there are many such observers, why only consider the perspective of one? The proper observer may be the union of all the possible observers in society interested in the operation of the legal system. This mega-observer—what this Article calls society at large—assesses behavioral equivalence by asking whether any societal constituent observes the part of the system being changed, can differentiate the new version, and cares about the difference.[153] Society’s observations must be attended to because the legal system acts on society’s authorization. It is literally authorized through the adoption of the Constitution that vests the judicial power in the judiciary and through the elected Congress which created and populated lower courts that exercise that power (and the corresponding authorization of State courts). But the legal system is also meaningfully authorized by society because society’s institutions exist to ensure justice.[154] If that authorized system is significantly modified in a way that is felt by the body politic and that alters it from that which was authorized, then the new system will lack legitimacy.[155]

Unfortunately, as this Section will demonstrate, no aspects of the legal system can be assumed to be unobserved by society at large. Because the legal process is fully embedded within the society in which it operates, a replacement system that generates the same explanations and intermediate representations as the current system in a way that satisfies litigants will still be perceived by—and may affect the interests of—other observers outside the legal system. Under this view, the legal system is like a kidney, cleansing for the body that which passes through it. Looking only at the system itself and its effects on those who pass through it is like taking the perspective of the blood being cleaned. To the blood, a dialysis machine may be equivalent to a kidney. Not so to the person attached to the machine.

Examining the interests of stakeholders in society at large draws focus back to procedural justice, the justice inherent in a system, rather than substantive justice, the just outcomes that system produces. The interests in society-at-large may care less about the final allocation of rights and duties than about how the allocation happens.[156] It is especially important that procedures be fair because though the procedures of the American legal system are designed to arrive at the right result, they may not do so perfectly[157] (and indeed, convictions get overturned).[158] A computer is unlikely to always arrive at the right result through perfect procedures,[159] because it would need perfect knowledge to guarantee those results and never make errors the current legal system makes.[160] Even were such a system possible, the populace submitted to it—to whom the subjective experience of fair treatment by the legal system matters, regardless of their cases’ outcomes—might not perceive it as just.[161]

There are many observers beyond the judicial system that might care about the effects of altering that system. Unless there is a reason to ignore one,[162] all these interests should be attended to when considering such changes. We examine several such interests, and the implications of considering the observations of society at large, or the union of all these interests together.

  1. Observers at Large and Their Observations

Litigants’ observations are closely related to those of the judicial system; they overlap but are not equivalent to them. Litigants care not only about the outcome of the case, but about how they can argue it, and their interests go beyond what might lead to reversible error. For example, the legal system is largely unconcerned with who the attorney representing the party is,[163] but the litigant will care if his attorney is rude or unresponsive in addition to her performance as a litigator.[164] Indigent criminal defendants are not always granted their requests for new representation, but would surely see a system that always denied those requests as meaningfully different from the current one that only sometimes denies them. Litigants care about being heard and allowed to present their case, not just the verdict, even when no due process interests are implicated.[165] Similarly, judges care not only about the outcome of a case and the rules with which the outcome is derived, but also whether those rules are just.[166] Judges might therefore be less satisfied with a rule that says “X therefore Y” as with one that says “because of interests a, b, and c, X therefore Y,” even though these rules are logically equivalent.[167] Lower-court judges who criticize in their opinions the rule deciding a case prove that judges can apply a rule with which they disagree, but clearly care about the rules they apply.[168]

Society extends beyond the courthouse walls and includes many more people and groups who will never be involved in a lawsuit than those who are,[169] but who may still care about the process and procedures of justice. These include, at a minimum, members of the general public, the media, governmental institutions, corporations, interest groups, and other institutions. Exhaustively describing all such interests is beyond this piece, but even the partial listing that follows illustrates the extent to which society’s observations pierce and peer into the legal system.

Members of the general public may care about the specific outcomes of cases, just as litigants do, and about the rules applied, just as judges do; therefore there is a societal interest in keeping court proceedings public.[170] People also care about justice being applied fairly even when they have no interest in a specific case.[171] People might also be interested in a case for the facts made public in court proceedings.[172] What happens in courtrooms also has secondary effects on society about which the public may care. Because the public does not want the police to torture criminal suspects, it wants to ensure confessions obtained in violation of Miranda v. Arizona are excluded from criminal cases.[173] Members of the public who value not imprisoning defendants for years at a time before and during their trials will be interested to learn about court systems’ creative accounting of time to handle backlogs of criminal cases.[174] Like judges, members of the public care not only about the rules applied in a trial but also about the justification for those rules and the social policies those justifications support. For example, an anti-abortion small-government conservative may feel better about a ruling striking down abortion restrictions that is grounded in preventing government overreach rather than as respecting a pregnant person’s right to make choices about their own body; a pro-choice progressive who believes the government should provide citizens’ healthcare and that abortion rights protect personal autonomy might feel the reverse. Not only the outcomes nor the rules generating them but the reasons for those rules matter to the public. Furthermore, individuals are not the only members of the public whose values are implicated by what happens within the legal system: so do various institutions that process and manage social values, such as political parties, religious organizations, and universities. Just as individuals care about what happens in courtrooms, as a microcosm of what happens in society and a reflection of society’s rules and policies, so too will these institutions.

Interest groups care about the extent to which their interests are advanced or threatened within the legal system. Interest groups include corporate interests concerned with how courtrooms interpret and apply regulations, enforce accountability with legal rules, and interpret and enforce contracts;[175] public-interest groups that engage in impact litigation;[176] labor unions;[177] and professional groups like bar associations. Because these groups protect particular interests, they are invested both in the outcomes of cases that affect them and the rules that govern their behavior. These groups, along with the arbiters and promulgators of social values, are attuned to shifts in legal doctrines that change judicial decision rules. These groups may therefore intervene in cases directly, file amicus curiae briefs, and advance cases designed to change those doctrines.

All these interests rely on the media to identify and disseminate information about what happens inside the judicial system. Aside from the media’s own interests, other members of society rely on having media that observe how the legal system implicates those members’ interests. This interest often extends beyond the courtroom. For instance: when ProPublica revealed racial disparities in bail recommendations made by a software system, public outrage stemmed not only from the disparities, but from the company’s lack of transparency about its technology.[178]

Finally, the government observes all of this and can respond by changing the way the legal system operates and the rules it applies.[179] The government is also directly made up of the people, and members of the legislative or executive branches sometimes leave to join the judiciary,[180] and vice versa.[181] Because the government both observes the legal system—directly and through the interests it serves—and can modify it, this analysis has arrived at a snake eating its own tail, for behavioral equivalence purposes. If any part of the system is of interest to some stakeholder in society, that stakeholder might convince the government to modify the system. Even if the judiciary’s constitutional foundations do not change, it is clear that no part of the judiciary goes unobserved both internally and externally: the rare parts of the legal system that are constitutionally immune from Congressional interference (for example, the right to counsel) are observed internally by the judicial system;[182] the rest are observed and potentially changed by Congress.[183]

Again, no single one of these observers should be attended to, but they all should be: unless there is a reason to ignore some observers’ perspectives,[184] an automated system and the judicial process it replaces do not “work the same way” unless all these observers see them as behaviorally equivalent. Similarly, if the replacement is designed to change the system or change is unavoidable, the effects should be examined from the perspectives of all these observers. The proper “external” observer to detect and evaluate changes to the legal system is therefore the union of all observers within society, including the judiciary. This union of all societal observers—society at large—captures all aspects of the legal system observed by some interest in society.

It is possible to disentangle the observations of each of the observers in society at large. In fact, taking these observers and their observations one at a time is crucial to taking stock of the consequences of making changes to the judicial system and deciding whether those consequences are worth the change.[185] That these observers can be interrogated individually, however, does not suggest that only one matters at a time. As components of a larger society, to say “the academy cares about X” or “the welders’ union cares about X” or “Jane cares about X” is equivalent to saying “some part of society cares about X.” And because the government—representing the people—observes and can change any part of the system protected (and observed) by the judiciary, we should conclude that these collective observations capture everything that occurs within the legal system.[186]

  1. The Omniscient Societal Observer

If the observer monitoring and evaluating the legal system is society-at-large—the collective observation of every interest observing that system—then that agglomerated interest resembles the omniscient observer to whom no replacement can be equivalent to the original process.[187] A judge could offload the task of ruling on the admissibility of evidence onto a computer that perfectly mimicked the judge’s own decisions, but the clerk who was accustomed to participating in that analysis, and the litigator who assumed he could help persuade the judge with a clever word, would not see that system as equivalent. Nor would the judge, who would be applying the computer’s judgment rather than her own, making her a passenger in her own courtroom. It is possible that some areas of the legal system are truly observed by no one in society, but it is not clear what those would be: even what kind of coffee is served in the jury pool waiting room is observed by someone with preferences about it. For a part of the legal system to be truly unobserved, it would have to either be something literally no member of society cares about,[188] or it would have to be managed with no human input or oversight at all or by humans who had no capacity to report on it back to the rest of society.[189]

Again, the point is not that everything in the legal system ought to be preserved as it is, but simply to note that if anything in the legal system changes, some member of society will notice it changed and might have preferences about that which was changed.[190] Having noticed the change, that member of society would not view the changed system as strictly behaviorally equivalent to the original system, and therefore neither will the society at large that includes that member. The danger is not that the judicial system might change, but that the change’s impacts will only be considered from a narrow range of perspectives. The literature at the intersection of legal and AI scholarship[191] has thus far focused on perspectives internal to the judicial system: judges, advocates, litigants, and potential litigants (i.e., people subject to the jurisdiction of laws seeking to understand their rights and obligations). These are important perspectives to consider, arguably the most important ones. When considering the effects of automating some component of the legal system, scholars and policymakers should surely look first to the effects on the direct participants in that system. But the analysis should not end there. A view that defines the current judicial system in terms of its rules, reasoning, and outcomes will ignore the effects of changes on participants and other observers of the system.

We thus arrive at an uncomfortable situation. On the one hand, this Article has described an observer that can analyze whether replacement components to the judicial system are behaviorally equivalent to the original components, and can monitor and correct those changes using the judicial system’s rules surrounding reversal on appeal.[192] But that observer will happily treat as behaviorally equivalent systems that members of society at large will not see as faithful replacements to their predecessors.[193] And if the observer is instead the union of those members of society with an interest in any part of the legal system, there may be no part of the legal system that could be replaced while leaving the overall system equivalent in the eyes of the observer. If we use the first observer behavioral equivalence is achievable but might undermine society’s faith in our legal system; if we use the second, behavioral equivalence is impossible. Because society at large’s concerns about the legal system should not be discarded, we must conclude that developing a system that is truly behaviorally equivalent to the current legal system is impossible.

But that’s fine—just because some changes might be observed and cared about does not mean they are not worth making. Someone might care if the coffee in the jury pool room changed and conclude that the system generating that coffee had changed, but that doesn’t mean that the courthouse must stick with the same brand of coffee. Someone might also notice if racial disparities in charging decisions disappeared and would rightfully laud that change. In both cases, whoever effects the change must decide to do so in light of the preferences of society at large’s interests, and not only rely on courts to catch any undesirable changes. Recognizing that behavioral equivalence is an impossible goal, regardless of whether it is desirable, means that any changes made are guaranteed to have observable consequences which must be reckoned with. It also means that if anyone claims their system perfectly replicates some part of the legal process, they must be ignoring the perspective of some observer in society at large. Instead of asking whether a new system is exactly behaviorally equivalent to the old one, policymakers should evaluate the extent to which it is, carefully consider the differences, and use that analysis to consider the tradeoffs involved in making any such changes.

Evaluating Legal Automation

The judicial system will not remain static, nor should it. Behavioral equivalence does not weigh in favor of or against any change, but can help to detect and evaluate those changes. This Part frames that analysis by focusing on where changes will be observed and what tradeoffs they may implicate. But note that the first assumption with which this Article began— that a computer will someday be able to perfectly mimic any given legal subprocess—has been weakened: if the society-at-large observer detects the difference between any subprocess and its replacement, perfect mimicry is impossible. Instead, let us assume only that the reasoning, participatory processes, and outcomes of every legal subprocess might one day be emulated.[194]

Using the judicial system’s observations is an insufficient lens through which to evaluate behavioral equivalence, given society’s interests in the legal process that go beyond those observations. Nonetheless, that analysis is a useful floor for evaluating changes to the legal system, because at a minimum the legal system must be able to determine whether a case using a new system warrants reversal. It is also important to know if a change leads to markedly different legal outcomes, because that change should only be made if those differences are desirable (or at least acceptable). This Part begins by examining what is involved in detecting differences even in the simple case. It then turns to the tradeoffs policymakers will face in deciding what changes to make when true behavioral equivalence is an impossible goal. It finishes by examining the tradeoffs implicated by other scholars’ systems as described in Section I.A.

Detecting Differences

Simply detecting the change in outcomes may be a challenge. Recall that our formulation for evaluating behavioral equivalence in the eyes of the judicial system asks whether anything that happens under a new system would have led to reversal under the old system, i.e., whether a difference between the old and new system is appealable.[195] Such differences must be detectable for this formulation to have teeth. It would be circular to define differences as detectable based on whether an overall process is appealable, because the appeal depends on having detected the difference. Defining differences as “anything detectable” risks the conclusion that any difference the observer fails to detect does not—and should not—matter.

For processes like determining admissibility of evidence, a replacement component could be run alongside the current one, and have its outputs compared to the current one’s outputs given the same inputs.[196] But processes that operate over humans, not data, cannot happen in parallel because humans are not duplicable. Witnesses cannot be asked the same questions twice in a row (by a human and by a computer) and be expected to respond precisely the same way, because their second response will be influenced by their having been asked twice. It would be difficult to attribute differences in witnesses’ responses to the system rather than to the way it was being tested. Nor could an entire trial be run more than once.[197] For anything not run on the old and new systems concurrently, the result under one system will be unknown and outcomes will not be directly comparable.

A statistical analysis could show whether the system computes outputs within some acceptable range.[198] This would reveal whether a system performed acceptably overall, but not whether its performance was true to the old system in a specific case, and so might not affect whether any given case could be appealed.[199] For processes with clear standards to which the system must conform, behavioral equivalence can be evaluated directly by determining whether the standard has been met.[200] But this describes few parts of the legal process,[201] because standards that depend on a judge exercising judgment and discretion lack such a clear specification. Where a generous standard of review like abuse of discretion applies, the judicial system might treat even outcomes that greatly diverge from those of the old system as behaviorally equivalent to them.

It thus appears that outcomes could change a great deal under a new system and the judiciary’s appeals process would treat them as equivalent to the old system. But the judicial system attends to more than outcomes, so detecting differences can be supported by the roles of explanation and due process in legal decision-making. Any explanation accompanying a decision under a new system can be evaluated against the reasoning employed under the old system; any unexplained decision is observably different from one the old system would have explained. Requiring and evaluating a quality explanation any time one would be expected under the old system could ensure most differences remain detectable. This will not help where decisions are currently not accompanied by explanations,[202] when a policymaker decides they need not be, or when there is a risk that the new system’s explanation may not correspond to its actual reasoning.[203] Regardless, the decisions of new systems designed to mimic the behavior of old ones should be audited,[204] so that policymakers may decide whether any differences detected are acceptable or not.[205] Finally, litigants can be relied on to protect their own participation and due process rights, and therefore will notify an appeals court when a system abridges those rights.

Part II discussed a system introducing harmless errors as a problem when society is the observer, but not when the judicial system is the observer: because harmless errors do not lead to reversal, the legal observer that sees differences based only on reversals will see a system that introduces harmless errors as behaviorally equivalent to the old system. The current judicial system seems to treat harmless errors as constant,[206] such that having a dozen harmless errors is equivalent to having only one.[207] But the possibility that a new system will introduce increasing numbers of such errors should give policymakers pause, and provides an opportunity to reevaluate the assumption that harmless errors accumulate harmlessly.

Simply adopting a new system, one that faithfully mimics the old, may lead the new system to get different inputs than the current one. If the new system is more transparent or predictable it may be more easily influenced to users’ ends than is the current system.[208] People will attempt to manipulate such a legal system to their benefit, just as they already do through contracts requiring binding arbitration using the drafter’s preferred arbitrator,[209] jurisdiction shopping,[210] or by tailoring arguments to judges.[211] Thus even a behaviorally equivalent system might lead to differences in outcomes from the current system, not because it works differently but because people work it differently.[212] These differences might only be detectable through post-hoc analysis across cases to identify shifting trends in the kinds of claims brought and legal arguments made.[213]

In summary: if the behavior of some component of the legal system is governed by a detailed, top-down standard, or if its potential replacement generates as much explanation as the original—and assuming harmless errors are constant—then the component and its replacement just might generate enough information for the legal system to assess whether they are behaviorally equivalent.[214] But behavioral equivalence is only an end in itself if building a completely faithful replacement process is desirable. Perhaps the new system is meant to be different from the original in some way. Or perhaps it is fine to accept that pure behavioral equivalence is impossible, either because the above constraints are missing or because the relevant observer is society-at-large, which observes all changes. Thinking in terms of specific observers can nonetheless help evaluate the tradeoffs those changes introduce.

Considering Tradeoffs

Considering who are the relevant observers and what they observe about a system that may be changed can help assess the ramifications of those changes. This is true when any change to the system is made, not just when the replacement is a computer.[215] Thus, for any given subprocess that is being considered for replacement, the policymaker should consider who the stakeholders are and what outputs of that process they see. For example, even if harmless errors are invisible to the legal system’s appeals process, they may matter to court observers or to lawyers who account for such errors in their strategies. This approach naturally guides policymakers towards perceiving (and weighing) the consequences of the change that matter to each stakeholder.

If every aspect of a process matters to some interested party, then no replacement system, analog or digital, can be behaviorally equivalent to the current system to all observers.[216] But change may be desirable, and not every observation from every observer should matter. It should be fine to switch the coffee in the jury pool room. Recognizing that whatever system replaces the current one cannot, by definition, be equivalent to it (at least, not for everyone) supports conscious, deliberate decision-making about the changes introduced. Knowing that change is unavoidable frees decision-makers to focus not on whether they want change, but on which changes they want and how those changes will be perceived, and helps policymakers focus on what tradeoffs can be countenanced.

The tradeoffs described below fall into four categories: informational access, process, reasoning, and outcome. Informational access tradeoffs concern how much information comes out of a system, and who has access to it. Process tradeoffs change the legal process itself, the steps through which participants in the legal system move as they travel through it. Reasoning tradeoffs change how a system reasons its way to a conclusion. Outcome tradeoffs change legal outcomes.[217]

  1. Informational Access Tradeoffs

Replacing components of the legal system might change what information that system reveals, and to whom. Informational access tradeoffs occur when a change to the system makes it less (or more) transparent than the current system. For example, juries are often referred to as black boxes, but jurors do sometimes speak out after trials or grand jury empanelings.[218] This information can have an impact on the real world, as when in 2020 a grand juror publicly contradicted the Kentucky Attorney General’s claims about the indictment process after the killing of Breonna Taylor,[219] prompting protests and demands for the AG’s resignation. If an algorithm rather than a grand jury assessed whether evidence could support an indictment, there might be no opportunity for that information to be revealed to the public. Similarly, records of trials include a courtroom transcript. If court stenographers were replaced by voice-to-text transcription software, errors in the software might create errors in the transcript, which would lead to incomplete explanations of judges’ rulings, confusion about legal arguments advanced at trial, or even errors about witness testimony.[220] When judges publish opinions, they do so in English meant to be comprehensible at least to judges and lawyers, and possibly others.[221] A computer-generated opinion in a logical form might encode the same information, but would be less comprehensible to as wide an audience. Finally, courtrooms are generally open to the public; any change that moved any part of the legal process out of public courtrooms would reduce the information coming out of that process.

  1. Process Tradeoffs

Though constitutional due process rights set a floor on how much process can be stripped from the legal system,[222] some changes may lead to process tradeoffs, which alter how cases proceed and affect participation in them. For example, pleading a motion orally versus in a written brief changes whether the litigant can react to a judge’s response in real time or clear up a misunderstanding. A process change might also affect who may participate in the legal system. Changes that make the legal process more efficient or cheaper might increase participation, but such benefits might be inequitably distributed. For example, an automated system that reads filings for administrative purposes could be more efficient in general but less accurate than a human at deciphering hand-written filings, negatively impacting pro se litigants.[223]

Process tradeoffs also include the risk that knowledgeable bad actors can take advantage of a system.[224] Someone who can predict what output they will get given some input may carefully calibrate those inputs to guarantee their preferred outcome, which would invite abuse. For example, if someone determined a set of words consistently led an evidentiary admissibility system to invoke the residual exception to the rule against hearsay,[225] that litigant would defeat the hearsay rule in general. And computer systems’ vulnerabilities differ from those of humans. When the process changes, so too might the ways people can participate in and therefore affect it, even when due process rights are not implicated. Unfortunately, automated legal decision-making has often burdened people’s ability to participate in the decision-making process.[226]

  1. Reasoning Tradeoffs

The judicial system reasons its way to conclusions through the thoughts and analyses of judges, juries, and other adjudicators. When a change alters how conclusions are reached, the reasoning tradeoff may implicate procedural justice.[227] Indeed, this has already happened.[228] If avoiding reasoning tradeoffs is desirable, engineers should design replacement systems to mimic the reasoning of the current system as closely as possible. For example, if debate is an important element of jury deliberations, juries might only be replaced by multi-agent systems that can encode disagreement and argument.[229] Some AI systems are designed to perform rule-based or analogy-based reasoning, or to combine the two.[230] And where reasoning matters little compared to accuracy, perhaps machine learning techniques that resist inspection and interpretation are fine for certain aspects of legal reasoning.[231]

One reasoning tradeoff computers might introduce is a certainty tradeoff: a system that tells users how probable it finds an outcome is different from humans seeking to establish whether a threshold has been reached. For example, one study found that judges generally rated a 90% certainty as being “beyond a reasonable doubt,”[232] but there is reason to doubt that judges or jurors use numerically precise standards in reasoning about guilt or liability.[233] If grainy footage shows twenty-five prisoners in a prison yard, with twenty-four beating a guard to death and one trying to stop them, charging a random prisoner from the yard with murder yields a 96% chance of guilt.[234] A computer system that indicated a defendant was 96% likely to be guilty might seem trustworthy, but if its calculation relied only on the defendant’s proximity to an event, it would not be seen as dispensing justice.[235]

  1. Outcome Tradeoffs

Outcome tradeoffs occur when a change in a system leads cases to be decided differently than under the old system.[236] These are the worst tradeoffs to have happen by accident—changes to the legal system should not unexpectedly change legal outcomes—but may be the most desirable ones to achieve on purpose. There are as many situations in which outcome tradeoffs would be the goal of altering the system as there are problems with legal outcomes under the current system: racial disparities in criminal charging and sentencing;[237] bias and disbelief against victims of sexual assault;[238] long delays in criminal process,[239] and many more. Unfortunately, legal automation again has a bad track record in this area.[240] Change is not always bad, but should be made deliberately, especially when making changes to the dispensation of justice.

  1. Evaluating Tradeoffs

Tradeoffs will generally occur across those categories, not within them. For example, a machine learning system designed to make quick, first-cut assessments on whether a lawsuit is well-pleaded may make the legal system more efficient and therefore open to more litigants but generate less comprehensible decisions, thereby improving process at the expense of reasoning and informational access. A system to reduce variations in outcomes across judges might apply rules predictably but rigidly and change judgments relative to the current system, trading outcomes for reasoning.

Policymakers can attend to the effects of replacing components of the legal system, and therefore properly consider the tradeoffs implicated, by first enumerating the specific, immediate changes wrought by the replacement. The areas where those changes occur can then be examined for secondary changes, to capture downstream effects of the primary change. Then, for each change and each relevant interest in society, the analyst asks what that interest will perceive of that change. Each alteration is thus treated as a potential observation, and the analyst asks what observers might make of it. Having a list of interests like that described above in Section II.B will help guide this analysis, but such lists will likely be incomplete, so a mechanism by which interests can identify themselves will be helpful.[241] Although predicting the specific effect of the change for each interest may be difficult, this method can help generate potential tradeoffs implicated by the change.

Reasoning about these tradeoffs is not merely an academic exercise. One controversy over the recidivism prediction software accused of racial bias is that because the software is a trade secret its workings were kept from both the prisoners being denied bail and the judges issuing the denial.[242] Judges jailed defendants pending trial because a computer told them to without telling them why, not because the algorithm was incomprehensibly complex but to protect its intellectual property.[243] The goal was to improve decision-making regarding who receives bail (an outcome tradeoff),[244] but regardless of whether that was successful, it came at the expense of defendants understanding why they were sent to jail (an informational access and reasoning tradeoff). In turn, the system operating out of sight makes it difficult to know whether a negative outcome tradeoff has occurred, i.e., whether the system is perpetuating racial disparities.[245] Unfortunately, that this has already occurred raises concerns about whether lawmakers will consider the tradeoffs involved in modifying the legal system, as this Article urges them to do.

Tradeoffs in Proposed Systems from the Academy

The foregoing discussion helps to assess the potential tradeoffs involved in implementing the systems discussed in Section I.A.[246] What follows is not criticism of the systems or their designers: models cannot capture everything about the processes they are modeling.[247] But in considering whether to implement a model in the real world it is crucial to attend to what the model leaves out, because therein will lie the tradeoffs.

Much of the research in AI & Law focuses on modeling judicial decision-making: how the judge uses the evidence adduced at trial[248] to dispose of a case. One strain of research within this literature has sought to replicate both legal outcomes and the reasoning used to reach them.[249] HYPO[250] and rule-based precedential reasoners[251] implement different theories of precedential reasoning, but all seek to capture the mechanisms of such reasoning to replicate real-world decisions in cases. This design decision means these systems should minimize reasoning and outcome tradeoffs.[252] Because they reason explicitly (step-by-step), depending on their implementation they might minimize informational access tradeoffs if the system can publicize its reasoning to those who currently observe it.

Nonetheless, these systems would involve process tradeoffs. All these systems are designed to work their way to the “best” possible answer; in fact, they rely on the idea that there is a right answer at all. HYPO algorithms involve argumentation and the distinguishing of cases, but the algorithm itself performs these steps, not a lawyer.[253] Professor Horty’s system involves weighing conflicting rules and precedents, but that weighing process happens before the consideration of a case, and leaves no room for argumentation.[254] Professor Verheij’s system not only does not allow argumentation, but requires that cases in a case model not contradict one another.[255] All these systems take in information and reason their way to an answer without external input; all would require changes to allow participants in the judicial system the same process they have now.

These systems could be modified to reduce process tradeoffs. HYPO algorithms evaluate and distinguish precedents; this mechanism could be modified to evaluate arguments and precedents furnished by litigants. Professor Horty’s system considers reasons that some rules outweigh others; the system could additionally consider reasons and rules provided by litigants.[256] But current implementations of these systems primarily model the reasoning a judge performs using facts and precedents, so adopting such systems as they are would reduce the role litigants have in shaping that analysis.

A Deep Learning system could theoretically reproduce a pattern of legal outcomes in a body of law at scale.[257] However, in addition to the process tradeoffs precedential reasoners share, such models also implicate reasoning and informational access issues. Deep Learning systems reason in fundamentally different ways from humans, and their reasoning is often uninspectable and unexplainable.[258] Even when they generate explanations, the explanations may be untrustworthy.[259] Branting et al.’s approach predicts which facts in a case text correspond to legally relevant features; they argue that these features can explain outcomes by pointing to the factors in a case that contributed to it.[260] Identifying these intermediate features could mitigate, but will not eliminate, the reasoning and informational access tradeoffs. Those tradeoffs are intertwined here: because a Deep Learning system’s reasoning is hidden within its network, the reasoning may not track with human reasoning and will not be observable to the outside world. In systems like HYPO every reasoning step is inspectable, but a Deep Learning system is closer to a black box. These approaches offer significant benefits: they are more scalable than other approaches, and Branting and colleagues argue that a system like theirs could help pro se litigants understand and frame their claims.[261] But as a legal decision-maker, such systems implicate process, reasoning, and informational access tradeoffs.[262]

Not all the AI research described above necessarily involves process tradeoffs. Professor Prakken’s formal model of argumentation preserves the role of litigants and adjudicators in litigation.[263] Indeed, formal models of argumentation are motivated in part to preserve due process and the adversarial system.[264] Because Professor Prakken has proposed a formal model, not implemented a system, it is difficult to evaluate what tradeoffs an implementation would bring. For example, the model does not specify how the adjudicator determines when the burden of persuasion has been met.[265] Without a specification for the judge’s decision-making we cannot know what reasoning, outcome, or informational access tradeoffs that replacement judge might bring. The model does not account for fact-finding and witness questioning, which may implicate process and informational access tradeoffs. But the model’s representation of a trial as a dialogue between advocates and managed by an adjudicator, and its separation of these roles, should minimize process tradeoffs.

Professors Gowder and Livermore suggest that machine learning systems might one day be used to eliminate ambiguity in open-textured legal terms, or at least provide a reliable way of resolving ambiguities.[266] Professor Livermore argues that such a system could lead to new kinds of statutes, wherein, for example, an object is a vehicle under a “no vehicles in the park” statute exactly if some specified neural network system says so.[267] Such classifiers could involve process and outcome tradeoffs by eliminating argument and persuasion in close cases: there may be putative vehicles for which both prosecutor and defense can mount a strong case, and which would count as a vehicle for being-in-the-park purposes depending only on which side presents a more finely-crafted argument. A Deep Learning system will eliminate that ambiguity, changing both the outcome and the process to reach it. Professor Gowder limits his proposed system to the elimination of factual ambiguity, to preserve the due process rights of litigants to argue what the law should be to the judge, and to preserve the legitimacy of the system by having the judge continue to determine what the rules should be and what justice entails.[268] He also suggests that litigants might argue about which features of a system go into the machine learning model, which reduces process tradeoffs by maintaining a role for argumentation.[269] Such a system might implicate reasoning tradeoffs if it can tell you that something is a vehicle, but not why. For complex factual determinations it may also implicate informational access tradeoffs: providing an object’s features and a judgment of whether it is a vehicle—the inputs and outputs—may give as much information as the current system; not so when the system must synthesize and transform its inputs into intermediate representations to reach its conclusion. On the other hand, it may provide a beneficial informational access tradeoff, if the public can learn what counts as a vehicle without getting a ticket.

Professor Genesereth has argued that if consumer technology has access to data that could be used to determine whether behavior is unlawful, it should advise its users of that information.[270] Such technology might eventually take on the role of enforcer. If it does, the informational access tradeoffs might benefit users: they would understand exactly why they were suffering an enforcement action. The outcome tradeoffs might benefit society, if additional enforcement of laws is good.[271] If the system uses the same reasoning as would an officer writing a ticket, there is no reasoning tradeoff. But there would be an enormous process tradeoff: gone would be the ability to duck the law, argue one’s way out of a ticket, or get the ticket dismissed by going to court and not having the officer show up to defend it.[272] Having the arm of the law sitting in every consumer’s pocket would dramatically change the process by which laws are enforced.

Professor McGinnis and Steven Wasick argue that cheap computation and large-scale data could enable dynamic rules, rules that change depending on real-world conditions.[273] Such rules will constrain different behaviors under different circumstances, leading to outcome tradeoffs. If dynamic rules closely match the rules and standards they replace (but with shifting criteria for when they apply), they might not implicate reasoning and process tradeoffs. But the authors argue that one advantage of these rules is to “thwart judicial discretion” and apply more mechanistically than standards,[274] suggesting reasoning tradeoffs are desirable. Dynamic rules might also lead to informational access tradeoffs if the rules’ changing standards are not communicated clearly to those whose behavior is governed.

Professor Coglianese and David Lehr argue that administrative agencies can use Machine Learning like any other tool, including for adjudication and rulemaking.[275] They describe several associated risks which fit within this Article’s framework.[276] These include the risks that algorithms might introduce biases into agency decision-making—implicating reasoning and outcome tradeoffs—and that individuals challenging agency decisions will be unable to effectively scrutinize them—an informational access tradeoff.[277] They also raise concerns regarding using quantitative rather than qualitative standards, determining acceptable error rates, emotional and dignitary costs to participants, and privacy concerns,[278] which respectively correspond to reasoning, outcome, process, and informational access tradeoffs.

Finally, Professor Volokh argues that an AI that writes judicial opinions that pass the Turing test should be permitted to serve as a judge.[279] The Turing test is likely the wrong test of intelligence and capability because it depends on humans perceiving an author as human and intelligent, and humans are prone to anthropomorphizing and perceiving intelligence where none exists.[280] Natural language generation systems have already proven themselves better at generating convincing-seeming language than displaying an understanding of the world.[281] Some natural-language generation system may soon be able to produce text that humans—even experts—will be convinced was written by a judge about a case, but I expect that close examination will yield faults in reasoning and fact-finding that might be forgiven if attributed to a human judge, but not to a computer judge.

Professor Volokh suggests the standard should be whether human experts are persuaded by the opinion, not only convinced a human generated it.[282] But just as humans want to believe their interlocutors are human and read intelligence into automatically-generated texts, humans can be persuaded by all sorts of arguments, including those that contain logical fallacies.[283] Professor Volokh allows that there may be risks involved in such a system,[284] but he does not count among them flawed reasoning that nonetheless passes his test. But a system that generates faulty-but-persuasive reasoning would implicate not only reasoning tradeoffs (at least, against a human whose reasoning is not faulty), but outcome tradeoffs too, if faulty reasoning leads to mistaken outcomes. Additionally, the reasoning relied on by the AI system may not be the same as that which it writes on the page.[285] If the AI derives outcome X using the internal rule “A → X,”[286] but in its opinion states that it relied on the rule “B → X,” which rule did it use? If “B → X” is persuasive to the reader, does it matter that the system actually used “A → X” in its reasoning? Assuming that human judges do not disguise the grounds for their decisions, an AI system that did so would involve informational access tradeoffs as well as reasoning tradeoffs. That said, these criticisms mostly concern Professor Volokh’s test of a successful system, not its capabilities. If an opinion-writing system accurately represented its reasoning and only took over opinion-writing, the only informational access tradeoffs involved might be the lost conversations between judge and clerk, and the process tradeoffs may only arise from limiting the judge to trial management and not adjudication.

The foregoing discussion shows that not only will there be tradeoffs involved in automating components of the judicial system, but the consequences will often appear in areas other than those that the replacements target for change. Even if a replacement leaves a system’s internal processes intact, it may change how people interact with that system or the information it transmits. Those considering making changes to the legal system must attend to these unintended consequences.


Behavioral equivalence is a useful concept because it shows that asking whether two systems work the same way is an insufficiently precise inquiry. The behavior of any system can only be assessed through some observer making observations about that system. Two systems may be equivalent from one observer’s perspective and quite different according to others. By focusing attention on observers and their observations, behavioral equivalence transforms the question from “are these systems the same?” to “who is looking at these systems and what do they see?”

Because equivalence is inherently subjective to the observer, the frame of reference policymakers use will determine how difficult it is to create a system that is behaviorally equivalent to the current one. When scholars and policymakers focus on the reasoning and outcomes of judicial decision-making, they implicitly treat the judicial system as the observer, an observer whose observations are confined to that which remains within the judicial system. Under this view, whether some new process is equivalent to the old one can be evaluated by asking whether differences in their behavior would be appealable if done by the old process. This focus on appealable differences would allow the system to tolerate a variety of changes, such as increases in harmless error, by declaring processes de jure equivalent if not de facto so.

But ignoring differences that are invisible to the judicial system is misguided if what matters is not only what the judicial system observes about itself, but also what others see in it. And if all those outside perspectives matter—if society at large is the observer—behavioral equivalence is an impossible goal because every change will be detected by some observer. This realization is liberating, because it allows policymakers to abandon any idea that technology can perfectly mimic some part of the legal process for every observer. It also reminds scholars and policymakers that if they think some new system is behaviorally equivalent to the current one, they are likely leaving out someone’s observations of that system. By assuming that any change made will matter to some interest in society, attention can be focused on what the impacts of a change will be, and who will perceive them. Expanding the frame of reference beyond the legal system and to society at large invites a careful analysis of the consequences of change and helps to consider how proposed changes will affect processes, outcomes, reasoning, and informational access.

Components of the legal system may one day be replaced by computer algorithms, and there may be significant benefits to doing so. But the consequences of making those changes will likely extend beyond whatever specific issues the systems are designed to address. Legal outcomes are not the only outputs of the legal system, and the other outputs matter to interests beyond the system’s participants. Scholars and policymakers must attend to these secondary changes—and to the observers who will perceive them—in order to consider properly the full consequences of any change. This is true regardless of whether the replacement is by a computer algorithm or simply a different system run by humans. Change is not inherently good or bad, but a change that advances some interest at the expense of another should be made deliberately and with careful consideration of that tradeoff.

  1. * Law and Science Fellow, Northwestern University Pritzker School of Law and McCormick School of Engineering. Many thanks to Ronald Allen, Shari Seidman Diamond, Christos Dimoulas, Paul Gowder, Sarah Lawsky, and Deborah Tuerkheimer for helping me develop these ideas and providing valuable feedback on previous drafts. Thanks to Dan Linna, Jason Hartline, Maria Amparo Grau Ruiz, Chenhao Zhang, and the Science of Law and Computation Reading Group at Northwestern for the discussions that germinated this piece. And a special thank you to all the fine editors on the South Carolina Law Review for their hard work and patient diligence editing this piece. All errors and stylistic idiosyncrasies are my own.
  2. . Discussed infra Section I.A.1.
  3. . Discussed infra Section I.A.1.
  4. . See infra note 132.
  5. . See Julia Angwin et al., Machine Bias, ProPublica (May 23, 2016), [] (describing a widely used machine-learning based risk assessment tool). Although their use is on the rise, most risk assessment systems do not use Machine Learning. See Megan Stevenson, Assessing Risk Assessment in Action, 103 Minn. L. Rev. 303, 316 (2018).
  6. . See infra Section I.A.2.
  7. . See infra Section I.B.
  8. . See infra Section I.B.
  9. . See infra Section I.C.
  10. . See infra Section II.B.
  11. . Indeed, these efforts are near constant. In late 2018, President Trump signed a piece of criminal justice reform legislation. See Pub. L. No. 115-391, 132 Stat. 5194 (codified at 18 U.S.C. §§ 3631–3635, 4050, 4322). President Biden—who shepherded the largest criminal justice reform legislation through Congress in the 90s, see Pub. L. No. 103-322, 108 Stat. 1796—campaigned in part on criminal justice reform. The Biden Plan for Strengthening America’s Commitment to Justice, []. While at the time of this Article’s publication Congress’s latest attempt to pass a criminal justice reform bill appears to have stalled, history suggests it will try again before too long, and efforts are ongoing in several states.
  12. . See Melissa Hamilton, McSentencing: Mass Federal Sentencing and the Law of Unintended Consequences, 35 Cardozo L. Rev. 2199, 2206 (2014).
  13. . See Ronald F. Wright, Trial Distortion and the End of Innocence in Federal Criminal Justice, 154 U. Pa. L. Rev. 79, 130–31 (2005).
  14. . See Angwin et al., supra note 4. Such systems have been accused of exacerbating racial disparities in the criminal justice system. See id. (describing biases in one recidivism predictor, COMPAS, including that it made more errors rating as high risk Black defendants who did not go on to reoffend than it did White defendants and more errors rating as low risk White defendants who did go on to reoffend than it did Black defendants). ProPublica’s conclusions have been criticized, reflecting the difficulty in measuring an algorithm’s fairness. See generally Deborah Hellman, Measuring Algorithmic Fairness, 106 Va. L. Rev. 811 (2020) (analyzing the difficulties of determining whether an algorithm is fair, including the determination of what fairness requires).
  15. . Catherine M. Sharkey, Unintended Consequences of Medical Malpractice Damages Caps, 80 N.Y.U. L. Rev. 391, 394 (2005).
  16. . Id. at 419–22.
  17. . Id. at 425–28.
  18. . Wright, supra note 12, at 130–31.
  19. . See Angwin et al., supra note 4 (describing how judges may use the scores when deciding how punitive a sentence to hand down, although the scores are not meant to calibrate how punitive a sentence to assign).
  20. . See Sharkey, supra note 14, at 429–31.
  21. . Note that this Article generally uses “the judicial system” to refer to the judiciary and its processes and “the legal system” to refer to the broader set of legal actors, social institutions, and processes within which the judicial system operates. For example, as this Article uses the terms, the judicial system does not include police, jails, or bar associations, but the legal system does.
  22. . An “observer” need not be one or more humans, but anything capable of perceiving the outputs of some system. For example, both a human driver and her automatic windshield wipers observe the process of the windshield getting wet, though only the human can differentiate between rainfall and a garden hose. Essentially, an observer of a system captures information emanating from that system. See Matthias Felleisen et al., Semantics Engineering with PLT Redex 59 (2009).
  23. . Again, this Article uses “the judicial system” to refer to the judiciary and its processes and “the legal system” to refer to the broader legal context, processes, and actors within which the judicial system works.
  24. . Usually two subsystems within a larger system.
  25. . Ian Horswill, What Is Computation?, XRDS: Crossroads, ACM Mag. for Students, Spring 2012, at 8, 10 (“[T]he principle of behavioral equivalence [holds that] if a person or system reliably produces the right answer, they can be considered to have solved the problem regardless of what procedure or representation(s) they used.”).
  26. . See infra notes 43–45 and accompanying text.
  27. . Just as AI researchers studying law form the AI & Law community, this Article calls legal scholars studying AI the Law & AI community.
  28. . See infra Section II.B.
  29. . See Kevin D. Ashley, Toward a Computational Theory of Arguing with Precedents: Accom[m]odating Multiple Interpretations of Cases, 2 Int’l Conf. on Artificial Intelligence & L. 93 (1989); Edwina Rissland et al., BankXX: Supporting Legal Arguments Through Heuristic Retrieval, 4 Artificial Intelligence & L. 1, 3–6 (1995); Kevin D. Ashley, Designing Electronic Casebooks that Talk Back: The CATO Program, 40 Jurimetrics 275 (2000); Stefanie Bruninghaus & Kevin D. Ashley, Progress in Textual Case-Based Reasoning: Predicting the Outcome of Legal Cases from Text, 21 Int’l Conf. on Artificial Intelligence & L. 1577, 1577–78.
  30. . See generally Edwina L. Rissland et al., Case-Based Reasoning and Law, 20 Knowledge Eng’g Rev. 293 (2005). These systems have most commonly (though not exclusively) been used in the domain of trade-secrets law. Id.
  31. . Some HYPO algorithms use dimensions instead of factors. Id. Dimensions are essentially weighted factors that encode not only what factors are present in a case, but the extent to which they are. Id.
  32. . See Vincent Aleven & Kevin D. Ashley, Evaluating a Learning Environment for Case-Based Argumentation Skills, 6 Int’l Conf. on Artificial Intelligence & L. 170, 171 (1997). Similarity is based on the number of shared factors between cases.
  33. . John F. Horty, Rules and Reasons in the Theory of Precedent, 17 Legal Theory 1, 7–17 (2011).
  34. . Bart Verheij, Formalizing Arguments, Rules and Cases, 16 Int’l Conf. on Artificial Intelligence & L. 199, 200 (2017).
  35. . Id. at 199.
  36. . My own AI research involves synthesizing precedent cases to extract legal principles that can be applied to resolve future cases. Joseph Blass, Analogical Reasoning, Generalization, and Rule Learning for Legal Reasoning about Common Law Torts (2022) (unpublished manuscript) (on file with author).
  37. . See, e.g., Alison Chorley & Trevor Bench-Capon, AGATHA: Automated Construction of Case Law Theories Through Heuristic Search, 10 Int’l Conf. on Artificial Intelligence & L. 45 (2005); Floris Bex et al., What Makes a Story Plausible? The Need for Precedents, in 235 Frontiers in Artificial Intelligence and Applications 23 (Katie M. Atkinson ed., 2011).
  38. . See, e.g., David B. Skalak & Edwina L. Rissland, Arguments and Cases: An Inevitable Intertwining, 1 Int’l Conf. on Artificial Intelligence & L. 3 (1992); Adam Wyner & Trevor Bench-Capon, Argument Schemes for Legal Case-Based Reasoning, 20 JURIX Int’l Conf. on Legal Knowledge & Info. Sys. 139 (2007).
  39. . See, e.g., Henry Prakken & Giovanni Sartor, Law and Logic: A Review from an Argumentation Perspective, 227 Artificial Intelligence 214, 214 (2015).
  40. . Henry Prakken, A Formal Model of Adjudication Dialogues, 16 Int’l Conf. on Artificial Intelligence & L. 305, 305 (2008).
  41. . See, e.g., Ilias Chalkidis et al., Neural Legal Judgment Prediction in English, 57 Ass’n for Computational Linguistics 4317, 4318–21 (2019); L. Karl Branting et al., Scalable and Explainable Legal Prediction, 29 Int’l Conf. on Artificial Intelligence & L. 213, 224–30 (2021).
  42. . Branting et al., supra note 40, at 221–24.
  43. . Id. at 223. Machine Learning has also led to advances in legal case retrieval and annotation, which can help lawyers and pro se litigants retrieve relevant case law and identify passages in those cases that contain legal knowledge useful to them. See, e.g., Huihui Xu et al., Using Argument Mining for Legal Text Summarization, in 334 Frontiers in Artificial Intelligence and Applications 184, 184–85 (Serena Villata et al. eds., 2020). Case retrieval and annotation is explicitly a human-augmentation task. Id. at 191.
  44. . See, e.g., Branting et al., supra note 40, at 229–30; Rissland et al., supra note 29,
    at 1.
  45. . One legal scholar argues that AI precedential reasoning systems can never fully model legal reasoning because studying only resolved cases cannot resolve all future cases and because the legal system is irreducibly complex. See Ronald J. Allen, Taming Complexity: Rationality, The Law of Evidence, and the Nature of the Legal System, 12 Law, Probability & Risk 99, 107 (2013).
  46. . See, e.g., Ronald J. Allen, Artificial Intelligence and the Evidentiary Process: The Challenges of Formalism and Computation, 9 Int’l Conf. on Artificial Intelligence & L. 99, 101 (2001) (“[AI & Law research] has the potential of helping those of us in the law to better understand what terms such as ‘discretion’, ‘judgment’ and ‘precedent’ may mean . . . .”); see also Ashley, supra note 28, at 275 (describing CATO, an AI system designed to help first-year law students learn to perform precedential reasoning).
  47. . See infra Section II.B.2.
  48. . Paul Gowder, Is Legal Cognition Computational? (When Will DeepVehicle Replace Judge Hercules?), in Computational Legal Studies 215, 220–21 (Ryan Whalen ed., 2020).
  49. . Id. at 223; see also the discussion on procedural justice infra Section II.B.
  50. . Michael A. Livermore, Rule by Rules, in Computational Legal Studies, supra note 47, at 238, 250–52.
  51. . Id.
  52. . Michael Genesereth, Computational Law: The Cop in the Backseat (2015), [].
  53. . John O. McGinnis & Steven Wasick, Law’s Algorithm, 66 Fla. L. Rev. 991, 1040 (2015).
  54. . Cary Coglianese & David Lehr, Regulating by Robot: Administrative Decision Making in the Machine-Learning Era, 105 Geo. L.J. 1147, 1154 (2017).
  55. . Eugene Volokh, Chief Justice Robots, 68 Duke L.J. 1135, 1138 (2019).
  56. . Scholars have also described technological approaches to existing legal processes that would give humans tools to better do their jobs, not eliminate them. For example, “smart contracts” are contracts written in formal logic that can be executed and enforced without human intervention. See, e.g., Lauren Henry Scholz, Algorithmic Contracts, 20 Stan. Tech. L. Rev. 128, 134–36 (2017). And researchers have created a programming language to help create and analyze property conveyances. Shrutarshi Basu et al., Property Conveyances as a Programming Language, 2019 ACM SIGPLAN Int’l Symp. on New Ideas, New Paradigms, & Reflections on Programming & Software 128, 129.
  57. . See supra Section I.A.1.
  58. . Professor Coglianese and David Lehr’s proposal regarding administrative decision-making is an exception. See supra note 53 and accompanying text. Nonetheless, their argument in favor of automated adjudication is analogous to Professor Genesereth’s and Professor Volokh’s. See supra notes 51, 54 and accompanying text.
  59. . Some of these scholars are proponents of automating legal processes, while others take no position on the value of legal automation and simply explore what it entails. For views critical of automated legal decision-making but arguing that AI can support human legal decision-making, see Frank Pasquale, A Rule of Persons, Not Machines: The Limits of Legal Automation, 87 Geo. Wash. L. Rev. 1, 46–48 (2019) (arguing that law, as a social institution, is inherently not automatable and that AI should augment, not replace, human legal work); Ryan Calo & Danielle K. Citron, The Automated Administrative State: A Crisis of Legitimacy, 70 Emory L.J. 797, 837–40 (2021) (arguing that automating administrative agencies’ decision-making undermines the justifications for agency deference and therefore their legitimacy but that AI systems can improve access to, and the quality of, traditional administrative action); Harry Surden, Machine Learning and Law, 89 Wash. L. Rev. 87, 102–14 (2014) (arguing that Machine Learning systems cannot reliably predict legal outcomes but could support lawyers’ work in a variety of ways).

    Though this Article limits its analysis to scholarship proposing an active role for AI in automating the legal system and legal decision-making, a wealth of scholarship has examined the risks to, and potential impacts on, the law of AI and automation. See, e.g., Danielle K. Citron, Technological Due Process, 85 Wash. U. L. Rev. 1249, 1278 (2008); Robert Brauneis & Ellen P. Goodman, Algorithmic Transparency for the Smart City, 20 Yale J.L. & Tech. 103, 116–18 (2018); Pauline T. Kim, Data-Driven Discrimination at Work, 58 Wm. & Mary L. Rev. 857, 883–92 (2017); Solon Barocas & Andrew D. Selbst, Big Data’s Disparate Impact, 104 Calif. L. Rev. 671, 671 (2016); Kate Crawford & Jason Schultz, Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harms, 55 B.C. L. Rev. 93, 106–09 (2014); Joshua A. Kroll et al., Accountable Algorithms, 165 U. Pa. L. Rev. 633, 633 (2017); Mireille Hildebrandt, Law as Information in the Era of Data-Driven Agency, 79 Mod. L. Rev. 1, 24–25 (2016).

  60. . See infra Section I.C.1.
  61. . Neural networks have been proven to be theoretically universal function approximators, meaning a sufficiently large network initialized with the right start conditions and trained on the right dataset could learn any function that generated the dataset. See, e.g., George Cybenko, Approximation by Superpositions of a Sigmoidal Function, 2 Mathematics Control, Signals & Sys. 303, 312 (1989); Franco Scarselli & Ah Chung Tsoi, Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results, 11 Neural Networks 15, 19 (1998). That said, it is possible that the legal system is too complex to accurately simulate. See Allen, supra note 44, at 108.
  62. . See infra Section I.C.2 for a discussion on what it means to compute “the exact same” output when outputs are nondeterministic.
  63. . See Richard H. Fallon, Jr., Legitimacy and the Constitution, 118 Harv. L. Rev. 1787, 1828–29 (2005).
  64. . But see Calo & Citron, supra note 58.
  65. . Court systems could likely never be replaced across the board, barring a constitutional amendment getting rid of judges, see U.S. Const. art. III, §1, and jury trials, see U.S. Const. amends. VI, VII.
  66. . See, e.g., Steps in a Trial, Am. Bar Ass’n (Sept. 9, 2019), [] (describing procedural steps in a trial).
  67. . Angwin et al., supra note 4 and discussion supra note 13.
  68. . See supra Section I.A.
  69. . See, e.g., John C. Mitchell, On Abstraction and the Expressive Power of Programming Languages, 21 Sci. Comput. Programming 141, 144 (1993); Felleisen et al., supra note 21, at 58. Behavioral equivalence is also a concept within systems engineering, which applies, for example, when considering whether to replace one component of an assembly line with another. Jan Willem Polderman & Jan C. Willems, Introduction to Mathematical Systems Theory: A Behavioral Approach 225–26. (1998). In PL, behavioral equivalence is often referred to as “observational equivalence.” See infra note 72 and accompanying text.
  70. . See James Hiram Morris, Jr., Lambda-Calculus Models of Programming Languages 6–7 (Dec. 13, 1968) (Ph.D. dissertation, Massachusetts Institute of Technology), []. There are many different programming languages that differ in user-friendliness and can express different kinds of computation more or less simply. The multitude of languages also sustains itself, given the difficulty and cost involved in converting a system to a new language.
  71. . Felleisen et al., supra note 21, at 58–60.
  72. . Defining behavior in PL is tricky if programs are nondeterministic or when asynchronous programs running concurrently depend on each other’s state or outputs. Imagine program A’s operation will change depending on program B, which currently has no output. It might be impossible to tell whether program B will never compute an output or has just not computed one yet. That is, it might be impossible to tell what program B’s behavior is. But program A can proceed based on its observation of B’s behavior, i.e., that it has no output. See Matthew Hennessy & Robin Milner, On Observing Nondeterminism and Concurrency, in 85 Automata, Languages & Programming 299, 300 (J.W. de Bakker & J. van Leeuwen eds., 1980). See infra Section I.C.2.
  73. . Felleisen et al., supra note 21, at 58–59. Focusing on whether observations of behavior are equivalent, rather than behavior itself, is called “observational equivalence” and is subtly different from behavioral equivalence. See Morris, supra note 69, at 49–54. Behavioral equivalence is the ideal equivalence relation one would want to assess, whereas observational equivalence is often the closest equivalence that can actually be assessed. Felleisen et al., supra note 21, at 59–61. In deterministic systems, observational equivalence and behavioral equivalence can be the same. See Hennessy & Milner, supra note 71, at 300. This Article uses “behavioral equivalence” for consistency.

    PL researchers have described other equivalence relations. See generally Cédric Fournet & Georges Gonthier, The Join Calculus: A Language for Distributed Mobile Programming, in Applied Semantics: International Summer School, APPSEM 2000, at 268, 288–313 (2000). One useful equivalence is contextual equivalence, which asks whether two different subprograms behave the same way when plugged into a larger program. See Morris, supra note 69, at 17–18. An analogous inquiry in the law might ask how the larger legal system assesses similarity of legal decision-making processes that are embedded within that system. See infra Section II.A.

  74. . See Mitchell, supra note 68, at 147 (describing how behavioral equivalence in fact requires abstraction to strip away irrelevant implementation-specific details).
  75. . In PL this is referred to as an abstraction context. Id.
  76. . A replacement system might be designed to improve on, not mimic, the current system. Behavioral equivalence still provides a framework for analyzing the changes in terms of observers and their observations. See infra Part IV.
  77. . Fed. R. Evid. 104(a).
  78. . See Fed. Jud. Ctr., Judicial Writing Manual: A Pocket Guide for Judges 10 (Alvin B. Rubin et al. eds., 2d ed. 2013) (“In the writing process itself, judges use their law clerks in different ways. . . . Some assign the writing of the first draft to a law clerk in routine cases only; others have clerks write first drafts in even the most complex cases, having found that working from a draft makes the task of writing the opinion easier.”).
  79. . In this example, the clerk and the judge are more than simply legally equivalent because the judge verifies that the clerk arrives at the same conclusion the judge would have reached, and for the same reasons. However, even if the clerk persuades a judge who might have disagreed, the two processes are still legally equivalent.
  80. . See infra notes 88–92 and accompanying text.
  81. . Some Supreme Court clerks have claimed that Justices sometimes published clerks’ drafts without revision. See David J. Garrow, “The Lowest Form of Animal Life”?: Supreme Court Clerks and Supreme Court History, 84 Cornell L. Rev. 855, 865 n.72 (1999) (describing Justice Murphy’s dependence on his law clerks to draft his opinions).
  82. . See generally Kevin D. Ashley, Ontological Requirements for Analogical, Teleological, and Hypothetical Legal Reasoning, 12 Int’l Conf. on Artificial Intelligence & L. 1 (2009).
  83. . See Branting et al., supra note 40, at 217. For a primer on Deep Learning systems aimed at legal readers, see Joseph Blass, Algorithmic Advertising Discrimination, 114 Nw. U. L. Rev. 415, 426–36 (2019).
  84. . Generating such data would require an accurate simulation of how the rules of evidence are supposed to work. See, e.g., Gül Varol et al., Learning from Synthetic Humans, 2017 IEEE Conf. on Comput. Vision and Pattern Recognition 4627.
  85. . Lyle Denniston, Commentary: The Court’s Caseload, SCOTUSblog (Oct.
    21, 2005, 5:13 PM), [].
  86. . See id. Because memos are prepared for an audience of many Justices, they are more likely to recommend rejecting petitions that will not appeal to everyone rather than recommend that an individual Justice accept a petition in an area of personal interest to the Justice.
  87. . The statute establishing Supreme Court clerkships dates to 1919, but the practice predates that. See Perry Dane, Law Clerks: A Jurisprudential Lens, 88 Geo. Wash. L. Rev. Arguendo 54, 54 & n.1 (2020).
  88. . See Fed. Jud. Center, supra note 77, at 10. The practice is not uncontroversial. See Dane, supra note 86, at 58–60. Judge Posner credited his retirement in part to being unhappy about the power of staff attorneys over cases involving pro se litigants. See Richard A. Posner, Reforming the Federal Judiciary: My Former Court Needs to Overhaul Its Staff Attorney Program and Begin Televising Its Oral Arguments 148–49 (2017). He objects that judges on the Seventh Circuit considering such cases first see a draft opinion from the staff attorney’s office on how to resolve the case, rather than a bench memo exploring all relevant legal areas, and “tend to rubber stamp” that draft opinion. Posner concludes that the outcomes of pro se cases are thus generally decided by staff attorneys, not judges. Id. at 6. Judge Posner’s interpretation of the staff attorneys’ power has been disputed. See, e.g., Zoran Tasić, Reforming Richard Posner: The Former Federal Judge Needs to Overhaul His Assessment of the Seventh Circuit’s Staff Attorney Program and Correct the Errors in His Book 4–10
    (Oct. 4, 2017) (unpublished manuscript),
    KMHMEBvcDMTNXOHR6dzJlQ1U []. Regardless of Judge Posner’s specific critique, the longstanding acceptance of both the cert pool and clerks drafting opinions means neither practice is likely to be found illegitimate or illegal and demonstrates that replacing subprocesses of the legal system for the sake of efficiency is already tolerated.
  89. . See Dane, supra note 86, at 80–81 (attributing jurisprudential power to the name attached to a decision, regardless of the drafter).
  90. . See Fed. Jud. Center, supra note 77, at 10.
  91. . This is only a thought experiment; I have found no evidence it has ever happened.
  92. . Side effects are discussed infra Section II.B.
  93. . The Federal Judicial Center says, “Judges should not simply edit draft opinions. No matter how capable the clerk, the opinion must always be the judge’s work.” Fed. Jud. Center, supra note 77, at 11. Analogously, Rule 26 of the Federal Rules of Civil Procedure mandates that an expert witness’s report must be drafted by the witness. Fed. R. Civ. P. 26(a)(2)(b). A report drafted by the attorney is excludable even if the expert reviews, agrees with, and signs it. Numatics, Inc. v. Balluff, Inc., 66 F. Supp. 3d 934, 942–43 (E.D. Mich. 2014).
  94. . An observer can be a system instead of a human. See supra note 21.
  95. . See Marco Patrignani et al., Formal Approaches to Secure Compilation: A Survey of Fully Abstract Compilation and Related Work, 51 ACM Computing Survs. 1, at 9–11 (2019).
  96. . See Morris, supra note 69, at 49; see also Mitchell, supra note 68, at 147.
  97. . Of course, when a system’s behavior is clearly specified, the inquiry may be straightforward.
  98. . See Mitchell, supra note 68, at 154–55. In Mitchell’s PL example, he notes that if the label of a program variable matters to the larger system, then the exact same program written using different variable names looks like a different program to the larger system. Id.
  99. . The omniscient observer and the union of all possible observers are theoretical and may not exist in real-world cases. Nonetheless, though nothing we know of observes quarks, the implications of an observer that does can be explored.
  100. . Nondeterminism does not necessarily mean probabilistic.
  101. . The behavior of nondeterministic systems may be difficult to fully specify, so they will only be observationally equivalent. This Article uses the term behavioral equivalence for simplicity. See supra notes 71–72 and accompanying text.
  102. . Nondeterministic programs, such as asynchronous concurrent programs, can be assessed for behavioral equivalence by asking whether they share a starting state, possible intermediate states, and transition rules to govern state change. See Hennessy & Milner, supra note 71, at 301–02. The states passed through may change depending on factors external to the program, such as whether some other program terminates. The programs are still observationally equivalent if the same circumstances in the same state would lead to the same transition, even if different transition rules fire at run-time, leading to different intermediate states passed through and different outcomes. See id.
  103. . For a method to assess behavioral equivalence in a probabilistic nondeterministic setting, see Mitchell Wand et al., Contextual Equivalence for a Probabilistic Language with Continuous Random Variables and Recursion, 2 ACM on Programming Languages 1 (2018); Jean Goubault-Larrecq et al., A Probabilistic Applied Pi-Calculus, 5 Asian Conf. on Programming Languages & Sys. 175, 182–84 (2007). For a discussion of probabilistic behavioral equivalence, see Patrignani et al., supra note 94, at 36–37.
  104. . See Shai Danziger et al., Extraneous Factors in Judicial Decisions, 108 Proc. Nat’l Acad. Sci. 6889, 6892 (2011) (describing a pattern of sentencing in Israeli judges throughout the day).
  105. . James M. Anderson et. al., Measuring Interjudge Sentencing Disparity: Before and After the Federal Sentencing Guidelines, 42 J.L. & Econ. 271, 301–02 (1999); Ryan W. Scott, Inter-Judge Sentencing Disparity After Booker: A First Look, 63 Stan. L. Rev. 1, 41 (2010).
  106. . That is, a party could not appeal a loss simply because of which judge was assigned to the case, unless the judge was biased and refused to recuse herself. See 28 U.S.C. § 455(b)(1).
  107. . See generally Lynn M. LoPucki & Walter O. Weyrauch, A Theory of Legal Strategy, 49 Duke L.J. 1405, 1463–68 (2000) (discussing different methods of forum shopping). See also infra notes 111–14 and accompanying text.
  108. . See supra notes 101–02 and accompanying text; see also Roberto Segala & Nancy Lynch, Probabilistic Simulations for Probabilistic Processes, 2 Nordic J. Comput. 250, 263 (1995); Paris C. Kanellakis & Scott A. Smolka, CCS Expressions, Finite State Processes, and Three Problems of Equivalence, 86 Info. & Computation 43, 49 (1990).
  109. . This downside does not suggest that behavioral equivalence is the wrong method by which to detect changes in a system. Indeed, because equivalence is always relative to the observer assessing the equivalence, the blind spot problem is inescapable.
  110. . This example is taken from Patrignani et al., supra note 94, at 5.
  111. . Id. at 11.
  112. . See Sir Arthur Conan Doyle, The Red-Headed League: The Adventures of Sherlock Holmes (1892) (involving exactly such a scheme; the story apparently inspired a real-life heist in 1971).
  113. . Part II infra discusses who “the relevant observer” is.
  114. . See LoPucki & Weyrauch, supra note 106, at 1463–68 (discussing various forum-shopping methods).
  115. . See Richard A. Posner, The Federal Judiciary: Strengths and Weaknesses 226 (2017) (“[A] court’s announcing in advance . . . who the members of the panel will be that will hear a particular case . . . is likely to cause the lawyers in the case to focus on the particular leanings of panel members.”). Many Supreme Court watchers commented on how litigators targeted arguments towards what they believed Justice Kennedy, then the Court’s swing vote, would find persuasive. See, e.g., Ilya Shapiro, Justice Kennedy: The Once and Future Swing Vote, CATO (Nov. 13, 2016), []; Adam Liptak, In Health Case, Appeals to a Justice’s Idea of Liberty, N.Y. Times (Mar. 30, 2012), []. And textualist arguments, previously unusual in LGBTQ+ plaintiffs’ arguments, helped persuade Justice Gorsuch and Chief Justice Roberts to find that Title VII’s prohibition against sex discrimination protected gay and transgender employees. See Dale Carpenter, Textualism’s Redeemers, Volokh Conspiracy
    (June 15, 2020, 9:30 PM), [].
  116. . Dane Thorley, Randomness Pre-Considered: Recognizing and Accounting for “De-Randomizing” Events When Utilizing Random Judicial Assignment, 17 J. Empirical Legal Stud. 342, 366 (2020) (“[L]awyers may have certain perceptions . . . regarding how punitive or difficult certain judges are and consequentially advise their clients to take a plea deal more often under some judges than others.”).
  117. . See Marc Galanter, Why the “Haves” Come Out Ahead: Speculations on the Limits of Legal Change, 9 Law & Soc’y Rev. 95, 97–104 (1974); see also Donald R. Songer et al., Do the “Haves” Come Out Ahead over Time? Applying Galanter’s Framework to Decisions of the U.S. Courts of Appeals, 1925-1988, 33 Law & Soc’y Rev. 811, 813–14 (1999) (reviewing empirical research showing that more resources lead to better judicial outcomes: more experienced lawyers get better outcomes, and so charge more for their services; richer litigants pay more for legal services, hiring better lawyers).
  118. . Part I supra focuses on computational replacements, but the principles it describes are not limited to these.
  119. . See supra Section I.A.1.
  120. . See supra Section I.A.2.
  121. . See supra Section I.B.
  122. . Recall that an observer that cannot transmit information back into a system is useless for the purposes of that system. See supra notes 93–95 and accompanying text.
  123. . The facts of a case, rules governing them, procedures applied to them, reasoning disposing of them, and their outcomes are not all the outputs of the judicial system; others are discussed infra Section II.B.
  124. . This formulation describes an equivalence relationship, not a causal one: saying that a difference between two processes cannot be the basis of an appeal is the same as saying the two are behaviorally equivalent. It is not the case that two processes being behaviorally equivalent causes them to be unappealable, nor the reverse.
  125. . Fed. R. Evid. 104(a).
  126. . See Blass, supra note 82, at 428.
  127. . Compare Katz v. Regents of the Univ. of Cal., 229 F.3d 831, 835 (9th Cir. 2000) (holding that showing statistical evidence of disparate impact will support a claim for employment discrimination), with McCleskey v. Kemp, 481 U.S. 279, 294–96 (1987) (holding that statistical evidence of racial bias in death penalty convictions cannot support an inference of racial bias in an individual case).
  128. . This risk, discussed infra Sections II.B. and III.A., might warrant reevaluating the McCleskey rule.
  129. . Peremptory challenges that strike jurors without an explanation are protected by law. 28 U.S.C. § 1870; Fed. R. Crim. P. 24(b). See also Douglas Blake Dykes, Articulation of Non-Race Based Reasons for Peremptory Challenges After Batson v. Kentucky, 17 Am. J. Trial Advoc. 245, 251–54 (1993).
  130. . For example, striking a juror based on their race is not permitted. Batson v. Kentucky, 476 U.S. 79, 87 (1986); see also Dykes, supra note 128, at 255–56.
  131. . The question is whether the judicial system would see the changed jury composition as the basis of an appeal; if not, the new jury is equivalent to the old one. Jury composition might matter to observers beyond the judicial system. See infra Section II.B.
  132. . There is no legal difference between a person hiring lawyer A or lawyer B, if both are qualified, admitted to the bar, and present no conflicts of interest or disability that prevent them from discharging their responsibilities to their client.
  133. . This idea is less far-fetched than it might seem. AI & Law researchers have developed systems to generate legal arguments. See Vincent Aleven & Kevin D. Ashley, supra note 30, at 170; Henri Prakken et al., A Formalization of Argumentation Schemes for Legal Case-Based Reasoning in ASPIC+, 25 J. Logic & Computation 1141, 1141 (2013). Companies such as DoNotPay automatically generate legal documents used in court cases. DoNotPay, []. And an AI system trained on Multistate Bar Exam multiple choice questions for an undergraduate class project got nearly 40% of questions correct (60% is a passing score). Lucia Zheng, Improving Language Model Performance on the United States Uniform Bar Exam 5–6 (2020), [].
  134. . See Angwin et al., supra note 4.
  135. . There is little rigorous empirical analysis of differences in outcomes between arbitration and litigation. See David S. Schwartz, Mandatory Arbitration and Fairness, 84 Notre Dame L. Rev. 1247, 1284 (2009). But see Jean R. Sternlight, Panacea or Corporate Tool?: Debunking the Supreme Court’s Preference for Binding Arbitration, 74 Wash. U. L.Q. 637, 680 (1996) (arguing why binding arbitration is not necessarily better for all parties involved, especially when a smaller actor is going up against a larger one). That parties can distinguish arbitration from litigation where the judicial system does not goes to the critique of this formulation. See infra Section II.B.
  136. . See Cong. Rsch. Serv., R44797, The Federal Government’s Authority to Impose Conditions on Grant Funds 4–7 (2017).
  137. . Kal Raustiala, Compliance & Effectiveness in International Regulatory Cooperation, 32 Case W. Res. J. Int’l L. 387, 391 (2000) (“In the international context, compliance is often specified as ‘an actor’s behavior that conforms to a treaty’s explicit rules.’”).
  138. . Fed. R. Civ. P. 50(a)(1), (e). 
  139. . A directed verdict requires not only knowing what facts came out at trial, but whether those facts are a “legally sufficient evidentiary basis to find for the party.” Fed. R. Civ. P. 50(a)(1). 
  140. . Knowing that a set of facts is sufficient to justify an outcome given a set of rules does not guarantee that those rules were in fact used to derive that outcome. Judges might decide on an outcome for whatever reasons they please and then, in their opinion, apply rules to facts to justify the outcome. This is already a known risk in certain data-driven AI systems. See Dylan Slack et al., Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods, 2020 AAAI/ACM Conf. on AI, Ethics, and Soc’y 180, 180 (“[E]xtremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques . . . into generating innocuous explanations which do not reflect the underlying biases.”). 
  141. . See Mathews v. Eldridge, 424 U.S. 319, 334 (1976); Cleveland Bd. of Educ. v. Loudermill, 470 U.S. 532, 544 (1985). Agencies have a poor track record of doing so. See Calo & Citron, supra note 58, at 819. 
  142. . See U.S. Const. amend. VI.
  143. . John Rawls, A Theory of Justice 73–77 (2d ed. 1999).
  144. . See Rosales-Mireles v United States, 138 S. Ct. 1897, 1910 (2018) (discussing social science research on the importance of fair procedure and the role of fairness in justice). Another view holds that due process promotes accuracy overall, even at the expense of it in certain cases. Proponents of legal automation who hold this view might argue that automated legal decision-makers will make due process obsolete by eliminating judicial errors. But no error-free automated system has ever been developed, and assuming one involves speculation beyond this Article’s assumption that an automated legal system might replicate human performance including errors.
  145. . See Charles S. Chapel, The Irony of Harmless Error, 51 Okla. L. Rev. 501, 512–13 (1998).
  146. . See Josh Bowers & Paul H. Robinson, Perceptions of Fairness and Justice: The Shared Aims and Occasional Conflicts of Legitimacy and Moral Credibility. 47 Wake Forest L. Rev. 211, 215 (2012).
  147. . Chapel, supra note 144, at 502.
  148. . See Virginia-Carolina Tie & Wood Co. v. Dunbar, 106 F.2d 383, 387 (4th Cir. 1939).
  149. . 28 U.S.C. § 2111; see also Chapel, supra note 144, at 502. 
  150. . Delaware v. Van Arsdall, 475 U.S. 673, 681 (1986).
  151. . See Rose v. Clark, 478 U.S. 570, 579, (1986).
  152. . The question here is whether some member of society can differentiate between processes and has preferences across them, not how policymakers should analyze what differences to care about. See discussion infra Part III. 
  153. . Only those changes some societal constituent notices and cares about should count, not all the changes that are noticed. This is because if no interest cares about some change, then none will not attempt to feed information related to it back into the system, making their observation of the difference meaningless. See supra notes 93–95 and accompanying text. There may be no change that some societal observer would notice but not care about. See infra Section II.B.2. 
  154. . See Rawls, supra note 142, at 3–4.
  155. . See Fallon, supra note 62, at 1828; see also Calo & Citron, supra note 58, at 832–35 (arguing that when administrative agencies automate, they often undermine the justifications for their authorization and therefore their legitimacy).
  156. . Rawls, supra note 142, at 76–77. Outcomes may be irrelevant to justice if they are generated by fair procedures. Robert Nozick, Anarchy, State, and Utopia 151 (1974) (“Whatever arises from a just situation by just steps is itself just.”).
  157. . Rawls, supra note 142, at 74–75.
  158. . See Nozick, supra note 155, at 96.
  159. . This is called “perfect procedural justice.” See Rawls, supra note 142, at 76. Perfect procedural justice requires that just outcomes be knowable. Id.
  160. . This Article cannot assume a system that perfects legal processes. See Rosales-Mireles v. United States, 138 S. Ct. 1897, 1910. Regardless, perfect procedural justice might be theoretically impossible because it assumes a just outcome is knowable, and it is not clear every case has a single, knowable just outcome.
  161. . E. Allan Lind. & Tom R. Tyler, The Social Psychology of Procedural Justice 61–92 (1988).
  162. . See infra note 183 and accompanying text.
  163. . Assuming the attorney is competent, free from conflicts of interest, etc.
  164. . Cf. Morris v. Slappy, 461 U.S. 1, 14 (1983) (establishing that the Sixth Amendment does not guarantee the right to a “‘meaningful relationship’ between an accused and his counsel”).
  165. . Lind & Tyler, supra note 160, at 65 (finding that “procedural fairness [is] more important than distributive fairness in determining attitudes toward the court”).
  166. . Larry Alexander & Emily Sherwin, Demystifying Legal Reasoning 71–72 (2008); Richard A. Posner, Reasoning by Analogy, 91 Cornell L. Rev. 761, 773–74 (2006) (reviewing Lloyd L. Weinreb, Legal Reason: The Use of Analogy in Legal Argument (2005)) (arguing judges engage in “reasoning by analogy” to justify departing from clearly established rules when they conflict with the judge’s policy analysis).
  167. . Justifications for rules also help judges handle conflicting rules and determine when rules should be overruled or departed from. Alexander & Sherwin, supra note 165, at 50–60.
  168. . See, e.g., Fuller v. Daniel, 438 F. Supp. 928, 929 (N.D. Ala. 1977) (“The Court has serious doubts about the fairness of this procedure. . . . Nevertheless, the Court is of the opinion that this action must be dismissed . . . .”); U.S. Asphalt Refin. Co. v. Trinidad Lake Petroleum Co., 222 F. 1006, 1012 (S.D.N.Y. 1915) (“[T]he decisions cited show beyond question that the Supreme Court has laid down the [controlling] rule . . . . It was within the power of that tribunal to make this rule. Inferior courts may fail to find convincing reasons for it; but the rule must be obeyed . . . .”).
  169. . Statistics for lifetime likelihood of interacting with the legal system are few. See generally Am. Acad. of Arts and Sci., Measuring Civil Justice for All: What Do We Know? What Do We Need to Know? How Can We Know It? (2021),
    tice-for-All.pdf []. While criminal records serve as only a rough proxy for number of arrests and do not account for peoples’ interactions with civil courts, over 70 million people have criminal records indexed by the FBI. Matthew Friedman, Just Facts: As Many Americans Have Criminal Records as College Diplomas, Brennan Ctr. for Just.
    (Nov. 17, 2015), []. Roughly 400,000 cases are filed annually in federal trial courts, and over 100 million cases are filed annually in state trial courts. Inst. for the Advancement of the Am. Legal Sys.,
    FAQs: Judges in the United States 3 (2014),
    default/files/documents/publications/judge_faq.pdf []. And even an estimate of the number of people who directly interact with the legal system does not account for the family and community members who can be seen as having interacted with the legal system through the person who participated in the case.
  170. . Nixon v. Warner Commc’ns, 435 U.S. 589, 602 (1978) (describing a “presumption . . . in favor of public access to judicial records”). See generally David S. Ardia, Court Transparency and the First Amendment, 38 Cardozo L. Rev. 835 (2017) (explaining the importance of transparency in the judicial system).
  171. . The Supreme Court has suggested that the purpose of “adversarial testing [is to] ultimately advance the public interest in truth and fairness.” Polk Cnty. v. Dodson, 454 U.S. 312, 318 (1981).
  172. . See Louis D. Brandeis, Other People’s Money and How the Bankers Use It 62 (Richard M. Abrams ed., 1967) (1914) (“Publicity is justly commended as a remedy for social and industrial diseases. Sunlight is said to be the best of disinfectants . . . .”).
  173. . Miranda v. Arizona, 384 U.S. 436, 444–46 (1966).
  174. . Court backlogs can have tragic effects. For example, 16-year-old Kalief Browder was wrongfully accused of stealing a backpack and spent three years in pretrial detention on Rikers Island before his case was eventually dismissed. Jennifer Gonnerman, Before the Law, New Yorker (Oct. 6, 2014) [hereinafter Gonnerman, Before the Law], []. Browder committed suicide soon after his release. Jennifer Gonnerman, Kalief Browder, 1993–2015, New Yorker (June 7, 2015), [].
  175. . See, e.g., Renee M. Jones, Legitimacy and Corporate Law: The Case for Regulatory Redundancy, 86 Wash. U. L. Rev. 1273, 1324–33 (2009) (describing how corporate interests used targeted litigation to neuter a variety of SEC regulatory efforts).
  176. . See generally Scott L. Cummings & Deborah L. Rhode, Public Interest Litigation: Insights from Theory and Practice, 36 Fordham Urb. L.J. 603 (2009) (arguing that public interest litigation “is an imperfect but indispensable strategy of social change”).
  177. . Unions not only bargain on behalf of members and mediate disputes with employers but also represent individual union members in employment disputes. See generally Catherine L. Fisk, Union Lawyers and Employment Law, 23 Berkeley J. Emp. & Lab. L. 57 (2002).
  178. . See Angwin et al., supra note 4; Anne L. Washington, How to Argue with an Algorithm: Lessons from the COMPAS ProPublica Debate, 17 Colo. Tech. L.J. 131, 148–53 (2018).
  179. . Lobbying legislators appears to be effective. John M. de Figueiredo & Brian Kelleher Richter, Advancing the Empirical Research on Lobbying 11–14 (Nat’l Bureau of Econ. Rsch., Working Paper No. 19698, 2013). Lobbying is dominated by corporate interests, id., but grassroots lobbying can be effective too. Daniel E. Bergan, Does Grassroots Lobbying Work? A Field Experiment Measuring the Effects of an E-Mail Lobbying Campaign on Legislative Behavior, 37 Am. Pol. Rsch. 327, 327 (2009). Executive agencies respond to notice-and-comment processes, though they are driven by political agendas. See William F. West, Formal Procedures, Informal Processes, Accountability, and Responsiveness in Bureaucratic Policymaking: An Institutional Policy Analysis, 64 Pub. Admin. Rev. 66 (2004).
  180. . For example, President Taft became Chief Justice Taft, and Justice Robert Jackson had been Attorney General. And in Marbury vs. Madison, Chief Justice Marshall ruled on a commission he had sealed as Secretary of State.
  181. . For example, Merrick Garland, a federal judge and former nominee for the Supreme Court, became Attorney General in 2021.
  182. . See supra Section II.A.2.
  183. . See generally Lauren C. Bell, Monitoring or Meddling? Congressional Oversight of the Judicial Branch, 64 Wayne L. Rev. 23 (2018).
  184. . Certainly some observers’ perspectives should be ignored by legal decision-makers as not being worth attending to, either because the observer itself is not worth attending to (for example, racist organizations that do not care how the legal system works, so long as it benefits their race), or because the observations concern something the policymaker is justified in ignoring (for example, the brand of coffee served in the jury pool room, see infra Section II.B.2.).
  185. . Discussed further infra Part III.
  186. . The FISA Court provides a case study in how no aspect of the judicial system goes entirely unobserved by every interest in society. It operated largely in secret, authorized sweeping operations that collected information about Americans not approved for surveillance, and was authorized and reauthorized by Congress even as senators expressed concerns about it. After Edward Snowden publicized its operations, public interest groups sued, and laws were passed restricting its authority. See Emily Berman, The Two Faces of the Foreign Intelligence Surveillance Court, 91 Ind. L.J. 1191, 1194–98 (2016); Walter F. Mondale et. al., No Longer A Neutral Magistrate: The Foreign Intelligence Surveillance Court in the Wake of the War on Terror, 100 Minn. L. Rev. 2251, 2259–69 (2016). The FISA Court and bulk-surveillance intelligence methods are different from most judicial proceedings: people the court approves for surveillance do not know they are the subject of legal proceedings, and the court does not operate as a normal Article III court (including its appeal and oversight system). But besides the politicians observing the court, the individual citizens working within it observed it, and one of those citizens exposed it to the media. Once the media broadcast the existence of the program, the public and interest groups like the ACLU focused on it. All this provided the impetus for change by Congress, which only a few years earlier had reauthorized it. Even this most secret of legal proceedings was observed by interested members of society who broadcast the information to others.
  187. . See supra Section I.C.1.
  188. . See supra note 152 and accompanying text. As noted supra note 183, policymakers might reasonably ignore some interests for not deserving to be included in society-at-large’s observations. But unless the change matters only to those interests, the change itself should be assumed observed by society at large.
  189. . See supra note 95 and accompanying text. This dystopian idea is best left unexplored.
  190. . See supra Section I.C.1.
  191. . See supra Section I.A.
  192. . See supra Section II.A.1.
  193. . See supra Section II.A.2.
  194. . This weak version of the assumption corresponds to having the judicial system searching for reversible error be the observer evaluating behavioral equivalence.
  195. . See supra Section II.A. It is possible that there is a better way for the legal system writ large to detect changes; I welcome others’ formulations.
  196. . When these determinations are nondeterministic, this is trickier but still possible. See supra notes 102–107 and accompanying text. If something less than exact fidelity is required, some variation in the replacement’s performance relative to the judge’s may be tolerable.
  197. . Criminal defendants are protected from double jeopardy. See U.S. Const. amend V. And civil litigants cannot relitigate because of res judicata. 18 Edward H. Cooper, Federal Practice and Procedure § 4401 (3d ed. 2021). Such a system could be tested by a mock trial without any legal weight behind it, but this is unlikely. Participating in a weightless practice run before trial would defeat the purpose of the exercise, since participants could change their strategies based on their opponents’ in the practice run, thereby changing the inputs to the two systems. It is hard to imagine participants relitigating their cases after the fact for no possible benefit.
  198. . See supra Section I.B.
  199. . Statistics about trends usually cannot be used for inferences about specific cases. See cases cited supra note 126.
  200. . See supra notes 134–36 and accompanying text.
  201. . Some rules of civil procedure might be examples of good candidates.
  202. . Sometimes courts issue decisions without explaining them; a reviewing court makes its best inference about the basis for the decision. See Dart Cherokee Basin Operating Co., LLC v. Owens, 574 U.S. 81, 95 n.7 (2014) (“Caution is in order when attributing a basis to an unreasoned decision. But we have not insisted upon absolute certainty when that basis is fairly inferred from the record.”). Thanks to court stenographers and parties’ filings, even unwritten decisions from the bench may be explainable.
  203. . See Slack et al., supra note 139; see also Gary Marcus & Ernie Davis, GPT-3, Bloviator: OpenAI’s Language Generator Has No Idea What It’s Talking About, MIT Tech. Rev. (Aug. 22, 2020), [] (“GPT-3 seems to have an impressive ability to produce human-like text. . . . But accuracy is not its strong point . . . . which means you can never really trust what it says.”).
  204. . Subjecting new systems to additional observations does not complicate a behavioral equivalence analysis, which focuses on the content, not quantity, of observations.
  205. . Note that this works when new systems are designed to fix problems with the old system as well: the new system must be audited to make sure that the desired difference appears.
  206. . See generally Eric W. Weisstein, Constant Function, MathWorld, [] (“A constant function is [a] function . . . whose value does not change as its parameters vary.”).
  207. . Chapel, supra note 144, at 505 (“It is not uncommon for an appellate court to acknowledge multiple errors in a single trial and conclude each is harmless because the record established guilt.”).
  208. . See Andrew Burt, The AI Transparency Paradox, Harv. Bus. Rev. (Dec. 13, 2019), [] (“[T]he more a model’s creators reveal about the algorithm, the more harm a malicious actor can cause.”).
  209. . See Sternlight, supra note 134, at 680–86 (explaining why and how contract drafters use arbitration clauses to their advantage).
  210. . See LoPucki & Weyrauch, supra note 106, at 1463–68.
  211. . See supra note 114 and accompanying text.
  212. . The adversarial system might mitigate this problem because each party has a strong incentive to ensure its opponent does not successfully manipulate the dispute resolution.
  213. . See, e.g., Anthony Niblett et al., The Evolution of a Legal Rule, 39 J. Legal Stud. 325, 346–47 (2010) (tracing the evolution of tort claims for economic loss).
  214. . I have focused on replacing a single component of the legal process at a time; replacing sets of them simultaneously would increase the difficulty of detecting differences and assigning blame for them.
  215. . For instance, given the ink spilled on the subject, this analysis would have been useful when Congress passed the Federal Arbitration Act of 1925, Pub. L. 68-401, 43 Stat. 883 (codified as amended at 9 U.S.C. §§ 1–16).
  216. . See supra Section II.B. That said, the legal system can declare nonidentical systems de jure, if not de facto, equivalent. See supra notes 134–36 and accompanying text.
  217. . Changes may also affect whether the system is perceived as legitimate, but these authorization tradeoffs will be a concern anytime the legal system is modified through automation. Because authorization has been assumed in this Article thus far, authorization tradeoffs are not discussed here. See Calo & Citron, supra note 58, at 817, for a discussion of the crisis of legitimacy facing administrative agencies that automate decision-making processes.
  218. . See, e.g., Sophia Tatum, ‘I Did Not Want Paul Manafort to Be Guilty, but He Was,’ Says Juror Who Supports Trump, CNN (Aug. 23, 2018, 1:45 PM), https:// [https://perm] (describing a juror discussing jury deliberations and her own biases).
  219. . Dylan Lovan, 2nd Breonna Taylor Grand Juror Criticizes Proceedings, AP News (Oct 22, 2020), []. Several grand jurors unsuccessfully filed a petition to impeach the AG for misrepresenting the grand jury’s findings. Joe Sonka & Morgan Watkins, Impeachment Committee Wants Daniel Cameron to Reconsider Not Billing Petitioners, Louisville Courier J. (Mar. 5, 2021, 5:21 PM), [].
  220. . See Joshua Y. Kim et al., A Comparison of Online Automatic Speech Recognition Systems and the Nonverbal Responses to Unintelligible Speech (Apr. 29, 2019) (unpublished manuscript), []. These results support claims by transcription companies that automated transcription is cheaper but less accurate than human transcription.
  221. . See Posner, supra note 114, at 224–25 (criticizing judicial writing as complicated and verbose). See also People v. Kelly, 146 P.3d 547, 548, 550–53 (Cal. 2006), for a case that turned on the adequacy of a written opinion, required by California’s State Constitution, and discussed the role of written opinions in American law.
  222. . See Mathews v. Eldridge, 424 U.S. 319, 332 (1976) (describing due process requirements preceding a deprivation of a property interest). Unfortunately, legal automation has a poor record of protecting these due process rights. See Calo & Citron, supra note 58, at 819 & n.152.
  223. . See Estelle v. Gamble, 429 U.S. 97, 106 (1976) (“The handwritten pro se document is to be liberally construed. . . . [A] pro se complaint, ‘however inartfully pleaded,’ must be held to ‘less stringent standards than formal pleadings drafted by lawyers’ . . . .” (quoting Haines v. Kerner, 404 U.S. 519, 520 (1972))).
  224. . See supra Section I.C.3.
  225. . Fed. R. Evid. 807.
  226. . See generally Citron, supra note 58; Calo & Citron, supra note 58, at 821.
  227. . For example, the verdict “Because you killed a person and not in self-defense, you committed murder” is quite different from a machine computing “Based on case inputs and model priors, you committed murder with 97% probability.” People care not only what happens to them, but why. See Section II.B.
  228. . See supra notes 11–13 and accompanying text (discussing how risk assessment software and sentencing guidelines removed decision power from judges and added bias to carceral decision-making).
  229. . See, e.g., McBurney et al., Desiderata for Agent Argumentation Protocols, 1 Int’l Joint Conf. on Autonomous Agents & Multiagent Sys. 402, 403–04 (2002).
  230. . See supra Section I.A.1. Which representational formalizations allow what kinds of reasoning is beyond the scope of this Article.
  231. . See Blass, supra note 82, at 428 (explaining why certain machine learning techniques produce often uninterpretable results).
  232. . C.M.A. McCauliff, Burdens of Proof: Degrees of Belief, Quanta of Evidence, or Constitutional Guarantees?, 6 Vand. L. Rev. 1293, 1324–27 (1982) (reporting a survey of judges’ ratings of probabilities associated with standards of proof).
  233. . Elisabeth Stofeflmayr & Shari Seidman Diamond, The Conflict Between Precision and Flexibility in Explaining “Beyond A Reasonable Doubt”, 6 Psych. Pub. Pol’y & L. 769, 778–83 (2000) (showing judges resist assigning percent certainties to standards, and the benefits of using flexible standards).
  234. . Thank you to Prof. Jay Koehler for this thought experiment.
  235. . Charles Nesson, The Evidence or the Event? On Judicial Proof and the Acceptability of Verdicts, 90 Harv. L. Rev. 1357, 1378–82 (1985).
  236. . These tradeoffs are those described in Section II.B. and Section III.A., so examples are not revisited here.
  237. . See Sonja B. Starr & M. Marit Rehavi, Mandatory Sentencing and Racial Disparity: Assessing the Role of Prosecutors and the Effects of Booker, 123 Yale L.J. 2, 48 (2013) (finding that “charging decisions appear to be the major driver of sentencing disparity”); Crystal S. Yang, Free at Last? Judicial Discretion and Racial Disparities in Federal Sentencing, 44 J. Legal Stud. 75, 76–77 (2015) (reviewing racial disparities in sentencing, which have increased when controlling for offender and crime attributes).
  238. . Deborah Tuerkheimer, Incredible Women: Sexual Violence and the Credibility Discount, 166 U. Pa. L. Rev. 1, 3 (2017) (describing how police, prosecutors, and the structure of sexual-assault law itself act to discredit victims of sexual assault).
  239. . See Gonnerman, Before the Law, supra note 173.
  240. . See Calo & Citron, supra note 58, at 818–32.
  241. . This is analogous to the notice-and-comment process.
  242. . See Frank Pasquale, Secret Algorithms Threaten the Rule of Law, MIT Tech. Rev. (June 1, 2017), [].
  243. . See id.
  244. . See Stevenson, supra note 4, at 305.
  245. . As noted supra note 13, commentators disagree whether these systems are racially biased.
  246. . Because those systems are not ready to implement in the real world, this assessment considers idealized implementations.
  247. . Indeed, my own research in AI involves building models of precedential reasoning, and I am sympathetic to the research goals of doing so. See Blass, supra note 35. Adopting my AI system for judicial purposes—which I would oppose doing—would implicate the same tradeoffs I have described. Designed to model how lawyers and judges derive and apply rules from precedents, my system aims to minimize reasoning and informational access tradeoffs. However, there is currently no role for participants to play in shaping the analysis performed, and no guarantee that the system will consistently derive the same outcomes as humans do. Therefore using my system for litigation would involve process and possibly outcome tradeoffs.
  248. . The evidence adduced in a trial is itself the result of a process operating over inputs, not all of which are trustworthy. See Allen, supra note 44, at 105. Systems that require factual inputs will have to solve the separate problem of what those inputs are.
  249. . See supra notes 28–33 and accompanying text. Imagining replacing judicial reasoning with these systems involves assuming that the problem supra note 247 has been solved. Facts would have to be determined and input into the system, which might implicate additional tradeoffs.
  250. . See sources cited supra note 28.
  251. . See supra notes 32–33 and accompanying text.
  252. . These systems would require adaptation to handle nonprecedential legal reasoning.
  253. . See supra notes 28–31 and accompanying text.
  254. . See supra notes 32 and accompanying text.
  255. . See supra notes 33–34 and accompanying text.
  256. . See Horty, supra note 32, at 12–26.
  257. . See Branting et al., supra note 40; Chalkidis et al., supra note 40.
  258. . See Blass, supra note 82, at 428.
  259. . See supra note 202 and accompanying text.
  260. . See Branting et al., supra note 40, at 229.
  261. . Id.
  262. . A separate question concerns the effect of automating attorneys, which represents a smaller change to the legal system than automating decision-makers. AI is closer to being able to serve as a legal adviser to a pro se litigant than as a judge. Systems like those of Xu et al., supra note 42, may continue to transform lawyers’ practice of law and allow more litigants to do without lawyers. This would affect lawyers and entail a process tradeoff; it may also implicate reasoning and outcome tradeoffs if litigants using those systems present different cases than those who hire lawyers.
  263. . See Prakken, supra note 39, at 326.
  264. . See Prakken & Sartor, supra note 38, at 240.
  265. . Prakken, supra note 39, at 316 (“These burdens are assumed to have been determined by the judge between the argumentation and decision phase and are given as input to the decision phase.”).
  266. . See supra notes 47–50 and accompanying text.
  267. . Livermore, supra note 49, at 250–51. Professor Livermore suggests that the specific weights of a trained neural network be defined in the statute directly, which may be problematic. He identifies one potential risk: that bad actors could game the system by tweaking vehicles until they just pass the classifier, but a bigger problem is that once the classifier is set in law, its mistakes cannot be corrected by updating the network with new examples but only by modifying the law. If the statute allows for updating given additional data, that problem is solved.
  268. . See Gowder, supra note 47, at 220–23.
  269. . Adding missing features and retraining models is costly, so this solution may face practical hurdles.
  270. . See supra note 51 and accompanying text.
  271. . Perfect enforcement is not necessarily a good thing. C.f. Michael L. Rich, Should We Make Crime Impossible?, 36 Harv. J. L. & Pub. Pol’y 795, 833–46 (2013).
  272. . Whether this is good or bad may depend on the crime. Any phone with a GPS and a maps app could already determine if its owner was jay-walking; it is not clear it should fine users for doing so.
  273. . McGinnis & Wasick, supra note 52, at 1040–45.
  274. . Id. at 1046.
  275. . Coglianese & Lehr, supra note 53 at 1147–48.
  276. . See id. at 1215–20.
  277. . Id. at 1216–17.
  278. . Id. at 1218–20.
  279. . Volokh, supra note 54. Recently Prof. Susskind, chair of the advisory group on AI for the Lord Chief Justice of England, said an AI system that was 95% accurate at predicting case outcomes could be used to conclusively dispose of cases in places with large case backlogs. Padraig Belton, Would You Let a Robot Lawyer Defend You?, BBC News (Aug. 15, 2021), [].
  280. . This is called the Eliza Effect, after one of the first chatbots developed. ELIZA was designed to mimic a psychotherapist, turning statements into questions, expressing sympathy, etc. The developer’s secretary spoke to it extensively, as if it were human. See The ELIZA Effect, 99% Invisible (Dec. 10, 2019),
    transcript []. Eliza Effects are well known to researchers studying natural language understanding and generation; some excited reactions to GPT-3 and other language models may be examples. See Marcus & Davis, supra note 202.
  281. . The ELIZA Effect, supra note 279.
  282. . Volokh, supra note 54, at 1152–54. This would establish a new standard of review: judicial opinions currently need not persuade anyone to be binding, not even a reviewing court, if the unpersuasive parts fall within the judge’s discretion.
  283. . Shelly Chaiken, The Heuristic Model of Persuasion, in 5 Social Influence: The Ontario Symposium 3 (1987) (presenting empirical evidence that “people exert little cognitive effort in judging the validity of a persuasive message and, instead, may base their agreement with a message on a rather superficial assessment of a variety of extrinsic persuasion cues”).
  284. . Volokh, supra note 54, at 1167–77. He discusses risks of bias, a lack of legitimacy, and security risks.
  285. . See Slack et al., supra note 139.
  286. . “A → X” is a logical representation for a rule and should be read as “If A, then X.”