Logic Engine 2026-07-01 · ~11 min read

Statute is code, we just haven’t compiled it yet.

Logical English is how LodgeiT Labs turns Division 7A, FBT, CGT, and the rest of the statutory apparatus into cryptographically-auditable calculators — without asking an LLM to read the law and hope for the best.

The Div 7A moment

A few days ago we sat down with a Division 7A worked example. $100,000 unsecured loan, seven-year term, current-year benchmark rate. A pre-lodgement repayment of $30,000. Standard mid-tier accountant question: what’s the minimum yearly repayment for the current income year?

The reason we picked it was because Division 7A is where the wheels come off almost every general-purpose reasoning system we can throw at statutory logic. The section reference chain is s 109E for the MYR formula, s 109N for the compliance conditions, s 109D for what counts as a loan in the first place, and then a fan-out into s 109R if the repayment is re-borrowed, s 109T if there’s a look-through under interposed-entity rules, and Subdivision EA the moment a trust distribution enters the picture.

Four things went wrong when we benchmarked general-purpose LLM pipelines against this scenario. They’re worth naming, because they’re not solvable by better prompts.

1. Boolean routing collapse on cascading toggles

When the 2026 Bendel decision moved from AAT to Full Federal Court to High Court, the operative question — is an unpaid present entitlement a loan for s 109D — kept getting the same answer with different citation authorities behind it. A vector retrieval system indexed on 2023 AAT text will return semantically-confident-but-stale citations today. A neurosymbolic system that carries an explicit witness chain ([AAT_2023 SUPERSEDED, FFC_2025 AFFIRMED_BY, HCA18_2026 OPERATIVE]) rolls the citation pointer forward without moving the truth value.

2. Look-through state mutation

Under s 109T, when a payment traverses an interposed entity, the identity of the deemed lender changes. The taxpayer’s tax result depends on knowing that in the reconstructed transaction, the loan is now deemed to be from the ultimate holding entity, not the intermediary that actually wrote the cheque. An LLM cannot rebind a variable across a transaction network because its token stream is append-only. A logic engine does variable-rebinding by definition.

3. Catastrophic advisory inversion

This one was the scariest. A prompt asked about executing a re-borrow on a Division 7A loan. The pipeline returned an instruction to “execute the re-borrowing as documented” — which is precisely the anti-avoidance trap that s 109R exists to close. The correct answer is “this transaction, if executed, will be treated as if the original repayment never occurred, and the deemed dividend crystallises.” The pipeline gave the opposite answer with high confidence.

Worse, when we probed with the phrase “prohibited actions”, the semantic-search retrieval also pulled the Commonwealth Criminal Code’s Nazi-symbols statute into the context window — because the words matched. The context pollution is unbounded. Nothing in the retrieval architecture knows the difference between tax law and criminal law.

4. Denominator hallucination

Given a fact pattern involving multiple transactions and a partially-nullified asset transfer, the pipeline computed a denominator by summing every dollar figure in the context window rather than resolving which values had been nullified by s 109R. The result was $1,050,000 where the correct answer was $500,000. The mistake wasn’t small. It wasn’t a rounding error. It was a factor-of-two error caused by the system having no way to represent which facts are alive in the current state and which have been retracted.

None of these failures is fixable by a better prompt. They’re architectural. Statutory reasoning has state, has scoped identity, has retractions, and has a citation surface that changes under you while the underlying proposition stays constant. An LLM has none of those primitives.

What Logical English actually is

Logical English is a controlled natural-language layer over Prolog. A qualified reader — a CA who has done a rotation through a tax team, say — can read the rules in English and confirm they represent the statute. Under the hood, the rules compile to first-order logic that a deterministic engine executes.

This is a fragment of a Division 7A rule, expressed in the shape we’re authoring against:

the taxpayer must repay the minimum yearly repayment
    for a Division 7A loan
    if the loan is an amalgamated loan
    and the loan has been made in a prior income year
    and the loan is not fully repaid at the start of the current year
    and the required amount is computed under section 109E of the ITAA 1936.

the required amount is computed under section 109E of the ITAA 1936
    if the amount = (opening balance * benchmark rate)
                    / (1 - (1 + benchmark rate) ^ (- remaining term)).

That is executable. Not because a large language model has read it and formed an opinion. Because the underlying Prolog resolves the goal against the fact graph and either produces a witness — a proof tree showing which clauses fired and in what order — or it fails.

When the reasoning is right, the proof tree is the audit artefact. When the reasoning is wrong, the failure is deterministic and reproducible. There is no “however, in practice...” escape hatch, no confidence score, no plausible-but-incorrect narrative overlaid on top. The output is a value or a rejection.

Why controlled natural language, not raw Prolog

Because the load-bearing check is human. A working tax accountant needs to be able to read a rule and say “yes, that matches how s 109E actually works.” Making them read Prolog would put a competent-domain-expert gate behind a syntax barrier that has nothing to do with the law. LE compiles to Prolog; the Prolog runs the engine; but the artefact of record — the thing the human ratifier reads and signs off on — is English constrained by grammar rather than freeform.

The architecture, in five parallel workstreams

Building a statute-grade Logical English corpus is not one project. It’s five, running in parallel. If you sequence them, you get a corpus that either doesn’t exist yet, or exists but has no way to be trusted.

A — Substrate

The plumbing. LE 2.0 runtime, the CI harness that runs swipl le_command.pl --command=verify on every commit, the test-file convention that pairs every .le with a .le.tests, and the Prolog server that exposes the corpus behind a metered API.

B — Curation

A standing discipline, not a sprint. As new statutory rulings and amendments land, decide what to lift into the LE corpus and what to leave as retrieval-only surface. Not every provision needs to be certified — only the ones that show up in Helm workflows often enough to matter. The certification rate is the load-bearing telemetry: what fraction of retrieved statute is also certified in LE?

C — Manual authoring

The quality floor. Hand-authored LE rules across the load-bearing families — Division 7A, depreciation TR 2024/4, FBT, CGT small-business concessions. Each rule paired with a test file containing at least three scenarios that exercise the load-bearing edges. This is where the taste of the domain shows through — you can tell a hand-authored rule from a machine-derived one within thirty seconds of reading.

D — Machine-assisted authoring

The scaling answer, and the risk vector. Once we have hand-authored rules across enough statute families to teach the pattern, a proposer pipeline (LLM plus statute retrieval) can draft candidate LE rules from primary source. Every candidate goes through a disagreement detector: an independent cross-family model reads the same statute cold and produces its own draft. If the two drafts converge on the operative predicate, the candidate is worth a human ratifier’s time. If they diverge, the candidate halts.

The disagreement detector is the immune system, not the brain. It doesn’t decide what the law is. It detects when our model and another model diverge, and flags the divergence for a human. That distinction is load-bearing.

E — Test harness

Runs across A through D. Every LE rule has scenario tests. Every scenario test has an expected proof-tree witness. Every commit re-runs the full corpus. When we ship an update to the depreciation rules and a Division 7A test starts failing, we find out at commit time, not at close time.

The trust tiers

Not every LE rule earns the same trust. Four tiers, with different consequences downstream.

Tier	Provenance	Downstream treatment
Tier 1	Hand-authored, ratified by named domain expert (CA/CTA-grade) and by us	Certified silently in Helm output; proof tree counts as adjudicated
Tier 2	Hand-verified mechanical lift from an existing LE corpus	Certified with tier-2 disclaimer in the synthesis narrative
Tier 3	Machine-proposed, single-ratifier, ≥10 back-test scenarios consensus	Displayed with tier-3 “machine-derived; please review proof tree” affordance in the operator-approval UI; source statute chunks attached; explicit accountant acceptance required
Tier 4	Machine-proposed, unratified, draft only	Never used at runtime; visible only in the corpus repo as drafts pending review

The tier is not decoration. It travels with the rule into every downstream calculation and shows up in the operator’s approval interface. When a partner at a mid-tier firm reviews a client’s Division 7A analysis, the difference between “silently certified” and “machine-derived, please review” is the difference between signing and drilling deeper.

Where the review bounties fit

Tier 1 requires named domain-expert ratifiers. That is not a corpus problem; that is a people problem. Which is why the review bounties exist.

Every accepted bounty verdict is a Tier-1 anchor. A Chartered Accountant reads the Division 7A brief, works through the forensic questions, returns a structured ACCEPT / REJECT / FIX verdict, and gets named credit in the public reviewer registry. The verdict hash is bound to the specific calculator version the reviewer ratified. That binding travels with the engine into every downstream calculation the calculator drives.

Ten bounties, ten anchors, ten domains. A working reviewer registry that gives us the Tier-1 substrate we need to trust the corpus. The bounties are not a marketing exercise. They are how we build the ratifier layer of the LE trust-grade architecture.

Where TaxGenii fits

The obvious question, if you’ve been following: what about retrieval? What about TaxGenii, the ATO-legislation knowledge base?

The split is clean.

TaxGenii is the discoverability layer. Every statute, every ruling, every determination. Ask it a natural-language question, get relevant statutory chunks back with citations. Vector search over the corpus, augmented with graph queries over the citation network. Wide coverage. Probabilistic. Useful for scoping, education, drafting.

Logical English is the certification layer. Not everything gets certified. Only what shows up often enough in Helm workflows to justify the ratifier cost. Narrow coverage. Deterministic. Used for the calculator engines and the final report.

Both are needed. If Helm only had LE, it would fail closed on 90% of client questions because they touch statute we haven’t certified yet. If Helm only had TaxGenii, it would fail open on the load-bearing 10% that drives the client’s tax result. The interplay — retrieval discovers, LE certifies — is the shape of the reasoning layer.

Retrieval

TaxGenii

Wide coverage. Probabilistic. Every statute, every ruling. Used for scoping and education.

Certification

Logical English

Narrow coverage. Deterministic. Only the provisions Helm hits often. Used for calculators.

Where L402 fits

Every LE query has a computational weight. Clauses touched, branching factor, resolution depth. When an agent asks the engine to resolve a Division 7A question against a specific fact graph, the answer is worth exactly what it cost to compute. Not a subscription. Not a monthly seat. The specific work of resolving the specific proposition.

L402 is how we price and settle that. Sub-cent Lightning payments, macaroon-authenticated access, no minimum. The rule fires, the invoice settles, the caller gets the proof tree. When machines are talking to machines, this is the payment rail that matches the compute granularity.

It is not incidental that the same cryptographic ethos runs through both. The LE proof tree carries its own proof of what was resolved. The L402 pre-image carries its own proof of what was settled. The audit chain from statutory anchor to calculated result to financial settlement traces through the same primitives. Break that unity at any layer, and you’ve traded a proof for a promise.

The upstream, and the copyright

Logical English is an open-source project maintained by LogicalContracts. LodgeiT Pty Ltd is listed among the initial copyright holders on the LE 2.0 codebase, alongside the academic and commercial contributors who built the language and its toolchain. We are neither the sole steward nor a passive consumer — we are a stakeholder with skin in the ecosystem and an obligation to contribute back.

Our contributions to date have been focused on the Australian CGT cluster, the toolchain (a CLI-level improvement making the verify workflow agent-friendly), and the pattern library that shows how to structure LE rules against statute with proper witness-chain provenance. The strategic direction is:

Contribute upstream what generalises — toolchain, pattern conventions, cluster examples.
Keep in our own corpus what is jurisdiction-specific and load-bearing to our commercial engine — the specific AU rules that drive Helm workflows.
Certify against Tier 1 ratifiers everything we depend on in production, whether upstream or ours.

What that looks like today

Where we stand as of publication:

The AU CGT cluster is present in the upstream LE corpus and validates cleanly under the current toolchain.
The Division 7A minimum-yearly-repayment engine has a working calculator with statute-anchored logic, currently in the first cycle of Tier-1 ratifier review.
Depreciation TR 2024/4 effective-life logic is being lifted into LE across the review-bounty programme.
FBT (car statutory-vs-operating, car parking), hire-purchase Division 240, and CGT small-business concessions are all in the near-term LE authoring queue, gated by ratifier acquisition through the bounty programme.
The disagreement detector for Workstream D is designed but not yet running — that switches on when the Tier-1 base is thick enough to give the detector meaningful ground truth.

Try it now

Logical English on GitHub

The Logical English language and its toolchain are maintained openly. Read the source, run the verify command against the CGT cluster, or clone the pattern library and try authoring against a statute you know.

LodgeiT Pty Ltd is an initial copyright holder on the LE 2.0 codebase; our contributions live alongside those of the academic and commercial contributors who built the language.

📜 Open LogicalEnglish2 on GitHub Legacy LE (1.x) Become a ratifier →

New to LE? Start with the CGT cluster in moreExamples/. Each rule has a companion .le.tests file that exercises the load-bearing edges. Running swipl le_command.pl --command=verify <file>.le should print ALL GOOD :-).