ENGINEERINGMAR 24, 2026

110,227 Tests. Zero Failures. We Tried to Break Our Own Compiler.

Seven levels of adversarial testing. Every monomer, every backend, every language, every edge case. 110,227 attempts to find a single failure. We found none.

The Question We Asked Ourselves

When you build a compiler that transpiles code between 10 input languages and 14 output targets, that certifies programs mathematically, and that compiles itself — there is one question that towers above everything else:

How do you know it actually works?

The industry standard answer is "we tested it pretty well." We looked at that answer and decided it was an embarrassment. "Pretty well" is not engineering. So we did something different.

What Are Abyssal Tests?

We call them abyssal tests because they go all the way down — to the absolute bottom of the system. These are not your typical integration tests that verify "the button works." These are tests designed to destroy our own compiler. Every atomic operation. Every value combination. Every backend. Every control flow pattern. We tried to break it. Systematically.

110,227 Tests Across 7 Categories

The tests span 7 categories: individual monomer operations, multi-family compositions, cross-target consistency, determinism verification, real execution with verified I/O, security and abuse resistance, and regression coverage. Every single test verifies a concrete, specific property. None of these are randomly generated. Each one exists because it targets a specific execution path that could fail — and we made sure it does not.

What We Tried to Break

Level 1: Individual Operations

Every monomer in the full catalog was tested with boundary values: 0, 1, 127, 128, 255, and every dangerous combination between them. ADD8(255, 1) must produce wrap-around — not a crash, not undefined behavior, wrap-around. DIV8(x, 0) must produce a controlled error — not a segfault. SHL(1, 7)must produce 128. No exceptions. No "it depends." No platform-specific behavior.

Level 2: Compositions

Here is where most compilers fall apart. An individual monomer can work perfectly and fail catastrophically when composed with another. We generated chains of 2, 3, 4, 5, and 6 operations mixing families: arithmetic with logic, logic with strings, strings with float, float with trigonometry. If ADD8 works and SIN works, does SIN(ADD8(1,2)) work?

Yes. In every single case. Every combination. Every permutation.

Level 3: Cross-Target

The same PCD program must produce correct code in JavaScript, Python, Rust, Go, C, C++, PHP, and Java. Each monomer generates idiomatic code in the target language — not a transliteration, but native semantics appropriate to that language. And here is the hard part: all backends must produce the same result for the same input. Identical outputs. Across languages.

2,864 tests verify this for monomer combinations alone. Every single one passes.

Level 4: Determinism

This is the most important property of BRIK64: the same input produces the same output. Always. Not "usually." Not "in most cases." Always. No garbage collection pausing between two runs. No JIT optimizing differently the second time. No scheduler reordering operations behind your back.

Every program is compiled twice. Hashes are compared. If they differ by a single bit, the test fails. 600 determinism tests. Zero failures.

Level 5: Real Execution

The first 100,000 tests verified code generation — that the compiler produces valid, compilable code. The last 10,000 go further. They verify real execution: that the generated code, when actually run, produces the correct values. Not just valid syntax. Correct answers.

ADD8(1, 2) must not only generate code that compiles — it must produce 3 when executed. SIN(0) must produce 0.0. A loop that accumulates 10 times must produce exactly 10.

These tests execute the BIR (BRIK Intermediate Representation) with known input values and verify that the output is exactly what the mathematics predicts. Not approximately. Exactly.

Level 6: Security and Abuse

What happens when someone deliberately tries to attack the compiler? SQL injection in a PCD variable name. XSS in a string literal. Path traversal in a filesystem argument. Unicode homoglyphs designed to confuse the parser. We threw everything we could think of at it.

484 regression and security tests verify that the system rejects or correctly handles every single malicious case. The compiler is not just correct — it is hostile to attackers.

Level 7: Regression

Every bug we found and fixed during development became a permanent, immortal test case. The array overflow that caused a segfault in ELF generation. The variable scoping in if blocks that did not propagate to the outer scope. The ENV function that did not exist as a monomer and returned garbage.

These bugs can never come back. Not tomorrow. Not next year. Not ever. Their tests are embedded in the artifact forever.

What We Did NOT Find

This is the part that matters most. After 110,227 deliberate, systematic attempts to break our own system:

0 failures in core operations. Every certified monomer, Φ_c = 1. The mathematical certification holds under adversarial conditions.

0 determinism failures. Same input, same output. Always.

0 uncontrolled crashes in the compilation pipeline.

0 cross-target inconsistencies. All backends produce equivalent code. Write once, run anywhere — and get the same answer everywhere.

Why This Is Possible

The secret is not that we are better testers than everyone else. It is that the operation space is finite. And that changes everything.

A conventional program has a virtually infinite state space: any combination of calls to any function with any argument. Exhaustively verifying a 1,000-line Python program is computationally impossible. Nobody will ever do it. It cannot be done.

A PCD program is composed of exactly 128 atomic operations. Each one has a known signature, a known domain, and a known range. You can verify every combination because the space is finite. This is not cleverness. This is architecture.

It is the same reason you can formally verify a digital circuit with 128 gates but you cannot formally verify a modern processor with a billion transistors. We made the deliberate architectural decision to keep the component space finite. And that decision is what makes exhaustive verification not just viable — but inevitable.

The Result

110,227 tests. 0 failures. This is not a marketing claim. It is not a rounded number. It is a verifiable fact. Every test is in the repository. Every one runs on every commit. Every one produces the same result today that it produced yesterday and will produce tomorrow and will produce a decade from now.

Because that is what "deterministic by construction" means. Not a promise. A mathematical property.

Run the Corpus

git clone https://github.com/brik64/brik64-demos.git
cd brik64-demos
./run_demo.sh adversarial-corpus

The abyssal tests cover the full monomer catalog, 14 backends, 10 input languages, control flow, multi-family compositions, determinism, real execution, security, and regression. The code and the tests are part of the same verifiable, immutable artifact. Run them yourself. The numbers do not change.