manu·martínez-almeida

Systems

NoiseLang: A Language Where Every Value Is a Distribution

· Manu Martínez-Almeida

In a signals-and-noise course during my telecommunications degree, I spent a lot of evenings writing probability by hand. Expectations, variances, the odds of two random variables landing in some region. The math was clean on paper and inert on the page. I kept wishing I could type it the way I wrote it in my notebook and have a machine run it.

That wish became NoiseLang. I started it about nine years ago, never finished it, and recently brought it back as something far more ambitious than I could have built alone the first time.

Everything is a distribution

The whole language hangs on one idea. Every value is a probability distribution. A plain number is the degenerate case, a point mass with all its weight on a single value, and a random variable is the general case. Constants and random variables are the same kind of object, and every operator maps distributions to distributions.

Two binders keep the randomness honest. X ~ unif_int(1, 6) is a stochastic node that draws a fresh random variable, and ~ is the only thing in the language that ever draws. Y = X + 3 is a deterministic node, a transform that adds no new randomness.

A name is one fixed node, the same way X is one X across a page of math. So X + X is 2X and X - X is exactly 0, with no hidden re-draw. Independence comes from separate draws, the way you’d write X₁..Xₙ iid on paper.

X ~ unif_int(1, 6)
Y ~ unif_int(1, 6)
X + Y                 # two independent dice, a real 2d6 distribution

Nothing samples until a query forces it. P(event) runs the expression over millions of draws and returns an estimate with a standard error attached, while everything upstream stays symbolic until that moment.

Bday = unif_int(1, 365)
days ~[23] Bday          # 23 independent draws
P(has_duplicates(days))  # the birthday paradox, about 0.507

Why it sat for nine years

The design was the easy part. A parser and a tree-walking interpreter for this is a weekend. The version I wanted was the problem.

Every interesting Noise program ends in the same place, evaluating an expression over a few million random draws. A toy interpreter walks the syntax tree once per draw, which is pointer-chasing death. The version worth having compiles that expression into a fused machine-code kernel, hides the RNG latency, and fans out across every core, in the browser too.

That’s a compiler, a JIT, a WASM backend, and a pile of careful numerical code. For a nights-and- weekends project, it stayed permanently out of reach.

Building the ambitious version with an agent

What changed is how I build now. I used an AI agent, working from a written spec and a suite of golden tests, to implement the parts that used to make this project too big for one person: the Cranelift JIT, the WASM emitter, the inlined RNG and transcendental approximations, the multi-core reduction. I set the architecture and the correctness bar, and the agent did the volume of meticulous work that nine-years-ago me never had the evenings for.

It’s the clearest example I have of a side project becoming feasible because the cost of careful implementation dropped. The same shift drives my day job on applied AI, and it’s strange and wonderful to feel it land on a hobby from college.

One IR, three backends

Under the hood, ~ and the distribution constructors build an append-only DAG called the RvGraph, with structural sharing so X + X is one draw of X. That graph is the single source of truth, and three backends lower it:

A shared module holds the one definition of what the graph means, so the two code generators stay thin and can’t drift apart. Every path falls back to the interpreter for anything it can’t profitably compile, and the results stay bit-identical across backends and core counts.

Making the Monte-Carlo loop cheap

The whole performance story is a single loop, made cheap without giving up determinism. A handful of techniques carry most of it.

Kernel fusion keeps every intermediate in registers and writes only the final result, so an arithmetic-heavy expression stops touching memory. The PRNG (xoshiro256++) compiles straight into the kernel, and the transcendentals behind normal and friends, the ln, sin, and cos, become inline polynomial approximations accurate to about 1e-9, which roughly doubled a transcendental-bound kernel.

The RNG is a serial dependency chain, so the kernel runs four independent streams at once and lets the out-of-order core overlap them. That latency-hiding trick beat a hand-written SIMD kernel. Columnar batches, a vectorized power-sum reduction, and a fan-out across every core with a deterministic merge do the rest.

The measured numbers, on a 14-core M4 Pro: a one-line P(...) sustains around 5.8 billion samples per second and scales about 9.6× from one core to all of them. Per core, the generated kernel runs within about 1.15× of hand-written, LLVM-compiled Rust. The same fused loop, emitted as WASM, runs at roughly half to three-quarters of native speed inside V8, client-side.

Figure 1. a field of Monte Carlo samples settling under a density curve, drawn live by an inline GLSL shader embedded in this article.

The point

What I care about is the gap between what you type and what you get. You write a one-line estimate that reads like the math from my old notebook, and the engine hands back a fused, multi-stream, multi-core kernel you never asked for by name.

NoiseLang is playable in the browser at noiselang.com. Open it, type X ~ unif(-1, 1); Y ~ unif(-1, 1); 4 * P(X^2 + Y^2 < 1), and watch a few million draws estimate pi from a sandboxed tab. If you have an old side project that always needed more hands than you had, this is a good year to test whether that’s still true.

← All writing