15-08-2025
Lean is a very good platform for implementing small embedded DSLs, which is part of why I think it is quite a valuable tool for thinking about probabilistic programming. To illustrate this I'll implement and grow a small probabilistic programming language (PPL) called TinyPPL. This is vaguely based on the language that I developed and presented as part of my Oregon Programming Languages Summerschool (OPLSS) lecture series. All the code can be found in this repository. In this article I'll be discussing TinyPPL.lean
. We will develope a simple monadic syntax for TinyPPL, and use Lean's built in parser/elaborator to embed TinyPPL into Lean's syntax.
Ultimately, we'll be able to write probabilistic programs in Lean like:
tinyppl {
x ← flip 0.5;
pure (if true then x else false)
}
This syntax is elaborated into a TinyPPL abstract syntax tree that we can interpret or inspect as we like.
Syntax and semantics
The syntax of TinyPPL is in 2 parts, pure and probabilistic expressions:
inductive PureE
| True
| False
| Ident (s : String)
| Ite (g: PureE) (thn: PureE) (els: PureE)
deriving BEq
inductive ProbE
| Flip (p : Rat)
| Bind (s : String) (e1 : ProbE) (e2 : ProbE)
| Pure (e : PureE)
deriving BEq
This syntax is divided into two pieces: the pure piece, called PureE
, is the probability-free fragment of the language. The effectful (or probabilistic) component manipulates probabilities, and is called ProbE
.
We can write a simple definitional interpreter for pure expressions. This interpreter is run under the IO
monad, since it is possible to have expressions that reference free variables. There are many ways to design your syntax to statically eliminate ill-formed programs that reference free variables -- future posts may talk about that. But for now, we'll keep our types simple, and pay for it with a partial interpeter:
def Env := HashMap String Bool
def interp_pure (env : Env) (e : PureE) : IO Bool :=
match e with
| .True => pure true
| .False => pure false
| .Ident s =>
match HashMap.get? env s with
| some v => pure v
| none => throw $ IO.userError s!"unbound identifier"
| .Ite g thn els => do
let guard_v ← interp_pure env g;
if guard_v then (interp_pure env thn) else (interp_pure env els)
notation "〚" e "〛ₚ" rho => interp_pure rho e
Next we can interpret the probabilistic fragment:
def interp_prob (env : Env) (e : ProbE) (v : Bool) : IO Rat :=
match e with
| .Flip p => pure $ if v then p else 1 - p
| .Bind x e1 e2 => do
let e1t ← interp_prob env e1 true;
let e2t ← interp_prob (HashMap.insert env x true) e2 v;
let e2f ← interp_prob (HashMap.insert env x false) e2 v;
pure $ e1t * e2t + (1-e1t) * e2f
| .Pure e => do
let ve ← 〚 e 〛ₚ env;
pure $ if ve = v then 1 else 0
This is a standard probability monad. The expression interp_prob env e v
computes the probability that e
run in environment env
evaluates to the value v
. The only interesting part is the Bind
rule, which runs e2
twice: once with x=true
and once with x=false
. The results of these two runs are combined and weighted by the probability that e1
evaluates to true and false.
Parser
One of the things I think is really cool about Lean is how easy it is to embed languages. There are some things about it that I think can be improved: in particular, the error messages are quite uninformative. But, the process is very ergonomic otherwise and I've found it much easier to make embedded DSLs in Lean than in other languages.
First we declare a new syntactic category for TinyPPL programs:
open Lean Elab Meta
declare_syntax_cat tiny_ppl
syntax scientific : tiny_ppl
syntax "flip" tiny_ppl : tiny_ppl
syntax tiny_ppl "←" tiny_ppl ";" tiny_ppl : tiny_ppl
syntax "pure" tiny_ppl : tiny_ppl
syntax ident : tiny_ppl
syntax "(" tiny_ppl ")" : tiny_ppl
syntax "true" : tiny_ppl
syntax "false" : tiny_ppl
syntax "if" tiny_ppl "then" tiny_ppl "else" tiny_ppl : tiny_ppl
These declarations combine both our lexer and parser. For instance, it says that "flip 0.1" is tiny_ppl
syntax because it matches the "flip" scientific
production rule. The scientific
syntactic category refers to decimal or scientific notation literals like 0.1
.
Finally we can write our elaborator. The job of the elaborator is to translate the tiny_ppl
syntax category into suitable Lean expressions. This is a function of type Syntax → MetaM Expr
, where the MetaM
monad keeps track of some useful state for dealing with metaprogramming such as generating fresh names and dealing with hygiene, and Expr
is the type of Lean expressions. Here is our elaborator:
partial def elabTinyPPL : Syntax → MetaM Expr
| `(tiny_ppl| ( $e:tiny_ppl )) => elabTinyPPL e
| `(tiny_ppl| true) => mkAppM ``PureE.True #[]
| `(tiny_ppl| false) => mkAppM ``PureE.False #[]
| `(tiny_ppl| flip $n:scientific) => do
let (n1, b, n2) := n.getScientific;
let e1 ← mkAppM ``Rat.ofScientific #[mkNatLit n1, if b then .const ``Bool.true [] else .const ``Bool.false [], mkNatLit n2];
mkAppM ``ProbE.Flip #[ e1 ]
| `(tiny_ppl| $id:ident ← $e1:tiny_ppl; $e2:tiny_ppl) => do
let elab_e1 ← elabTinyPPL e1
let elab_e2 ← elabTinyPPL e2
mkAppM ``ProbE.Bind #[mkStrLit id.getId.toString, elab_e1, elab_e2]
| `(tiny_ppl| pure $e:tiny_ppl) => do
let elab_e ← elabTinyPPL e
mkAppM ``ProbE.Pure #[elab_e]
| `(tiny_ppl| if $g:tiny_ppl then $thn:tiny_ppl else $els:tiny_ppl) => do
let elab_g ← elabTinyPPL g
let elab_thn ← elabTinyPPL thn
let elab_els ← elabTinyPPL els
mkAppM ``PureE.Ite #[elab_g, elab_thn, elab_els]
| `(tiny_ppl| $id:ident) =>
mkAppM ``PureE.Ident #[mkStrLit id.getId.toString]
| _ => throwUnsupportedSyntax
elab "tinyppl" "{" p:tiny_ppl "}" : term => elabTinyPPL p
A few notes about this:
- For the most part this example closely followed the lean metaprogramming book.
- By far the trickiest part was getting floating point numbers to work. This took quite a bit of digging around in the Lean documentation and source code.
- The error messages that you get from this aren't great. It would be interesting to improve those.
Now we can test it a bit:
> #eval (interp_prob (HashMap.empty)
(tinyppl {
x ← flip 0.5;
pure (if true then x else false)
})
True)
(1 : Rat)/2
That's all for now...