Type-checked non-empty strings

surprisetalk1 pts0 comments

Haskell koan: Type-checked non-empty strings

This post is a Haskell koan. We’ll get to the background and motivation, but the<br>goal here is to share a small and uncommon technique that we’ve employed, and<br>enjoyed; perfect blog fodder.

In short, we wrote a type-checked non-empty string constructor, replaced<br>thousands of equivalent TemplateHaskell calls, for a ~10% build-time<br>improvement in a large and data-heavy package that made many calls to it.

before, after, invalidBefore, invalidAfter :: NonEmptyText<br>before = $$(NonEmptyText.make "hello")<br>after = NonEmptyText.make "hello"<br>invalidBefore = $$(NonEmptyText.make "") -- ⇝ error during splice evaluation ...<br>invalidAfter = NonEmptyText.make "" -- ⇝ type error: expected a non-empty string

To make invalid states unrepresentable is a core design goal of the software<br>at Bellroy. With that in mind, a type we often employ for textual data – of<br>which we have a lot – is NonEmptyText. It is just what it sounds like: a<br>value of this type is a string with at least one character.

The technique is a result of a convergence of GHC features over the past 15<br>years or so. In particular, RequiredTypeArguments, introduced in GHC<br>9.10,<br>lets us pass a type-level string literal into a function as if it were a value.<br>We can throw a custom type error message like "Expected a non-empty string" if<br>we (at the type-level) see an empty string; like so:

type family IsNonEmptySymbol symbol :: Constraint where<br>IsNonEmptySymbol "" = Unsatisfiable (Text "Expected a non-empty string")<br>IsNonEmptySymbol _ = (()::Constraint) -- empty constraint is always satisfied

-- Note previous syntax without RequiredTypeArguments:<br>-- make :: forall symbol. IsNonEmptySymbol symbol => NonEmptyText<br>-- with usage like `make @"hello!"`

make :: forall symbol -> (IsNonEmptySymbol symbol) => NonEmptyText<br>make symbol = NonEmptyText (fromString (symbolVal (Proxy :: Proxy symbol)))

test :: NonEmptyText<br>test = make "hello!"

Which, with the right LANGUAGE incantations, does actually work. This requires<br>UndecidableInstances to work, which is not harmful in and of itself, but does<br>open up the possibilities of what can go wrong1.

Additionally, since IsNonEmptySymbol is a type family, it can’t directly be<br>used like an ordinary typeclass constraint – for instance, it can’t be packaged<br>into a Dict, or used with functions like Data.SOP.hcfoldMap; it’s not<br>something you can just “ask for an instance of” like an ordinary typeclass,<br>despite returning a Constraint like one.

The final step then to this trick is writing IsNonEmptySymbol as a typeclass:

class IsNonEmptySymbol symbol<br>instance {-# OVERLAPPING #-} Unsatisfiable (Text "Expected a non-empty string") => IsNonEmptySymbol ""<br>instance IsNonEmptySymbol a

-- make: same as above<br>make :: forall symbol -> (IsNonEmptySymbol symbol) => NonEmptyText<br>make symbol = NonEmptyText (fromString (symbolVal (Proxy :: Proxy symbol)))

When GHC resolves the IsNonEmptySymbol constraint given an empty string, it<br>finds both: _ => instance IsNonEmptySymbol "" and instance IsNonEmptySymbol a. If we omitted the OVERLAPPING pragma, that would be where GHC raises an<br>error; it wouldn’t know which instance to actually choose, and would complain<br>that there are overlapping possibilities – which is fine, since the only case<br>where there is an overlapping instance is in the case we want to forbid, when<br>the input is "".

So the effect of the OVERLAPPING pragma here is that GHC is does choose the<br>instance we “want”; the one with the custom type error. The instance then raises<br>our custom error message informing the user that a non-empty string was<br>expected.

Impact

Our internal bellroy-data package – containing data such as information about<br>known freight and shipping providers, accounting systems, product data, tax<br>codes, etc. – had thousands of TH splices like $$(NonEmptyText.makeTH _).<br>Moving to the RequiredTypeArgument approach shaved off around 10% of the<br>compilation time for that package.

Similar cases we can employ this trick

We can re-use (almost) the exact same code to validate that a given Natural is<br>positive to make a type-checked Positive constructor. In general we can use<br>this technique for any type we can define a type-level predicate for.

From here you might be able to imagine, for instance, type-level string parsing<br>to define for instance type-safe term syntax for constructing URIs: it will<br>work, but you’ll run into GHC’s default reduction limit of 20 quite quickly; an<br>O(n) type-level validator over a n-length string has a hard-cap of 20<br>“reduction” (i.e., parsing) steps and thus parsing cannot proceed beyond 20<br>characters – that is, also assuming your parsing steps themselves aren’t<br>counting towards that limit.

In general non-trivial algorithms are also quite awkward to express in type<br>families; you have no way to write let-bindings for instance, or do pattern<br>matching with a case-like syntax: if you need either, they must be expressed<br>with some combination of extra type...

type make isnonemptysymbol nonemptytext instance string

Related Articles