June 8, 2020

Writing functional DSLs for business domains

In functional programming, a domain specific language (DSL) is a set of functions that can be composed to solve a specific problem. They are often found in libraries, but you can also write your own DSL that is specific to your business domain. This can be beneficial for several reasons:

Testable — Each independent component is small and isolated;
Understandable — Composed solutions are easy to read;
Expressive — Solve an entire class of problems with a small set of primitives.

In this post we’ll build a DSL for filtering emails in Scala. When we’re done, we can compose any email filter using simple, orthogonal building blocks.

Anatomy of a functional DSL

A functional DSL consists of three components:

Types that describe solutions;
Constructors for those types, that give the simplest possible solutions;
Operators that compose or transform solutions.

With these components, we can construct a solution for a business domain. They are encoded as pure data, which can be evaluated to get a result.

Types

There should be a single type that describes a solution to a domain problem. In our case that’s the EmailFilter trait, that describes solutions to the domain problem of filtering emails. Use sealed traits, case classes and case objects for this.

Constructors

The constructors for these types create the simplest possible solutions. For example, bodyContains or recipientIn. Constructors that use only one case class are called primitive constructors. Derived constructors, like senderIsNot, use a combination of constructors and operators.

Operators

Operators can combine and transform the data structures in order to Lego together more complex solutions. Like constructors, operators can be either primitive, like negate, or derived, like ||.

Design principles

There are many ways to factor a DSL, but some are better than others. These guiding principles help come to a good design. Our components should be

Composable — to build complex solutions using simple components;
Orthogonal — such that there’s no overlap in capabilities between primitives (i.e. MECE or the single-responsibility principle);
Minimal — in terms of the number of primitives.

As always, it takes iteration and refinement to converge to a clean DSL. For example, consider List's flatMap, flatten, and map functions. We could implement flatMap and derive the other two.

Or we could implement flatten and map, and derive flatMap.

So which one is better? They are equally composable, because they result in the same operations. The first approach is more minimal, because it has only one primitive. And the latter is more orthogonal, because flatten and map can’t be split up into smaller operations, but flatMap can. When deciding between minimalism and orthogonality, go with orthogonality, because that gives the simplest design.

Evaluating the DSL

So far we’ve only defined a way to build a data structure. There are two approaches for evaluating it: final and initial encoding.

Final encoding

This approach embeds the evaluation code in the data structure itself as it’s constructed. You can think of final encoding as describing a process of steps that should be executed. The resulting data structure will be executable.

Final encoding can be more straightforward to implement and allows wrapping existing code like libraries, that you might not be able to change.

Initial encoding

Initial encoding completely separates the evaluation from the data. There are one or more interpreters that traverse the data structure. The run function in this example evaluates if a given email matches the filter. We could also define interpreters that generates a human-readable string for the filter, persist it in a database, simplifies or optimizes it. This can’t be done with final encoding, because functions are opaque — they can’t be inspected.

Simplified for readability. There’s a link to the full implementation below.

Because evaluation is separated from the data, it’s simpler to reason about and gives more flexibility than final encoding. So even though there’s more boilerplate code involved, initial encoding is usually preferred in green field scenarios.

Putting it all together

And that’s all we need in order to write a DSL for a specific business domain: constructors for types that describe solutions, operators to compose solutions, and a way to evaluate the solutions.

You can read through the full email filtering example for both initial encoding and final encoding.

Thanks to John de Goes whose functional design workshop was the inspiration for this post.