Java Pattern: Algebraic Data Types
Posted on 07 May 2020Contents
- Prerequisites
- Introduction: what are ADTs?
- Introduction: a use case
- Introduction: why, though?
- Breakdown: the syntax
- Intermezzo: other programming languages
- Variant: ADTs with many cases
- Variant: ADTs with common fields
- Variant: “better” enums
- Variant: handling only specific cases
- Implementation: our own Optional type
- Case study: recursion
- Esoteric: safe state machine
- Esoteric: data types without data classes
- Coda: ADTs are here to stay
- Resources
Prerequisites
-
Familiarize yourself with the Lombok library. In particular, its Value feature to build immutable data types.
-
Assume that
null
is not admissible for any field, parameter, etc. unless it is marked with@Nullable
.
Introduction: what are ADTs?
Algebraic Data Types (ADTs) are product types and sum types, usually in combination.
Product types
“Product type” is just a fancy name for struct or record.
Here’s a product type in Java:
@Value
public class TextStyle {
boolean bold;
Font font;
}
// where
public enum Font { SERIF, SANS_SERIF, MONOSPACE }
This is called a product type because the set of all possible values is the Carthesian product of the possible values of its components. For example:
false | true | |
---|---|---|
Font.SERIF | new TextStyle(false, Font.SERIF) | new TextStyle(true, Font.SERIF) |
Font.SANS_SERIF | new TextStyle(false, Font.SANS_SERIF) | new TextStyle(true, Font.SANS_SERIF) |
Font.MONOSPACE | new TextStyle(false, Font.MONOSPACE) | new TextStyle(true, Font.MONOSPACE) |
Barring null
, those are all the possible ways to construct a TextStyle
object.
In abstract syntax: TextStyle = Boolean ⨯ Font
.
Sum types
“Sum type” is just a fancy name for… 🤷🏻♂️. Different languages call (and implement) this construct differently, so let’s leave it at that.
Here’s a sum type in Java:
public abstract class Background {
private Background() {}
@Value
public static class Transparent extends Background {}
@Value
public static class SolidColor extends Background {
Color color;
}
@Value
public static class Image extends Background {
ImageResource image;
Alignment alignment;
}
}
// where
public enum Color { BLACK, WHITE }
public enum ImageResource { DOG, CAT }
public enum Alignment { TOP, BOTTOM }
Notice:
- The
Background
class isabstract
. This means we can’t donew Background()
. - The constructor for
Background
isprivate
. This means we are in complete control of the classes that may extendBackground
: only classes defined withinBackground
may extend it.
☝🏻 Terminology: each subtype is called a case.
The set of all possibly values for a sum type is the disjoint union of its cases. For example, these are the possible ways to construct a Background
object:
// only 1 way to construct Transparent, because it has no fields
new Background.Transparent();
// there are 2 ways to construct a SolidColor, because it has a Color field and it has 2 possible values, BLACK and WHITE
new Background.SolidColor(Color.BLACK);
new Background.SolidColor(Color.WHITE);
// there are 4 ways to construct an Image, because it has an ImageResource field (2 values) and an Alignment field (also 2 values)
new Background.Image(ImageResource.DOG, Alignment.TOP);
new Background.Image(ImageResource.DOG, Alignment.BOTTOM);
new Background.Image(ImageResource.CAT, Alignment.TOP);
new Background.Image(ImageResource.CAT, Alignment.BOTTOM);
// so in total there are 1 + 2 + 4 = 7 ways to build a Background
In abstract syntax: Background = Transparent + SolidColor + Image
.
☝🏻 You can read the +
as “or”. That is: “a Background
is either Transparent
or a SolidColor
or an Image
.”
(Notice that the Image
case is itself a product type.)
Introduction: a use case
Let’s say we are writing a backend application that returns a paginated list of transactions:
@Value
class Page {
List<Transaction> transactions;
}
To implement pagination, we return an opaque cursor to the client:
@Value
class Page {
List<Transaction> transactions;
String cursor;
}
This cursor can be passed in by the client to fetch the next page:
// assume this is the RPC interface of our service
class TransactionFeedService {
Page getTransactions(String cursor);
}
Now, we already have two issues:
- How will clients fetch the first page of transactions, if the
String cursor
is a required input? - How can we signal the client that it has reached the end of the feed?
@Nullable
to the rescue!
@Value
class Page {
List<Transaction> transactions;
/**
* `null` indicates the end of the feed.
*/
@Nullable String cursor;
}
class TransactionFeedService {
/**
* When `cursor == null`, the first page of the feed is fetched.
*/
Page getTransactions(@Nullable String cursor);
}
Great (:
We have extended the range of admissible values of our input and output types to include null
. To clarify the semantics of null
in our domain, we made use of docstrings.
This presents the following issues:
- We must now check for
null
s. And if we forget to, we’ll encounterNullPointerException
s. (Granted: some IDEs or static analysis may be able to mitigate this issue.) - We rely on docstrings to give semantics to our domain.
Enter ADTs!
Let’s start off with a dedicated type for cursors:
public abstract class Cursor {
}
Now, let’s think about the types of cursor we need.
- A cursor that corresponds to the start of the feed.
- A cursor that contains an opaque value that can be used to fetch the next page in the feed.
- A cursor that corresponds to the end of the feed.
Let’s write that down:
public abstract class Cursor {
private Cursor() {}
@Value
public static class Start extends Cursor {}
@Value
public static class Next extends Cursor {
String value;
}
@Value
public static class End extends Cursor {}
}
Indeed, new Cursor.Start()
is-a Cursor
, as are new Cursor.Next("magic value in here")
and new Cursor.End()
.
Let’s update our service interface:
@Value
class Page {
List<Transaction> transactions;
Cursor cursor;
}
class TransactionFeedService {
Page getTransactions(Cursor cursor);
}
Perfect. We got rid of the null
state and the docstrings. Is that it?
Well, yes and no.
Within our service code, we will end up writing something like:
List<Transaction> transactions;
if (cursor instanceof Cursor.Next) {
transactions = fetchWithCursor(((Cursor.Next) cursor).getValue());
} else if (cursor instanceof Cursor.Start) {
transactions = fetchFirstPage();
} else if (cursor instanceof Cursor.End) {
throw new IllegalArgumentException("cannot request end of feed");
} else {
// but... we know there are no other cursor types
// serves as a fallback in case we add a new cursor type and forget to handle it
throw new IllegalStateException("unknown cursor type");
}
This is (arguably) a bit clunky. Also: imagine that we added a new cursor type. It would be easy to forget the if-check for this new cursor type. (Perhaps in this contrived example this seems unlikely, but consider a data type with more alternatives.)
Enter the visitor pattern!
public abstract class Cursor {
private Cursor() {}
public abstract void accept(Visitor visitor);
public interface Visitor {
void visit(Start cursor);
void visit(Next cursor);
void visit(End cursor);
}
@Value
public static class Start extends Cursor {
@Override
public void accept(Visitor visitor) {
visitor.visit(this);
}
}
@Value
public static class Next extends Cursor {
String value;
@Override
public void accept(Visitor visitor) {
visitor.visit(this);
}
}
@Value
public static class End extends Cursor {
@Override
public void accept(Visitor visitor) {
visitor.visit(this);
}
}
}
Now, we can write:
List<Transaction> transactions = ???;
cursor.accept(new Cursor.Visitor() {
@Override
public void visit(Cursor.Start cursor) {
fetchFirstPage();
}
@Override
public void visit(Cursor.Next cursor) {
fetchWithCursor(cursor.getValue()); // no cast !
}
@Override
public void visit(Cursor.End cursor) {
throw new IllegalArgumentException("cannot request end of feed");
}
});
Hmm. Hold on. How do we get the return values from fetchFirstPage
and fetchWithCursor
?
What we really need here is what I call the modified visitor pattern, which allows for a generic return value.
Take a look:
public abstract class Cursor {
private Cursor() {}
public abstract <R> R accept(Visitor<R> visitor);
public interface Visitor<R> {
R visit(Start cursor);
R visit(Next cursor);
R visit(End cursor);
}
@Value
public static class Start extends Cursor {
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
@Value
public static class Next extends Cursor {
String value;
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
@Value
public static class End extends Cursor {
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
}
Now, we can write:
List<Transaction> transactions =
cursor.accept(new Cursor.Visitor<List<Transaction>>() {
@Override
public List<Transaction> visit(Cursor.Start cursor) {
return fetchFirstPage();
}
@Override
public List<Transaction> visit(Cursor.Next cursor) {
return fetchWithCursor(cursor.getValue()); // no cast !
}
@Override
public List<Transaction> visit(Cursor.End cursor) {
throw new IllegalArgumentException("cannot request end of feed");
}
});
Notice how there is no more IllegalStateException
, because the Visitor
knows exactly which cursor types exist.
Also, notice how it is impossible to add a new cursor type without updating the rest of the code. For example:
- You add a new inner
static class NewThing extends Cursor
. - You are forced to implement the abstract method
accept
. - To implement
accept
, you must add a newvisit(NewThing cursor)
method to theVisitor
interface. - You must now update all the code that calls
Cursor::accept
, because theirVisitor
implementations are now lacking this new method.
The above is an example of type-guided programming! The code will not compile unless the system remains coherent. I’d argue this is a good property to have.
Moving on.
We still need to handle that pesky IllegalArgumentException
, though. Why?
Because we conflated input and output cursors from the get-go:
- It does not make sense for an input cursor to be of type
End
. - It does not make sense for an output cursor to be of type
Start
.
Before we tackle this issue, let me introduce a bit of abstract syntax to describe ADTs concisely.
Our current Cursor
type can be described as:
Cursor = Start | Next(String value) | End
That reads: “a Cursor is either Start, or Next with a String, or End.”
Now, let’s split the cursor type:
InputCursor = Start | Next(String value)
OutputCursor = Next(String value) | End
Cool. Let’s implement that in Java:
public abstract class InputCursor {
private InputCursor() {}
public abstract <R> R accept(Visitor<R> visitor);
public interface Visitor<R> {
R visit(Start cursor);
R visit(Next cursor);
}
@Value
public static class Start extends InputCursor {
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
@Value
public static class Next extends InputCursor {
String value;
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
}
and
public abstract class OutputCursor {
private OutputCursor() {}
public abstract <R> R accept(Visitor<R> visitor);
public interface Visitor<R> {
R visit(Next cursor);
R visit(End cursor);
}
@Value
public static class Next extends OutputCursor {
String value;
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
@Value
public static class End extends OutputCursor {
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
}
Note that although both are ‘cursors’, the types actually have no relationship between each other.
Now, our service interface can be updated:
@Value
class Page {
List<Transaction> transactions;
OutputCursor cursor;
}
class TransactionFeedService {
Page getTransactions(InputCursor cursor);
}
Along with the service code snippet:
List<Transaction> transactions =
cursor.accept(new InputCursor.Visitor<List<Transaction>>() {
@Override
public List<Transaction> visit(InputCursor.Start cursor) {
return fetchFirstPage();
}
@Override
public List<Transaction> visit(InputCursor.Next cursor) {
return fetchWithCursor(cursor.getValue()); // no cast !
}
});
Great! No more illegal or unexpected states.
This is what types are for. To make illegal states unrepresentable in our system.
Success!
Introduction: why, though?
Separation of concerns
ADTs are data types. They must not implement business logic. Instead, the way to attribute behavior (or semantics) to an ADT is by implementing a Visitor
for each use case.
For example, instead of bolting on JSON toJSON()
and XML toXML()
methods to an ADT, you will implement a class MyAdtJsonSerializer
that will use a Visitor
to inspect instances of your ADT and turn them into JSON.
That is: when implementing N features for a type, instead of writing N orthogonal (read: unrelated) methods in that class, you end up writing N feature classes separetely.
Decisions
One way of thinking about ADTs is that they represent a decision with a payload.
Similarly to how Booleans model a YES/NO decision, a given ADT with N cases can model a N-decision.
More than that, the decision may be accompanied with a payload. This means that observing a decision (via a Visitor
) is information discovery.
TODO: expand on this idea.
Breakdown: the syntax
Given an ADT like:
MyType = Case1(FieldType1_1 field1_1, ...)
| Case2(FieldType2_1 field2_1, ...)
| ...
Its direct translation to Java is:
public abstract class MyType {
private MyType() {}
public abstract <R> R accept(Visitor<R> visitor);
public interface Visitor<R> {
R visit(Case1 cursor);
R visit(Case2 cursor);
...
}
@Value
public static class Case1 extends MyType {
FieldType1_1 field1_1;
... (fields of Case1)
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
@Value
public static class Case2 extends MyType {
FieldType2_1 field2_1;
... (fields of Case2)
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
... (cases)
}
Intermezzo: other programming languages
Implementing ADTs in Java can be a bit verbose. Other languages, however, implement ADTs as a built-in feature of the language.
A key language feature often (and I’d say necessarily) paired with ADTs is pattern matching. Pattern matching achieves what we have with the Visitor
pattern, and much more.
For example:
Haskell
data InputCursor = Start | Next String
fetchTransactions cursor =
case cursor of
Start -> fetchFirstPage
Next cursor -> fetchWithCursor cursor
Swift
enum InputCursor {
case start
case next(String)
}
// then
switch inputCursor {
case let .start:
return fetchFirstPage()
case let .next(cursor):
return fetchWithCursor(cursor)
}
Scala
sealed abstract class InputCursor
object InputCursor {
final case class Start() extends InputCursor
final case class Next(cursor: String) extends InputCursor
}
// then
def fetchTransactions(cursor: InputCursor) = cursor match {
case InputCursor.Start() => fetchFirstPage()
case InputCursor.Next(cursor) => fetchWithCursor(cursor)
}
Future Java
In Pattern Matching for Java (September 2018) by Gavin Bierman and Brian Goetz, the following syntax is explored:
sealed interface Node { }
record IntNode(int value) implements Node;
record NegNode(Node node) implements Node;
record SumNode(Node left, Node right) implements Node;
record MulNode(Node left, Node right) implements Node;
record ParenNode(Node node) implements Node;
// and
int eval(Node n) {
return switch(n) {
case IntNode(var i) -> i;
case NegNode(var n) -> -eval(n);
case AddNode(var left, var right) -> eval(left) + eval(right);
case MulNode(IntNode(0), var right),
MulNode(var left, IntNode(0)) -> 0;
case MulNode(var left, var right) -> eval(left) * eval(right);
};
}
A few precursor features have already been implemented in Java 14:
- records (preview)
- instanceof pattern matching (preview)
- switch expressions
On top of that, the planned features for Java 15 include:
- sealed classes and interfaces (preview)
The future is bright!
Variant: ADTs with many cases
If your ADT has many cases and your base type is becoming too large in terms of LOC, there’s a solution.
First, declare you base type as an interface
:
public interface MyType {
<R> R accept(Visitor<R> visitor);
interface Visitor<R> {
... cases ...
}
... case classes ...
}
Notice that there is no more private constructor. This is something we must forego in this scenario.
Next, move out each of your case classes outside your base type definition. I.e., move them to the enclosing package.
You will end up with:
// MyType.java
public interface MyType {
<R> R accept(Visitor<R> visitor);
interface Visitor<R> {
... cases ...
}
}
// Case1.java
@Value
public class Case1 implements MyType {
...
}
// etc.
That’s all. Now, every type gets its own file and you can avoid single files with 100s of LOC.
Variant: ADTs with common fields
Consider an ADT that will be used to display a rich transaction history for an e-money solution:
Transaction
= TopUp(timestamp, amount, ...)
| Withdraw(timestamp, amount, ...)
| ServicePayment(timestamp, amount, ...)
| RevokedFunds(timestamp, amount, ...)
| ExpiredFunds(timestamp, amount, ...)
| Generic(timestamp, amount, ...)
Perhaps it would be useful to be able to access the common fields from the Transaction
type itself. For example, if we needed to sort a List<Transaction>
.
We can declare getters for those fields in the base type:
public interface Transaction {
Instant getTimestamp();
Money getAmount();
interface Visitor<R> { ... }
}
Thanks to lombok.Value
, every case type implements those fields automatically:
@Value
public class TopUpTransaction implements Transaction {
// generated getters just 'match' the required methods in `Transaction`
Instant timestamp;
Money amount;
String topUpMethod;
...
}
Now, we are able to sort our heterogenous List<Transaction>
:
List<Transaction> sortForFeed(List<Transaction> transactions) {
return transactions.stream()
.sorted(Comparator.comparing(Transaction::getTimestamp).reversed())
.collect(Collectors.toList());
}
Variant: “better” enums
Consider this enumeration type:
enum Color {
RED, GREEN, BLUE
}
Now, consider this eerily-similar ADT without any fields:
Color = Red | Green | Blue
And its Java representation:
public abstract class Color {
public abstract <R> R accept(Visitor<R> visitor);
public interface Visitor<R> {
R visit(Red color);
R visit(Green color);
R visit(Blue color);
}
@Value
public static class Red extends Color {
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
@Value
public static class Green extends Color {
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
@Value
public static class Blue extends Color {
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
}
To make it more like an enum
, let’s:
- Define constant singletons for each case type
- Make each case type’s constructor private
public abstract class Color {
public static final Red RED = new Red();
public static final Green GREEN = new Green();
public static final Blue BLUE = new Blue();
public abstract <R> R accept(Visitor<R> visitor);
public interface Visitor<R> {
R visit(Red color);
R visit(Green color);
R visit(Blue color);
}
@Value
public static class Red extends Color {
private Red() {}
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
@Value
public static class Green extends Color {
private Green() {}
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
@Value
public static class Blue extends Color {
private Blue() {}
@Override
public <R> R accept(Visitor<R> visitor) {
return visitor.visit(this);
}
}
}
This may look like an overkill to you. And it may very well be one in many cases.
This is (almost) like an enum
. Except:
- You must handle all cases. No exceptions.
- When you do add a new case, all client code must be updated at once.
Variant: handling only specific cases
Sometimes your visitor treats a subset of the ADT cases in a specific manner, while the rest are handled in the same way.
For example, consider an ADT that represents week days:
WeekDay
= Monday
| Tuesday
| Wednesday
| Thursday
| Friday
| Saturday
| Sunday
And you want to write a function boolean isWeekendDay(WeekDay day)
.
Monday through Friday are not weekend days. Saturday and Sunday are. That means that 5 out of the 7 cases are handled in the same way.
So you would write this:
boolean isWeekendDay(WeekDay day) {
return day.accept(new WeekDay.Visitor<Boolean>() {
@Override
Boolean visit(WeekDay.Monday day) { return false; }
@Override
Boolean visit(WeekDay.Tuesday day) { return false; }
@Override
Boolean visit(WeekDay.Wednesday day) { return false; }
@Override
Boolean visit(WeekDay.Thursday day) { return false; }
@Override
Boolean visit(WeekDay.Friday day) { return false; }
@Override
Boolean visit(WeekDay.Saturday day) { return true; }
@Override
Boolean visit(WeekDay.Sunday day) { return true; }
});
I would argue that writing this instead is clearer:
boolean isWeekendDay(WeekDay day) {
return day.accept(new WeekDay.PartialVisitor<Boolean>() {
@Override
Boolean visit(WeekDay.Saturday day) { return true; }
@Override
Boolean visit(WeekDay.Sunday day) { return true; }
@Override
Boolean otherwise(WeekDay day) { return false; }
});
The code above reads: “Saturday and Sunday are weekend days; the rest are not.”
How do we get there?
We must introduce a derived visitor interface called PartialVisitor
:
public abstract class WeekDay {
...
interface Visitor<R> { ... }
interface PartialVisitor<R> extends Visitor<R> {
@Override
default R visit(Monday day) { return otherwise(color); }
@Override
default R visit(Tuesday day) { return otherwise(color); }
@Override
default R visit(Wednesday day) { return otherwise(color); }
@Override
default R visit(Thursday day) { return otherwise(color); }
@Override
default R visit(Friday day) { return otherwise(color); }
@Override
default R visit(Saturday day) { return otherwise(color); }
@Override
default R visit(Sunday day) { return otherwise(color); }
R otherwise(Color color);
}
}
PartialVisitor<R>
extends Visitor<R>
and provides a default implementation for each case that falls back to a new method otherwise
that the implementor of PartialVisitor<R>
is forced to provide.
Implementation: our own Optional type
We will no write our own implementation of java.util.Optional
.
Let’s give this type a different name, to avoid clashes or confusion.
How about:
Maybe<T> = Nothing | Just(T value)
Notice that this will be our first generic ADT. No biggie.
Here’s the Java code:
public abstract class Maybe<T> {
public abstract <R> R accept(Visitor<T, R> visitor);
public interface Visitor<T, R> {
R visit(Nothing<T> maybe);
R visit(Just<T> maybe);
}
@Value
public static class Nothing<T> extends Maybe<T> {
@Override
public <R> R accept(Visitor<T, R> visitor) {
return visitor.visit(this);
}
}
@Value
public static class Just<T> extends Maybe<T> {
T value;
@Override
public <R> R accept(Visitor<T, R> visitor) {
return visitor.visit(this);
}
}
}
Notice that the Visitor
type must hold on to the type variable T
, because the T
in Maybe<T>
is not accessible (and is unrelated). This is because Visitor
is an inner interface
, which in terms of scoping is just like an inner static class
.
Cool. So far, so good.
We can use our type like so:
Maybe<String> getHeader(String name) { ... }
String caller = getHeader("x-grpc-caller-service").accept(new Maybe.Visitor<String, String>() {
@Override
public String visit(Maybe.Nothing<String> maybe) {
return "(unknown)";
}
@Override
public String visit(Maybe.Just<String> maybe) {
return maybe.getValue();
}
});
Alright. That works, but it’s quite verbose.
We can do something about that!
public abstract class Maybe<T> {
... (existing code) ...
public static final <T> Maybe<T> of(T value) {
return new Just<>(value);
}
public static final <T> Maybe<T> empty() {
return new Nothing<>();
}
public final <U> Maybe<U> flatMap(Function<T, Maybe<U>> callback) {
return accept(new Visitor<T, Maybe<U>>() {
@Override
public Maybe<U> visit(Nothing<T> maybe) {
return new Nothing<>();
}
@Override
public Maybe<U> visit(Just<T> maybe) {
return callback.apply(maybe.getValue());
}
});
}
public final <U> Maybe<U> map(Function<T, U> f) {
return flatMap(f.andThen(Maybe::of));
}
public final T orElse(T fallback) {
return accept(new Visitor<T, T>() {
@Override
public T visit(Nothing<T> maybe) {
return fallback;
}
@Override
public T visit(Just<T> maybe) {
return maybe.getValue();
}
});
}
}
We added methods directly to the base type. Note how they’re final
, that’s because
a. they rely completely on accept
, and
b. the case types have no specific behavior to add here.
Now we can use Maybe
just like Optional
, pretty much!
Case study: recursion
public abstract class Expr {
// skipping visitor code
@Value
public static class Const extends Expr {
int value;
}
@Value
public static class BinOp extends Expr {
BinaryOp op;
Expr lhs; // short for: left-hand side
Expr rhs; // short for: right-hand side
}
}
public abstract class BinaryOp {
// skipping visitor code
@Value
public static class Add extends BinaryOp {}
@Value
public static class Mul extends BinaryOp {}
}
To evaluate the expression:
int eval(Expr expr) {
return expr.accept(new Expr.Visitor<Integer>() {
@Override
public Integer visit(Expr.Const e) {
return e.getValue();
}
@Override
public Integer visit(Expr.BinOp e) {
return eval(e.getOp()).apply(eval(e.getLhs()), eval(e.getRhs()));
}
});
}
BinaryOperator<Integer> eval(BinaryOp op) {
return op.accept(new BinaryOp.Visitor<BinaryOperator<Integer>>() {
@Override
public BinaryOperator<Integer> visit(BinaryOp.Add op) {
return (x, y) -> x + y;
}
@Override
public BinaryOperator<Integer> visit(BinaryOp.Mul op) {
return (x, y) -> x * y;
}
});
To print the expression:
String print(Expr expr) {
return expr.accept(new Expr.Visitor<Integer>() {
@Override
public String visit(Expr.Const e) {
return Integer.toString(e.getValue());
}
@Override
public Integer visit(Expr.BinOp e) {
return print(e.getLhs()) + " " + print(e.getOp()) + " " + print(e.getRhs());
}
});
}
String print(BinaryOp op) {
return op.accept(new BinaryOp.Visitor<String>() {
@Override
public String visit(BinaryOp.Add op) {
return "+";
}
@Override
public String visit(BinaryOp.Mul op) {
return "*";
}
});
For example, you could also implement symbolic differentation if you added a Var(String name)
case.
Esoteric: safe state machine
(This example was borrowed from Type-Driven Development with Idris.)
Consider the state machine:
- A closed door can have its bell ringed.
- A closed door can be opened.
- An opened door can be closed.
We can model the operations on the door with this ADT:
DoorCmd = Open | Close | RingBell
However, we want to be able to chain these operations, so we introduce the Bind
operation:
DoorCmd<T>
= Open(T=Unit)
| Close(T=Unit)
| RingBell(T=Unit)
| Bind<U, V>(T=V, DoorCmd<U> operation, Function<U, DoorCmd<V>> callback)
DoorCmd
is now generic on T
, where T
represents the result type of the operation.
I write T=Unit
in each case to point out that T
has been fixed to specific type. In this case, Unit
which is an ADT with a single case with no arguments. (The Java code below will make this clearer.)
The Bind
operation is generic itself, because it must chain together the output of an operation (here U
) into the input of a callback (the Function<U, DoorCmd<V>>
field). Then, the result type of the Bind
operation is V
.
Here’s the Java code:
@Value
public class Unit {}
public abstract class DoorCmd<T> {
// skipping visitor code
@Value
public static class Open extends DoorCmd<Unit> {}
@Value
public static class Close extends DoorCmd<Unit> {}
@Value
public static class RingBell extends DoorCmd<Unit> {}
@Value
public static class Bind<T, U> extends DoorCmd<U> {
DoorCmd<T> command;
Function<T, DoorCmd<U>> callback;
}
// helper method to chain commands
public <U> DoorCmd<U> andThen(Function<T, DoorCmd<U>> callback) {
return new Bind<>(this, callback);
}
}
We are able to write:
DoorCmd<Unit> doorProgram() {
return new DoorCmd.RingBell()
.andThen($ -> new DoorCmd.Open())
.andThen($ -> new DoorCmd.Close());
}
(The $
argument name just means I want to ignore it. Yes, it’s a valid identifier in Java.)
However, we are able to write a door program that does not follow the state machine specification:
DoorCmd<Unit> badDoorProgram() {
return new DoorCmd.Open()
.andThen($ -> new DoorCmd.Open())
.andThen($ -> new DoorCmd.RingBell());
}
We opened the door twice and also rang the bell on an open door.
Let’s fix that.
// not an ADT, just a "type-level enum"
public abstract class DoorState {
private DoorState() {}
public static final class Closed extends DoorState {}
public static final class Open extends DoorState {}
}
// A: the state of the door before the command
// B: the state of the door after the command
public abstract class DoorCmd<T, A extends DoorState, B extends DoorState> {
// skipping visitor code
// opening the door requires it to be closed; then changes state to open
@Value
public static class Open extends DoorCmd<Unit, DoorState.Closed, DoorState.Open> {}
// closing the door requires it to be opened; then changes state to closed
@Value
public static class Close extends DoorCmd<Unit, DoorState.Open, DoorState.Closed> {}
// ringing the bell requires the door to be closed; the door remains closed
@Value
public static class RingBell extends DoorCmd<Unit, DoorState.Closed, DoorState.Closed> {}
@Value
public static class Bind<
T,
U,
A extends DoorState,
B extends DoorState,
C extends DoorState> extends DoorCmd<U, A, C> {
DoorCmd<T, A, B> command;
Function<T, DoorCmd<U, B, C>> callback;
}
public <U, C extends DoorState>
DoorCmd<U, A, C> andThen(Function<T, DoorCmd<U, B, C>> callback) {
return new Bind<>(this, callback);
}
}
We adjust our program’s type, stating that the program begins with a closed door and also ends with a closed door:
DoorCmd<Unit, DoorState.Closed, DoorState.Closed> doorProgram() {
return new DoorCmd.RingBell()
.andThen($ -> new DoorCmd.Open())
.andThen($ -> new DoorCmd.Close());
}
Now, it is not possible to write the following program, because the compiler complains:
DoorCmd<Unit, DoorState.Closed, DoorState.Closed> badDoorProgram() {
return new DoorCmd.Open()
.andThen($ -> new DoorCmd.Open()) // error
.andThen($ -> new DoorCmd.RingBell()); // error
}
It is also not possible to leave the door open:
DoorCmd<Unit, DoorState.Closed, DoorState.Closed> badDoorProgram2() {
return new DoorCmd.RingBell()
.andThen($ -> new DoorCmd.Open()); // error
}
Cool, eh?
Esoteric: data types without data classes
Consider a simple ADT like:
Credentials = Basic(String username, String password)
Let’s try to implement it in Java without defining explicit inner clases. In other words, using anonymous inner classes:
public abstract class Credentials {
public abstract <R> R accept(Visitor<R> visitor);
public interface Visitor<R> {
R basic(String username, String password);
}
public static Credentials basic(String username, String password) {
return new Credentials() {
@Override
public <R> accept(Visitor<R> visitor) {
return visitor.visit(username, password);
}
};
}
}
Notice that because there is no explicit case type, we are forced to explode the case fields into the Visitor
method. I also took the liberty of naming the method basic
instead of visit
, to express that this is the Basic
case of Credentials
.
Cool. Here’s how we use that type:
int totalLength(Credentials creds) {
return creds.accept(new Credentials.Visitor<Integer>() {
@Override
Integer basic(String username, String password) {
return username.size() + password.size();
}
});
}
totalLength(Credentials.basic("garciat", "hahaifonly"));
Works as expected. As if we had used regular inner classes.
We just implemented a data type (ADTs are data types) without explicitly declaring a class with fields.
How? Java supports closures in inner classes. That means that the parameters of Credentials::basic
are automatically and implicitly captured by the anonymous inner classes for its use.
Why? Well, for fun (:
Let’s streamline this further by avoiding the explicit Visitor
interface:
public abstract class Credentials {
public abstract <R> R accept(
BiFunction<String, String, R> basic);
public static Credentials basic(String username, String password) {
return new Credentials() {
@Override
public <R> accept(BiFunction<String, String, R> basic) {
return basic.apply(username, password);
}
};
}
}
Let’s add a Certificate
case to Credentials
to better exemplify what’s going on here:
public abstract class Credentials {
public abstract <R> R accept(
BiFunction<String, String, R> basic,
Function<X509Certificate, R> certificate);
public static Credentials basic(String username, String password) {
return new Credentials() {
@Override
public <R> accept(BiFunction<String, String, R> basic, Function<X509Certificate, R> certificate) {
return basic.apply(username, password);
}
};
}
public static Credentials certificate(X509Certificate cert) {
return new Credentials() {
@Override
public <R> accept(BiFunction<String, String, R> basic, Function<X509Certificate, R> certificate) {
return certificate.apply(cert);
}
};
}
}
That’s right. This doesn’t scale, but it’s still interesting.
How interesting? Enough to have its own Wikipedia article. This ‘pattern’ is called “Mogensen–Scott encoding”.
For example, I used this pattern to implement a functional applicative parser (with a working BigInteger
parser implementation).
Coda: ADTs are here to stay
As evidenced by the Wikipedia article on ADTs, several of the ‘latest and greatest’ programming languages support (some flavor of) ADTs. Including: Rust, C++, F#, Julia, Kotlin, Scala, Swift, TypeScript – and, soon, Java.
ADTs are now mainstream – and they’re here to stay!
I think this is a sign of not only the positive net value of ADTs (and pattern matching), but also the overall gradual acceptance of functional programming by the overall programming community.
Resources
- Refactoring Guru: Visitor – introduction to the Visitor pattern
- Wikipedia: Algebraic Data Types
- YouTube: The Algebra of Algebraic Data Types – if you would like to have your mind blown (:
- Book: Type-Driven Development with Idris – an excellent book that teaches you how to harness the power of types