Java Pattern: Algebraic Data Types

Contents

  1. Prerequisites
  2. Introduction: what are ADTs?
    1. Product types
    2. Sum types
  3. Introduction: a use case
  4. Introduction: why, though?
    1. Separation of concerns
    2. Decisions
  5. Breakdown: the syntax
  6. Intermezzo: other programming languages
    1. Haskell
    2. Swift
    3. Scala
    4. Future Java
  7. Variant: ADTs with many cases
  8. Variant: ADTs with common fields
  9. Variant: “better” enums
  10. Variant: handling only specific cases
  11. Implementation: our own Optional type
  12. Case study: recursion
  13. Esoteric: safe state machine
  14. Esoteric: data types without data classes
  15. Coda: ADTs are here to stay
  16. Resources

Prerequisites

Introduction: what are ADTs?

Algebraic Data Types (ADTs) are product types and sum types, usually in combination.

Product types

“Product type” is just a fancy name for struct or record.

Here’s a product type in Java:

@Value
public class TextStyle {
  boolean bold;
  Font font;
}

// where

public enum Font { SERIF, SANS_SERIF, MONOSPACE }

This is called a product type because the set of all possible values is the Carthesian product of the possible values of its components. For example:

  false true
Font.SERIF new TextStyle(false, Font.SERIF) new TextStyle(true, Font.SERIF)
Font.SANS_SERIF new TextStyle(false, Font.SANS_SERIF) new TextStyle(true, Font.SANS_SERIF)
Font.MONOSPACE new TextStyle(false, Font.MONOSPACE) new TextStyle(true, Font.MONOSPACE)

Barring null, those are all the possible ways to construct a TextStyle object.

In abstract syntax: TextStyle = Boolean ⨯ Font.

Sum types

“Sum type” is just a fancy name for… 🤷🏻‍♂️. Different languages call (and implement) this construct differently, so let’s leave it at that.

Here’s a sum type in Java:

public abstract class Background {

  private Background() {}

  @Value
  public static class Transparent extends Background {}

  @Value
  public static class SolidColor extends Background {
    Color color;
  }

  @Value
  public static class Image extends Background {
    ImageResource image;
    Alignment alignment;
  }
}

// where

public enum Color { BLACK, WHITE }

public enum ImageResource { DOG, CAT }

public enum Alignment { TOP, BOTTOM }

Notice:

☝🏻 Terminology: each subtype is called a case.

The set of all possibly values for a sum type is the disjoint union of its cases. For example, these are the possible ways to construct a Background object:

// only 1 way to construct Transparent, because it has no fields
new Background.Transparent();

// there are 2 ways to construct a SolidColor, because it has a Color field and it has 2 possible values, BLACK and WHITE
new Background.SolidColor(Color.BLACK);
new Background.SolidColor(Color.WHITE);

// there are 4 ways to construct an Image, because it has an ImageResource field (2 values) and an Alignment field (also 2 values)
new Background.Image(ImageResource.DOG, Alignment.TOP);
new Background.Image(ImageResource.DOG, Alignment.BOTTOM);
new Background.Image(ImageResource.CAT, Alignment.TOP);
new Background.Image(ImageResource.CAT, Alignment.BOTTOM);

// so in total there are 1 + 2 + 4 = 7 ways to build a Background

In abstract syntax: Background = Transparent + SolidColor + Image.

☝🏻 You can read the + as “or”. That is: “a Background is either Transparent or a SolidColor or an Image.”

(Notice that the Image case is itself a product type.)

Introduction: a use case

Let’s say we are writing a backend application that returns a paginated list of transactions:

@Value
class Page {
  List<Transaction> transactions;
}

To implement pagination, we return an opaque cursor to the client:

@Value
class Page {
  List<Transaction> transactions;
  String cursor;
}

This cursor can be passed in by the client to fetch the next page:

// assume this is the RPC interface of our service
class TransactionFeedService {
  Page getTransactions(String cursor);
}

Now, we already have two issues:

  1. How will clients fetch the first page of transactions, if the String cursor is a required input?
  2. How can we signal the client that it has reached the end of the feed?

@Nullable to the rescue!

@Value
class Page {
  List<Transaction> transactions;
  /**
   * `null` indicates the end of the feed.
   */
  @Nullable String cursor;
}

class TransactionFeedService {
  /**
   * When `cursor == null`, the first page of the feed is fetched.
   */
  Page getTransactions(@Nullable String cursor);
}

Great (:

We have extended the range of admissible values of our input and output types to include null. To clarify the semantics of null in our domain, we made use of docstrings.

This presents the following issues:

  1. We must now check for nulls. And if we forget to, we’ll encounter NullPointerExceptions. (Granted: some IDEs or static analysis may be able to mitigate this issue.)
  2. We rely on docstrings to give semantics to our domain.

Enter ADTs!

Let’s start off with a dedicated type for cursors:

public abstract class Cursor {

}

Now, let’s think about the types of cursor we need.

  1. A cursor that corresponds to the start of the feed.
  2. A cursor that contains an opaque value that can be used to fetch the next page in the feed.
  3. A cursor that corresponds to the end of the feed.

Let’s write that down:

public abstract class Cursor {

  private Cursor() {}

  @Value
  public static class Start extends Cursor {}

  @Value
  public static class Next extends Cursor {
    String value;
  }

  @Value
  public static class End extends Cursor {}
}

Indeed, new Cursor.Start() is-a Cursor, as are new Cursor.Next("magic value in here") and new Cursor.End().

Let’s update our service interface:

@Value
class Page {
  List<Transaction> transactions;
  Cursor cursor;
}

class TransactionFeedService {
  Page getTransactions(Cursor cursor);
}

Perfect. We got rid of the null state and the docstrings. Is that it?

Well, yes and no.

Within our service code, we will end up writing something like:

List<Transaction> transactions;

if (cursor instanceof Cursor.Next) {
  transactions = fetchWithCursor(((Cursor.Next) cursor).getValue());
} else if (cursor instanceof Cursor.Start) {
  transactions = fetchFirstPage();
} else if (cursor instanceof Cursor.End) {
  throw new IllegalArgumentException("cannot request end of feed");
} else {
  // but... we know there are no other cursor types
  // serves as a fallback in case we add a new cursor type and forget to handle it
  throw new IllegalStateException("unknown cursor type");
}

This is (arguably) a bit clunky. Also: imagine that we added a new cursor type. It would be easy to forget the if-check for this new cursor type. (Perhaps in this contrived example this seems unlikely, but consider a data type with more alternatives.)

Enter the visitor pattern!

public abstract class Cursor {

  private Cursor() {}

  public abstract void accept(Visitor visitor);

  public interface Visitor {
    void visit(Start cursor);
    void visit(Next cursor);
    void visit(End cursor);
  }

  @Value
  public static class Start extends Cursor {
    @Override
    public void accept(Visitor visitor) {
      visitor.visit(this);
    }
  }

  @Value
  public static class Next extends Cursor {
    String value;
    @Override
    public void accept(Visitor visitor) {
      visitor.visit(this);
    }
  }

  @Value
  public static class End extends Cursor {
    @Override
    public void accept(Visitor visitor) {
      visitor.visit(this);
    }
  }
}

Now, we can write:

List<Transaction> transactions = ???;

cursor.accept(new Cursor.Visitor() {
  @Override
  public void visit(Cursor.Start cursor) {
    fetchFirstPage();
  }
  @Override
  public void visit(Cursor.Next cursor) {
    fetchWithCursor(cursor.getValue()); // no cast !
  }
  @Override
  public void visit(Cursor.End cursor) {
    throw new IllegalArgumentException("cannot request end of feed");
  }
});

Hmm. Hold on. How do we get the return values from fetchFirstPage and fetchWithCursor?

What we really need here is what I call the modified visitor pattern, which allows for a generic return value.

Take a look:

public abstract class Cursor {

  private Cursor() {}

  public abstract <R> R accept(Visitor<R> visitor);

  public interface Visitor<R> {
    R visit(Start cursor);
    R visit(Next cursor);
    R visit(End cursor);
  }

  @Value
  public static class Start extends Cursor {
    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }

  @Value
  public static class Next extends Cursor {
    String value;
    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }

  @Value
  public static class End extends Cursor {
    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }
}

Now, we can write:

List<Transaction> transactions =
  cursor.accept(new Cursor.Visitor<List<Transaction>>() {
    @Override
    public List<Transaction> visit(Cursor.Start cursor) {
      return fetchFirstPage();
    }
    @Override
    public List<Transaction> visit(Cursor.Next cursor) {
      return fetchWithCursor(cursor.getValue()); // no cast !
    }
    @Override
    public List<Transaction> visit(Cursor.End cursor) {
      throw new IllegalArgumentException("cannot request end of feed");
    }
  });

Notice how there is no more IllegalStateException, because the Visitor knows exactly which cursor types exist.

Also, notice how it is impossible to add a new cursor type without updating the rest of the code. For example:

  1. You add a new inner static class NewThing extends Cursor.
  2. You are forced to implement the abstract method accept.
  3. To implement accept, you must add a new visit(NewThing cursor) method to the Visitor interface.
  4. You must now update all the code that calls Cursor::accept, because their Visitor implementations are now lacking this new method.

The above is an example of type-guided programming! The code will not compile unless the system remains coherent. I’d argue this is a good property to have.

Moving on.

We still need to handle that pesky IllegalArgumentException, though. Why?

Because we conflated input and output cursors from the get-go:

  1. It does not make sense for an input cursor to be of type End.
  2. It does not make sense for an output cursor to be of type Start.

Before we tackle this issue, let me introduce a bit of abstract syntax to describe ADTs concisely.

Our current Cursor type can be described as:

Cursor = Start | Next(String value) | End

That reads: “a Cursor is either Start, or Next with a String, or End.”

Now, let’s split the cursor type:

InputCursor = Start | Next(String value)
OutputCursor = Next(String value) | End

Cool. Let’s implement that in Java:

public abstract class InputCursor {

  private InputCursor() {}

  public abstract <R> R accept(Visitor<R> visitor);

  public interface Visitor<R> {
    R visit(Start cursor);
    R visit(Next cursor);
  }

  @Value
  public static class Start extends InputCursor {
    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }

  @Value
  public static class Next extends InputCursor {
    String value;
    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }
}

and

public abstract class OutputCursor {

  private OutputCursor() {}

  public abstract <R> R accept(Visitor<R> visitor);

  public interface Visitor<R> {
    R visit(Next cursor);
    R visit(End cursor);
  }

  @Value
  public static class Next extends OutputCursor {
    String value;
    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }

  @Value
  public static class End extends OutputCursor {
    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }
}

Note that although both are ‘cursors’, the types actually have no relationship between each other.

Now, our service interface can be updated:

@Value
class Page {
  List<Transaction> transactions;
  OutputCursor cursor;
}

class TransactionFeedService {
  Page getTransactions(InputCursor cursor);
}

Along with the service code snippet:

List<Transaction> transactions =
  cursor.accept(new InputCursor.Visitor<List<Transaction>>() {
    @Override
    public List<Transaction> visit(InputCursor.Start cursor) {
      return fetchFirstPage();
    }
    @Override
    public List<Transaction> visit(InputCursor.Next cursor) {
      return fetchWithCursor(cursor.getValue()); // no cast !
    }
  });

Great! No more illegal or unexpected states.

This is what types are for. To make illegal states unrepresentable in our system.

Success!

Introduction: why, though?

Separation of concerns

ADTs are data types. They must not implement business logic. Instead, the way to attribute behavior (or semantics) to an ADT is by implementing a Visitor for each use case.

For example, instead of bolting on JSON toJSON() and XML toXML() methods to an ADT, you will implement a class MyAdtJsonSerializer that will use a Visitor to inspect instances of your ADT and turn them into JSON.

That is: when implementing N features for a type, instead of writing N orthogonal (read: unrelated) methods in that class, you end up writing N feature classes separetely.

Decisions

One way of thinking about ADTs is that they represent a decision with a payload.

Similarly to how Booleans model a YES/NO decision, a given ADT with N cases can model a N-decision.

More than that, the decision may be accompanied with a payload. This means that observing a decision (via a Visitor) is information discovery.

TODO: expand on this idea.

Breakdown: the syntax

Given an ADT like:

MyType = Case1(FieldType1_1 field1_1, ...)
       | Case2(FieldType2_1 field2_1, ...)
       | ...

Its direct translation to Java is:

public abstract class MyType {

  private MyType() {}

  public abstract <R> R accept(Visitor<R> visitor);

  public interface Visitor<R> {
    R visit(Case1 cursor);
    R visit(Case2 cursor);
    ...
  }

  @Value
  public static class Case1 extends MyType {

    FieldType1_1 field1_1;
    ... (fields of Case1)

    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }

  @Value
  public static class Case2 extends MyType {

    FieldType2_1 field2_1;
    ... (fields of Case2)

    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }

  ... (cases)
}

Intermezzo: other programming languages

Implementing ADTs in Java can be a bit verbose. Other languages, however, implement ADTs as a built-in feature of the language.

A key language feature often (and I’d say necessarily) paired with ADTs is pattern matching. Pattern matching achieves what we have with the Visitor pattern, and much more.

For example:

Haskell

data InputCursor = Start | Next String

fetchTransactions cursor =
  case cursor of
    Start -> fetchFirstPage
    Next cursor -> fetchWithCursor cursor

Swift

enum InputCursor {
  case start
  case next(String)
}

// then

switch inputCursor {
  case let .start:
    return fetchFirstPage()
  case let .next(cursor):
    return fetchWithCursor(cursor)
}

Scala

sealed abstract class InputCursor

object InputCursor {
  final case class Start() extends InputCursor
  final case class Next(cursor: String) extends InputCursor
}

// then

def fetchTransactions(cursor: InputCursor) = cursor match {
  case InputCursor.Start()      => fetchFirstPage()
  case InputCursor.Next(cursor) => fetchWithCursor(cursor)
}

Future Java

In Pattern Matching for Java (September 2018) by Gavin Bierman and Brian Goetz, the following syntax is explored:

sealed interface Node { }

record IntNode(int value) implements Node;
record NegNode(Node node) implements Node;
record SumNode(Node left, Node right) implements Node;
record MulNode(Node left, Node right) implements Node;
record ParenNode(Node node) implements Node;

// and

int eval(Node n) {
    return switch(n) {
        case IntNode(var i) -> i;
        case NegNode(var n) -> -eval(n);
        case AddNode(var left, var right) -> eval(left) + eval(right);
        case MulNode(IntNode(0), var right),
             MulNode(var left, IntNode(0)) -> 0;
        case MulNode(var left, var right) -> eval(left) * eval(right);
    };
}

A few precursor features have already been implemented in Java 14:

On top of that, the planned features for Java 15 include:

The future is bright!

Variant: ADTs with many cases

If your ADT has many cases and your base type is becoming too large in terms of LOC, there’s a solution.

First, declare you base type as an interface:

public interface MyType {

  <R> R accept(Visitor<R> visitor);

  interface Visitor<R> {
    ... cases ...
  }

  ... case classes ...
}

Notice that there is no more private constructor. This is something we must forego in this scenario.

Next, move out each of your case classes outside your base type definition. I.e., move them to the enclosing package.

You will end up with:

// MyType.java
public interface MyType {

  <R> R accept(Visitor<R> visitor);

  interface Visitor<R> {
    ... cases ...
  }
}

// Case1.java
@Value
public class Case1 implements MyType {
  ...
}

// etc.

That’s all. Now, every type gets its own file and you can avoid single files with 100s of LOC.

Variant: ADTs with common fields

Consider an ADT that will be used to display a rich transaction history for an e-money solution:

Transaction
  = TopUp(timestamp, amount, ...)
  | Withdraw(timestamp, amount, ...)
  | ServicePayment(timestamp, amount, ...)
  | RevokedFunds(timestamp, amount, ...)
  | ExpiredFunds(timestamp, amount, ...)
  | Generic(timestamp, amount, ...)

Perhaps it would be useful to be able to access the common fields from the Transaction type itself. For example, if we needed to sort a List<Transaction>.

We can declare getters for those fields in the base type:

public interface Transaction {

  Instant getTimestamp();
  Money getAmount();

  interface Visitor<R> { ... }
}

Thanks to lombok.Value, every case type implements those fields automatically:

@Value
public class TopUpTransaction implements Transaction {
  // generated getters just 'match' the required methods in `Transaction`
  Instant timestamp;
  Money amount;
  String topUpMethod;
  ...
}

Now, we are able to sort our heterogenous List<Transaction>:

List<Transaction> sortForFeed(List<Transaction> transactions) {
  return transactions.stream()
    .sorted(Comparator.comparing(Transaction::getTimestamp).reversed())
    .collect(Collectors.toList());
}

Variant: “better” enums

Consider this enumeration type:

enum Color {
  RED, GREEN, BLUE
}

Now, consider this eerily-similar ADT without any fields:

Color = Red | Green | Blue

And its Java representation:

public abstract class Color {

  public abstract <R> R accept(Visitor<R> visitor);

  public interface Visitor<R> {
    R visit(Red color);
    R visit(Green color);
    R visit(Blue color);
  }

  @Value
  public static class Red extends Color {
    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }

  @Value
  public static class Green extends Color {
    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }

  @Value
  public static class Blue extends Color {
    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }
}

To make it more like an enum, let’s:

  1. Define constant singletons for each case type
  2. Make each case type’s constructor private
public abstract class Color {

  public static final Red RED = new Red();
  public static final Green GREEN = new Green();
  public static final Blue BLUE = new Blue();

  public abstract <R> R accept(Visitor<R> visitor);

  public interface Visitor<R> {
    R visit(Red color);
    R visit(Green color);
    R visit(Blue color);
  }

  @Value
  public static class Red extends Color {
    private Red() {}

    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }

  @Value
  public static class Green extends Color {
    private Green() {}

    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }

  @Value
  public static class Blue extends Color {
    private Blue() {}

    @Override
    public <R> R accept(Visitor<R> visitor) {
      return visitor.visit(this);
    }
  }
}

This may look like an overkill to you. And it may very well be one in many cases.

This is (almost) like an enum. Except:

  1. You must handle all cases. No exceptions.
  2. When you do add a new case, all client code must be updated at once.

Variant: handling only specific cases

Sometimes your visitor treats a subset of the ADT cases in a specific manner, while the rest are handled in the same way.

For example, consider an ADT that represents week days:

WeekDay
  = Monday
  | Tuesday
  | Wednesday
  | Thursday
  | Friday
  | Saturday
  | Sunday

And you want to write a function boolean isWeekendDay(WeekDay day).

Monday through Friday are not weekend days. Saturday and Sunday are. That means that 5 out of the 7 cases are handled in the same way.

So you would write this:

boolean isWeekendDay(WeekDay day) {
  return day.accept(new WeekDay.Visitor<Boolean>() {
    @Override
    Boolean visit(WeekDay.Monday day) { return false; }
    @Override
    Boolean visit(WeekDay.Tuesday day) { return false; }
    @Override
    Boolean visit(WeekDay.Wednesday day) { return false; }
    @Override
    Boolean visit(WeekDay.Thursday day) { return false; }
    @Override
    Boolean visit(WeekDay.Friday day) { return false; }
    @Override
    Boolean visit(WeekDay.Saturday day) { return true; }
    @Override
    Boolean visit(WeekDay.Sunday day) { return true; }
  });

I would argue that writing this instead is clearer:

boolean isWeekendDay(WeekDay day) {
  return day.accept(new WeekDay.PartialVisitor<Boolean>() {
    @Override
    Boolean visit(WeekDay.Saturday day) { return true; }
    @Override
    Boolean visit(WeekDay.Sunday day) { return true; }
    @Override
    Boolean otherwise(WeekDay day) { return false; }
  });

The code above reads: “Saturday and Sunday are weekend days; the rest are not.”

How do we get there?

We must introduce a derived visitor interface called PartialVisitor:

public abstract class WeekDay {

  ...

  interface Visitor<R> { ... }

  interface PartialVisitor<R> extends Visitor<R> {
    @Override
    default R visit(Monday day) { return otherwise(color); }
    @Override
    default R visit(Tuesday day) { return otherwise(color); }
    @Override
    default R visit(Wednesday day) { return otherwise(color); }
    @Override
    default R visit(Thursday day) { return otherwise(color); }
    @Override
    default R visit(Friday day) { return otherwise(color); }
    @Override
    default R visit(Saturday day) { return otherwise(color); }
    @Override
    default R visit(Sunday day) { return otherwise(color); }

    R otherwise(Color color);
  }
}

PartialVisitor<R> extends Visitor<R> and provides a default implementation for each case that falls back to a new method otherwise that the implementor of PartialVisitor<R> is forced to provide.

Implementation: our own Optional type

We will no write our own implementation of java.util.Optional.

Let’s give this type a different name, to avoid clashes or confusion.

How about:

Maybe<T> = Nothing | Just(T value)

Notice that this will be our first generic ADT. No biggie.

Here’s the Java code:

public abstract class Maybe<T> {

  public abstract <R> R accept(Visitor<T, R> visitor);

  public interface Visitor<T, R> {
    R visit(Nothing<T> maybe);
    R visit(Just<T> maybe);
  }

  @Value
  public static class Nothing<T> extends Maybe<T> {
    @Override
    public <R> R accept(Visitor<T, R> visitor) {
      return visitor.visit(this);
    }
  }

  @Value
  public static class Just<T> extends Maybe<T> {

    T value;

    @Override
    public <R> R accept(Visitor<T, R> visitor) {
      return visitor.visit(this);
    }
  }
}

Notice that the Visitor type must hold on to the type variable T, because the T in Maybe<T> is not accessible (and is unrelated). This is because Visitor is an inner interface, which in terms of scoping is just like an inner static class.

Cool. So far, so good.

We can use our type like so:

Maybe<String> getHeader(String name) { ... }

String caller = getHeader("x-grpc-caller-service").accept(new Maybe.Visitor<String, String>() {
  @Override
  public String visit(Maybe.Nothing<String> maybe) {
    return "(unknown)";
  }
  @Override
  public String visit(Maybe.Just<String> maybe) {
    return maybe.getValue();
  }
});

Alright. That works, but it’s quite verbose.

We can do something about that!

public abstract class Maybe<T> {

  ... (existing code) ...

  public static final <T> Maybe<T> of(T value) {
    return new Just<>(value);
  }

  public static final <T> Maybe<T> empty() {
    return new Nothing<>();
  }

  public final <U> Maybe<U> flatMap(Function<T, Maybe<U>> callback) {
    return accept(new Visitor<T, Maybe<U>>() {
      @Override
      public Maybe<U> visit(Nothing<T> maybe) {
        return new Nothing<>();
      }
      @Override
      public Maybe<U> visit(Just<T> maybe) {
        return callback.apply(maybe.getValue());
      }
    });
  }

  public final <U> Maybe<U> map(Function<T, U> f) {
    return flatMap(f.andThen(Maybe::of));
  }

  public final T orElse(T fallback) {
    return accept(new Visitor<T, T>() {
      @Override
      public T visit(Nothing<T> maybe) {
        return fallback;
      }
      @Override
      public T visit(Just<T> maybe) {
        return maybe.getValue();
      }
    });
  }
}

We added methods directly to the base type. Note how they’re final, that’s because a. they rely completely on accept, and b. the case types have no specific behavior to add here.

Now we can use Maybe just like Optional, pretty much!

Case study: recursion

public abstract class Expr {

  // skipping visitor code

  @Value
  public static class Const extends Expr {
    int value;
  }

  @Value
  public static class BinOp extends Expr {
    BinaryOp op;
    Expr lhs; // short for: left-hand side
    Expr rhs; // short for: right-hand side
  }
}

public abstract class BinaryOp {

  // skipping visitor code

  @Value
  public static class Add extends BinaryOp {}

  @Value
  public static class Mul extends BinaryOp {}
}

To evaluate the expression:

int eval(Expr expr) {
  return expr.accept(new Expr.Visitor<Integer>() {
    @Override
    public Integer visit(Expr.Const e) {
      return e.getValue();
    }
    @Override
    public Integer visit(Expr.BinOp e) {
      return eval(e.getOp()).apply(eval(e.getLhs()), eval(e.getRhs()));
    }
  });
}

BinaryOperator<Integer> eval(BinaryOp op) {
  return op.accept(new BinaryOp.Visitor<BinaryOperator<Integer>>() {
    @Override
    public BinaryOperator<Integer> visit(BinaryOp.Add op) {
      return (x, y) -> x + y;
    }
    @Override
    public BinaryOperator<Integer> visit(BinaryOp.Mul op) {
      return (x, y) -> x * y;
    }
  });

To print the expression:

String print(Expr expr) {
  return expr.accept(new Expr.Visitor<Integer>() {
    @Override
    public String visit(Expr.Const e) {
      return Integer.toString(e.getValue());
    }
    @Override
    public Integer visit(Expr.BinOp e) {
      return print(e.getLhs()) + " " + print(e.getOp()) + " " + print(e.getRhs());
    }
  });
}

String print(BinaryOp op) {
  return op.accept(new BinaryOp.Visitor<String>() {
    @Override
    public String visit(BinaryOp.Add op) {
      return "+";
    }
    @Override
    public String visit(BinaryOp.Mul op) {
      return "*";
    }
  });

For example, you could also implement symbolic differentation if you added a Var(String name) case.

Esoteric: safe state machine

(This example was borrowed from Type-Driven Development with Idris.)

Consider the state machine:

We can model the operations on the door with this ADT:

DoorCmd = Open | Close | RingBell

However, we want to be able to chain these operations, so we introduce the Bind operation:

DoorCmd<T>
  = Open(T=Unit)
  | Close(T=Unit)
  | RingBell(T=Unit)
  | Bind<U, V>(T=V, DoorCmd<U> operation, Function<U, DoorCmd<V>> callback)

DoorCmd is now generic on T, where T represents the result type of the operation.

I write T=Unit in each case to point out that T has been fixed to specific type. In this case, Unit which is an ADT with a single case with no arguments. (The Java code below will make this clearer.)

The Bind operation is generic itself, because it must chain together the output of an operation (here U) into the input of a callback (the Function<U, DoorCmd<V>> field). Then, the result type of the Bind operation is V.

Here’s the Java code:

@Value
public class Unit {}

public abstract class DoorCmd<T> {

  // skipping visitor code

  @Value
  public static class Open extends DoorCmd<Unit> {}

  @Value
  public static class Close extends DoorCmd<Unit> {}

  @Value
  public static class RingBell extends DoorCmd<Unit> {}

  @Value
  public static class Bind<T, U> extends DoorCmd<U> {
    DoorCmd<T> command;
    Function<T, DoorCmd<U>> callback;
  }

  // helper method to chain commands
  public <U> DoorCmd<U> andThen(Function<T, DoorCmd<U>> callback) {
    return new Bind<>(this, callback);
  }
}

We are able to write:

DoorCmd<Unit> doorProgram() {
  return new DoorCmd.RingBell()
          .andThen($ -> new DoorCmd.Open())
          .andThen($ -> new DoorCmd.Close());
}

(The $ argument name just means I want to ignore it. Yes, it’s a valid identifier in Java.)

However, we are able to write a door program that does not follow the state machine specification:

  DoorCmd<Unit> badDoorProgram() {
    return new DoorCmd.Open()
            .andThen($ -> new DoorCmd.Open())
            .andThen($ -> new DoorCmd.RingBell());
  }

We opened the door twice and also rang the bell on an open door.

Let’s fix that.

// not an ADT, just a "type-level enum"
public abstract class DoorState {
  private DoorState() {}
  public static final class Closed extends DoorState {}
  public static final class Open extends DoorState {}
}

// A: the state of the door before the command
// B: the state of the door after the command
public abstract class DoorCmd<T, A extends DoorState, B extends DoorState> {

  // skipping visitor code

  // opening the door requires it to be closed; then changes state to open
  @Value
  public static class Open extends DoorCmd<Unit, DoorState.Closed, DoorState.Open> {}

  // closing the door requires it to be opened; then changes state to closed
  @Value
  public static class Close extends DoorCmd<Unit, DoorState.Open, DoorState.Closed> {}

  // ringing the bell requires the door to be closed; the door remains closed
  @Value
  public static class RingBell extends DoorCmd<Unit, DoorState.Closed, DoorState.Closed> {}

  @Value
  public static class Bind<
          T,
          U,
          A extends DoorState,
          B extends DoorState,
          C extends DoorState> extends DoorCmd<U, A, C> {
    DoorCmd<T, A, B> command;
    Function<T, DoorCmd<U, B, C>> callback;
  }

  public <U, C extends DoorState>
  DoorCmd<U, A, C> andThen(Function<T, DoorCmd<U, B, C>> callback) {
    return new Bind<>(this, callback);
  }
}

We adjust our program’s type, stating that the program begins with a closed door and also ends with a closed door:

DoorCmd<Unit, DoorState.Closed, DoorState.Closed> doorProgram() {
  return new DoorCmd.RingBell()
          .andThen($ -> new DoorCmd.Open())
          .andThen($ -> new DoorCmd.Close());
}

Now, it is not possible to write the following program, because the compiler complains:

DoorCmd<Unit, DoorState.Closed, DoorState.Closed> badDoorProgram() {
  return new DoorCmd.Open()
          .andThen($ -> new DoorCmd.Open()) // error
          .andThen($ -> new DoorCmd.RingBell()); // error
}

It is also not possible to leave the door open:

DoorCmd<Unit, DoorState.Closed, DoorState.Closed> badDoorProgram2() {
  return new DoorCmd.RingBell()
          .andThen($ -> new DoorCmd.Open()); // error
}

Cool, eh?

Esoteric: data types without data classes

Consider a simple ADT like:

Credentials = Basic(String username, String password)

Let’s try to implement it in Java without defining explicit inner clases. In other words, using anonymous inner classes:

public abstract class Credentials {

  public abstract <R> R accept(Visitor<R> visitor);

  public interface Visitor<R> {
    R basic(String username, String password);
  }

  public static Credentials basic(String username, String password) {
    return new Credentials() {
      @Override
      public <R> accept(Visitor<R> visitor) {
        return visitor.visit(username, password);
      }
    };
  }
}

Notice that because there is no explicit case type, we are forced to explode the case fields into the Visitor method. I also took the liberty of naming the method basic instead of visit, to express that this is the Basic case of Credentials.

Cool. Here’s how we use that type:

int totalLength(Credentials creds) {
  return creds.accept(new Credentials.Visitor<Integer>() {
    @Override
    Integer basic(String username, String password) {
      return username.size() + password.size();
    }
  });
}

totalLength(Credentials.basic("garciat", "hahaifonly"));

Works as expected. As if we had used regular inner classes.

We just implemented a data type (ADTs are data types) without explicitly declaring a class with fields.

How? Java supports closures in inner classes. That means that the parameters of Credentials::basic are automatically and implicitly captured by the anonymous inner classes for its use.

Why? Well, for fun (:

Let’s streamline this further by avoiding the explicit Visitor interface:

public abstract class Credentials {

  public abstract <R> R accept(
    BiFunction<String, String, R> basic);

  public static Credentials basic(String username, String password) {
    return new Credentials() {
      @Override
      public <R> accept(BiFunction<String, String, R> basic) {
        return basic.apply(username, password);
      }
    };
  }
}

Let’s add a Certificate case to Credentials to better exemplify what’s going on here:

public abstract class Credentials {

  public abstract <R> R accept(
    BiFunction<String, String, R> basic,
    Function<X509Certificate, R> certificate);

  public static Credentials basic(String username, String password) {
    return new Credentials() {
      @Override
      public <R> accept(BiFunction<String, String, R> basic, Function<X509Certificate, R> certificate) {
        return basic.apply(username, password);
      }
    };
  }

  public static Credentials certificate(X509Certificate cert) {
    return new Credentials() {
      @Override
      public <R> accept(BiFunction<String, String, R> basic, Function<X509Certificate, R> certificate) {
        return certificate.apply(cert);
      }
    };
  }
}

That’s right. This doesn’t scale, but it’s still interesting.

How interesting? Enough to have its own Wikipedia article. This ‘pattern’ is called “Mogensen–Scott encoding”.

For example, I used this pattern to implement a functional applicative parser (with a working BigInteger parser implementation).

Coda: ADTs are here to stay

As evidenced by the Wikipedia article on ADTs, several of the ‘latest and greatest’ programming languages support (some flavor of) ADTs. Including: Rust, C++, F#, Julia, Kotlin, Scala, Swift, TypeScript – and, soon, Java.

ADTs are now mainstream – and they’re here to stay!

I think this is a sign of not only the positive net value of ADTs (and pattern matching), but also the overall gradual acceptance of functional programming by the overall programming community.

Resources