From Partitions to Wordle - Type Safety with Phantom Types

April 4, 2022

Suppose you had a list that you wanted to split according to some predicate, like splitting it up into the portion that’s greater than 7, and the portion that’s less than 7. Then we could take advantage of the partition function from Data.List:

-- A reminder, the type of partition is:
partition :: (a -> Bool) -> [a] -> ([a], [a])

-- | Moves the elements of a given list that are greater than 7 to the front.
greaterThan7sFirst :: [Int] -> [Int]
greaterThan7sFirst xs =
  let (lessThans, greaterThans) = partition (>7) xs
  in greaterThans ++ lessThans

Pretty clean, right? Well when we run it:

ghci> greaterThan7sFirst [4,9,4,8,2,7,9,2,6]

Oops! It looks like we put things in the wrong order. The fix is simple, of course, just swap the order in the tuple. But it’d be nice if we could prevent these sorts of mix-ups from happening in the first place. Also, it would be nice if we could reflect this in the type signature of partition, that way we don’t even need to read the documentation every time we want to remember which half of the tuple is which.

The “separate newtypes” Approach

So we want to reflect the different meanings of the lists in the type system. So we can create a couple of newtypes:

newtype TruePart a = Trues [a]
  deriving (Eq, Ord, Show, Read)

newtype FalsePart a = Falses [a]
  deriving (Eq, Ord, Show, Read)

Then our result tuple will simply contain a true part and a false part.

partitionTyped :: (a -> Bool) -> [a] -> (TruePart a, FalsePart a)
partitionTyped p xs = 
  let (trues, falses) = partition p xs
  in (Trues trues, Falses falses)

Already we have more clarity in the type signature. Now we can adjust our greaterThan7sFirst function to use partitionTyped':

-- | Moves the elements of a given list that are greater than 7 to the front.
greaterThan7sFirst :: [Int] -> [Int]
greaterThan7sFirst xs = 
  let (Falses lessThans, Trues greaterThans) = partitionTyped (>7) xs
  in greaterThans ++ lessThans

Immediately we get a complaint:

• Couldn't match type ‘TruePart Int’ with ‘FalsePart a’
  Expected type: (FalsePart a, TruePart a1)
    Actual type: (TruePart Int, FalsePart Int)

This is great! The type checker has successfully saved us from our own foolishness. Fixing our function makes it type check:

-- | Moves the elements of a given list that are greater than 7 to the front.
greaterThan7sFirst :: [Int] -> [Int]
greaterThan7sFirst xs = 
  let (Trues greaterThans, Falses lessThans) = partitionTyped (>7) xs
  in greaterThans ++ lessThans

And testing it out, we have our desired behaviour:

ghci> greaterThan7sFirst [4,9,4,8,2,7,9,2,6]

This approach does have the downside that we’ve had to introduce two new datatypes to accomplish the task. Any instances we write for one of them need to be duplicated for the other. What if we had a single type that did both jobs?

The Phantom Types Approach

Consider the following type instead:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}

newtype Part (bool :: Bool) a = Part { unPart :: [a] } 

This type has a seemingly extraneous type parameter with what appears to be a type signature on it, (bool :: Bool). This field is indeed “extraneous” in a sense, but what it does is that when we construct a Part we can choose what we want bool to be. If we have two values where the value of bool is different, then we won’t be able to confuse one for the other, since their types will be different, even if the actual runtime values are the same. bool is called a phantom type parameter, since it doesn’t appear on the right-hand side of the =.

So what sorts of values can bool take on? Well, that syntax that looks like a type signature is actually a kind signature. So bool exists on the type level and thus has a kind that dictates what sort of values it can take on. In this case, its kind is Bool, so it can take on any of the values of Bool, meaning True or False. This can be a bit confusing since normally True and False are runtime values. The DataKinds extension essentially creates two new types called 'True and 'False with kind Bool. These are separate from the value level True and False. The compiler allows us to leave off the tick marks if it’s still clear what we’re referring to. 1

What does this look like in practice? Here’s an example:

truePart :: Part True Int
truePart = Part [1,2,3,4]

falsePart :: Part False Int
falsePart = Part [1,2,3,4]

-- The following gives a type error!
parts = [truePart, falsePart]
• Couldn't match type ‘'False’ with ‘'True’
  Expected type: Part 'True Int
    Actual type: Part 'False Int

So we can’t define parts since it would have two elements of different types, even though their contents are the same. This is a good thing, we can use this to implement a type-safe partition:

partitionTyped :: (a -> Bool) -> [a] -> (Part True a, Part False a)
partitionTyped p xs = 
  let (trues, falses) = partition p xs
  in (Part trues, Part falses)

There’s an issue when we go to implement greaterThan7sFirst, though. We don’t have two separate constructors to match on as we did with the newtypes. We can work around that, but it’s not very ergonomic:

greaterThan7sFirst :: [Int] -> [Int]
greaterThan7sFirst xs = 
  let greaterThans :: Part True Int
      lessThans :: Part False Int
      (greaterThans, lessThans) = partitionTyped (>7) xs
  in unPart greaterThans ++ unPart lessThans

It would be much nicer to have separate constructors that we could match on like with the two newtypes. Turns out pattern synonyms are just the thing we need!

{-# LANGUAGE PatternSynonyms #-}

pattern Trues :: [a] -> Part True a
pattern Trues xs = Part xs
{-# COMPLETE Trues #-}

pattern Falses :: [a] -> Part False a
pattern Falses xs = Part xs
{-# COMPLETE Falses #-}

So pattern synonyms allow us to define, well, synonyms for existing patterns. In this case, we’ve defined two patterns Trues and Falses that are the same as Part. The only difference is that we’ve restricted their types. If we ask GHCi the type of Part we get:

ghci> :t Part
Part :: [a] -> Part bool a

So of course, Part can return a Part with any value of bool that we choose. In the case of Trues we want to restrict its type to only match when the type of the receiving value has a bool parameter equal to True. By giving Trues this restricted type we accomplish two things:

  1. Any values created with Trues will have True as the bool type parameter.
  2. Any values matched on by Trues must have True as the bool type parameter.

In this case, we’re concerned with the second behaviour, but the first is nice to have as well. So if we go into GHCi we can witness these two facts in action:

ghci> -- Fact 1:
ghci> :t Trues "abcd"
Trues "abcd" :: Part 'True Char
ghci> -- Fact 2:
ghci> Trues t = truePart
ghci> Falses f = truePart

<interactive>:9:12: error:
Couldn't match type'True’ with ‘'False
      Expected type: Part 'False Int
        Actual type: Part 'True Int
In the expression: truePart
      In a pattern binding: Falses f = truePart

The COMPLETE pragmas above are there to tell GHC that when pattern matching, matching only on Trues (or only on Falses) constitutes a complete pattern match, this way GHC doesn’t give us a “missing patterns” warning if we don’t provide a fallback case.

Now we can implement greaterThan7sFirst nicely:

greaterThan7sFirst :: [Int] -> [Int]
greaterThan7sFirst xs = 
  let (Trues greaterThans, Falses lessThans) = partitionTyped (>7) xs
  in greaterThans ++ lessThans

And if we mess up then we get an error:

greaterThan7sFirst :: [Int] -> [Int]
greaterThan7sFirst xs = 
  let (Falses lessThans, Trues greaterThans) = partitionTyped (>7) xs
  in greaterThans ++ lessThans
• Couldn't match type ‘'True’ with ‘'False’
  Expected type: (Part 'False Int, Part 'True Int)
    Actual type: (Part 'True Int, Part 'False Int)

Nice. As a bonus, let’s implement Show instances that display the results using our pattern synonym syntax. We’ll need FlexibleInstances to convince GHC that it’s okay to write separate instances for True and False.

{-# LANGUAGE FlexibleInstances #-}

instance Show a => Show (Part False a) where
  showsPrec n (Falses xs) = showParen (n > 10) (showString "Falses " . shows xs) 

instance Show a => Show (Part True a) where
  showsPrec n (Trues xs) = showParen (n > 10) (showString "Trues " . shows xs)
ghci> partitionTyped (>7) [4,9,4,8,2,7,9,2,6]
(Trues [9,8,9],Falses [4,4,2,7,2,6])

Wordle: Guesses and Masters

This particular motivating example may feel a bit overcomplicated for what it’s trying to do, and indeed the solution with two newtypes was already quite satisfactory. The benefits to using phantom types with DataKinds really come up more in situations where you may want to work more with the type, unlike here where it exists purely to be pattern matched on immediately like in this example.

As an example, a while ago I wrote a Wordle solver that makes use of this technique to differentiate between words that represent a guess versus words that represent a possible solution. In that case, I had something like

-- "Wordlet" is a cutesy word that I use to refer to a Wordle word.
data WordletType = Guess | Master

newtype Wordlet (wlty :: WordletType) = Wordlet { ... }

newtype WordList (wlty :: WordletType) = WordList { getWordList :: Data.Vector.Vector (Wordlet wlty) }

Because words and word lists are tagged, it means that on the one hand, I have type safety, so if I have a function like

againstMaster :: Wordlet Guess -> Wordlet Master -> Colors

which checks a given guess word against a candidate master word, then I can use it to write the following function, which checks if a given guess word and feedback to that guess word is consistent with some possible master word:

consistentWith :: Colors -> Wordlet Guess -> Wordlet Master -> Bool
consistentWith colors guess master = guess `againstMaster` master == colors

This is just a small example, but the fact that words are tagged ensures that I can’t mess up the argument order when applying againstMaster. Another example would be the filterMasters function, which filters a word list full of master word candidates to only those that are consistent with a given guess word and response to that guess word

filterMasters :: Colors -> Wordlet Guess -> WordList Master -> WordList Master
filterMasters colors guess (WordList masters) =
  WordList $ Data.Vector.filter (consistentWith colors guess) masters

The types guide us towards a solution. We have a vector of WordList Masters and we want to filter it. So of course we need Data.Vector.filter. The filtering function needs to combine some Colors, a Wordlet Guess and a WordLet Master in some way to give us a Bool. This is exactly consistentWith. 2

When I first added these phantom types to my code it indeed found a couple of places where I was implicitly confusing a master word for a guess word.

As far as how this is beneficial over a couple of newtypes, it means that I don’t need separate functions for dealing with guess words and master words in cases where it doesn’t matter. For example:

wordletToString :: Wordlet wlty -> String
parseWordlet :: Text -> Either Text (Wordlet wlty)

In both of these cases it doesn’t matter whether we’re dealing with a guess word or a master word, and the phantom types approach makes it very easy to be polymorphic over the type of word, since the implementation is the same, and thus they only involve a single function body.

So we’ve learned today that phantom types give us the ability to both communicate more in our type signatures, and they can help us prevent mistakes and misuses by allowing us to track where a value came from.

Link to the full code for this post

  1. If we had another type in scope that happened to be called True then we’d need to use ticks in order for GHC to understand that we’re referring to the lifted version of the value True and not the type True.↩︎

  2. Presumably, I used againstMaster directly first and then abstracted it out into consistentWith, but the point still stands- againstMaster is the obvious choice for comparing some Colors combined with a guess word and a master word, just by looking at the types involved.↩︎