Algebraic Data Types

Representing complex data

We’ve seen:

  • base types: Bool, Int, Integer, Float

  • some ways to build up types: given types T1, T2

    • functions: T1 -> T2
    • tuples: (T1, T2)
    • lists: [T1]



Algebraic Data Types: a single, powerful technique for building up types to represent complex data

  • lets you define your own data types
  • subsumes tuples and lists!







Algebraic Data Types

  1. Type Synonyms: naming existing types

  2. Product Types: bundling things together

  3. Sum Types: types with multiple variants

  4. Recursive Types: types that contain themselves

  5. Polymorphic Datatypes: datatypes with parameters










Type Synonyms

Synonyms are just names (“aliases”) for existing types

  • think typedef in C










A type to represent circles

A tuple (x, y, r) is a circle with center at (x, y) and radius r

type CircleT = (Double, Double, Double)











A type to represent cuboids

A tuple (length, depth, height) is a cuboid

type CuboidT = (Double, Double, Double)










Using Type Synonyms

We can now use synonyms by creating values of the given types

circ0 :: CircleT 
circ0 = (5, 1, 100)  -- circle at (5,1) with radius 100

cub0 :: CuboidT
cub0 = (10, 20, 30)  -- cuboid with l=10, d=20, h=30 

And we can write functions over synonyms too

area :: CircleT -> Double
area (x, y, r) = pi * r * r  

volume :: CuboidT -> Double
volume (l, d, h) = l * d * h 

We should get this behavior

>>> area circ0 
31415.926535897932

>>> volume cub0
6000










QUIZ

Suppose we have the definitions

type CircleT = (Double, Double, Double)
type CuboidT = (Double, Double, Double)

circ0 :: CircleT
circ0 = (5, 1, 100)  -- circle at (5,1) with radius 100

cub0 :: CuboidT
cub0 = (10, 20, 30)  -- cuboid with length=10, depth=20, height=30 

area :: CircleT -> Double
area (x, y, r) = pi * r * r  

volume :: CuboidT -> Double
volume (l, d, h) = l * d * h

What is the result of

>>> volume circ0

A. 500

B. Type error










Beware!

Type Synonyms

  • Do not create new types

  • Just name existing types

And hence, synonyms

  • Do not prevent confusing different values










Algebraic Data Types

  1. Type Synonyms: naming existing types [done]

  2. Product Types: bundling things together

  3. Sum Types: types with multiple variants

  4. Recursive Types: types that contain themselves

  5. Polymorphic Datatypes: datatypes with parameters










Creating New Data Types

We can avoid mixing up values by creating new data types

-- | A new type `Circle` with constructor `MkCircle`,
--   which takes three arguments of type `Double`
data Circle = MkCircle Double Double Double

-- | A new type `Cuboid` with constructor `MkCuboid`
--   which takes three arguments of type `Double`
data Cuboid = MkCuboid Double Double Double

We use constructors to build values of the new type:

circ1 :: Circle 
circ1 = MkCircle 5 1 100  -- circle at (5,1) w/ radius 100

cub1 :: Cuboid
cub1 = MkCuboid 10 20 30  -- cuboid w/ len=10, dep=20, ht=30 










QUIZ

Suppose we create a new type with a data definition

-- | A new type `Circle` with constructor `MkCircle`
data Circle = MkCircle Double Double Double

What is the type of the MkCircle constructor?

A. MkCircle :: Circle

B. MkCircle :: Double -> Circle

C. MkCircle :: Double -> Double -> Circle

D. MkCircle :: Double -> Double -> Double -> Circle

E. MkCircle :: (Double, Double, Double) -> Circle









Aside: Record syntax

Haskell’s record syntax allows you to name the constructor parameters:

  • Instead of

    data Circle = MkCircle Double Double Double
  • you can write:

    data Circle = MkCircle { 
      center_x :: Double, 
      center_y :: Double,
      radius   :: Double  
    }
  • then you can do:

    circ1 = MkCircle { center_x = 5, center_y = 1, radius = 100 }
    -- same as: circ1 = MkCircle 5 1 100
    
    r = radius circ1 -- use field name as a function    










QUIZ

Suppose we have the definitions

type CuboidT = (Double, Double, Double)
data Cuboid  = MkCuboid Double Double Double     

volume :: CuboidT -> Double
volume (l, d, h) = l * d * h

What is the result of

>>> volume (MkCuboid 10 20 30)

A. 6000

B. Type error










Deconstructing Data

Constructors let us build values of new type … but how to use those values?

How can we implement a function

volume :: Cuboid -> Double
volume c = ???

such that

>>> volume (MkCuboid 10 20 30)
6000










Deconstructing Data by Pattern Matching

Haskell lets us deconstruct data via pattern-matching

volume :: Cuboid -> Double
volume (MkCuboid l d h) = l * d * h

area :: Circle -> Double
area (MkCircle x y r) = pi * r * r

same as for tuples, just using different constructors!










Algebraic Data Types

  1. Type Synonyms: naming existing types [done]

  2. Product Types: bundling things together [done]

  3. Sum Types: types with multiple variants

  4. Recursive Types: types that contain themselves

  5. Polymorphic Datatypes: datatypes with parameters










Defining new types with data prevents us from mixing up values:

area :: Circle -> Double
area (MkCircle x y r) = pi * r * r

>>> area (MkCuboid 10 20 30) -- TYPE ERROR!

But … what if we need to mix up values?









A List of Shapes

Suppose I need to represent a list of shapes

  • Some Circles
  • Some Cuboids

What is the problem with shapes as defined below?

shapes = [circ1, cub1]

Where we have defined

circ1 :: Circle 
circ1 = MkCircle 5 1 100  -- circle at (5,1) with radius 100

cub1 :: Cuboid
cub1 = MkCuboid 10 20 30  -- cuboid with length=10, depth=20, height=30 










This does not type-check!

All list elements must have the same type!





Solution?










Sum types

Lets create a single type that can represent both kinds of shapes!

data Shape 
  = MkCircle Double Double Double   -- Circle at x, y with radius r 
  | MkCuboid Double Double Double   -- Cuboid with length, depth, height

A sum type is a data type with multiple constructors

  • aka enum in Rust
  • aka (tagged) union in C










QUIZ

With the definition:

data Shape 
  = MkCircle Double Double Double   -- Circle at x, y with radius r 
  | MkCuboid Double Double Double   -- Cuboid with length, depth, height

What is the type of MkCircle 5 1 100 ?

A. Shape

B. Circle

C. MkCircle

D. (Double, Double, Double)













List of Shapes: Take 2

Now we can define

circ2 :: Shape
circ2 = MkCircle 5 1 100  -- circle at (5,1) with radius 100

cub2 :: Shape 
cub2 = MkCuboid 10 20 30  -- cuboid with length=10, depth=20, height=30 

and then define collections of Shapes

shapes :: [Shape]
shapes = [circ2, cub2]













2D Shapes

Lets define a type for 2D shapes

data Shape2D 
  = MkRect Double Double -- rectangle with width and height
  | MkCirc Double        -- circle with radius
  | MkPoly [Vertex]      -- polygon with a list of vertices

type Vertex = (Double, Double)

Different constructors can have different types!











Tagged Boxes

A values of type Shape2D is either two doubles tagged with Rectangle

>>> :type (Rectangle 4.5 1.2)
(Rectangle 4.5 1.2) :: Shape

or a single double tagged with Circle

>>> :type (Circle 3.2)
(Circle 3.2) :: Shape

or a list of pairs of d tagged with Polygon

ghci> :type (Polygon [(1, 1), (2, 2), (3, 3)])
(Polygon [(1, 1), (2, 2), (3, 3)]) :: Shape




Data values are Tagged Boxes





How do we get the value out of the box?













EXERCISE

Write a function to compute the perimeter of a Shape2D

data Shape2D = MkRect Double Double | MkCirc Double | MkPoly [Vertex]
type Vertex = (Double, Double)

perimeter :: Shape2D -> Double
perimeter s = ???

HINT

You may want to use this helper that computes the distance between two points

distance :: Vertex -> Vertex -> Double
distance (x1, y1) (x2, y2) = sqrt ((x2 - x1) ** 2 + (y2 - y1) ** 2)







Algebraic Data Types

  1. Type Synonyms: naming existing types [done]

  2. Product Types: bundling things together [done]

  3. Sum Types: types with multiple variants [done]

  4. Recursive Types: types that contain themselves

  5. Polymorphic Datatypes: datatypes with parameters











Recursive types

Have we seen an example already where

  • a value of type T
  • contains another value of the same type T inside?











Yes, lists!

Lists aren’t built-in! They are an algebraic data type like any other:

data IntList 
  = INil                -- empty list
  -- ^ base constructor
  | ICons Int IntList   -- list with Int head and IntList tail
  -- ^ inductive constructor (contains another IntList)











QUIZ

data IntList 
  = INil                -- empty list
  | ICons Int IntList   -- list with Int head and IntList tail

What is the type of ICons ?

A. IntList

B. Int -> IntList

C. (Int, IntList)

D. Int -> IntList -> IntList

E. IntList -> IntList











EXERCISE

With the definition:

data IntList = INil | ICons Int IntList

Write down an IntList representation of the list [1,2,3]

list_1_2_3 :: IntList
list_1_2_3 = ???











Recursion means boxes within boxes

Recursively Nested Boxes











Functions on recursive types

Recursive code mirrors recursive data

data IntList 
  = INil                -- base constructor
  | ICons Int IntList   -- inductive constructor

Step 1: add a pattern per constructor

length :: IntList -> Int
length INil         = ... -- base case
length (ICons x xs) = ... -- recursive case

Step 2: fill in base case:

length :: IntList -> Int
length INil         = 0
length (ICons x xs) = ...

Step 3: fill in inductive case using a recursive call:

length :: IntList -> Int
length INil         = 0
length (ICons x xs) = 1 + length xs
--                        ^ recursive call on the "nested box"










EXERCISE: Append

Write an append function that appends two IntLists:

data IntList = INil | ICons Int IntList

append :: IntList -> IntList -> IntList
append = ???

so that we can call:

-- [1,2] ++ [3,4] ==> [1,2,3,4]
>>> append (ICons 1 (ICons 2 INil))  (ICons 3 (ICons 4 INil))
(ICons 1 (ICons 2 (ICons 3 (ICons 4 INil))))













Lessons Learned

  • Recursive code mirrors recursive data
  • With multiple arguments of a recursive type, which one should I recurse on?
  • The name of the game is to pick the right inductive strategy!










Trees

Lists are unary trees:

Unary tree (aka list)




How do we represent binary trees?

Binary tree










QUIZ: Binary trees

How do you represent this binary tree as a recursive datatype?

Binary tree

(A) data ITree = ILeaf | INode Int ITree

(B) data ITree = ILeaf | INode ITree ITree

(C) data ITree = ILeaf | INode Int ITree ITree

(D) data ITree = ILeaf Int | INode ITree ITree

(E) data ITree = ILeaf Int | INode Int ITree ITree











Binary tree
-- | Binary tree of integers
data ITree = ILeaf | INode Int ITree ITree

t1234 = INode 1 
          (INode 2 (INode 3 ILeaf ILeaf) ILeaf) 
          (INode 4 ILeaf ILeaf)








Functions on trees

height :: Tree -> Int
height t = ??








Example: Calculator

I want to implement an arithmetic calculator to evaluate expressions like:

  • 4.0 + 2.9
  • 3.785.92
  • (4.0 + 2.9) * (3.78 - 5.92)

What is a Haskell datatype to represent these expressions?

data Expr = ???









data Expr = Num Float
          | Add Expr Expr
          | Sub Expr Expr
          | Mul Expr Expr









EXERCISE: Eval

With expressions defined as follows:

data Expr = Num Float
          | Add Expr Expr
          | Sub Expr Expr
          | Mul Expr Expr

write a function to evaluate an expression

eval :: Expr -> Float
eval e = ???








Algebraic Data Types

  1. Type Synonyms: naming existing types [done]

  2. Product Types: bundling things together [done]

  3. Sum Types: types with multiple variants [done]

  4. Recursive Types: types that contain themselves [done]

  5. Polymorphic Datatypes: datatypes with parameters








Parameterized Types

Our IntList datatype can only store Ints :-(

What if we want to store Chars or Doubles?

data CharList 
  = CNil
  | CCons Char CharList

data DoubleList 
   = DNil
   | DCons Char DoubleList











Don’t Repeat Yourself!

Don’t repeat definitions - Instead reuse the list structure across all types!

Find abstract data patterns by

  • identifying the different parts and
  • refactor those into parameters








A Refactored List

Here are the three types: What is common? What is different?

data IList = INil | ICons Int    IList

data CList = CNil | CCons Char   CList

data DList = DNil | DCons Double DList





Common: Nil/Cons structure

Different: type of each “head” element





Refactored using Type Parameter

-- | A list of elements of type `a`
data List a = Nil | Cons a  (List a)

We can recover original types as instances of List:

iList :: List Int      -- list where 'a' = 'Int' 
iList = Cons 1 (Cons 2 (Cons 3 Nil))

cList :: List Char     -- list where 'a' = 'Char' 
cList = Cons 'a' (Cons 'b' (Cons 'c' Nil))

dList :: List Double   -- list where 'a' = 'Double' 
dList = Cons 1.1 (Cons 2.2 (Cons 3.3 Nil))















QUIZ

data List a = Nil | Cons a  (List a)

What is the type of Cons ?

A. Int -> List -> List

B. Int -> List Int -> List Int

C. Char -> List Char -> List Char

D. a -> List -> List

E. a -> List a -> List a















Type Constructors

Note: List is not a type!

List a, List Int, … are types

a is a type parameter

Then what is List?

A type-constructor that

  • takes as input a type a
  • returns as output the type List a











Polymorphic Data has Polymorphic Constructors

Look at the types of the constructors

>>> :type Nil 
Nil :: List a

That is, Nil is a value of any kind of list, and

>>> :type Cons 
Cons :: a -> List a -> List a

Cons takes an a and a List a and returns a List a.















Polymorphic Functions

Let us refactor the append function to work on Lists:

data List a = Nil | Cons a  (List a)
append :: ??? -- What is the type of `append`?
append = ???















Polymorphic Functions over Polymorphic Data

The append function on Lists is polymorphic:

append :: List a -> List a -> List a
append Nil ys         = ys
append (Cons x xs) ys = Cons x (append xs ys)

append doesn’t care about the actual values in the list

  • only manipulates the structure of the list

Hence append :: List a -> List a -> List a

  • we can call append on lists of any kind
  • as long as both lists are of the same kind
>>> append (Cons 1 (Cons 2 Nil)) (Cons 3 Nil)       -- a = Int
Cons 1 (Cons 2 (Cons 3 Nil))

>>> append (Cons 'a' (Cons 'b' Nil)) (Cons 'c' Nil)  -- a = Char
Cons 'a' (Cons 'b' (Cons 'c' Nil))















QUIZ

With the type of append defined as

append :: List a -> List a -> List a

what would be result of

append (Cons 'a' (Cons 'b' Nil)) (Cons True Nil)

A. Cons 'a' (Cons 'b' (Cons True Nil))

B. Cons 'a' (Cons 'b' Nil)

C. Type error















Built-in Lists?

This is exactly how Haskell’s “built-in” lists are defined:

data [a]    = [] | (:) a [a]

data List a = Nil | Cons a (List a)
  • Nil is called []
  • Cons is called :

Many list manipulating functions e.g. in Data.List are polymorphic - Can be reused across all kinds of lists.

(++) :: [a] -> [a] -> [a]
head :: [a] -> a
tail :: [a] -> [a]









That’s all folks!