Technology Blog

Comonads for booking correlation

November 05, 2019 - 7 min read ⏱️

emanuele_colombo
emanuele colombo

How can category theory and typeclasses help us in our daily job? Let's see an example of real world usage for the Applicative and Comonad typeclasses.


In our company data is the most important resource. So, it is very important to analyse all the information we have in order to provide a better service to our customers and our helpdesk operators.

During one of our data analysis activities my team was trying to find out a way to correlate multiple bookings made by our customers in order to find out whether some of them were related to the same journey (e.g. a customer staying in different hotels during their holiday) or not. This information is very important both for a better understanding of our customer’s needs and it is also very helpful for our customer care operators that can provide seamless help to the customer when multiple bookings are involved.

We defined some basic rules that could help us find this correlation and we implemented them within an Apache Spark job in Scala, in order to validate them.

Basically you can see our rules as a function: given two bookings as input parameters the function will find out if they are related or not:

def areRelated(first: Booking)(second: Booking): Boolean = ???

Once the areRelated function was implemented we needed to invoke it with all the possible combinations of bookings and get the list of the related Booking’s identifiers as output. After some data cleanup, filtering and aggregation processes, we had a List[Booking] available. One simple solution would have been to match every booking in the list with all the others inside the same list (hopefully excluding itself), like:

// given: bookings = List[Booking]

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] = {
    var relations: List[(BookingId, BookingId)] = List()
    for (c <- candidates) {
        if ((b.id != c.id) && areRelated(b)(c)) {
            relations = (b.id, c.id) :: relations
        }
    }
    relations
}

var relations: List[(BookingId, BookingId)] = List()
for (b <- bookings) {
    relations = relations ++ findRelated(b, bookings)
}

Using a more functional approach we could replace this code with the following (using the Cats library for the Applicative typeclass instance):

val relations = Applicative[List]
    .product(bookings, bookings)
    .filter { case (a, b) => a.id != b.id }
    .filter { case (a, b) => areRelated(a)(b) }
    .map    { case (a, b) => (a.id, b.id) }

Here we are using the product function of the Applicative typeclass in order to create a list of tuples. The content of this list is made by all the combinations of elements of bookings (as we are using it for both the first and the second parameter). Given this list of tuples the subsequent steps are trivial: a filter in order to remove the tuples where a booking is bound with itself, another filter using our correlation rules in order to find out what tuples correspond to related bookings and finally a map in order to extract the booking identifiers.

But we can go a step further as we know that this approach is a bit too heavy. Given that our related rules are commutative we don’t need to test both the combinations of two bookings (A, B) and (B, A); we can test instead only one of the two. This would reduce the number of correlation checks by 50% and also would remove 50% of the (duplicated) entries in our results. A first possible implementation could be the following:

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] = {
    var relations: List[(BookingId, BookingId)] = List()
    for (c <- candidates) {
      if ((b.id != c.id) && areRelated(b)(c)) {
        relations = (b.id, c.id) :: relations
      }
    }
    relations
}

val relations: List[(BookingId, BookingId)] = List()
for (i <- bookings.indices) {
    val sublist = bookings.drop(i)
    relations = relations ++ findRelated(sublist.head, sublist.tail)
}

But we can do something better, bringing back immutability and the functional approach, by using a recursive function instead of a for loop:

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
    candidates
      .filter(c => areRelated(b)(c))
      .map(c => (b.id, c.id))

@tailrec
def correlateAll(bookings: List[Booking],
                 relatedGroup: List[(BookingId, BookingId)] = List.empty) = bookings match {
    case head :: tail => correlateAll(tail, relatedGroup ++ findRelated(head, tail))
    case Nil => relatedGroup
}

correlateAll(bookings)

This is a good approach but actually, in functional programming, we have more tools on our belt. The FP paradigm provides some design patterns like the ones in the object oriented world, but with a big difference. While in the OOP world the patterns just tell you how to structure your code, in the FP world a pattern also provides a production-ready implementation so you don’t have to implement (and test and debug) it. I think this is a great plus of the functional programming approach.

So how we can satisfy our needs with an FP pattern like we’ve done before with the Applicative typeclass? The answer is one: the Comonad typeclass.

Disclaimer: we are not going to make a deep explanation of the monad and comonad concepts. If you are interested in it I suggest that you read the Category Theory for Programmers book (freely available online) by Bartosz Milewski where an entire section is dedicated to the comonad concept.

The Comonad typeclass provides some functions that could be very useful for us; in particular it provides the coflatten function that basically, given a F[A], provides an F[F[A]]; you can see it is as the dual of the flatten function of the Monad typeclass. The implementation of this typeclass could be trivial or not, depending on the data type we are working with, but in any case it must satisfy some specific laws as any other typeclass.

Since we are working with a list of bookings let’s see how the coflatten function works on a list. In order to work with the Comonad typeclass we have to work with a NonEmptyList data type (it is required in order to satisfy the Comonad definition and laws). As we know, a list can be defined as a chain of Cons instances terminated with a Nil value:

NonEmpytList(1,2,3,4) = Cons(1, Cons(2, Cons(3, Cons(4, Nil))))

Given this representation the coflatten method takes a list, like the one above, as input and returns a list as output where every element of the output is a Cons instance of the input. In this example, the Cons inside the list are the following:

Cons(1, Cons(2, Cons(3, Cons(4, Nil))))
Cons(2, Cons(3, Cons(4, Nil)))
Cons(3, Cons(4, Nil))
Cons(4, Nil)

and so, the coflatten will return:

coflatten(NonEmptyList(1, 2, 3, 4)) == NonEmptyList(
    NonEmptyList(1, 2, 3, 4),
    NonEmptyList(2, 3, 4),
    NonEmptyList(3, 4),
    NonEmptyList(4)
)

This is basically what we need in order to compare the bookings; we could rely on this method in order to find out the related bookings.

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
    candidates
        .filter(areRelated(b))
        .map(c => (b.id, c.id))

val bs = NonEmptyList.fromListUnsafe(bookings)
Comonad[NonEmptyList]
    .coflatten(bs)
    .map(l => findRelated(l.head, l.tail))
    .toList
    .flatten

Let’s see if we can simplify this code. First of all, we are executing the sequence of operations coflatten and map; they can be replaced with a coflatMap:

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
    candidates
        .filter(areRelated(b))
        .map(c => (b.id, c.id))

val bs = NonEmptyList.fromListUnsafe(bookings)
Comonad[NonEmptyList]
    .coflatMap(bs)(l => findRelated(l.head, l.tail))
    .toList
    .flatten

Given this new implementation, we can lose some typeclass constraints; we are using just the coflatMap function on the comonad typeclass; Cats provides a less specific typeclass named CoflatMap, that basically provides just the coflatMap function:

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
    candidates
        .filter(areRelated(b))
        .map(c => (b.id, c.id))

val bs = NonEmptyList.fromListUnsafe(bookings)
CoflatMap[NonEmptyList]
    .coflatMap(bs)(l => findRelated(l.head, l.tail))
    .toList
    .flatten

Since the laws that describe the CoflatMap typeclass are less strict that the Comonad ones Cats can provide an instance also for the List data type, so we can finally drop the NonEmptyList annoyance:

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
    candidates
        .filter(areRelated(b))
        .map(c => (b.id, c.id))

CoflatMap[List]
    .coflatMap(bookings)(l => findRelated(l.head, l.tail))
    .flatten

Finally, but this is a matter of taste, we can inline the findRelated function:

CoflatMap[List]
    .coflatMap(bookings)(l =>
        l.tail.filter(areRelated(l.head))
              .map(c => (l.head.id, c.id)))
    .flatten

That is it. This is the full implementation for our requirements.


lastminute.com folks
Written by lastminute.com folks who are busy living. You should follow us on Twitter

The postings on this site are authors' opinions and experiences and do not necessarily represent the postings, strategies or opinions of lastminute.com group.