lastminute.com logo

Technology

Comonads for booking correlation

emanuele_colombo
emanuele colombo

How can category theory and typeclasses help us in our daily job? Let's see an example of real world usage for the Applicative and Comonad typeclasses.


In our company data is the most important resource. So, it is very important to analyse all the information we have in order to provide a better service to our customers and our helpdesk operators.

During one of our data analysis activities my team was trying to find out a way to correlate multiple bookings made by our customers in order to find out whether some of them were related to the same journey (e.g. a customer staying in different hotels during their holiday) or not. This information is very important both for a better understanding of our customer’s needs and it is also very helpful for our customer care operators that can provide seamless help to the customer when multiple bookings are involved.

We defined some basic rules that could help us find this correlation and we implemented them within an Apache Spark job in Scala, in order to validate them.

Basically you can see our rules as a function: given two bookings as input parameters the function will find out if they are related or not:

def areRelated(first: Booking)(second: Booking): Boolean = ???

Once the areRelated function was implemented we needed to invoke it with all the possible combinations of bookings and get the list of the related Booking’s identifiers as output. After some data cleanup, filtering and aggregation processes, we had a List[Booking] available. One simple solution would have been to match every booking in the list with all the others inside the same list (hopefully excluding itself), like:

// given: bookings = List[Booking]

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] = {
    var relations: List[(BookingId, BookingId)] = List()
    for (c <- candidates) {
        if ((b.id != c.id) && areRelated(b)(c)) {
            relations = (b.id, c.id) :: relations
        }
    }
    relations
}

var relations: List[(BookingId, BookingId)] = List()
for (b <- bookings) {
    relations = relations ++ findRelated(b, bookings)
}

Using a more functional approach we could replace this code with the following (using the Cats library for the Applicative typeclass instance):

val relations = Applicative[List]
    .product(bookings, bookings)
    .filter { case (a, b) => a.id != b.id }
    .filter { case (a, b) => areRelated(a)(b) }
    .map    { case (a, b) => (a.id, b.id) }

Here we are using the product function of the Applicative typeclass in order to create a list of tuples. The content of this list is made by all the combinations of elements of bookings (as we are using it for both the first and the second parameter). Given this list of tuples the subsequent steps are trivial: a filter in order to remove the tuples where a booking is bound with itself, another filter using our correlation rules in order to find out what tuples correspond to related bookings and finally a map in order to extract the booking identifiers.

But we can go a step further as we know that this approach is a bit too heavy. Given that our related rules are commutative we don’t need to test both the combinations of two bookings (A, B) and (B, A); we can test instead only one of the two. This would reduce the number of correlation checks by 50% and also would remove 50% of the (duplicated) entries in our results. A first possible implementation could be the following:

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] = {
    var relations: List[(BookingId, BookingId)] = List()
    for (c <- candidates) {
      if ((b.id != c.id) && areRelated(b)(c)) {
        relations = (b.id, c.id) :: relations
      }
    }
    relations
}

val relations: List[(BookingId, BookingId)] = List()
for (i <- bookings.indices) {
    val sublist = bookings.drop(i)
    relations = relations ++ findRelated(sublist.head, sublist.tail)
}

But we can do something better, bringing back immutability and the functional approach, by using a recursive function instead of a for loop:

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
    candidates
      .filter(c => areRelated(b)(c))
      .map(c => (b.id, c.id))

@tailrec
def correlateAll(bookings: List[Booking],
                 relatedGroup: List[(BookingId, BookingId)] = List.empty) = bookings match {
    case head :: tail => correlateAll(tail, relatedGroup ++ findRelated(head, tail))
    case Nil => relatedGroup
}

correlateAll(bookings)

This is a good approach but actually, in functional programming, we have more tools on our belt. The FP paradigm provides some design patterns like the ones in the object oriented world, but with a big difference. While in the OOP world the patterns just tell you how to structure your code, in the FP world a pattern also provides a production-ready implementation so you don’t have to implement (and test and debug) it. I think this is a great plus of the functional programming approach.

So how we can satisfy our needs with an FP pattern like we’ve done before with the Applicative typeclass? The answer is one: the Comonad typeclass.

Disclaimer: we are not going to make a deep explanation of the monad and comonad concepts. If you are interested in it I suggest that you read the Category Theory for Programmers book (freely available online) by Bartosz Milewski where an entire section is dedicated to the comonad concept.

The Comonad typeclass provides some functions that could be very useful for us; in particular it provides the coflatten function that basically, given a F[A], provides an F[F[A]]; you can see it is as the dual of the flatten function of the Monad typeclass. The implementation of this typeclass could be trivial or not, depending on the data type we are working with, but in any case it must satisfy some specific laws as any other typeclass.

Since we are working with a list of bookings let’s see how the coflatten function works on a list. In order to work with the Comonad typeclass we have to work with a NonEmptyList data type (it is required in order to satisfy the Comonad definition and laws). As we know, a list can be defined as a chain of Cons instances terminated with a Nil value:

NonEmpytList(1,2,3,4) = Cons(1, Cons(2, Cons(3, Cons(4, Nil))))

Given this representation the coflatten method takes a list, like the one above, as input and returns a list as output where every element of the output is a Cons instance of the input. In this example, the Cons inside the list are the following:

Cons(1, Cons(2, Cons(3, Cons(4, Nil))))
Cons(2, Cons(3, Cons(4, Nil)))
Cons(3, Cons(4, Nil))
Cons(4, Nil)

and so, the coflatten will return:

coflatten(NonEmptyList(1, 2, 3, 4)) == NonEmptyList(
    NonEmptyList(1, 2, 3, 4),
    NonEmptyList(2, 3, 4),
    NonEmptyList(3, 4),
    NonEmptyList(4)
)

This is basically what we need in order to compare the bookings; we could rely on this method in order to find out the related bookings.

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
    candidates
        .filter(areRelated(b))
        .map(c => (b.id, c.id))

val bs = NonEmptyList.fromListUnsafe(bookings)
Comonad[NonEmptyList]
    .coflatten(bs)
    .map(l => findRelated(l.head, l.tail))
    .toList
    .flatten

Let’s see if we can simplify this code. First of all, we are executing the sequence of operations coflatten and map; they can be replaced with a coflatMap:

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
    candidates
        .filter(areRelated(b))
        .map(c => (b.id, c.id))

val bs = NonEmptyList.fromListUnsafe(bookings)
Comonad[NonEmptyList]
    .coflatMap(bs)(l => findRelated(l.head, l.tail))
    .toList
    .flatten

Given this new implementation, we can lose some typeclass constraints; we are using just the coflatMap function on the comonad typeclass; Cats provides a less specific typeclass named CoflatMap, that basically provides just the coflatMap function:

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
    candidates
        .filter(areRelated(b))
        .map(c => (b.id, c.id))

val bs = NonEmptyList.fromListUnsafe(bookings)
CoflatMap[NonEmptyList]
    .coflatMap(bs)(l => findRelated(l.head, l.tail))
    .toList
    .flatten

Since the laws that describe the CoflatMap typeclass are less strict that the Comonad ones Cats can provide an instance also for the List data type, so we can finally drop the NonEmptyList annoyance:

def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
    candidates
        .filter(areRelated(b))
        .map(c => (b.id, c.id))

CoflatMap[List]
    .coflatMap(bookings)(l => findRelated(l.head, l.tail))
    .flatten

Finally, but this is a matter of taste, we can inline the findRelated function:

CoflatMap[List]
    .coflatMap(bookings)(l =>
        l.tail.filter(areRelated(l.head))
              .map(c => (l.head.id, c.id)))
    .flatten

That is it. This is the full implementation for our requirements.


Read next

SwiftUI and the Text concatenations super powers

SwiftUI and the Text concatenations super powers

fabrizio_duroni
fabrizio duroni
marco_de_lucchi
marco de lucchi

Do you need a way to compose beautiful text with images and custom font like you are used with Attributed String. The Text component has everything we need to create some sort of 'attributed text' directly in SwiftUI. Let's go!!! [...]

A Monorepo Experiment: reuniting a JVM-based codebase

A Monorepo Experiment: reuniting a JVM-based codebase

luigi_noto
luigi noto

Continuing the Monorepo exploration series, we’ll see in action a real-life example of a monorepo for JVM-based languages, implemented with Maven, that runs in continuous integration. The experiment of reuniting a codebase of ~700K lines of code from many projects and shared libraries, into a single repository. [...]