In our company data is the most important resource. So, it is very important to analyse all the information we have in order to provide a better service to our customers and our helpdesk operators.

During one of our data analysis activities my team was trying to find out a way to correlate multiple bookings made by our customers in order to find out whether some of them were related to the same journey (e.g. a customer staying in different hotels during their holiday) or not. This information is very important both for a better understanding of our customer’s needs and it is also very helpful for our customer care operators that can provide seamless help to the customer when multiple bookings are involved.

We defined some basic rules that could help us find this correlation and we implemented them within an Apache Spark job in Scala, in order to validate them.

Basically you can see our rules as a function: given two bookings as input parameters the function will find out if they are related or not:

`def areRelated(first: Booking)(second: Booking): Boolean = ???`

Once the `areRelated`

function was implemented we needed to invoke it with all the possible combinations of bookings and get the list of the related Booking’s identifiers as output. After some data cleanup, filtering and aggregation processes, we had a `List[Booking]`

available. One simple solution would have been to match every booking in the list with all the others inside the same list (hopefully excluding itself), like:

```
// given: bookings = List[Booking]
def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] = {
var relations: List[(BookingId, BookingId)] = List()
for (c <- candidates) {
if ((b.id != c.id) && areRelated(b)(c)) {
relations = (b.id, c.id) :: relations
}
}
relations
}
var relations: List[(BookingId, BookingId)] = List()
for (b <- bookings) {
relations = relations ++ findRelated(b, bookings)
}
```

Using a more functional approach we could replace this code with the following (using the Cats library for the `Applicative`

typeclass instance):

```
val relations = Applicative[List]
.product(bookings, bookings)
.filter { case (a, b) => a.id != b.id }
.filter { case (a, b) => areRelated(a)(b) }
.map { case (a, b) => (a.id, b.id) }
```

Here we are using the `product`

function of the `Applicative`

typeclass in order to create a list of tuples. The content of this list is made by all the combinations of elements of *bookings* (as we are using it for both the first and the second parameter). Given this list of tuples the subsequent steps are trivial: a `filter`

in order to remove the tuples where a booking is bound with itself, another `filter`

using our correlation rules in order to find out what tuples correspond to related bookings and finally a `map`

in order to extract the booking identifiers.

But we can go a step further as we know that this approach is a bit too heavy. Given that our related rules are commutative we don’t need to test both the combinations of two bookings (A, B) and (B, A); we can test instead only one of the two. This would reduce the number of correlation checks by 50% and also would remove 50% of the (duplicated) entries in our results. A first possible implementation could be the following:

```
def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] = {
var relations: List[(BookingId, BookingId)] = List()
for (c <- candidates) {
if ((b.id != c.id) && areRelated(b)(c)) {
relations = (b.id, c.id) :: relations
}
}
relations
}
val relations: List[(BookingId, BookingId)] = List()
for (i <- bookings.indices) {
val sublist = bookings.drop(i)
relations = relations ++ findRelated(sublist.head, sublist.tail)
}
```

But we can do something better, bringing back immutability and the functional approach, by using a recursive function instead of a for loop:

```
def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
candidates
.filter(c => areRelated(b)(c))
.map(c => (b.id, c.id))
@tailrec
def correlateAll(bookings: List[Booking],
relatedGroup: List[(BookingId, BookingId)] = List.empty) = bookings match {
case head :: tail => correlateAll(tail, relatedGroup ++ findRelated(head, tail))
case Nil => relatedGroup
}
correlateAll(bookings)
```

This is a good approach but actually, in functional programming, we have more tools on our belt. The FP paradigm provides some design patterns like the ones in the object oriented world, but with a big difference. While in the OOP world the patterns just tell you how to structure your code, in the FP world a pattern also provides a production-ready implementation so you don’t have to implement (and test and debug) it. I think this is a great plus of the functional programming approach.

So how we can satisfy our needs with an FP pattern like we’ve done before with the `Applicative`

typeclass? The answer is one: the **Comonad** typeclass.

*Disclaimer: we are not going to make a deep explanation of the monad and comonad concepts. If you are interested in it I suggest that you read the Category Theory for Programmers book (freely available online) by Bartosz Milewski where an entire section is dedicated to the comonad concept.*

The `Comonad`

typeclass provides some functions that could be very useful for us; in particular it provides the `coflatten`

function that basically, given a `F[A]`

, provides an `F[F[A]]`

; you can see it is as the dual of the `flatten`

function of the `Monad`

typeclass. The implementation of this typeclass could be trivial or not, depending on the data type we are working with, but in any case it must satisfy some specific laws as any other typeclass.

Since we are working with a list of bookings let’s see how the `coflatten`

function works on a list. In order to work with the Comonad typeclass we have to work with a `NonEmptyList`

data type (it is required in order to satisfy the Comonad definition and laws). As we know, a list can be defined as a chain of `Cons`

instances terminated with a `Nil`

value:

`NonEmpytList(1,2,3,4) = Cons(1, Cons(2, Cons(3, Cons(4, Nil))))`

Given this representation the `coflatten`

method takes a list, like the one above, as input and returns a list as output where every element of the output is a `Cons`

instance of the input. In this example, the `Cons`

inside the list are the following:

```
Cons(1, Cons(2, Cons(3, Cons(4, Nil))))
Cons(2, Cons(3, Cons(4, Nil)))
Cons(3, Cons(4, Nil))
Cons(4, Nil)
```

and so, the `coflatten`

will return:

```
coflatten(NonEmptyList(1, 2, 3, 4)) == NonEmptyList(
NonEmptyList(1, 2, 3, 4),
NonEmptyList(2, 3, 4),
NonEmptyList(3, 4),
NonEmptyList(4)
)
```

This is basically what we need in order to compare the bookings; we could rely on this method in order to find out the related bookings.

```
def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
candidates
.filter(areRelated(b))
.map(c => (b.id, c.id))
val bs = NonEmptyList.fromListUnsafe(bookings)
Comonad[NonEmptyList]
.coflatten(bs)
.map(l => findRelated(l.head, l.tail))
.toList
.flatten
```

Let’s see if we can simplify this code. First of all, we are executing the sequence of operations `coflatten`

and `map`

; they can be replaced with a `coflatMap`

:

```
def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
candidates
.filter(areRelated(b))
.map(c => (b.id, c.id))
val bs = NonEmptyList.fromListUnsafe(bookings)
Comonad[NonEmptyList]
.coflatMap(bs)(l => findRelated(l.head, l.tail))
.toList
.flatten
```

Given this new implementation, we can lose some typeclass constraints; we are using just the `coflatMap`

function on the comonad typeclass; Cats provides a less specific typeclass named `CoflatMap`

, that basically provides just the `coflatMap`

function:

```
def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
candidates
.filter(areRelated(b))
.map(c => (b.id, c.id))
val bs = NonEmptyList.fromListUnsafe(bookings)
CoflatMap[NonEmptyList]
.coflatMap(bs)(l => findRelated(l.head, l.tail))
.toList
.flatten
```

Since the laws that describe the `CoflatMap`

typeclass are less strict that the `Comonad`

ones Cats can provide an instance also for the `List`

data type, so we can finally drop the `NonEmptyList`

annoyance:

```
def findRelated(b: Booking, candidates: List[Booking]): List[(BookingId, BookingId)] =
candidates
.filter(areRelated(b))
.map(c => (b.id, c.id))
CoflatMap[List]
.coflatMap(bookings)(l => findRelated(l.head, l.tail))
.flatten
```

Finally, but this is a matter of taste, we can inline the `findRelated`

function:

```
CoflatMap[List]
.coflatMap(bookings)(l =>
l.tail.filter(areRelated(l.head))
.map(c => (l.head.id, c.id)))
.flatten
```

That is it. This is the full implementation for our requirements.