In this fourth blog post of the series about our Tech ecosystem we introduce the principles and practices we've put in place to implement a devOps model and mindset as well as how we're going to implement a comprehensive observability strategy.
Team’s autonomy is one of our pillars: we want to decentralise the responsibility of the infrastructure and empower functional teams to self use the platform. There are different benefits on increasing the team’s autonomy:
To facilitate the development of the culture supporting this way of working, we recently introduced the DevOps role, which will collaborate with Site Reliability Engineers and Developers to extend the accountability for development teams on underlying services hosted on Cloud environments (shared responsibility model).
Below are some of the activities that will be taken care of by developers, with the support of DevOps:
and much more.
In order to support our vision, the CI/CD has been designed to be very flexible: modular pipelines allow us to manage projects based on different languages like Python, Java, Golang, .Net, etc.
As Kubernetes is our main technology, pipelines are capable of building different types of manifests applicable to kubernetes, like, deployment, cron jobs, configmaps, secrets and so on.
GitlabCI runners are hosted on AWS EKS with Cluster Autoscaler and Horizontal Pod Autoscaling in order to dynamically resize the number of nodes and pods replicas based on the developer requests.
Spinnaker allows engineers to deploy microservices across different Kubernetes clusters (both AWS and on-premise).
In the future the entire CI/CD workflow will be centralised on GitLab including the Kubernetes deployments.
Logs and metrics are an interesting source of information, but they can easily become ‘noise’, if not well managed.
Governance becomes even more important when enabling Observability which is the case for us, and therefore:
We created an internal framework (contracts) to describe the structure of the data in the logs, the max size allowed, and the threshold rate so that, instead of tracking every event, we throw alerts that highlight issues, and enable us to react in the smallest amount of time:
We identified three macro clusters of metrics, that provide insights on our products at different levels:
As a result, our dashboards are now easier to read, smaller, and more useful to the business.
The next steps we have in mind are:
This is the fourth in a series of articles where we talk about our pink world. If you want to discover more, read:
iOS localization is a wild ride where device and app locales play by their own rules. But don’t worry, after some chaos, Apple’s settings actually matched our expectations. Of course, only after a few twists and turns [...]