Building event-driven systems at scale in Kubernetes with Dapr — Part I: Why Dapr?
First off, what is Dapr?
The documentation says distributed application runtime. If you’re like me, that didn’t really explain it any better. The TL;DR is that it makes building distributed systems a whole lot easier than it usually is. Dapr does this through a suite of components that you can configure that provide abstractions over infrastructure so that you don’t have to code directly against that infrastructure. The implications of this become obvious very quickly the first time you try it: you have no infrastructure code in your repository. There’s no need to import infrastructure packages and plumbing code to wire it all up. In the context of systems at scale, with varying database systems, Pub/Sub brokers, service discovery, programming languages, and more, this level of abstraction makes a big difference to the overall developer experience and ultimately time-to-market.
Getting to the point, why Dapr?
In my situation, I was leading a team who was tasked with building an event-driven system that needs to be elastically scalable with a near-real-time SLA. Further to that, we were venturing where our company had not yet gone: Public Cloud. Being a Microsoft house, Azure was the obvious choice. The next, most obvious choice, would be Azure Functions. They’re event-driven, elastically scalable and the barrier of entry to deploying is really low. So what gives? Why Dapr?
Ultimately, it came down to two major deciding factors / core tenets:
- (Legendary) Developer experience; and
- Vendor locking, or lack thereof.
1. Developer Experience
Let’s start at developer experience. Historically, developing software and being able to easily debug locally and knowing that what you’re debugging is actually what is running in live is something that’s rare — I’ve not experienced it in my career thus far. Development environments are flaky at the best of times, and complete blockers at the worst. So when we set out on this challenge, we made a conscious decision that developer experience in this new repository would unlike anything we’d ever worked with before. At its core is Docker containers. Anyone looking to run a new project should not overlook what a productivity multiplier it is to give a developer the ability to spin up all the infrastructure and applications they need to run and debug our entire system locally; regardless of their preferred IDE, operating system and their system setup. At a very minimum, all anyone needs to run our system on their local machine is Docker Desktop and a CLI. No tooling, no SDKs, no infrastructure installations. It’s as simple as clone-and-run. There were other productivity gains we found as we developed this system. This included a simpler approach to automated testing as we could use tools like Wiremock, or even spin up real database in our CI pipelines to test integrations. Ultimately, what we’ve achieved in the space of a handful of months would typically take a year to develop, maybe more — in large part because we’ve had no days lost to environments not working.
2. Vendor Locking
Now you might say to me, you can run Azure Functions in containers, and I’d have to agree with you. However, going Azure Functions means we become locked in to Azure. It also means, from an infrastructure perspective, we are limited to the function bindings available to us at the time of development. It also doesn’t give us the flexibility to easily deploy APIs that can provide entry into our system for users to consume without deploying separate infrastructure; most of which is bespoke to Azure. By being tightly coupled to infrastructure goes against our first core tenet of having legendary developer experience. So, when we set out to build this new system, we deliberately chose to be cloud-native and open source. This would give us the ability to take our software and host it anywhere if we needed. By choosing Kubernetes (Azure Kubernetes Service), and ultimately running Dapr applications in Kubernetes, we are essentially free of any vendor decisions. We can spin up a K8s cluster in any cloud, or even on-premise if we had to. That is entirely up to us, and we are in complete control of our future.
With the above two tenets in mind, the answer became obvious: Dapr. Dapr has given us the ability to spin up an entire development environment locally within Docker, alongside lightweight infrastructure containers that allow us to test functionality easily. Practically speaking, we run RabbitMQ for local Pub/Sub. We run Redis for local state storage. Yet in production we target Confluent Kafka as our Pub/Sub broker and Azure Cosmos DB for all state management — all with zero code changes. This level of abstraction and flexibility has ultimately resulted in us developing and deploying a fully working system into production in under 6 months. This gives us an edge in terms of time-to-market and flexibility to pivot at any point if something new comes along with very little effort.
In my next blog post, I’ll dig into some more technical bits and bobs as to how Dapr works. See you there! 👋