Max – Tell me briefly what your team is up to @ Discovery?
I’m one member of a devteam which takes care of external API’s. We build services for streaming TV (OTT), for example user management, authentication, user entitlements and playback services.
We started this effort to enable a more flexible way to assign entitlements to users, for example through affiliates (like cable TV companies and mobile phone operators). Before, you had to be a subscriber at Discovery to be able to view any content. Being a direct subscriber is called the DTC model, ”Direct To Consumer”. But a large part of the business in some markets still comes from the so called “TV Everywhere” model where the customers come in via affiliates.
Share a little about the existing tech stack, what are the main ingredients?
Everything is running on Kubernetes on AWS. There is no legacy running on dedicated instances or in a non-cloud datacenter. All databases are utilizing managed services on AWS, such as RDS and DynamoDB. We use Kinesis and Kafka for data streams and queues.
We are deploying to multiple regions across the world to provide low latency and of course in multiple availability zones to provide high resilience to failure.
All applications are written in Java or Kotlin. We use two frameworks, Vert.x and Spring Boot with Webflux. For relational databases, we use PostgreSQL.
The old monolith, how does the pain materialize?
I wouldn’t really refer to it as the “old monolith”, but…
In this case, we noticed that we didn’t have a way to grant entitlements (the right to watch something) to users if they didn’t buy a subscription. And since affiliate users wouldn’t buy a subscription, we needed to extract the possibility to grant entitlements to a user.
In the near future, entitlements can be granted in many different ways.
Share a little bit about the upsides of the new architecture?
A few upsides that we have experienced so far:
- We have removed sources of incidents, including limiting the impact of them.
- We can build new things without disrupting the old.
- We have also enabled our system to continue to serve real time requests, even if the backend should experience a glitch. In other words, it’s more robust.
- Step by step improvements in the systems scalability and availability.
- We can now build other imaginative features, such as granting users entitlements based on any business event.
What were the first steps?
Not sure I have a good answer to this question, only an honest one. The first steps were to decide that we didn’t want the complexity of partner-granted entitlements to be added to a service which already had too many responsibilities, and was already in progress of being split up for that reason.
I suppose the next step would be to decide the incision point. Where do we make the cut? What responsibilities should stay in the old service, and what goes into the new service.
Then we spent a lot of time trying to forecast all of the different future things this service might need to do. Probably too much time in my opinion. But you solve these things as a team, not by yourself. Everyone involved needs to be onboard.
After exploring, we decided on the minimal set of features the service would need to have before we could take it live. We kept future expansions in mind, but avoided implementing everything. We built what we sometimes refer to as a “walking skeleton”.
We now have the new service in production, integrated with several other services. All of the downstream services are integrated via events, which we can attribute much of the robustness to.
What has been tricky?
Defining and agreeing on the boundaries of the new service. What should be the responsibilities? What should not? But this we kind of expected.
What surprisingly turned out to be tricky and in fact a non tech area, was defining a shared vocabulary. What’s a ”subscription”? What are ”entitlements”? What’s a ”product” and a ”package”? When you move responsibilities between services, the meaning of some words might change. Before, all users with premium content were subscribers. Now they are not. Or are users coming from affiliates also subscribers? Well, yes, but they are not subscribers with Discovery, so not in our systems, no. Changing the meaning of a word has large implications. Will you go through the entire system and codebase and documentation and change it? Or should we make up a new word for this?
What would you recommend for someone who is keen on taking similar steps?
Oh, I don’t know. One thing that has been really appreciated is a presentation I did about why we’re building it, who’s building it, what it is and how it is built. I’ve run the presentation 5 or 6 times. I think it is easy to underestimate the importance of internal communication. Don’t be afraid to share what you’re up to. And you will need to repeat the message.
Communicating what you are up to and why also helps in discovering if you’re building the wrong thing! If that is the case, better to find out sooner rather than later!
Thanks Max for sharing what you’ve learned!