How Netflix Scales its API with GraphQL
Last Updated :
12 Jun, 2024
Netflix is said to be a subscription-based streaming service that allows users to watch TV shows and movies on any device, given it is connected to the internet. It is really popular because of streaming exclusive content and in 4K resolution. But behind the scenes how do they manage it? How do they reach out to so many users seamlessly?
It was possible because of many things and GraphQL was one of them. GraphQL is an open-source query language and server-side runtime that helps to specify how clients should interact with application programming interfaces. In this article, We will focus on “How Netflix Scales its API with GraphQL” in detail.
Introduction
- Netflix became popular for its streaming services by reaching out to a large number of users. They scaled their services to accommodate this growth. Dealing with the increasing complexity of data and its relationships was a challenge in the past.
- According to the Netflix TechBlog 2020, the Netflix API team found the Apollo Federation specification as the ideal way to scale their GraphQL architecture.
- In this model, the individual GraphQL schemas (tables) tend to become subgraphs (flowchart-like things) which are composed into a unified supergraph (bigger chart). In this way, they also retained their integrated “Consumer Edge'' API which is actually how businesses get to know their consumer behavior. As a result, they could provide faster delivery without compromising the customers usability experience.
- They do not have any official documentation about their API use but the developers get to access the data about movie reviews, ratings etc from their (Netflix) data catalog.
History
- They launched their public API on October 1st 2008, they had a blog, code samples from developers etc. They also developed applications on their API like InstaWatcher, WhichFlicks etc.
- Earlier Netflix app used a different graph-API technology called Falcor. In 2012, GraphQL did not exist so Falcor was used both have similar concepts but by 2020, GraphQL was way more popular so the latter is used.
- They use federation in their API. It can be explained as a way of breaking the API into pieces that can be further developed independently, as it tends to handle a single domain. By July of 2019, Netflix started building a GraphQL gateway based on Apollo's reference implementation.
- They used Kotlin (used for Java) to get access to their Java ecosystem for efficient fetching etc. This federation has resulted in explosive growth over the years.
GraphQL and Federation
- GraphQL is an open-source query language and server-side runtime which is built around the concept of "get exactly what you asked for" without any under or over fetching of data. For example, consider GraphQL as the grocery list for our API.
- We just need to specify what data we need and the server (grocer) delivers just that. It is considered as a successor to REST APIs. GraphQL makes it easier to gather data from multiple sources and uses a type system to describe data (rather than multiple endpoints).
The principle behind Federation
- Firstly it breaks down a large API into smaller, independent services, called microservices, mainly focused on some specific data domains like, user data, movie data. Then each microservice may be further developed, scaled, and updated independently, increasing flexibility.
- Now a central unified gateway collects the data from all of these microservices and also combines the individual schemas of the latter into a unified GraphQL schema for that whole API.
Note: Federated Graph service is all about combining multiple GraphQL APIs into a single, federated graph. This federated graph enables clients to interact with multiple APIs through a single request only.
How is Federation implemented and used at Netflix?
At the Netflix application, we see the LOLOMO screen first. It stands for the list-of-list-of-movies (the mainpage of Netflix filled with lots of recommendations and popular TV shows or movies). And it is actually built by fetching data from many microservices such as:
- Service that returns a list of top 10 movies or series.
- Artwork service that provides personalised images (like posters or thumbnails) for each movie.
- Movie metadata service that returns the movie titles, actor details, and descriptions.
- LOLOMO service that provides what lists to actually sets up the user’s home page based on the user’s preferences or activities.
From the image given below, the microservices are called DGS or Domain Graph Service which is an in-house framework developed by Netflix to build GraphQL services. When they started using GraphQL and its Federation, there wasn’t any Java framework available which could have been old enough to use at the Netflix scale. So they used low-level GraphQL Java framework and expanded it with features like code generation for schema types and support for federation. But at its core, a DGS is just a Java microservice with a GraphQL endpoint and a schema.
For example, the LOLOMO DGS is said to define a type show using only the title. After that, the images DGS can enhance that type show by maybe adding an artwork URL or any image to it. These two separate DGSs tend to operate independently and do not share information with each other. They simply need to publish their own schema to the federated gateway only which is capable of communicating with any DGS through their GraphQL endpoint only.

How Netflix Scales its API with GraphQL?
- It starts by breaking your API apart it into chunks that can be developed independently, as a single domain usually. They are usually implemented by domain experts. Then a graph-aware gateway, which is a central junction in this architecture, ties them together into a single API.
- But it doesn't contain any business logic. It tends to follow a declarative configuration that tells it which data comes from which service. This is the federation used for scaling.
- There are usually three components in a federated architecture, namely, graph services, schema registry and graph gateway. The graph services consists of GraphQL servers only. They display only a portion of their overall schema and publish it via schema registry.
- This registry mainly holds schemas for all of the services. And the gateway usually takes single query from client and breaks it into sub queries that later executed against the servers. They tend to process the request in two ways, query planning and execution.
- Query plan looks through the client request and collects the related fields for each service. Query plan execution traverses through the entire query plan starting from root node in either parallel or in sequence and merges the overall response.

Example

Sample GraphQL code corresponding to the above image can be written as
type Query {
recommendedVideos(first: Int): [Video]
}
type Video {
videoId: Int
title: String
description: String
boxartUrl: String
rating: Rating
matchScore: Int
trailer: Video
}
enum Rating{}
Using GraphQL for the consumer Netflix App
- Consider a really simple graph API. Starting at the root of the graph, which for GraphQL is called a query, can fetch the recommended videos for a user, where, you can run through each one of those videos.
- The video type has more fields that we can fetch, like title, rating. Here, a key takeaway about this graph API is that we can choose the properties or features that we want from a client's point of view and then later on follow relationships and maybe recursively select properties from other objects.
- But the actual Netflix graph is more complex than this with lots of fields and relations. To make it simple, we can just break it up into smaller parts.
- With GraphQL Federation, each distinct domain or logically meaningful portion of the graph is served by a different service.
- And then the API aggregation layer composes these together into a single unified graph. That's what it's all about. A big picture broken into smaller pieces and these pieces complete a puzzle together.
Query Plan And Query Plan Execution
- Now, look again at our initial schema, if we wanted to take the top 10 videos for any user, then for each, we want to fetch the title and the box art images to display first. We know we have to fetch the top recommended videos first, because we need those video IDs in order to know which titles and image URLs to fetch and that is how we create a query plan accordingly.
- The recommended videos are fetched first, and then at the same time, title is fetched from the video service and box art URLs are fetched from the images service. The fetch nodes are later traversed and executed which is commonly referred as query plan execution.
- These fetch-processes occur together in a parallel way. In a nutshell, the server parses, validates, and creates a data retrieval plan based on dependencies or a Query Plan and when the data is retrieved simultaneously based on the plan and gathered for the response, it is said to be Query Plan Execution.
Limitation of Federation
- Only Apollo Gateway is a ready-to-use, self-hosted federation gateway implementation; the rest are still under development and not fully functional.
- It has limited support for custom directives (instructions within GraphQL to increase functionality).
a. No built-in mechanism for federated directives.
b. Per-service directives, if any, get removed by the gateway.
c. Workarounds (or temporary solutions) exist but-unsupported.
- Service Startup: "Hello World" scenario assumes services are already running when the gateway starts which is not considered ideal for disaster management.
- It has type naming conflicts like the term "Service" which is commonly used by tooling can not be used for anything else.
- It does not support subscriptions currently.
Conclusion
In conclusion, Netflix's adoption of GraphQL and the Apollo Federation specification for its API architecture has been instrumental in scaling its services to reach a large user base. By breaking down the API into smaller, independent services and using a central gateway to compose them into a unified graph, Netflix has achieved faster delivery without compromising user experience. It has some limitations, GraphQL and Federation have enabled Netflix to handle the growing complexity of data and its relationships, ensuring the continued success of its streaming services.
Similar Reads
How to use GraphQL with Postman
GraphQL is an open-source technology that allows us to query only the data that we require, unlike the traditional REST architecture which returns us the entire resources and data with specific endpoints configured for the same. Using GraphQL, we specify using the query the data that we want, and it
5 min read
How to Write GraphQL Queries
GraphQL queries are a fundamental part of interacting with a GraphQL server, allowing clients to request specific data they need. GraphQL queries are used to fetch or modify data from a GraphQL server. They are written in a syntax similar to JSON and allow clients to specify the exact data they need
4 min read
Building GraphQL APIs with PostgreSQL
GraphQL and PostgreSQL are powerful technologies that play important roles in modern web development. GraphQL a query language for APIs, revolutionizes how clients interact with servers by allowing them to request specific data. On the other hand, PostgreSQL, an advanced relational database manageme
6 min read
What is GraphQL Queries
GraphQL is a powerful open-source Query Language for APIs. It is most commonly known for its single endpoint query which allows the user to define a single endpoint to fetch all the information needed. Queries in GraphQL allow us to retrieve the data from an API endpoint, and the data is what we spe
4 min read
GraphQL vs REST: Which is Better for APIs?
In the world of web development, communication between a client (like a web or mobile app) and a server is crucial. Traditional REST APIs have been the go-to solution for many years, but GraphQL is emerging as a powerful alternative that offers more flexibility and efficiency. GraphQL is a query lan
6 min read
How to Integrate GraphQL APIs Into Your React.js Projects
Imagine a React.js world where you can specify exactly the data you need and it flows effortlessly tailored to your needs, and react can efficiently update the DOM as and when needed. No more under-fetching or over-fetching of data or wrestling with nested JSON structures.This dream becomes a realit
11 min read
Lists and Non-Null in GraphQL Schema
GraphQL is a powerful open-source Query Language for APIs. In 2012 it was first developed by a team of developers in Facebook and then it was later made public to the general people by making it open-source. Now it is being maintained by the GraphQL community. GraphQL is most commonly known for its
6 min read
GraphQL Server Authorization with JWT
In the world of GraphQL, securing your server and implementing authorization mechanisms are critical aspects of building robust and secure APIs. JSON Web Tokens (JWT) provide a powerful method for handling authentication and authorization in GraphQL servers. This article will delve into the concepts
6 min read
Introduction to GraphQL with NestJS
NestJS is a progressive NodeJS framework that uses TypeScript to build efficient and scalable server-side applications. Combining NestJS with GraphQL, a powerful query language for APIs, offers a robust solution for creating modern, maintainable, and highly performant web applications. In this artic
3 min read
How to Build a GraphQL server with NodeJS and Express
GraphQL is the open-supply question-based language that is used for querying the data. The main task of GraphQL is to execute the given question return the appropriate facts and represent it to the person. GrapghQL is the advancement of conventional REST API architecture presenting greater features.
2 min read