API design for cross-team collaboration

In BenevolentAI each team focuses on a different aspect of the drug discovery process. As a ML-driven company we aim to optimise this process in new and innovative ways using digital technology.

We work across the entire drug discovery and development process, which means the software from the different teams need to be able to ‘talk’ to each other. We make this happen through well-engineered and user-friendly APIs. If you’re reading this then you will probably already know what an API is. But if not, an API is an abbreviation of Application Programming Interface, which is defined as:

→ A set of functions and procedures allowing the creation of applications that access the features or data of an operating system, application, or other service.

An API is essentially the layer of software that other applications interact with. If you want to pull data from Twitter for your data science project, you will most likely use the Twitter API. Or if you want to build a bot that automatically posts updates to Facebook, then you will need its API as well. 

At Benevolent, most teams provide their software services to other teams by deploying the software in our shared cluster and giving access to it via an API. Good API design is paramount because it prescribes how users interact with your application. The adoption of your software will strongly depend on how easily other developers can understand its API and develop against it.

In this blog we will talk about the different types of APIs we use, the conventions we aim to follow and some other tips that can aid you in building your own API. The authors have learned a lot about this topic since joining BenevolentAI, and will share our most important lessons with you.

Types

There are various technologies which you can adopt to build an API. The most important aspects are the application layer it travels over - in Benevolent, it’s usually HTTPS - and the format of the messages it sends across - we mostly use JSON, but also plain text for simple APIs, or special formats for large data datasets.

The most common API categories we have at Benevolent are described below.


REST API

REST is the most popular architectural style for web services because it relies on the standard HTTP concepts, and thus requires little boilerplate code on the client-side to communicate with them. If you want to quickly build a simple API that you don’t expect to change much in the future, then this is a good option. The conventions for this type of API are well understood by most developers. 

For querying large and highly linked datasets, however, REST is not ideal. More often than not you will receive more data than you need, though API developers can build in additional filters or follow a standard like JSON:API to give the user more control over the output. But still, if your client needs to join multiple datasets together, then a REST API can be fairly inefficient. This is because you likely need to make multiple requests and join the results on the client-side. For these use cases, it is better to provide a GraphQL API.

GraphQL

With GraphQL you can make your API more scalable compared to REST. The reason is that a GraphQL client can convey in its request exactly which data it is interested in. The API returns only what the client asks for; nothing more and nothing less. Also because GraphQL allows complicated nested data, a client only needs a single request to get the data it needs. Most of our data-providing APIs use GraphQL because of these benefits.

gRPC

gRPC-based APIs are something that we only have started to experiment with recently at Benevolent. A big selling point of gRPC is that you can have a single schema, defined in protobuf, which is shared between the server and all the clients (independent of the programming language). In this sense, it is similar to GraphQL but it differs in the communication format it uses; gRPC uses protobuf, which allows much more efficient data serialization and deserialization than GraphQL. For this reason, it is popular in the low-power and resource-jealous IoT domain. Besides that, gRPC allows automatic generation of server and client code (called stubs), so you don’t have to worry about that tedious chore.

Also out of the box, gRPC brings other handy features such as pluggable authentication and load balancing. If you are new to gRPC and interested in implementing an API using it, have a look into this technology here.

When making a choice between these web-targeted technologies, one should also think about how flexible their API needs to be. A major trade-off with GraphQL and gRPC versus REST is that even though they are more flexible, they require more design time as a result. REST on the other hand comes with strong conventions that simplify the API design process.

Software libraries

Shared libraries that developers import into their software are a type of API too and can be a viable alternative to web APIs. This type of API is usually the easiest to integrate with because they can make use of all the language concepts that their clients use like classes, inheritance and methods. However, a major drawback is that you need to provide libraries in the programming language that your users will be using. In contrast, technologies like REST, GraphQL and gRPC are language-agnostic because there exist server and client implementations in most popular programming languages. gRPC is especially good in this respect because of its automatic generation of server and client stubs in different languages.

We encourage you to stick to these API types unless you have a good argument not to. The reason is that other developers will likely have some experience with these types already, and it’s therefore much easier for them to start using your API.

Guidelines

Irrespective of the combination technologies you decide to use, there are several guidelines considered as best practice at Benevolent. Here we have collected some of the guidelines we consider to be most important. They have been categorised in two groups depending on whether they concern the interface design or the supporting implementation. 

Interface design

Start small, collect feedback, iterate

APIs are hard to change later because the clients’ code depends on them. For this reason, start with a small and simple specification, such that it is easy to extend later on. You can always add, but not remove without breaking clients. After you have your first users, ask their feedback, then modify and repeat iteratively.

Design with change in mind

You will not get the interface design perfect the first time. By designing your API in a modular way, you can allow it to be used in different ways in the future.

Keep it simple

When in doubt, leave it out. This cannot be stressed enough. Adding extra functionality that will not be used only steepens the learning curve and can confuse the user. And remember, if you allow it, there will be likely at least one user who will abuse it. 

Think in the form of use cases 

Ask what kind of queries your customers will most likely make - get them right! When considering which queries to allow, your starting point should be the goals the user is expected to have. Will your API primarily apply specific calculations on the input, provide access to big data sets, or provide operations for controlling a tool?

Naming is important

Better naming makes the API easier to learn and use. Use the same names as the domain experts use, or what the users of the API will likely use. Use a consistent naming style. For example, it is common to use nouns for resources and verbs for operations. Singular and plural nouns are used for single and collections of resources, respectively.

Follow the conventions of your chosen API technology 

Do not violate the principle of least astonishment. Every technology has standard ways of using it. Find out about those and apply them.

Prefer stateless over stateful

Keeping communication to your API stateless makes it easier to reason about. The reason is that as a client you don’t need to keep track of the state of the server. Also, server implementation becomes simpler because it can handle each request in isolation. However, exceptions can be made when this leads to complicated or excess overhead as a result of needing to send the context for every request.

Document, document, document

Document every class, resource, function, and endpoint that the users have to interact with. For operations, it must be apparent what the preconditions and postconditions are, and whether it has any side effects. 

Preconditions are the input constraints of a function, and postconditions are the output expectations. For RESTful APIs the HTTP methods are an example of expected side effects: GET must not change the state, POST usually creates new resources, PUT is for updating and must be idempotent.


Tech

Hide the implementation

How the API is implemented should not concern the users. All they need to think about is the contract with your API; what kind of input it accepts and the promises it makes.

Generate documentation automatically

By making code documentation generation part of your CI (Continuous Integration) pipeline, you guarantee that the documentation is always up-to-date and aligned with the code.Depending on the chosen technology there are different ways to document. For example, for REST APIs Swagger is the standard. For GraphQL APIs you can use GraphDoc.

Fail gracefully, provide useful errors

Errors are practically unavoidable. Be it on the client-side, the server-side or somewhere in between. For that reason, your error handling needs to minimise the negative impact of anything that can happen unexpectedly. There is much to be said about proper handling of errors, but generally, it’s a risk assessment that you, as a developer, need to make for each type of error to decide the best way to handle it.

However, no matter how you choose to treat errors, communicate clearly to your user what went wrong and what they can do to solve it (if anything). Preferably use error codes or unique error names so that clients can distinguish between different errors and handle them accordingly.

Provide a staging environment

In Benevolent, the convention is to provide a development (dev), staging (stg) and production (prd) instance of APIs. Development is meant just for the service API team to test. Staging allows other developers to test their clients against a future version of the API before it’s pushed to production. Production should be stable at all costs because that is what other production-level products will depend on.

Use semantic versioning for the API

Semantic versioning (semver) mandates that minor versions of software shouldn’t break clients which depend on it. Between major versions, on the other hand, compatibility does not have to be guaranteed. This rule, when applied to API versioning, prevents it from breaking clients between minor updates. 

Also, don’t change the major version invisibly. For example, if your API is hosted at a certain URL, do not deploy a new major version to that URL which can break the clients (unless it’s dev or stg).

If you get into the situation where you need to support multiple major API versions, it is better to have them deployed simultaneously and require the user to explicitly choose one. Of course, aim to minimize the number of API versions you support. If you expect to be in this situation in the future, it is good practice to start versioning with v1 from the beginning.

For web APIs a good url structure is: 

http://api.{product_name}.{domain}/v{major_version_number}/

E.g. http://api.myproduct.mydomain.com/v3/

This way you can support multiple versions at the same time.

Version the software

Besides versioning the API, consider versioning the software that drives it as well. Sometimes the behaviour of the API can change due to software changes, while this is not captured in the API version. By keeping track of CI pipeline ID or the git commit SHA, and exposing that to the user, it is possible to trace back to the specific software version. For software libraries, the API version and the software version are usually the same.


Final words

What is so interesting about APIs is that they have to be both understandable by humans and machines at the same time. This means that they should be intuitive and user-friendly, but also formal and strict. This presents the challenging task to the developer to make their API a joy to use and well-engineered, as this will determine its popularity with other developers. Hopefully, this blog has given you some insight into how we do APIs at Benevolent. Now go build awesome APIs yourself!

About Martin // I’m a software engineer at BenevolentAI, where my work mostly revolves around building data infrastructure for the knowledge graph and chemistry optimisation linkedin.com/in/mthaak

About Nik // I'm a full-stack software engineer at BenevolentAI working on tools for drug discovery that integrate with lots of internal APIs linkedin.com/in/nikhaldi

Further reading

How To Design A Good API and Why it Matters - this is an old but still very relevant presentation 

https://www.youtube.com/watch?v=aAb7hSCtvGw 

Google’s REST and gRPC guidelines

https://cloud.google.com/apis/design