Series of articles: defence in depth part 4 of 7

Secure APIs

The first three articles were about designing and obtaining an access token. We have also established the pattern for how we move from identities and scopes to entitlements on which we base all further access control.

In this article, we discuss what you need to do in your API implementation to protect your functions and data.

A strong defence in depth according to the principle of "Least privilege" means that we need to limit the rights of each call to an absolute minimum. We implement a strong and fine-grained six-step permission control:

Validate that the call is correct
Validate that the access token is correct
Transform access token to rights model
Validate that the data in the call is correct
Validate authority to perform the operation
Validate access to the data requested or affected

We go from a correct, verified access token (e.g. a JWT) to an object that represents our rights in the system. Using this rights model, we implement a strong fine-grained access control in steps 4, 5 and 6.

Note that in this model we have omitted what should be in front of your API in terms of infrastructure protection, e.g. a firewall, WAF, API gateway or similar. For example, a WAF might do an initial input validation for known injection attacks and we might do a basic permission check based on IP or trusted device.

One does not exclude the other, but complements each other. It is important that our API security does not rely solely on previous protective laws. We build defence in depth where our API can withstand public exposure, following the Zero Trust principle.

We can make similar arguments for steps 4 and 5, where the order between them may vary depending on the framework etc. For example, it is common to do a basic input validation early and then a deeper validation in your domain logic. The important thing is that this is done and that incorrect calls are aborted early, without consuming unnecessary system resources.

What we want to emphasize with the model is that these six steps are needed for a strong and fine-grained permission control, and that they are implemented with a forced pattern so that the core, your domain logic, is never exposed without permission control.

Kasper Karlsson

When I perform penetration testing, failures in these steps are one of the first thing I test. It is very common that it gives payoff. It's also a vector where I can extract data by going directly to the service instead of, say, going through another user's account.

Validate that calls are correct (step 1)

Validating the HTTP call itself may not be something you think about as a developer, but it is an important aspect of a secure API. If we choose a good web server, with secure basic settings, we get this for free. Examples of verification are that the call is in the correct format and is of a reasonable size. Some products may do a deeper analysis and reject a call that contains data that could be considered malicious, for example.

Validate that the access token is correct (step 2)

Verification of the access token included in the call should be done using the framework on which we build our API. You should only configure here, not implement this check yourself. Using JWT as an example, the framework needs to verify:

Correct cryptographic signature
Signed by the correct IdP (usually configured as a URL to our IdP)
Issued with an audience that applies to our API
Correct type, i.e. an access token and not something else
Valid, e.g. with regard to time

How a JWT is validated is defined by https://tools.ietf.org/html/rfc8725 There are also other types of tokens than JWT. For example, if it is a reference token, the reference needs to be translated into an access token first, by looking it up against the IdP (called "token introspection").

For higher security systems, access tokens bound to the client are often used, commonly through mTLS. If your access token is certificate bound, the binding to the client's certificate must be validated.

In order to verify a correct signature, we need key material for our IdP. How we get this may differ. It is common to make a lookup against the IdP using "JSON Web Key Sets" (JWKs). This is a protocol for retrieving the public part of our IdP's signing key. You can also choose to install the key material on the same machine that our API is running on.

An important aspect is that the IdP can rotate its key material. The vast majority of IdP products support JWKs and in practice we find that a good solution is for our API to restart daily and make a lookup at startup.

Normally, your framework should return a 401 Unauthorized if the call is made with an invalid access token.

Transform access token to rights model (step 3)

After we have validated our access token, we can transform the information it contains into an object of our rights model. The object contains all the information (rights) that we need in our domain logic to find out if the user has the right to the requested function and data. From this point on, we only work with our rights object, not with information from our access token.

See article 2, Claims-based access control for details on how to implement a transformation.

Validate that the data in the call is correct (step 4)

In this step, we validate the data sent by the user in the call against our domain model. For example, this could be that the phone number should have the correct format. If this validation fails, we always return a 400 Bad Request.

This early validation gives us a valuable opportunity to reject and log incorrect calls. A secure system does not normally generate any violations of this type of input validation. An automated alert based on logging of erroneous calls can catch many attack attempts and alert you that someone is trying to break into your system.

If you are implementing your API in a strongly typed language, it is a powerful pattern to choose a type that minimizes the problem of injection. Instead of declaring parameters in your API as strings, you might want to use integers, booleans, etc. This reduces the possibility of an attacker passing in values in order to break out of the intended function and change the meaning of the call.

Note that validation of input data is not a complete protection against injection attacks. There are many examples of situations where we need to allow the user to attach data without restrictions. Ultimately, data needs proper output encoding for the situation in which it will be used.

For example, a call to /api/products/1, where the attacker instead of the value 1 attempts to send a string containing an injection attack. If the declaration in our API of the product id parameter is of type integer, then the framework handles the validation for us. If the product id is of type string instead, we need to verify ourselves that the value is, for example, only numbers.

Indata validation is a layer of our defence in depth. A strong pattern for reducing attack vectors is to move to domain primitives as early as possible, so that your domain logic only works with domain-specific types.

Good examples of this deep input validation are customer ID and number. Customer-id is not an unbounded string that can contain any characters. In your domain, customer-id might consist of three uppercase letters and three numbers. Similarly, number is not an unbounded integer, but a number from 1 to 100. This makes it impossible to place an order of, say, minus one or several thousand.

This pattern is taken from the book Secure By Design and limits the risk of injection attacks but also of denial of service or failure due to corrupt data. Security is also quality and availability. https://www.manning.com/books/secure-by-design

Validate authorisation to perform the operation (step 5)

In this step, we validate whether the user is allowed to perform the operation without fetching data or using other resources that create load on the system. If these rights requirements are not met, we return the call directly with a 403 Forbidden.

For example, a call to /api/products/1, which requires the right to read products and that the user is in the "users" role. We can return a 403 Forbidden early on if any of this is not true, without having to do any lookup in the products database.

Validate access to data (step 6)

In the last step, we have normally done a lookup for the data to which the call applies, either for it to be returned, or for it to be modified or deleted. For example, at this stage we can verify that the data to be returned belongs to the user's organization or role.

Sometimes it is easy to determine whether the user is entitled to the data requested. In other cases, complex domain logic is required. An example is search functions where we may need to do the permission check late, after the data has been read.

If this verification fails, we usually have two return codes to choose from. 404 Not Found is appropriate if we do not want to reveal that the user has requested data belonging to another organization, for example. 403 Forbidden may be appropriate if we want to alert the client that the data exists, but is not available due to her rights model.

The actual entry of data into a database, for example, should be done with a database account that provides the most limited access to the data possible. If an API needs to read parts of all the data contained in a database, it should be given that account that is restricted to that purpose. "Least privilege" applies to both user accounts and system accounts.

For example, a call to /api/products/1 where the user does not have rights to that particular product. We can choose to return a 404 Not Found in that case.

Note also that access to data can be valid even if it is a command to be executed, and no data is returned. A simple example is HTTP DELETE to /api/products/1 where we also need to verify that the user has permission to that particular product.

In our experience, it is a very common error not to verify that the user is entitled to the data returned or intended by a command.

Martin Altenstedt

I often see GUIDs used as IDs and it is argued that this is sufficient protection because it is difficult to guess a GUID. That may be true, but a GUID is rarely cryptographically secure and often the value is a direct reference to the object. If that value is distributed, it cannot be revoked, and access to the object is no longer controlled.

Logging and error handling

Attackers often take advantage of unexpected behaviour in the system. Typically, these vulnerabilities result from flaws in error handling that can put the system into an undefined state. For a secure API it is important to have full control over how the application works, part of this is a clear and consistent pattern of error handling and logging.

Central logging of errors in the system, combined with the use of the correct return status, is a prerequisite for detecting and responding to intrusion attempts. A system that in normal operation does not generate errors and returns 200-series HTTP status codes allows us to configure automated alarms for unexpected events.

For a system consisting of several APIs, it is easier if the logging is also centralised and correlated. This means that there is a central place where we can see the logs for the whole system, and also follow the thread of calls when one API makes calls to another, in a chain.

Today there are many good products on the market for centralised logging. An important feature that some of them offer is anomaly detection, i.e. automated alerting if we have deviations from the normal pattern of how our system works. This can be difficult to build on its own and provides a good means of detecting intrusion attempts.

Tobias Ahnoff

Although we should always avoid logging sensitive data, such as personal identification numbers or access tokens, logs are often sensitive data sources and therefore need to be protected in the same way as business data. There is a very high risk that personal data, for example, could end up in logs in one way or another.

If you perform an independent penetration test on your system, take the opportunity to verify that your logging solution detects the intrusion attempts at the same time!

Kasper Karlsson

In penetration testing, it is common for applications to leak sensitive data in case of failure, such as stack trace or even connection strings.

Martin Altenstedt

Remember to log for the right purpose and at the right level. Otherwise we risk missing important logs. What you need as a developer during development may not be the same as what you need for operational monitoring.

Summary

The article series so far has discussed how to design your system to implement strong access control. We have also highlighted the importance of input validation and centralised logging.

In the next article, we will look at infrastructure, such as transport layer protection and data storage.

An in-depth look at this article, with example implementation, can be found in the companion article "Secure APIs in ASP.NET"