Azure API Management performance testing with Locust

Azure API Management (APIM) is one of the main integration components in the API driven world today. It's a plaform for abstracting API details from client applications, making them more resilient to change.

APIM in it's most basic form is just passing on the request from client to API, but in many cases something needs to be done on APIM level to validate, handle, adjust or fix something in the flow. This can be done using a policy. In APIM policies can be as simple as just validating the subscription key, or as complex as totally reconstructing the request or response body.

It is tempting to put all kinds of logic in policies, but we have to be careful with this. One thing is we don't want to turn APIM into an ESB, centralizing all different integration logic in the API management layer. Another thing is the impact on the performance of APIM, when doing too much processing in policies.

This blog post is taking a look at the impact of policies on the capacity of APIM.

APIM performance aspects

APIM is a SAAS solution, which means you ‘rent a tier’ and pay for the capacity. As the pricing between tiers is quite different, it's important to select the right tier for your needs.

  • Consumption -> capacity n/a
  • Developer -> 500 req/sec (€40.51 per month)
  • Basic -> 1000 req/sec (€124.11 per month)
  • Standard -> 2500 req/sec (€579.11 per month)
  • Premium -> 4000 req/sec (€2357.17 per month)
  • Isolated -> 4000 req/sec (in preview, not yet known)

Of course, capacity is not the only criterium based on which you chose a tier, VNET support might be more important for example. However, if you're not aware of the performance implications of the policies you plan to use, you might select the wrong tier.

When you're lucky, you find out during performance tests before go live, otherwise you need to fix it on the spot while being in production already. Fixing performance issues for APIM means either scale up to a higher tier or scale out by increasing the number of units. Both solutions have pricing impact, although a tier up is more expensive than adding a unit. Next to pricing, moving to a higher tier also changes the static IP address APIM uses and you might depend on in your firewall, which makes it a more complex and impacting change being in production.

Another thing that impacts APIM performance is enabling diagnostics. When you log all requests and responses in Application Insights, it will hurt performance. For this blog post however, the focus is on policies.

Baseline setup

To be able to see the impact of policy changes, we need to start with a baseline test. We use an APIM Basic tier, the lowest tier with an SLA and which is expected to provide us 1000 requests per second.

Locust

For load testing we use Locust. An awesome tool I stumbled upon looking for an API load test tool. The beauty of this tool is it's simplicity and the fact you can run it as a set of containers to generate the (massive) load you need and which you can't generate locally. Heyko Oelrichs (Customer Architecture & Engineering at Microsoft) is also enthusiastic about it and wrote this blog post about it. He created a Terraform script which creates a master and a configurable number of worker containers in Azure.

We need the capacity to generating at least 1000 requests per second, therefore we setup Locust with:

  • 1 master container, the orchestrator
  • 6 worker containers, as one worker can fire around 300-350 requests/sec

After terraform apply, you'll find the resources in the Azure resource group you specified on the command line. The naming of the resources looks a bit odd, but you can use Terraform to generate unique names based on animals. This is an alternative to having GUID-like names, which are way less readable.

Azure resource group

As specified in Heyko's blog post, in Key Vault you can find a generated password you need to access the portal on the master (username ‘locust’). In my case the portal can be found here: http://powerfulcrawdad-locust-master.westeurope.azurecontainer.io:8089/. In the portal it's just a matter of specifying the number of users, ramp up number and target host before start swarming.

Locust master configuration

When you click start, a script will be run against the host. This script is defined as a Python file, where you can do whatever you need to do. You can let it run single call, or compile a full set of steps to for instance simulate what a user would do.

In our case, we keep it very simple. A script to do an HTTP GET with a variable delay between the calls:

import time
from locust import HttpUser, task, between

class QuickstartUser(HttpUser):
    wait_time = between(0.5, 2.5)

    @task(1)
    def get_test(self):
        self.client.get("/test-api/api", headers={"Ocp-Apim-Subscription-Key": "986a7087f277449ea95b77176b02b173"}, name="GET test")

With Locust ready, we need to setup APIM with a basic GET operation. In APIM we do as little as possible, policy wise, to see how close we get to the 1000 req/sec

  • plain GET operation
  • API operation level policy to immediately return HTTP 200 to the caller

APIM GET scenario

<policies>
    <inbound>
        <return-response>
            <set-status code="200" />
            <set-body>Hello from return-response policy</set-body>
        </return-response>
        <base />
    </inbound>
    ......
</policies>

Baseline capacity

To see the number of requests APIM can handle, we look at the Application Insights Live Metrics. These metrics show you a live overview of the performance of the platform. Next to that, we can use the Locust charts to see how it behaves from the client point of view.

We're going to run the test with 2500 users, leading to the following Locust charts for number of requests/sec, response times and number of concurrent users.

Locust chart for 2500 users

As you can see in the top chart, the number of requests per second ramp up and APIM is capable of handling this load. It shows around 1400 requests per second, but the red line on the top chart shows connection errors occur overy now and then (‘Connection aborted.’ / ‘Connection reset by peer’). The number of failures is low, but any failing request It seems the requests fired at APIM is about what it can handle.

Below the live metrics graphs on APIM. On the bottom section you can see we got two servers on our Basic tier, this is to be able to meet the SLA (on Developer tier you'll get only one).

APIM live metrics for 2500 users

The CPU capacity for both servers reach 100%, also indicating it's close to it's limits. The request duration is with around 1 ms very fast, but that's easy to explain with only a directly returning response policy. Although the incoming requests graph is fluctuating, the number of requests per second is more than the promised 1000 req/sec for the Basic tier.

So far so good, we confirmed the Basic tier indeed provides what it promises. Next is finding out what the impact of policy changes is.

End-to-end capacity

In the baseline results we saw how APIM reached around 1400 rec/sec with a policy where the requests wasn't sent to a backend API. What if we do link a backend API to APIM, what would that mean for the capacity? In our case we add an Azure Function as backend, just returning a small response when receiving an HTTP GET.

APIM GET with backend API scenario

In the Locust chart for this configuration, the response times show a spike in the beginning. This is due to the cold start of the Azure Function, which is running on a Consumption tier.

Locust chart GET with backend API scenario

More interesting is the APIM side of things, where the live metrics show a noticable drop in requests per second. The screenshot shows around 700 requests per second, while the CPU utilization is almost at 100%. This indicates APIM is at the max capacity of what it can handle in this scenario.

APIM GET with backend API scenario

Interesting is also to compare the request duration between this test and the previous one. Having a real API as backend obviously increases the response times, but the difference is a factor 1000x. This partially is due to the consumption plan the Function is running, but as that scales with demand it's clear we no longer achieve the 1000 req/sec on the Basic tier.

Request/Response validation

As a gate keeper, APIM is important in making sure invalid requests don't reach the backend API. Invalid requests can be accidentally generated, but also purposly, which could be a deliberate attempt to damage or hack your backend.

Next to validation as security measure, when backend API's only receive valid requests, the capacity reserved for the API can be optimized, which means lower costs.

Recently Microsoft added policies to validate requests and responses. This includes content, parameter, header and response code validation. However, what is the impact of request validation on APIM? If this requires more capacity (=costs) on APIM side, is it worth using them?

To see the impact of request validation, we configured this piece of validation policy:

<validate-content unspecified-content-type-action="prevent" max-size="200" size-exceeded-action="prevent" errors-variable-name="requestBodyValidation">
    <content type="application/json" validate-as="json" action="prevent" />
</validate-content>

This policy returns HTTP 400 bad request (as result of action='prevent’) when:

  • the request body is over 200 bytes (max-size attribute)
  • the content-type is not application/json
  • request body doesn't pass validation based on the request definition (OpenAPI spec)

To see the impact of this validation on the capacity, we use the following script to fire POST requests from the workers.

    def post_test(self):
        self.client.post("/test-api/api", headers={"Ocp-Apim-Subscription-Key": "986a7087f277449ea95b77176b02b173"}, json={ "name": "test-apim-policy-performance", "type": "performance", "status": "scheduled", "nbr": 3 }, name="POST test")

This request body is valid and will pass the policy in APIM, before forwarding the request to the backend Function. It's interesting to see what content validation for this rather small request body does to the capacity to handle requests from clients.

Locust chart request body size validation test

The live metrics from APIM, shows why Microsoft warns for the performance implications of the content validation policy. The number of requests APIM is able to handle, drops from around 700 to less than 500 per second.

APIM request body size validation test

Content validation is an important aspect of an integration platform like APIM, but the impact of having this in the policy seems rather big. You can imagine having this in each of your API's will degrade the capacity severely. So you have to find a balance between the APIM capacity and the validation requirements you have. Alternatively you might want to move validation to API level, as that might be cheaper.

Final thoughts

What is the performance impact of having logic in an APIM policy?

It's a best practice not to put too much logic in APIM, because it's hard to (unit)test, debug and maintain. Most of the time it's better to put the logic you need in an Azure Function, where the above topics are much better and easier to achieve.

However, content validation and authorization token validation are typically things you would put in a policy in APIM, to centralize security or to offload the backend API. When you do these things in an Azure Function, you'd need to use a policy anyway, to call the Function with a ‘send-request’ policy.

In the examples shown in this blog post, it's clear that small pieces of policy already have impact on the capacity APIM has left for you after processing the policy. So it's for a reason Microsoft explicitly mentions performance implications in their documentation.

I encourage you to do performance tests on APIM, when using policies or in general, in order to know the impact of a decision before going live. Locust is a really good and easy to setup tool for executing these tests. Next to the Locust portal, there also is a headless version. This allows you to spin up a Locust environment unattendedly, as part of you CI/CD for example, and tear it down afterwards.

If you have any comments or remarks, you can reach me on Twitter @jeanpaulsmit.