all Technical posts

Resiliency Policies in Azure Container Apps

In a distributed system, components may fail, networks may experience delays, or nodes may become unreachable. By designing a system to be resilient, you ensure that it can gracefully handle failures and continue providing services. Azure Container Apps recently released a new feature to effortlessly overcome any outbound dependency request failures. Let's try it out.

In November 2023, two kinds of resiliency policies were released: “Service Discovery Resiliency” and “DAPR Component Resiliency“. This article focuses exclusively on Service Discovery Resiliency.

Key points about the Service Discovery Resiliency:

  • Resiliency policies are configured on the container apps being called.
  • DAPR operates behind the scenes, but there is no sidecar to be configured: it is completely transparent.
  • To use this resiliency feature the caller must discover the callee using the ACA service discovery. This means the target ACA should be called by name using one of the following formats: http://destination-app or http://destination-app.{random}.{location}.azurecontainerapps.io.

Four types of resiliency policies are supported: Timeouts, Retries, Circuit breakers, and Connection pools. You have the flexibility to choose either one individually or any combination that suits your preference.

Let’s implement a simple service-to-service communication and apply a retry and a circuit breaker. You can find the source code for this lab (including containers and infrastructure) on this GitHub repo.

 

Retries and Circuit breaker

To explore this scenario I have created two extremely simple containers: ‘aca-generator’ generates N orders and calls the ‘aca-processor’ sequentially, and the ‘aca-processor‘ processes the order and writes the status of the processing in the console logs.

The aca-processor implements the following dummy logic:

  • if the order amount is less than 60 and returns HTTP 200.
  • between 60 and 80 returns HTTP 429 (and randomly 200 when retried).
  • greater than 80 returns HTTP 503.

The resiliency policy is applied by deploying this bicep file.

  • The retry policy allows a maximum of 5 retries only when the failed response contains a specific header (x-retriable-status-code=true).
  • The circuit breaker policy kicks in after 3 consecutive errors and waits 10 seconds before re-evaluating the circuit.

To test the resiliency configuration, we can invoke the aca-generator service using the swagger ui and examine the result in the aca-processor log stream.

Result

In the following picture (Azure Container Apps Log Stream) we can see the resiliency policy in action.

The case of Kelly Moore:

  • 1 – acaprocessor returns 503
  • 2 – the resiliency policy retries the call an additional 2 times and then the circuit breaker kicks in.
  • 3 – there are no other calls for 10 seconds and then the circuit is re-evaluated.

The case of Alford Mosciski:

  • a – acaprocessor returns 429
  • b – the call is retried but it fails a second time
  • c – at the third try the call succeeds and a HTTP 200 is returned to the caller.

If we look at the replica details of our aca-processor app, notice that there are no sidecars even if DAPR is used under the hood.

Conclusion

In conclusion, resilience is vital in distributed systems, and Azure Container Apps’ recent Service Discovery Resiliency feature empowers users to effortlessly manage outbound dependency failures. This transparent and flexible solution ensures application robustness with no changes in the application code. Great feature, ContainerApps team!

Subscribe to our RSS feed

Thanks, we've sent the link to your inbox

Invalid email address

Submit

Your download should start shortly!

Stay in Touch - Subscribe to Our Newsletter

Keep up to date with industry trends, events and the latest customer stories

Invalid email address

Submit

Great you’re on the list!