How to prove your Ratpack app is non-blocking.

Using the Ratpack framework can be tricky. It's easy to get something working but it's harder to ensure that your code is non-blocking.  This post will detail how to prove your code is non-blocking with automated tests.

Background - What's the problem?

In order to realise the performance benefits of the Ratpack framework you have to ensure that your code is non-blocking.  This means ensuring that your Java Threads are always busy and not waiting for I/O (i.e. non-blocking).  As soon as your thread kicks off an I/O task (e.g. sending a http request or making a DB call), it should be able to switch to another task.  Once the I/O eventually completes an event is fired which informs that the next step in the code can be performed, typically doing something with the result of the I/O task.  This means that a huge number of tasks which are I/O bound (i.e. spend a significant amount of time waiting for I/O) can be performed in parallel with only a small number of threads.  This approach can have significant CPU and memory benefits.  However, it comes at the price of increased code complexity.


In my opinion, the biggest problem is where you accidentally make your code blocking i.e. the Thread is blocked waiting for the I/O to finish and doesn't do anything useful in the meantime.  Just because you are using asynchronous libraries, there's nothing stopping you calling a blocking method by mistake.  In this situation you go back to the traditional synchronous model whereby you can only hope to achieve more parallel tasks being executed by adding more threads to your application.  What makes this situation worse with a Ratpack application is that you have invariably assumed non-bocking code and have only allocated a small number of threads to your application (the default is two threads per CPU core).  This can result in a drastic decrease in throughput.


These bugs can easily sneak through into production for several reasons.  Firstly, you can make your code inadvertently block a thread for a variety of very subtle reasons.  You don't even need to call any blocking methods yourself, to introduce blocking code.  As an example, see this code which calls Cassandra - it makes any thread which subscribes to the returned Observable block on I/O.  Secondly, when the offending blocking code is invoked from tests, unless under load, it will not only work fine but respond with the same latency as if it was non-blocking.  In this scenario, only performance tests have the potential to stop your bug going live.


How to prove your code is non-blocking with tests


In short: demonstrate that your code can handle more I/O bound transactions concurrently than it has threads. Let's assume a web application that uses standard blocking I/O, has four threads, and  perform the following:
  1. Takes a http request from a client.
  2. Makes a call to a downstream service over http which takes three seconds to respond.
  3. Once the response has been received from the downstream service, returns a response to the client.
Assuming a blocking model, the absolute best throughput that could ever be achieved is as follows:

(Time Period / (I/O Delay)) x (Number Of Threads) = Maximum throughput

Or in the case above.

((60 seconds) / (3 seconds for I/O to complete)) x (2 Threads) = 40 Transactions per minute (TPM)

In practice, the throughput would not be as high as this since other time would be required for dealing with the client's request and response.

When we consider a non-blocking I/O model, the above formula should no longer hold true since we are no longer limited by the number of threads.  

It's up to our tests to prove that the above formula does not apply to our code.  This can be done by sending N number of requests at the code simultaneously and ensuring that the throughput exceeds the maximum throughput if it were blocking.  If it doesn't exceed the throughput - fail the test!

Writing the test  


I'm a fan of writing end to end tests that prove your code can make actual requests over the network to stubbed dependencies.  Wiremock is a great tool for stubbing and mocking http services and is ideal for this test.  

We need several things in our test as detailed below:-
  1. I/O delay - To simulate a delay in our downstream http service, we can use wiremock's fixed delay feature for simulating slow responses.
  2. Constrained number of threads - We will set ratpack to have one thread only.
  3. Send simultaneous requests to our web application.
  4. Timeout - We will set the timeout to the I/O delay multiplied by the number of simultaneous requests.  If this timeout is exceeded, we haven't proved our code is non-blocking.  This can be specified quite neatly in junit tests with an annotation.
The code:
@Test(timeout = SLOW_ENDPOINT_DELAY * NUMBER_OF_CALLS)
public void handlerIsNotBlocking() throws Exception {

    URI uri = new URI(getAddress().toString() + "happy");

    List<Response> responses = new ConcurrentExecutor(
            () -> jerseyClient().target(uri).request().get(), NUMBER_OF_CALLS).executeRequestsInParallel();

    assertThat(responses).hasSize(NUMBER_OF_CALLS);
    responses.forEach(this::verifyResponseHasCorrectContent);

}
 
See github for code.


Results


When this test is run, the following happens:
  1. Eight requests are sent simultaneously to the ratpack application via a "ConcurrentExecutor" a class I created which wraps an ExecutorService. 
  2. Eight requests are sent to wiremock. 
  3. After a three second delay, all eight responses are received from wiremock. 
  4. Ratpack then returns all eight responses within around ten milliseconds of each other. 
  5. The junit test verifies that all eight responses were received ok. 
  6. The test completes quicker than the timeout, proving that the code is non-blocking. 
This same technique can be applied to other services such as Cassandra using Stubbed Cassandra. 

Comments

  1. Simply love it and will use it to debunk my Ratpack application if required. Thanks Phill

    ReplyDelete

Post a Comment

Popular posts from this blog

Lessons learned from a connection leak in production

How to test for connection leaks

How to connect your docker container to a service on the parent host