Efficiency of Polling vs WebSockets
A web application I maintain uses polling from the front end to check if a long running task is complete. A
colleague suggested that WebScokets would be a far better alternative in terms of performance and user experience.
Having never used WebSockets before and keen to see just how much better it could be, I decided to compare the
two approaches to a contrived but similar problem side by side.
All code on github here.
My hypothetical problem involves jobs. Each job consists of:
For both solutions I decided to use Kotlin and the Dropwizard framework. Both of which I'm familiar with, enjoy using but have never used together.
I'm familiar with creating RESTful web services so decided to implement that first. Starting from the acceptance
tests
here I created the following endpoints
Implementing this from the tests drove the design and resulted in the following service functions:
My first acceptance test hit an issue. It seemed that fasterxml couldn't work with my kotlin data class.
Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of `com.github.phillbarber.Job` (no Creators, like default construct, exist): cannot deserialize from Object value (no delegate- or property-based Creator) at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 2]
Thankfully this was solved after a quick google lead me to the FasterXML/jackson-module-kotlin which solved my problem and let me serialize/deserialize kotlin data classes.
The next issue I hit was that my kotlin class with a main method would not work. I added the following code at
the package level in my Application:
(see code on github here)
I then listed this class as my Main class in the Manifest file by configuring the maven-shade-plugin. Once running from the command line as follows, I hit this error:
java -jar polling-vs-sockets-1.0-SNAPSHOT.jar
Error: Main method not found in class com.github.phillbarber.job.PollingVsSocketsApplication, please define the main method as: public static void main(String[] args)
or a JavaFX application class must extend javafx.application.Application
It seemed that after a maven install, the static method was not added to the compiled class for some reason. I never could work out what was going on so instead created an object as follows:
If anyone reading this works out what I'm doing wrong - please let me know! Or, alternatively, perhaps this is a bug with Kotlin 1.2.3?
For this solution, I had to do a bit of googling as it was completely new ground for me. My first goal was to just get any basic WebSocket example I could find working locally. I eventually came across the Jetty WebSocket Client which made sense to use given that DropWizard already uses Jetty. This page on the Jetty docs site shows how to get a "Echo" WebSocket Client up and running, however it didn't work for me. The only way it seems possible to get a WebSocket to work (at least with Jetty 9.4) was to implement the WebSocketConnectionListener class. I did that by making my SimpleMessageSocket extend the WebSocketAdapter class and then it started magically working. My first WebSocket!
Whilst playing around I noticed something interesting which now seems obvious. With HTTP, the client is very different to the server. There is a fundamental difference between sending and receiving a http request which is why we have many different http client libraries. With WebSockets the only difference between client and server is who establishes the connection. Both parties can equally send and receive messages. This means (at least with the Jetty library) you can deploy the same WebSocket code to both the client and server. This is good fun when you create a socket that just sends back the message it received (i.e. infinite loop of WebSocket traffic)!
In order to create a WebSocket for the server to create jobs and send job completion events, I had to expose some Asynchronous behaviour. For this I decided to use RxJava. The tests ended up giving me the following design in the Service:
As you can see, the storeJob function now returns a Single of Job. This Single will only ever emit a Job once it has been completed. This made the client code in the JobSocket quite simple:
It was great to be working with RxJava again - lots of fun!
Code for both tests (plus some bash scripts I found useful), can be found
here.
Creating the tests gave me some interesting thoughts. The http-polling tests were not simple to create since they had to:
Creating the tests for the websocket implementation had its own challenges due to the fact that JMeter does not have its own native WebSocket sampler. After a bit of googling I found Maciej Zaleski's Websocket sampler. The wiki explained how to install and after a bit of trial and error I had the test working. Sadly I wasn't able to setup proper error handling as there didn't seem to be a way to validate successful responses in the sampler, or expose the responses for bean shell processors.
These issues (whilst significant) are only due to the fact that websocket solutions aren't as common as http. Assuming that more mature and feature rich clients (in this case JMeter plugins) are created this issue goes away. The only reason my acceptance tests for the polling solution were simple was because I cheated with some Thread.sleeps. It's highly likely that this would not be a viable option for clients integrating in a real-world scenario.
Docker is great. Running the server in a docker container allowed me to constrain its resources and also
separate the performance test execution from the server under test.
I decided to constrain the resources to mimic something close to Amazon's ec2 t2.micro spec. This seemed a more production realistic simulation than allowing the server to have free rein of my entire laptop. I limited the cpus resources to one eighth (since my laptop has eight CPU cores) and memory to 1GB as follows:
docker run --kernel-memory=1024m --cpus=0.125 --name polling-vs-sockets -d -p 8080:8080 -p 8081:8081 $FULL_IMAGE_NAME /startServerInDocker.sh
I have to confess that the first time I ran these tests I incorrectly interpreted the docker runtime arg of --cpus as "number of cpu cores" and set it to 1. This had the effect of allowing full CPU to the docker container. Once I corrected my mistake, and set it to 1/8, throughout was hugely reduced. This highlights the fact that my hypothetical problem is mostly CPU bound.
Running in Docker also gives you a number of very interesting stats like overall CPU, Memory and Network IO via the useful docker stats command. This was also useful in showing me that I had failed to limit the memory of my container to 1024m as I had hoped!
Sadly specifying these limits meant that I was unable to replace my bash script with a docker-compose.yml file since version 2 of docker-compose does not support limiting resources.
After spending a lot of time trying to get pretty graphs from the docker-monitoring project I decided it would be simpler for me to just use the Docker Stats API. Once I found out how to enable the docker api on my laptop, I got my bash scripts to curl the following URL before and after each test run so as to snapshot all the data it provided for later analysis.
Where polling-vs-sockets was my container running the application.
Within the big json response from docker stats are CPU stats. The field cpu_stats.cpu_usage.total_usage looked the most relevant however I didn't know what the measurement actually represented. I couldn't find a description in the docker docs anywhere but eventually found someone with the same issue on github here. One of the responses provided a link to some code for a project named moby here which (within the comments) explains that it is a measure of Total CPU time in nanoseconds. I haven't even heard of moby before but essentially it's a framework which can be used to assemble it's own libraries into a standalone container platform which Docker uses. I thought Docker was a singe big monolith - but it's not! This article has a good explanation.
Whilst creating the performance tests I was immediately struck by how much cleaner a Websocket implementation is
over polling. Take a look at the following screenshots.
The image above shows my http-polling test creating a single job on a single thread and waiting for it to complete. Ignoring the “Start Time” entry (which was necessary to implement my overall timeout), we can see five requests. Only two of these are requests we actually care about, the POST to create and the final GET which returned the completed job. The other GET requests are a distraction and a waste of everyone's time and effort.
This screenshot shows the WebSocket solution. As before, just one thread creating one job and waiting for it to complete. This was all achieved with just a single entry which is incredibly clean and has no items/events that we don't care about. Sadly it didn't seem possible to display the individual messages received, but it didn't prevent from getting a working test up and running.
Our product owner has now decided that the UI isn't responsive enough to the job completion event. Our only solution is to decrease the polling interval and suffer the consequences!
Websockets still seem a slightly niche technology in comparison to good old HTTP/Rest. Depending on the
language you are using, there aren't perhaps as many libraries available to you should you go the WebSocket route.
However, when dealing with events, HTTP is fundamentally flawed in that it can only truly support events
generated by the client and not the server. Polling is a hack which generates waste, distracting noise and
complicated client code (i.e. keep trying until a condition OR max attempts exceeded). The Network and CPU
differences shown above could be even more significant if the GET request was serviced by a complicated query (e.g.
a DB query).
To summarise:
All code on github here.
The problem
- a unique id
- a boolean named complete
For both solutions I decided to use Kotlin and the Dropwizard framework. Both of which I'm familiar with, enjoy using but have never used together.
Implementing Solution 1 - Polling via http rest
- POST /job - creates a job which will be incomplete at first.
- GET /job/{id} - returns a job
Implementing this from the tests drove the design and resulted in the following service functions:
fun storeJob(job: Job) //Take a job and store it fun getJob(jobId: String): Job? //Return a job - null if not found
Kotlin issue with fasterxml JSON
Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of `com.github.phillbarber.Job` (no Creators, like default construct, exist): cannot deserialize from Object value (no delegate- or property-based Creator) at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 2]
Thankfully this was solved after a quick google lead me to the FasterXML/jackson-module-kotlin which solved my problem and let me serialize/deserialize kotlin data classes.
Kotlin issue with main method
fun main(args: Array<String>) { if(args.size == 0){ PollingVsSocketsApplication().run("server"); } else{ PollingVsSocketsApplication().run(*args); } }
(see code on github here)
I then listed this class as my Main class in the Manifest file by configuring the maven-shade-plugin. Once running from the command line as follows, I hit this error:
java -jar polling-vs-sockets-1.0-SNAPSHOT.jar
Error: Main method not found in class com.github.phillbarber.job.PollingVsSocketsApplication, please define the main method as: public static void main(String[] args)
or a JavaFX application class must extend javafx.application.Application
It seemed that after a maven install, the static method was not added to the compiled class for some reason. I never could work out what was going on so instead created an object as follows:
object Launcher { @JvmStatic fun main(args: Array<String>) { if(args.size == 0){ PollingVsSocketsApplication().run("server"); } else{ PollingVsSocketsApplication().run(*args); } } }(see code on github here)
If anyone reading this works out what I'm doing wrong - please let me know! Or, alternatively, perhaps this is a bug with Kotlin 1.2.3?
Implementing Solution 2 - WebSockets
Whilst playing around I noticed something interesting which now seems obvious. With HTTP, the client is very different to the server. There is a fundamental difference between sending and receiving a http request which is why we have many different http client libraries. With WebSockets the only difference between client and server is who establishes the connection. Both parties can equally send and receive messages. This means (at least with the Jetty library) you can deploy the same WebSocket code to both the client and server. This is good fun when you create a socket that just sends back the message it received (i.e. infinite loop of WebSocket traffic)!
Make the code Asynchronous - RxJava
In order to create a WebSocket for the server to create jobs and send job completion events, I had to expose some Asynchronous behaviour. For this I decided to use RxJava. The tests ended up giving me the following design in the Service:
fun storeJob(job: Job): rx.Single<Job> fun getJob(jobId: String): Job?
As you can see, the storeJob function now returns a Single of Job. This Single will only ever emit a Job once it has been completed. This made the client code in the JobSocket quite simple:
override fun onWebSocketConnect(sess: Session?) { super.onWebSocketConnect(sess) var storeJob = jobService.storeJob(Job()) storeJob.subscribe(Action1 { remote!!.sendString(objectMapper.writeValueAsString(it)) session.close(200, "Done") }) }
It was great to be working with RxJava again - lots of fun!
Creating the Performance Tests
Creating the tests gave me some interesting thoughts. The http-polling tests were not simple to create since they had to:
- Call POST /job to create the job
- Call GET /job/{id} repeatedly until complete=true or the maximum timeout had been exceeded.
Creating the tests for the websocket implementation had its own challenges due to the fact that JMeter does not have its own native WebSocket sampler. After a bit of googling I found Maciej Zaleski's Websocket sampler. The wiki explained how to install and after a bit of trial and error I had the test working. Sadly I wasn't able to setup proper error handling as there didn't seem to be a way to validate successful responses in the sampler, or expose the responses for bean shell processors.
These issues (whilst significant) are only due to the fact that websocket solutions aren't as common as http. Assuming that more mature and feature rich clients (in this case JMeter plugins) are created this issue goes away. The only reason my acceptance tests for the polling solution were simple was because I cheated with some Thread.sleeps. It's highly likely that this would not be a viable option for clients integrating in a real-world scenario.
Test Setup - Docker
I decided to constrain the resources to mimic something close to Amazon's ec2 t2.micro spec. This seemed a more production realistic simulation than allowing the server to have free rein of my entire laptop. I limited the cpus resources to one eighth (since my laptop has eight CPU cores) and memory to 1GB as follows:
docker run --kernel-memory=1024m --cpus=0.125 --name polling-vs-sockets -d -p 8080:8080 -p 8081:8081 $FULL_IMAGE_NAME /startServerInDocker.sh
I have to confess that the first time I ran these tests I incorrectly interpreted the docker runtime arg of --cpus as "number of cpu cores" and set it to 1. This had the effect of allowing full CPU to the docker container. Once I corrected my mistake, and set it to 1/8, throughout was hugely reduced. This highlights the fact that my hypothetical problem is mostly CPU bound.
Running in Docker also gives you a number of very interesting stats like overall CPU, Memory and Network IO via the useful docker stats command. This was also useful in showing me that I had failed to limit the memory of my container to 1024m as I had hoped!
Sadly specifying these limits meant that I was unable to replace my bash script with a docker-compose.yml file since version 2 of docker-compose does not support limiting resources.
After spending a lot of time trying to get pretty graphs from the docker-monitoring project I decided it would be simpler for me to just use the Docker Stats API. Once I found out how to enable the docker api on my laptop, I got my bash scripts to curl the following URL before and after each test run so as to snapshot all the data it provided for later analysis.
http://localhost:4243/containers/polling-vs-sockets/stats?stream=false
Where polling-vs-sockets was my container running the application.
Understanding docker stats - What is cpu_stats.cpu_usage.total_usage?
Within the big json response from docker stats are CPU stats. The field cpu_stats.cpu_usage.total_usage looked the most relevant however I didn't know what the measurement actually represented. I couldn't find a description in the docker docs anywhere but eventually found someone with the same issue on github here. One of the responses provided a link to some code for a project named moby here which (within the comments) explains that it is a measure of Total CPU time in nanoseconds. I haven't even heard of moby before but essentially it's a framework which can be used to assemble it's own libraries into a standalone container platform which Docker uses. I thought Docker was a singe big monolith - but it's not! This article has a good explanation.
Performance Test Results - First impressions from the JMeter UI
The image above shows my http-polling test creating a single job on a single thread and waiting for it to complete. Ignoring the “Start Time” entry (which was necessary to implement my overall timeout), we can see five requests. Only two of these are requests we actually care about, the POST to create and the final GET which returned the completed job. The other GET requests are a distraction and a waste of everyone's time and effort.
This screenshot shows the WebSocket solution. As before, just one thread creating one job and waiting for it to complete. This was all achieved with just a single entry which is incredibly clean and has no items/events that we don't care about. Sadly it didn't seem possible to display the individual messages received, but it didn't prevent from getting a working test up and running.
Results - What is more efficient Polling or WebSockets?
Scenario:
- Job duration 0-10 seconds
- 20 Threads/Users - Instant ramp up
- Each Thread creating 10 jobs
- Polling interval of 500ms
- Maximum job creation time (from client perspective) - 11 seconds
Polling Results:
- Total Bytes sent/received 1.62 KB
- Bytes received: 951 bytes
- Bytes sent: 703 bytes
- CPU 3.29 seconds (3,287,000,000 nanoseconds)
WebSocket Results:
- Total Bytes sent/received 0.28 KB
- Bytes received: 162 bytes
- Bytes sent: 126 bytes
- CPU 0.68 seconds (681,000,000 nanoseconds)
Conclusion
In this scenario, the polling solution uses roughly five times the amount of CPU and five times the amount of network traffic than the WebSocket solution.
What if we double the polling frequency?
Scenario - Same as above except:
- Polling interval of 250ms
Polling Results:
- Total Bytes sent/received 2.6 KB
- Bytes received: 1,471 bytes
- Bytes sent: 1,195 bytes
- CPU 13.8 seconds (1,381,000,000 nanoseconds)
- Bytes received: 1,471 bytes
- Bytes sent: 1,195 bytes
Conclusion
Here we see a huge difference over the socket implementation. Nine times the amount of data and twenty times the CPU are used for polling compared to the socket implementation. Also, the socket implementation is still quicker for the user since the polling option still requires the user to wait on average half of the polling duration (125 ms).
Summary
To summarise:
- Polling code is simple on the server but:
- The client code making polling requests is annoyingly complicated.
- Load can increase when clients decide they want to know sooner.
- WebSockets (like anything) take a little learning and you might not quite have the library support but:
- The client code will be simpler on a conceptual level as it will deal with true events and not loops with conditions.
- You'll benefit from the reduced cpu and memory consumption in the long run.
Other things to investigate
If I get the time, I'd love to look into this topic again. Specifically:
- Stress test - Determine which solution can handle the most traffic before breaking.
- Examine an HTTP2 implementation. Just before I published this, a colleague told me that HTTP2 supports full duplex communication and could end up replacing WebSockets!
Comments
Post a Comment