Which begs the question, is one tool better than the other?
Ordinarily, we’re not in the business of playing one tool off the other as we think different tools meet different requirements of our testers. We consider “all competitive benchmarking is institutionalized cheating.” Guerrilla Manifesto
Since we offer both tools on our platform, it was in our interest to offer an objective comparison for both our testers and also the hard working developers who give up their own time making fantastic software like Gatling and JMeter. To that end, we’ll continue to make these benchmarks available at the following URLs on a regular basis:
Current Release https://flood.io/benchmarks/jmeter
Latest Release https://flood.io/benchmarks/jmeter?tag=benchmark-latest
Current Release https://flood.io/benchmarks/gatling
Latest Release https://flood.io/benchmarks/gatling?tag=benchmark-latest
The Target Site
We needed a target site that could comfortably handle the types of concurrency and volume that we’d be throwing at it. We chose nginx for this task, an extremely fast HTTP server with low resource overheads.
We also needed the target site to behave like an application server; that is, respond to normal HTTP GETs but also respond to HTTP POSTs whilst serving up static and dynamic content. The site had to generate artificial latency in response time, much like a normal web tier would behave. To that end, we were able to mock this mix of transactions with our custom nginx configuration.
We tuned the OS kernel / TCP network settings and allocated 4 virtual CPUs and 15 GB RAM to make sure there were no bottlenecks on the target site.
The Load Generator
Flood.io is a distributed load testing platform that lets you scale out on your own dedicated Grid of flood nodes within minutes. Whilst customers normally launch multiple nodes per Grids in regions across the globe, for the sake of benchmarking we chose to test with just one node, our lowest common denominator. A flood node is equivalent to an m1.xlarge which sports a 64 bit processor, 4 virtual CPUs and 15 GB RAM.
We run the Java HotSpot JVM with JRE version 1.7.0_13 on Ubuntu 12.04 LTS. Each node allocates a 4GB JVM max. heap size to the test tool running, be it JMeter or Gatling, with the following JVM options:
-Xms4096m -Xmx4096m -XX:NewSize=1024m -XX:MaxNewSize=1024m -XX:MaxTenuringThreshold=2 -XX:MaxPermSize=128m -XX:PermSize=64m -Xmn100M -Xss2M -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+AggressiveOpts -XX:+OptimizeStringConcat -XX:+UseFastAccessorMethods -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+CMSClassUnloadingEnabled -XX:SurvivorRatio=8 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Dsun.rmi.dgc.client.gcInterval=600000 -Dsun.rmi.dgc.server.gcInterval=600000 -XX:+HeapDumpOnOutOfMemoryError -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:/var/log/flood/verbosegc.log -XX:-UseGCLogFileRotation
The remaining resources are utilized by our test runner and distributed elasticseach engine. We also tune the OS kernel / TCP network settings in a similar fashion to the target site.
The Target Scenario
Our load scenario consists of the following user transactions:
- 20% of transactions fetching a slow resource in approx. 3.5s
- 40% of transactions making conditional requests to a cache-able resource in < 10ms
- 30% of transactions fetching a non cache-able resource in approx. 2s
- 10% of transactions posting to a slow resource in approx. 4s
The Test Plans
The Target Benchmarks
Ordinarily we recommend a planning figure of 1,000 users per flood node as a “finger in the air” guesstimate. It’s hard to recommend a planning figure without first knowing your test plan complexity, target volumetrics and target site behavior under load. To establish a target for these benchmarks we went the traditional exploratory route, and came up with the following that works well for this particular scenario:
|10,000 users||30,000 requests per minute||20 minute duration with 10 minute rampup|
Pleasingly, we found that at these volumes, there was not much variance in results between the tools. But compare if you must!
|Tool||Benchmark||Date||Mean RT +/- SDev|
|Gatling-1.5.3||10,000 Users||2013-09-30 09:52:32||1788 +/- 362 ms|
|JMeter-2.9||10,000 Users||2013-09-30 10:13:15||1625 +/- 322 ms|
|JMeter-2.10||10,000 Users||2013-09-30 10:33:59||1698 +/- 31 ms|
- Gatling does not record response size in bytes, hence flood.io uses an estimate based on Content-Length headers if they exist, which is optimistic and does not accurately reflect true network throughput. Request rate per minute should be used as a measure of throughput instead if using Gatling. Alternatively use external network monitors during your test. The following graph demonstrates network utilization parity between the tools.
- JMeter is more resource heavy on the JVM compared to Gatling. At Flood we use Concurrent Mark Sweep (CMS) for garbage collection in an effort to lower the latency of GC pauses.
- JMeter is more resource heavy on system CPU and Memory as the following graphs demonstrates this in terms of CPU and JVM Heap utilization. This may affect you more if the complexity of your test plans increase or perceived concurrency on the JVM increases with a slower performing target site.
- Both JMeter and Gatling demonstrated the desired characteristics of relatively flat response times for measured transactions during rampup and under load, with little variance. Mean response time shouldn’t be used as a measure of the tool’s performance aside from the prior observation in this sense.
- Both JMeter and Gatling were able to sustain an average throughput in the region of 30,000 requests per minute with no deviation.
- Both JMeter and Gatling were able rampup to 10,000 concurrent users within 10 minutes, which is ordinarily considered an aggressive target from a single load generator.
- Both JMeter and Gatling demonstrated correct caching behavior, particularly when making conditional requests for static resources that respond with a HTTP 304. Gatling were able to promptly provide us with a patch to ensure this.
- Both JMeter and Gatling test plans included extraction of content via regular expressions from the response body, as well as assertions for contained text and HTTP response codes without detriment to performance.
In terms of concurrency and throughput achievable from a single load generator, there is little to differentiate between Gatling and JMeter. Gatling has some limitations in the ability to accurately record response payload in bytes, which can be compensated by external monitors. JMeter generally demonstrates higher resource usage in terms of CPU, Memory and JVM performance, but can otherwise manage the load when run with appropriate memory allocation.
We don’t anticipate users ordinarily run JVMs at their peak as we did in this benchmark, and Flood IO automatically warns the user if any of the Grid nodes are exhausting available resources in such a case.
For the sake of these benchmarks, we chose a simplistic scenario to reduce the number of variables that can affect a side by side comparison. As such results should be analyzed in context of the test boundaries described above. It is possible that performance will differ in more realistic scenarios. The best way to explore is to try for yourself. We host a free node on Flood IO which lets you run JMeter or Gatling tests, and registration is free.
At the end of the day, the choice between JMeter and Gatling is purely subjective, and is better made on some of the other features that each tool independently provides.
We hope this brings some clarity to the relative performance of these great tools.
A special thank you to Philippe Mouawad and Stéphane Landelle, core contributors to the Apache JMeter and Gatling-Tool projects. They both helped improve the quality of these benchmarks as well as provide advice / code / patches where appropriate. Thanks!