This post is a transcript of the video at floodio.tv
Let’s take a look at how to use regular expressions in load testing.
You might be using regular expressions to extract information from a response to use in future requests. Or you may be using them to make some form of assertion. I’ll show you how to do these things with JMeter.
You don’t need to be a master of regular expressions in order to be effective. I’ll show you just one regular expression pattern that you need to remember. I’ll also show you some variations on that pattern which will get you up and running in no time, without needing to be an expert.
What are regular expressions?
In computing terms, a regular expression is just a bunch of characters that express a pattern that can be searched for within a longer piece of text.
Regular expressions as a concept was founded by Stephen Kleene, best known as the founder of mathematical logic known as recursion theory.
Who would have thought some 60 odd years later, we’d still be searching for a sequence of symbols and characters using an asterisk!
Pattern-matching with regular expressions
Searching for a sequence of symbols and characters using an asterisk is the guts of pattern matching with regular expressions. You get lots of ways to write complex patterns with quantifiers and tokens and booleans and groupings …
/sequence of .*/ is just saying match everything from the text starting with, “sequence of”, including a space up to any character represented by the full stop, and repeated by the asterisk or star … which means zero or more occurrences from that point until the end of the string.
These patterns are immensely useful for searching for text amongst text and you only need to remember one pattern to be proficient at regular expressions.
This is it! ;P
Just kidding! The one pattern to rule them all is really
One pattern to rule them all
What does this pattern mean? regexper.com has a great way of visualizing regular expressions.
What this pattern really does is capture the text enclosed in quotes as part of the value attribute.
Pay attention to the use of brackets – this is used for grouping and creates a numbered captured group in the match results. That means we can extract part of the regular expression. In this case, we have one group which is for the value attribute.
Also take note of the use of a full stop, which means match any character, along with the plus sign, which means match one or more times.
The question mark means don’t be greedy, so stop when the first match succeeds after the question mark which is the closing quote. The closing bracket also lets us know we’re done with that particular group.
Regular expressions explained by …Pacman
Let’s look an example.
Here’s the response of a particular HTML page. Assume we really need to know what the value attribute of the search button is.
We might need it because it’s required in some field of the next request, or we may need to assert that it exists on the page, so we know that we have our expected response, in this case, the “waka waka waka” of Pacman.
These are the two most common reasons for needing regular expressions in load testing: Variables and Assertions
Using regular expressions with JMeter
Let’s take a look at this in the real world, where we’re searching for the text “Pacman waka waka waka” with the intent to click on the first video link for this unique sound.
So we can look at this in the browser, using inspector or debug tools to get an idea of what we’re going to look for at the protocol level.
What we need to do first is simulate that request to Google(using JMeter, in this example). From that, we can use a listener to debug what is returned as part of the response body.
Let’s put our magic pattern to work. Since I’m interested in the href link for the video, I can search the page for links and I can also use the dot plus question mark pattern to continue the search for specific text. This helps narrow down the search because the page will have lots of links we’re not interested in. However, the following pattern is still too greedy and returns too many matches on the page.
I can demonstrate this in JMeter using a post processor called the Regular Expression Extractor. To keep things tidy, I like to name the element the same as the reference name, which is the variable that is populated with the result. I’ll add the over zealous pattern from before, specify the match template (in this case, I’m just after the first pattern group that is captured), and I’ll tell JMeter to take the first match on the page.
As we saw in the text editor, the pattern is simply too greedy and isn’t going to capture the information we want.
In our text editor, let’s expand on the pattern a little more, still using our
.+?pattern but break it up into the components that we need. We know the info we’re after inside the
href link is prefixed by
url and we can skip over the
youtubeprefix as it’s really only the query string parameter that we’re after.
This comes right after the text
watch and finishes around the
& text. If this all looks pretty crazy to you, it’s because we’re looking at the URL-encoded view of response data (that is another skill you’ll need to understand when scripting at the protocol level). We’ll finish off the pattern with the
Pacman text so we can really cut down the number of links this finds.
Running this in JMeter, we can see that the new regular expression is getting the link information that we want. The
link_waka variable now contains the text that comes after the watch prefix in YouTube.
Now we can use that variable in the following request, which is how we’re going to visit the YouTube link. We use variable interpolation, which is the
When we run the test, we can see in the results tree that we hit an HTTP 503, which means we’ve broken something in the application logic on the server side.
We can find out what request we made and optionally run that in a browser to see what is wrong with it or just use the results tree for the raw HTML returned.
The problem, in this case, is that we sent the URL-encoded version of the variable with the request, and YouTube thinks the request is unusual and has returned an HTTP 503. So to fix this, all we need to do is send the request with the URL-decoded version of the variable and everything should work. You can use built-in JMeter functions to achieve this easily, with no coding required.
At this stage, we’ve extracted a variable using regular expressions from the response and used it in the subsequent request. There’s one final use of regular expressions to look at: assertions. We could assume that the page we just requested on YouTube is correct, but the best practice would be to assert that some value exists on the page rather than just assume the web site did what we wanted. This puts some rigour into what the application under test is responding with, and also makes sure your scripts are doing what you intend. Don’t just rely on HTTP response codes, as errors are easily masked and you will make mistakes in your scripting as result.
What I’ve just demonstrated on screen is how to apply a response assertion in JMeter. We’re just using the substring pattern to test for basic text on the page. If you can use substrings, this is the best way to do it in JMeter as the performance overhead is smaller than testing for regular expressions.
There are many more regular expression patterns and techniques that we haven’t covered, but those are topics for another time. I hope I have sparked your interest in how to effectively use regular expressions to support your load testing, especially with protocol level tools like JMeter and Gatling as well as cloud load testing services like Flood IO.