Ruby Regex Cheat Sheet
- Open BookshelfCover Page
- Preface
- Getting Started
- Regular Expressions
- Using Regular Expressions
- Conclusion
- Share on
Now that you're bobbing along atop the waves, it's time to relax and explore your surroundings. Get your swim fins on, and head on out into deeper waters.
Thus far, our explorations have given us a good handle on the different types of patterns that can appear in a regex. You know how to match specific characters, classes of characters, can anchor your matches, and can even match strings of different sizes and content. However, you've seen but a handful of examples that show what this looks like in real code. We're going to rectify that a bit in this section and introduce a handful of Ruby and JavaScript methods that use regex. This discussion won't be comprehensive, but it does provide the tools you'll need in the future. Most developers won't ever need anything more.
You'll be able to study them slowly, and to use them as a cheat sheet later, when you are reading the rest of the site or experimenting with your own regular expressions. If you overdose, make sure not to miss the next page, which comes back down to Earth and talks about some really cool stuff: The 1001 ways to use Regex. A regular expression, or 'regex', is used to match parts of a string. Below is my cheat sheet for creating regular expressions. Made with love and Ruby on Rails.
Oddly, the Regexp
(Ruby) and RegExp
(JavaScript) classes don't provide the regex methods you'll use most often. Instead, the String
class does.
Matching Strings
We've already seen match
in some of our examples. This method returns a value that indicates whether a match occurred, and what substrings matched. This return value is 'truthy'; you can test it in a conditional expression in either Ruby or JavaScript to determine whether a given string matched a regex. At its most basic, we use it like this:
Ruby
JavaScript
Here we call fetch_url(text)
when match
returns a value that indicates a match: that is when text
contains something that looks like a URL.
We won't discuss the return value of match
in detail -- see the documentation instead. For now, match
returns an Array that contains the string we matched against, along with the capture groups defined in the regex. If we name this Array capture
, then capture[0]
represents the entire matched portion of text
, while capture[1]
, capture[2]
, etc. correspond to the capture groups. (We discuss capture groups below.). If the regex doesn't match text
, then Ruby returns nil
, while JavaScript returns null
.
In Ruby, the return value of match
isn't an Array, but a MatchData
object that responds to [0]
, [1]
, [2]
, and so on. You cannot apply most Array methods to this object directly.
In Ruby, you sometimes see something like this:
=~
is similar to match
, except that it returns the index within the string at which the regex matched, or nil
if there was no match. =~
is measurably faster than match
, so some rubyists prefer to use it when they can. Others dislike it because it is unfamiliar, or solely because =~
reminds them of the Perl language where it saw widespread use.
Rubyists should also investigate the String#scan
method; it is a global form of match
that returns an Array of all matching substrings.
Splitting Strings
Applications that process text often must analyze data comprised of records and fields delimited by some special characters or delimiters. A typical format has records separated by newlines, and fields delineated by tabs. Such data often needs parsing before you can use it in your program; the split
method is an often-useful parsing tool.
split
is frequently used with a simple string as a delimiter:
Ruby
JavaScript
As you can see, split
returns an Array
that contains the values from each of the split fields.
Not all delimiters are as simple as that, though. Sometimes, formatting is much more relaxed. For example, you may encounter data where arbitrary whitespace characters separate fields, and there may be more than one whitespace character between each pair of items. The regex form of split
comes in handy in such cases:
Ruby
JavaScript
Beware of regex like /:*/
and /t?/
when using split
. Recall that the *
quantifier matches zero or more occurrences of the pattern it is modifying. In the case of split
, the result may be totally unexpected:
A six element array instead of the two element array you may have expected. This result occurs because the regex matches the gaps between each letter; zero occurrences of :
occurs between each pair of characters.
Capture Groups: A Diversion
Before moving on to the final methods in our whirlwind tour, we need to first talk about capture groups. (Note that regex also have non-capture groups but we won't cover them here.) You've already encountered these before, though we called them something different at the time: grouping parentheses. We didn't mention it at the time, but these meta-characters have another function: they provide capture and non-capture groups.
Capture groups capture the matching characters that correspond to part of a regex. You can reuse these matches later in the same regex, and when constructing new values based on the matched string.
We'll start with a simple example. Suppose you need to match quoted strings inside some text, where either single or double quotes delimit the strings. How would you do that using the regex patterns you know? You might consider:
as your first attempt to match quotes, but, you'll soon find that it also matches mixed single and double quotes. This may not be what you want. Instead, you need a way to capture the opening quote and reuse that character for the closing quote. It's time to call on capture groups:
Here the group captures the part of the string that matches the pattern between parentheses; in this case, either a single or double quote. We then match one or more of any other character and end with a 1
: we call this sequence a backreference - it references the first capture group in the regex. If the first group matches a double quote, then 1
matches a double quote, but not a single quote.
It may be more reasonable to use two regex to solve this problem:
It's easier to read and maintain when written like this. However, you will almost certainly encounter problems where a single regex with a backreference is the preferred solution.
A regex may contain multiple capture groups, numbers from left to right as groups 1, 2, 3, and so on, up to 9. As you might expect, the backreferences are 1
, 2
, 3
, ..., and 9
.
Note that there are patterns in Ruby that allow for named groups and named backreferences, but this is beyond the scope of this book. If you find yourself needing multiple groups in Ruby regex, you may want to investigate these named groups and backreferences.
While you can use capture groups in any regex, they are most useful in conjunction with methods that use regex to transform strings. We'll see this in the next two sections.
By the way: did you notice that lazy quantifier in our regex? Why do you think we used that here?
Transformations in Ruby
While regex-based transformations in Ruby and JavaScript are conceptually similar, the implementations are different. We'll cover these transformations in separate sections.
Transforming a string with regex involves matching that string against the regex, and using the results to construct a new value. In Ruby, we typically use String#sub
and String#gsub
. #sub
transforms the first part of a string that matches a regex, while #gsub
transforms every part of a string that matches.
Here's a simple example:
Here we replace every vowel in text
with an *
.
We can use backreferences in the replacement string (the second argument):
One thing to note here is that if you double quote the replacement string, you need to double up on the backslashes:
When possible, try to use single quotes to avoid leaning toothpick syndrome.
Transformations in JavaScript
While regex-based transformations in Ruby and JavaScript are conceptually similar, the implementations are different. We'll cover these transformations in separate sections.
Transforming a string with regex involves matching that string against the regex, and using the results of the match to construct a new value. In JavaScript, we can use the replace
method which transforms the matched part of a string. If the regex includes a g
option, the transformation applies to every match in the string.
Here's a simple example:
Here we replace every vowel in text
with an *
. We applied the transformation globally since we used the g
option on the regex.
We can use backreferences in the replacement string (the second argument):
One thing to note here is that the backreferences in the replacement string use $1
, $2
, etc. instead of 1
, 2
, etc.
Ruby Regex Cheat Sheet Pdf
Summary
We now conclude our little dive into the regex ocean. We hope you've learned a lot and enjoyed the experience. We have one more section: it includes a regex cheat sheet and a few other useful tidbits.
Regular Expression Cheat Sheet
But, before you proceed, take a little while to work the exercises below. In these exercises, write your code using your language of choice. Rubyists may want to use IRB to test their methods, while JavaScripters can check their answers in node
or their browser's JavaScript console.
Regex Cheat Sheet Pdf
Exercises
Ruby Regular Expression Match
Write a method that returns true if its argument looks like a URL, false if it does not.
Examples:
Ruby
JavaScript
Solution
Ruby
or
JavaScript
Note that we use
!!
to coerce the result of ourmatch
call to a boolean value. More recent Ruby versions add theString.match?
method, which we demonstrate in our second Ruby solution.Write a method that returns all of the fields in a haphazardly formatted string. A variety of spaces, tabs, and commas separate the fields, with possibly multiple occurrences of each delimiter.
Examples:
Ruby
JavaScript
Solution
Ruby
JavaScript
Note that we don't use
s
here since we want to split at spaces and tabs, not other whitespace characters.Write a method that changes the first arithmetic operator (
+
,-
,*
,/
) in a string to a '?' and returns the resulting string. Don't modify the original string.Examples:
Ruby
JavaScript
Solution
Ruby
JavaScript
Note that we need to escape the
-
character in our character class to interpret as a literal hyphen, not a range specification. We also must escape the/
character in the Ruby code; in the JavaScript code, we don't need to escape the/
character but do so here for consistency.Write a method that changes every arithmetic operator (
+
,-
,*
,/
) to a '?' and returns the resulting string. Don't modify the original string.Examples:
Ruby
JavaScript
Solution
Ruby
JavaScript
Note that we now use the
gsub
method in Ruby, and apply theg
option to the regex in JavaScript.Write a method that changes the first occurrence of the word
apple
,blueberry
, orcherry
in a string todanish
.Examples:
Ruby
JavaScript
Solution
Ruby
JavaScript
Note that
pineapple
is not changed in the last example for each language.Challenge: write a method that changes dates in the format
2016-06-17
to the format17.06.2016
. You must use a regular expression and should use methods described in this section.Example:
Ruby
JavaScript
Solution
Ruby
JavaScript
We use three capture groups here to capture the year, month, and date, then use them in the replacement string in reverse order, this time separated by periods instead of hyphens.
Challenge: write a method that changes dates in the format
2016-06-17
or2016/06/17
to the format17.06.2016
. You must use a regular expression and should use methods described in this section.Example:
Ruby
JavaScript
Solution
Ruby
Alternate solution
JavaScript
Alternate solution
The easiest way to approach this problem is to split it into smaller sub-problems, one that handles dates in
2016-05-17
format, and one that handles2016/05/17
format, which is what both of our primary solutions do. One possible gotcha here is that you must remember to escape the/
characters in the regex.You can solve this problem with one regex, as in our alternate solutions, but at the expense of a more complex regex and lowered readability. The regex adds one additional capture group to capture the first
-
or/
, and uses a2
backreference to refer back to that capture in the regex. However, this additional capture group modifies the backreference numbers for the month and day components of the date, so we now need to refer to them as4
and3
in Ruby,$4
and$3
in JavaScript. In Ruby, this might be a good time to look up how to use named capture groups.Note that our alternate solutions use variables to store the regex. We do this both for readability, and to show that regex are no different than any other object; you can manipulate and pass them around as needed.