Regular Expressions are used to match string patterns.
They have proved being really useful to me, mostly when I'm going through competitive algorithm challenges, but I always end up forgetting them. So I made a quick reference for it.
[/pattern/flags].test([string])
In JavaScript, we use the /pattern/
syntax. We can use the .test() method, that returns true
or false
, to
check for those patterns inside a string.
const str = 'Hello World';
const regex = /Hello/;
regex.test(str); // true
We use the |
or OR
operators to search for multiple patterns. We can even concatenate them like /Human|Cat|Dog/
.
i
flag: Ignore caseTo ignore the difference between uppercase and lowercase letters, we can use the flag i
, as/RegExp/i
.
[string].match(/pattern/)
To extract matches from the RegExp, will use .match()
on a string passing the RegExp as a parameter.
const str = 'Abstract Chill';
str.match(/chill/i);
// Returns ['Chill']
g
flag: Find multiple matchesTo search or extract a pattern more than just the first occurrence, we will add the g
flag, like /RegExp/g
.
,[]
and -
.
A period inside the RegExp will match any character /hu./
.[123]
The use of square brackets let us match any letter inside them as one option for a position or slot in the
pattern.-
The hyphen is used to define a range of characters to match, like [a-e]
or [0-5]
.const pattern1 = /b[aeiou]g/;
// Will match 'bag', 'beg', 'big', 'bog', 'bug'
const pattern1 = /copy[1-9]/;
// Will match 'copy1', 'copy2', ..., 'copy9'
[^...]
Negate characters (invert rule)When we want to match characters that are NOT the ones specified in the pattern, we will use the caret symbol inside brackets followed by the group of characters.
const patterm = /[^aeiou]/;
// Will match all characters that are NOT vowels
+
Matches a character that occurs one or more timesIn this case, we will use the [+] sign to match characters that appear one or more times. If the occurrence happens together, it will count as one match.
const pattern = /a+/g;
const str1 = 'abc';
str1.match(pattern); // Returns ['a']
const str2 = 'aabc';
str2.match(pattern); // Returns ['aa']
// Where both returns are a single match
const pattern = /a+/g;
const str1 = 'abab';
str1.match(pattern); // Returns ['a', 'a'] (two matches)
*
Match characters that occur zero or more times with an asteriskSame as the previous one, but with 0 or more matches.
const pattern = /go*/;
const soccerWord = 'gooooooooal!';
const sentence1 = 'gut feeling';
const sentence2 = 'over the moon';
soccerWord.match(pattern); // ['goooooooo']
sentence1.match(pattern); // ['g']
sentence2.match(pattern); //
?
Lazy match: Finds the smallest possible part of the string.RegEx are greedy by default.
If we wanted to do a lazy matching, we will use the interrogation [?] mark.
const testString = 'titanic';
const greedyPattern = /t[a-z]*i/;
testString.match(greedyPattern); // ['titani']
const lazyPattern = /t[a-z]*?i/;
testString.match(lazyPattern); // ['ti']
^...
Match beginning string patternsIf we put the caret symbol outside a character set (outside the square brackets []), we will search for patterns at the beginning of a string.
const testString = 'Ricky Martin';
const firstName = /^Ricky/; // As 'it begins with Ricky'
testString.test(firstName); // true
...$
Match ending string patterns with a dollar sign $Opposite to the caret, we can use the dollar sign similarly to match the end of a string.
const testString = 'Viva la vida loca';
const endingInLoca = /loca$/; // As 'it finished with 'loca''
testString.test(endingInLoca) // true
\w
Match all letters and numbersThere's a character class that can match all letters of the alphabet and numbers: \w
being equivalent to [A-Za-z0-9]
, plus the underscore _
.
const tediousWay = /[A-Za-z0-9_]+/;
const easierWay = /\w+/;
If we want to find the opposite of alphanumerics, we use \W
(note the capital W is used as the opposite).
\W
is the same as [^A-Za-z0-9_]
.
\d
Match all numbersThe shortcut to look for DIGIT CHARACTERS is \d
, being equal to [0-9].
\D
Match all non-numbersThe shortcut this time, is obviously \D
, being equivalent to [^0-9]
.
\s
Match whitespaceWe can search for whitespace using \s
. It will also match return, tab, form feed, and new line characters.
\S
Match non-whitespace charactersIt will match everything that is not whitespace.
{n,n}
Specify upper and lower times that a match can occurWe know we can use the plus sign +
to look for one or more characters.
We also know that the asterisk *
is used to look for zero or more characters.
When we want to specify the quantity we will use curly brackets {
and }
. defining the minimum and maximum times we
want it to appear in the pattern.
For example, to match only the letter a
appearing between 3 and 5 times in a string, the regex will look
like /a{3,5}h/
.
{n,}
Specify only the lower number of matchesIf we want to specify just the lower number of matches, we will just omit the second parameter like /ha{3,}h/
, that
will match the string with the letter a appearing at least 3 times.
{n}
Specify the exact number of matchesWe can specify the number of matches we want if we just put one number inside the curly brackets, like /ha{3}h/
.
..?
Check for zero or one (optional)We can check the possible existence of an element with a question mark ?
. This checks for a zero or one of the
preceding element.
const american = 'color';
const british = 'colour';
const pattern = /colou?r/;
pattern.test(american); // true
pattern.test(british); // true
Lookaheads are patterns that look ahead in your string to check for patterns further along. This can be useful when you want to search for multiple patterns over the same string.
There are two types:
(?=...)
where the ...
is the required part that is not matched.(?!...)
where ...
is the pattern that you do not want to be there. The rest of the pattern is returned if the
negative part is not present.// Positive lookahead
const quit = 'qu';
const quRegex = /q(?=u)/;
quit.match(quRegex); // ['q']
// Negative lookahead
const noquit = 'qt';
const qRegex = /q(?!u)/;
noquit.match(qRegex); // ['q']
A more practical use of the lookaheads is to check two or more patterns in one string, like /(?=GROUP1)(?=GROUP2)/
// This will look for between 3 and 6 characters and at least one number
let password = 'abc123';
let checkPass = /(?=\w{3,6})(?=\D*\d)/;
checkPass.test(password); // returns true
(...)
Reuse patterns using capture groupsThere's going to be patterns that occur multiple times in a string. Instead of manually repeat a regex, we can search for repeated substrings using capture groups.
Parentheses (
and )
are used to find repeated substrings, putting the regex of the pattern that is going to repeat
in between those.
To specify where that repeat string will appear, we will use backslash \
and a number
. This number starts at 1 and
increases with each additional capture group you use.
// Match any word that occurs twice separated by a space
let repeatStr = 'regex regex';
let repeatRegex = /(\w+)\s\1/; // Here (\w+) is the first group,
// then we match whitespace with \s
// and reuse the first group with \1
repeatRegex.test(repeatStr); // true
repeatStr.match(repeatRegex); // ['regex regex', 'regex']
As we can see in the last line above, using .match() will return an array with the string it matches, along with its capture group.
We can use [string1].replace([pattern1], [string2])
to look for a pattern and replace it.
pattern1 is the regex you want to search for, and string2 the string to replace or a function.
You can also access capture groups in the replacement string with dollar signs $n
, like:
const cities = 'Bristol Liverpool';
const matchTwoWordsWithAWhiteSpace = /(\w+)\s(\w+)/;
const newStringWithWordsInverted = '$2 $1';
cities.replace(matchTwoWordsWithAWhiteSpace, newStringWithWordsInverted);
// 'Liverpool Bristol'