Dirk Harriman Banner Image

 

Notes Javascript - Regular Expressions


 

 

Regular Expressions

Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the following:

Regular Expression Methods
String Methods
Special Characters In Regular Expressions.
Item Characters/Constructs
Characetr Classes [xyz], [^xyz], ., \d, \D, \w, \W, \s, \S, \t, \r, \n, \v, \f, [\b], \0, \cX, \xhh, \uhhhh, \u{hhhh}, x|y
Assertions ^, $, \b, \B, x(?=y), x(?!y), (?<=y)x, (?
Groups & Backreferences (x), (?<Name>x), (?:x), \n, \k<Name>
Quantifiers x*, x+, x?, x{n}, x{n,}, x{n,m}
Unicode Property Escapes \p{UnicodeProperty}, \P{UnicodeProperty}

 

Regular Expression Methods

exec()

The exec() method executes a search for a match in a specified string and returns a result array, or null.

const regex1 = RegExp('foo*', 'g'); const str1 = 'table football, foosball'; let array1; while ((array1 = regex1.exec(str1)) !== null) { console.log(`Found ${array1[0]}. Next starts at ${regex1.lastIndex}.`); // Expected output: "Found foo. Next starts at 9." // Expected output: "Found foo. Next starts at 19." }

test()

The test() method executes a search for a match between a regular expression and a specified string. Returns true if there is a match; false otherwise.

JavaScript RegExp objects are stateful when they have the global or sticky flags set (e.g., /foo/g or /foo/y). They store a lastIndex from the previous match. Using this internally, test() can be used to iterate over multiple matches in a string of text (with capture groups).

const str = 'table football'; const regex = new RegExp('foo*'); const globalRegex = new RegExp('foo*', 'g'); console.log(regex.test(str)); // Expected output: true console.log(globalRegex.lastIndex); // Expected output: 0 console.log(globalRegex.test(str)); // Expected output: true console.log(globalRegex.lastIndex); // Expected output: 9 console.log(globalRegex.test(str)); // Expected output: false


 

String Methods

match()

The match() method retrieves the result of matching a string against a regular expression.

const paragraph = 'The quick brown fox jumps over the lazy dog. It barked.'; const regex = /[A-Z]/g; const found = paragraph.match(regex); console.log(found); // Expected output: Array ["T", "I"]

matchAll()

The matchAll() method returns an iterator of all results matching a string against a regular expression, including capturing groups.

const regexp = /t(e)(st(\d?))/g; const str = 'test1test2'; const array = [...str.matchAll(regexp)]; console.log(array[0]); // Expected output: Array ["test1", "e", "st1", "1"] console.log(array[1]); // Expected output: Array ["test2", "e", "st2", "2"]

replace()

The replace() method returns a new string with one, some, or all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function called for each match. If pattern is a string, only the first occurrence will be replaced. The original string is left unchanged.

Syntax

replace( pattern, replacement )

Function Parameters
Parameter Description
pattern Can be a string or an object with a Symbol.replace method, the typical example being a regular expression. Any value that doesn't have the Symbol.replace method will be coerced to a string.
replacement Can be a string or a function.
 
String
If it's a string, it will replace the substring matched by pattern. A number of special replacement patterns are supported; see the Specifying a string as the replacement section below.
 
Function
If it's a function, it will be invoked for every match and its return value is used as the replacement text. The arguments supplied to this function are described in the Specifying a function as the replacement section below.
Specifying A String As The Replacement

The replacement string can include the following special replacement patterns:

String Replacement Patterns
Parameter Description
$$ Inserts a $.
$& Inserts the matched substring.
$` Inserts the portion of the string that precedes the matched substring.
$' Inserts the portion of the string that follows the matched substring.
$n Inserts the nth (1-indexed) capturing group where n is a positive integer less than 100.
$<Name> Inserts the named capturing group where Name is the group name.

$n and $<Name> are only available if the pattern argument is a RegExp object. If the pattern is a string, or if the corresponding capturing group isn't present in the regex, then the pattern will be replaced as a literal. If the group is present but isn't matched (because it's part of a disjunction), it will be replaced with an empty string.

const p = 'The quick brown fox jumps over the lazy dog. If the dog reacted, was it really lazy?'; console.log(p.replace('dog', 'monkey')); // Expected output: "The quick brown fox jumps over the lazy monkey. If the dog reacted, was it really lazy?" const regex = /Dog/i; console.log(p.replace(regex, 'ferret')); // Expected output: "The quick brown fox jumps over the lazy ferret. If the dog reacted, was it really lazy?" let text = "Mr Blue has a blue house and a blue car"; let result = text.replace(/blue|house|car/gi, function (x) { return x.toUpperCase(); });

Replacer Function As Parameter

In the last example above, the second parameter to the replace() function is a function that returns a replacement string. You can specify a function as the second parameter. In this case, the function will be invoked after the match has been performed. The function's result (return value) will be used as the replacement string.
The following are the specifications for the replacer function.

function replacer(match, p1, p2, /* ..., */ pN, offset, string, groups) { return replacement; }

Parameter Description
match The matched substring. (Corresponds to $& above.)
p1-pN The nth string found by a capture group (including named capturing groups), provided the first argument to replace() is a RegExp object. (Corresponds to $1, $2, etc. above.) For example, if the pattern is /(\a+)(\b+)/, then p1 is the match for (\a+), and p2 is the match for (\b+).
If the group is part of a disjunction (e.g. "abc".replace(/(a)|(b)/, replacer)), the unmatched alternative will be undefined.
offset The offset of the matched substring within the whole string being examined. For example, if the whole string was 'abcd', and the matched substring was 'bc', then this argument will be 1.
string The whole string being examined.
groups An object whose keys are the used group names, and whose values are the matched portions (undefined if not matched). Only present if the pattern contains at least one named capturing group.

The exact number of arguments depends on whether the first argument is a RegExp object, and, if so, how many capture groups it has.

replaceAll()

The replaceAll() method returns a new string with all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function to be called for each match. The original string is left unchanged.

const p = 'The quick brown fox jumps over the lazy dog. If the dog reacted, was it really lazy?'; console.log(p.replaceAll('dog', 'monkey')); // Expected output: "The quick brown fox jumps over the lazy monkey. If the monkey reacted, was it really lazy?" // Global flag required when calling replaceAll with regex const regex = /Dog/ig; console.log(p.replaceAll(regex, 'ferret')); // Expected output: "The quick brown fox jumps over the lazy ferret. If the ferret reacted, was it really lazy?"

search()

The search() method executes a search for a match between a regular expression and this String object.

const paragraph = 'The quick brown fox jumps over the lazy dog. If the dog barked, was it really lazy?'; // ANY CHARACTER THAT IS NOT A WORD CHARACTER OR WHITESPACE const regex = /[^\w\s]/g; console.log(paragraph.search(regex)); // EXPECTED OUTPUT: 43 console.log(paragraph[paragraph.search(regex)]); // EXPECTED OUTPUT: "."

split()

The split() method takes a pattern and divides a String into an ordered list of substrings by searching for the pattern, puts these substrings into an array, and returns the array.

const str = 'The quick brown fox jumps over the lazy dog.'; const words = str.split(' '); console.log(words[3]); // Expected output: "fox" const chars = str.split(''); console.log(chars[8]); // Expected output: "k" const strCopy = str.split(); console.log(strCopy); // Expected output: Array ["The quick brown fox jumps over the lazy dog."]

Creating Regular Expressions
A Regular Expression Literal

Using a regular expression literal, which consists of a pattern enclosed between slashes, as follows:

const re = /ab+c/; // IN THE FOLLOWING i IS A FLAG const re = /ab+c/i;

Regular expression literals provide compilation of the regular expression when the script is loaded. If the regular expression remains constant, using this can improve performance.

Calling A Regular Expression Constructor Function

Or calling the constructor function of the RegExp object, as follows:

const re = new RegExp("ab+c"); // IN THE FOLLOWING i IS A FLAG const re = new RegExp("ab+c", "i"); const re = new RegExp(/ab+c/, "i");

Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.


 

Writing A Regular Expression Pattern

A regular expression pattern is composed of simple characters, such as /abc/, or a combination of simple and special characters, such as /ab*c/ or /Chapter (\d+)\.\d*/.

Quick Definition Table
*   Zero or more
+   One or more
?   Zero or one
[ ]  Defines a set or class of characters. Certain characters lose their meaning such as *, $ and ^ which means NOT or !
( )  Used to designate groups of regular expressions. Can be used to return the results of the match within the parentheses. Useful for find and replace or programming.
{ }  Used to designate iterations i.e. a{4,10} means between 4 and 10 character a's
\   The backslash is used to escape special characters. i.e. \* means a literal asterisk "*".
Also used to designate special characters i.e. \t = tab, \n = newline, \  = space
^   The hat symbol designates start
$   The dollar symbol designates end
.   The period matches any character like a wild card
|   The pipe symbol is used to designate an OR
&   The and symbol is used to replace text. Equivalent to \0 For more \0 to \9
Flags Definition Table
Flag Description Corresponding Property
d   Generate indices for substring matches. hasIndices
g   Global search. global
i   Case-insensitive search. ignoreCase
m   Allows ^ and $ to match newline characters. multiline
s   Allows . to match newline characters. dotAll
u   "Unicode"; treat a pattern as a sequence of Unicode code points. unicode
y   Perform a "sticky" search that matches starting at the current position in the target string. sticky

 

Assertions

Assertions include boundaries, which indicate the beginnings and endings of lines and words, and other patterns indicating in some way that a match is possible (including look-ahead, look-behind, and conditional expressions).

Assertions Definition Table
^   The hat symbol designates the beginning of input. If the multiline flag is set to true, also matches immediately after a line break character.
For example, /^A/ does not match the "A" in "an A", but does match the first "A" in "An A".

This character has a different meaning when it appears at the start of a character class.

$   The dollar symbol matches the end of input. If the multiline flag is set to true, also matches immediately before a line break character.
For example, /t$/ does not match the "t" in "eater", but does match it in "eat".
\b   Matches a word boundary. This is the position where a word character is not followed or preceded by another word character, such as between a letter and a space.
 
Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero.
 
Examples:
/\bm/   Matches the "m" in "moon".
/oo\b/   Does not match the "oo" in "moon", because "oo" is followed by "n" which is a word character.
/oon\b/   Matches the "oon" in "moon", because "oon" is the end of the string, thus not followed by a word character.
/\w\b\w/   Will never match anything, because a word character can never be followed by both a non-word and a word character.
To match a backspace character ([\b]), see Character Classes.
\B   Matches a non-word boundary. This is a position where the previous and next character are of the same type: Either both must be words, or both must be non-words, for example between two letters or between two spaces. The beginning and end of a string are considered non-words. Same as the matched word boundary, the matched non-word boundary is also not included in the match.
 
For example, /\Bon/ matches "on" in "at noon", and /ye\B/ matches "ye" in "possibly yesterday".

 

Other Assertions

The ? character may also be used as a quantifier.

Assertions Definition Table
RE Type Description
x(?=y)   Lookahead Assertion Matches "x" only if "x" is followed by "y".
 
For Example:
/Jack(?=Sprat)/ Matches "Jack" only if it is followed by "Sprat".
/Jack(?=Sprat|Frost)/ Matches "Jack" only if it is followed by "Sprat" or "Frost".
However: neither "Sprat" nor "Frost" is part of the match results.
x(?!y)   Negative Lookahead Assertion Matches "x" only if "x" is not followed by "y".
 
/\d+(?!\.)/ Matches a number only if it is not followed by a decimal point.
/\d+(?!\.)/.exec('3.141') Matches "141" but not "3".
(?<=y)x   Lookbehind Assertion Matches "x" only if "x" is preceded by "y".
 
/(?<=Jack)Sprat/ Matches "Sprat" only if it is preceded by "Jack".
/(?<=Jack|Tom)Sprat/ Matches "Sprat" only if it is preceded by "Jack" or "Tom".
However: neither "Jack" nor "Tom" is part of the match results.
(?<!y)x   Negative Lookbehind Assertion Matches "x" only if "x" is not preceded by "y".
 
/(?<!-)\d+/ Matches a number only if it is not preceded by a minus sign.
/(?<!-)\d+/.exec('3') Matches "3".
/(?<!-)\d+/.exec('-3') Match is not found because the number is preceded by the minus sign.

function runScript1(){ let divResult = document.getElementById("results1"); let taSentence = document.getElementById("textArea1"); let strResult = ""; let taMultilineText = taSentence.value; // 1 - USE ^ TO FIX THE MATCHING AT THE BEGINNING OF THE STRING, AND RIGHT AFTER NEWLINE. // FIX 'tey' => 'hey' AND 'tangs' => 'hangs' BUT DO NOT TOUCH 'traa'. taMultilineText = taMultilineText.replace(/^t/gim, "h"); // 2 - USE $ TO FIX MATCHING AT THE END OF THE TEXT. // FIX 'traa' => 'tree.'. taMultilineText = taMultilineText.replace(/aa$/gim, "ee."); // 3 - USE \b TO MATCH CHARACTERS RIGHT ON BORDER BETWEEN A WORD AND A SPACE. // FIX 'ihe' => 'the' BUT DO NOT TOUCH 'light'. taMultilineText = taMultilineText.replace(/\bi/gim, "t"); // 4 - USE \B TO MATCH CHARACTERS INSIDE BORDERS OF AN ENTITY. // FIX 'greon' => 'green' BUT DO NOT TOUCH 'on'. taMultilineText = taMultilineText.replace(/\Bo/gim, "e"); // 5 - REPLACE THE NEWLINE IN THE TEXT WITH AN HTML BREAK. taMultilineText = taMultilineText.replace(/(\r\n|\r|\n)/g, "<br \>"); divResult.innerHTML = taMultilineText; }

The text in the following text area is full of spelling errors:

Run Script 1 
 

Response


 
Matching The Beginning Of Input Using A ^ Control Character

Use ^ for matching at the beginning of input.
In this example, we can get the fruits that start with 'A' by a /^A/ regex. For selecting appropriate fruits we can use the filter method with an arrow function.

const fruits = ["Apple", "Watermelon", "Orange", "Avocado", "Strawberry"]; // SELECT FRUITS STARTED WITH 'A' BY /^A/ REGEX. // HERE '^' CONTROL SYMBOL USED ONLY IN ONE ROLE: MATCHING BEGINNING OF AN INPUT. const fruitsStartsWithA = fruits.filter((fruit) => /^A/.test(fruit)); console.log(fruitsStartsWithA); // [ 'Apple', 'Avocado' ]

function runScript2(){ let divResult = document.getElementById("results2"); let strResult = ""; const fruits = ["Apple", "Watermelon", "Orange", "Avocado", "Strawberry"]; const fruitsStartsWithA = fruits.filter((fruit) => /^A/.test(fruit)); iterateArr(fruitsStartsWithA, "A - Fruits", divResult); } function iterateArr(anArr, divMsg, divItem){ if (divMsg != "") {divItem.innerHTML += "<b class='title'>"+ divMsg +"</b>";} for(i=0;i<anArr.length;i++){ divItem.innerHTML += "Index: "+ i +" Array Item: "+ anArr[i] +"<br/>"; } }

Run Script 2 
 

Response


 

In the second example ^ is used both for matching at the beginning of input and for creating negated or complemented character class when used within character classes.

function runScript3(){ let divResult = document.getElementById("results3"); let strResult = ""; const fruits = ["Apple", "Watermelon", "Orange", "Avocado", "Strawberry"]; // SELECTING FRUITS THAT DO NOT START BY 'A' WITH A /^[^A]/ REGEX. // IN THIS EXAMPLE, TWO MEANINGS OF '^' CONTROL SYMBOL ARE REPRESENTED: // 1 - MATCHING BEGINNING OF THE INPUT // 2 - A NEGATED OR COMPLEMENTED CHARACTER CLASS: [^A] // THAT IS, IT MATCHES ANYTHING THAT IS NOT ENCLOSED IN THE BRACKETS. const fruitsStartsWithNotA = fruits.filter((fruit) => /^[^A]/.test(fruit)); iterateArr(fruitsStartsWithNotA, "Non A - Fruits", divResult); }

Run Script 3 
 

Response


 

function runScript4(){ let divResult = document.getElementById("results4"); let strResult = ""; const fruitsWithDescription = ["Red apple", "Orange orange", "Green Avocado"]; // SELECT DESCRIPTIONS THAT CONTAINS 'en' OR 'ed' WORDS ENDINGS: const enEdSelection = fruitsWithDescription.filter((descr) => /(en|ed)\b/.test(descr),); iterateArr(enEdSelection, "Descriptive Fruit", divResult); }

Run Script 4 
 

Response


 
Lookahead Assertion

function runScript5(){ let divResult = document.getElementById("results5"); let strResult = ""; const regex = /First(?= test)/g; let strText1 = "First test"; let strText2 = "First peach"; let strText3 = "This is the First test in a year."; let strText4 = "This is the First peach in a month."; let regexResult = strText1.match(regex); strResult += "String 1: "; if (regexResult != null) { strResult += iterateArrStr(regexResult); } else { strResult += "<br/>NULL"; } regexResult = strText2.match(regex); strResult += "<br/>String 2: "; if (regexResult != null) { strResult += iterateArrStr(regexResult); } else { strResult += "<br/>NULL"; } regexResult = strText3.match(regex); strResult += "<br/>String 3: "; if (regexResult != null) { strResult += iterateArrStr(regexResult); } else { strResult += "<br/>NULL"; } regexResult = strText4.match(regex); strResult += "<br/>String 4: "; if (regexResult != null) { strResult += iterateArrStr(regexResult); } else { strResult += "<br/>NULL"; } divResult.innerHTML = strResult; }

Run Script 5 
 

Response


 
Basic Negative Lookahead Assertion

For example, /\d+(?!\.)/ matches a number only if it is not followed by a decimal point.
/\d+(?!\.)/.exec('3.141') matches "141" but not "3"".

console.log(/\d+(?!\.)/g.exec("3.141")); // [ '141', index: 2, input: '3.141' ]

Meaning of '?!' Combination

The ?! combination has different meanings in assertions like /x(?!y)/ and character classes like [^?!].

const orangeNotLemon = "Do you want to have an orange? Yes, I do not want to have a lemon!"; // DIFFERENT MEANING OF '?!' COMBINATION USAGE IN ASSERTIONS /x(?!y)/ AND RANGES /[^?!]/ const selectNotLemonRegex = /[^?!]+have(?! a lemon)[^?!]+[?!]/gi; console.log(orangeNotLemon.match(selectNotLemonRegex)); // [ 'Do you want to have an orange?' ] const selectNotOrangeRegex = /[^?!]+have(?! an orange)[^?!]+[?!]/gi; console.log(orangeNotLemon.match(selectNotOrangeRegex)); // [ ' Yes, I do not want to have a lemon!' ]

Lookbehind Assertion

const oranges = ["ripe orange A", "green orange B", "ripe orange C"]; const ripeOranges = oranges.filter((fruit) => /(?<=ripe )orange/.test(fruit)); console.log(ripeOranges); // [ 'ripe orange A', 'ripe orange C' ]


 

Character Classes

Character classes distinguish kinds of characters such as, for example, distinguishing between letters and digits.

Assertions Definition Table
Set Meaning
[xyz]
[a-c]
A character class. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets, it is taken as a literal hyphen to be included in the character class as a normal character.
 
For example, [abcd] is the same as [a-d].
They match the "b" in "brisket", and the "c" in "chop".
 
For example, [abcd-] and [-abcd] match the "b" in "brisket", the "c" in "chop", and the "-" (hyphen) in "non-profit".
 
For example, [\w-] is the same as [A-Za-z0-9_-].
They both match the "b" in "brisket", the "c" in "chop", and the "n" in "non-profit".
[^xyz]
[^a-c]
A negated or complemented character class. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first character after the ^ or the last character enclosed in the square brackets, it is taken as a literal hyphen to be included in the character class as a normal character. For example, [^abc] is the same as [^a-c]. They initially match "o" in "bacon" and "h" in "chop".
 

Note: The ^ character may also indicate the beginning of input.

. Has one of the following meanings:
  • Matches any single character except line terminators: \n, \r, \u2028 or \u2029.
    For example, /.y/ matches "my" and "ay", but not "yes", in "yes make my day", as there is no character before "y" in "yes".
  • Inside a character class, the dot loses its special meaning and matches a literal dot.
Note that the m multiline flag doesn't change the dot behavior. So to match a pattern across multiple lines, the character class [^] can be used, it will match any character including newlines.  
The s "dotAll" flag allows the dot to also match line terminators.
\d Matches any digit (Arabic numeral). Equivalent to [0-9].
For example, /\d/ or /[0-9]/ matches "2" in "B2 is the suite number".
\D Matches any character that is not a digit (Arabic numeral). Equivalent to [^0-9]. For example, /\D/ or /[^0-9]/ matches "B" in "B2 is the suite number".
\w Matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to [A-Za-z0-9_].
For example, /\w/ matches "a" in "apple", "5" in "$5.28", "3" in "3D" and "m" in "Emanuel".
\W Matches any character that is not a word character from the basic Latin alphabet. Equivalent to [^A-Za-z0-9_].
For example, /\W/ or /[^A-Za-z0-9_]/ matches "%" in "50%" and "É" in "Émanuel".
\s Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces.
Equivalent to [\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].
For example, /\s\w*/ matches " bar" in "foo bar".
\S Matches a single character other than white space.
Equivalent to [^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].
For example, /\S\w*/ matches "foo" in "foo bar".
\t Matches a horizontal tab.
\r Matches a carriage return.
\n Matches a linefeed.
\v Matches a vertical tab.
\f Matches a form-feed.
[\b] Matches a backspace. If you're looking for the word-boundary character (\b)
\0 Matches a NUL character. Do not follow this with another digit.
\cX Matches a control character using caret notation, where "X" is a letter from A-Z (corresponding to codepoints U+0001-U+001A). For example, /\cM\cJ/ matches "\r\n".
\xhh Matches the character with the code hh (two hexadecimal digits).
\uhhhh Matches a UTF-16 code-unit with the value hhhh (four hexadecimal digits).
\u{hhhh}
or
\u{hhhhh}
(Only when the u flag is set.) Matches the character with the Unicode value U+hhhh or U+hhhhh (hexadecimal digits).
\p{UnicodeProperty},
\P{UnicodeProperty}
Matches a character based on its Unicode character properties (to match just, for example, emoji characters, or Japanese katakana characters, or Chinese/Japanese Han/Kanji characters, etc.).
\ Indicates that the following character should be treated specially, or "escaped". It behaves one of two ways.
  • For characters that are usually treated literally, indicates that the next character is special and not to be interpreted literally. For example, /b/ matches the character "b". By placing a backslash in front of "b", that is by using /\b/, the character becomes special to mean match a word boundary.
  • For characters that are usually treated specially, indicates that the next character is not special and should be interpreted literally. For example, "*" is a special character that means 0 or more occurrences of the preceding character should be matched; for example, /a*/ means match 0 or more "a"s. To match * literally, precede it with a backslash; for example, /a\*/ matches "a*".

Note: To match this character literally, escape it with itself. In other words to search for \ use /\\/.

x|y Disjunction: Matches either "x" or "y". Each component, separated by a pipe (|), is called an alternative. For example, /green|red/ matches "green" in "green apple" and "red" in "red apple".
 

Note: A disjunction is another way to specify "a set of choices", but it's not a character class. Disjunctions are not atoms, you need to use a group to make it part of a bigger pattern. [abc] is functionally equivalent to (?:a|b|c).

Random Character Class Examples
Set Meaning
[a-z] All lower case letters.
[^a-z] All characters that are not lower case letters.
[0-9] All digits
[aeiouAEIOU] All vowels, upper case and lower case.
a(b|c|d)e Valid input is: ae, abe, ace, or ade.

function runScript6(){ let divResult = document.getElementById("results6"); let strResult = ""; const randomData = "015 354 8787 687351 3512 8735"; const regexpFourDigits = /\b\d{4}\b/g; /* RE BREAKDOWN \b INDICATES A BOUNDARY (i.e. DO NOT START MATCHING IN THE MIDDLE OF A WORD) \d{4} INDICATES A DIGIT, FOUR TIMES \b INDICATES ANOTHER BOUNDARY (i.e. DO NOT END MATCHING IN THE MIDDLE OF A WORD) */ let arrResult = randomData.match(regexpFourDigits); strResult = iterateArrStr(arrResult); divResult.innerHTML = strResult; }

Run Script 6 
 

Response


 

function runScript7(){ let divResult = document.getElementById("results7"); let taSentence = document.getElementById("textArea7"); let strResult = ""; const regexpWordStartingWithA = /\b[aA]\w+/g; /* RE BREAKDOWN \b INDICATES A BOUNDARY (i.e. DO NOT START MATCHING IN THE MIDDLE OF A WORD) [aA] INDICATES THE LETTER a OR A \w+ INDICATES ANY CHARACTER *FROM THE LATIN ALPHABET*, MULTIPLE TIMES */ let arrResult = taSentence.value.match(regexpWordStartingWithA); strResult = iterateArrStr(arrResult); divResult.innerHTML = strResult; }

Run Script 7 
 

Response


 
Counting Vowels

function runScript8(){ let divResult = document.getElementById("results8"); let taSentence = document.getElementById("textArea8"); const regexpVowels = /[AEIOUYaeiouy]/g; let strResult = taSentence.value.match(regexpVowels).length; divResult.innerHTML = "Number of vowels: "+ strResult; }

Run Script 8 
 

Response


 

 

Groups & Backreferences

Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string.
Backreferences refer to a previously captured group in the same regular expression.

Groups

function runScript9(){ let divResult = document.getElementById("results9"); let taSentence = document.getElementById("textArea9"); const regexpWithoutE = /\b[a-df-z]+\b/ig; let strResult = iterateArrStr(taSentence.value.match(regexpWithoutE)); divResult.innerHTML = "Matches: "+ strResult; }

Run Script 9 
 

Response


 
Backreferences

Note the difference in this code using brackets to designate items to capture.

function runScript10(){ let divResult = document.getElementById("results10"); let taSentence = document.getElementById("textArea10"); const regexpSize = /([0-9]+)x([0-9]+)/; let arrSize = taSentence.value.match(regexpSize); let strResult = `Width: ${arrSize[1]} / Height: ${arrSize[2]}.`; divResult.innerHTML = "Dimensions: "+ strResult; }

Run Script 10 
 

Response


 
Capture Group Types
Characters Meaning
(x) Capturing group: Matches x and remembers the match.
For example, /(foo)/ matches and remembers "foo" in "foo bar".
 
A regular expression may have multiple capturing groups. In results, matches to capturing groups typically in an array whose members are in the same order as the left parentheses in the capturing group. This is usually just the order of the capturing groups themselves. This becomes important when capturing groups are nested. Matches are accessed using the index of the result's elements ([1], ..., [n]) or from the predefined RegExp object's properties ($1, ..., $9).
 
Capturing groups have a performance penalty. If you don't need the matched substring to be recalled, prefer non-capturing parentheses (see below).
 
String.prototype.match() won't return groups if the /.../g flag is set. However, you can still use String.prototype.matchAll() to get all matches.
(?<name>x) Named capturing group: Matches "x" and stores it on the groups property of the returned matches under the name specified by <name>. The angle brackets (< and >) are required for group name.
 
For example, to extract the United States area code from a phone number, we could use /\((?<area>\d\d\d)\)/. The resulting number would appear under matches.groups.area.
(?:x) Non-capturing group: Matches "x" but does not remember the match. The matched substring cannot be recalled from the resulting array's elements ([1], ..., [n]) or from the predefined RegExp object's properties ($1, ..., $9).
\n Where "n" is a positive integer. A back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses). For example, /apple(,)\sorange\1/ matches "apple, orange," in "apple, orange, cherry, peach".
\k<name> A back reference to the last substring matching the Named capture group specified by <Name>.
 
For example, /(?<title>\w+), yes \k<title>/ matches "Sir, yes Sir" in "Do you copy? Sir, yes Sir!".
 

Note: \k is used literally here to indicate the beginning of a back reference to a Named capture group.


 

Quantifiers

Quantifiers indicate numbers of characters or expressions to match.

Quantifier Types
Characters Meaning
x* Matches the preceding item "x" 0 or more times.
For example, /bo*/ matches "boooo" in "A ghost booooed" and "b" in "A bird warbled", but nothing in "A goat grunted".
x+ Matches the preceding item "x" 1 or more times. Equivalent to {1,}.
For example, /a+/ matches the "a" in "candy" and all the "a"'s in "caaaaaaandy".
x? Matches the preceding item "x" 0 or 1 times.
For example, /e?le?/ matches the "el" in "angel" and the "le" in "angle."
 
If used immediately after any of the quantifiers *, +, ?, or {}, makes the quantifier non-greedy (matching the minimum number of times), as opposed to the default, which is greedy (matching the maximum number of times).
x{n} Where "n" is a positive integer, matches exactly "n" occurrences of the preceding item "x".
For example, /a{2}/ does not match the "a" in "candy", but it matches all of the "a"'s in "caandy", and the first two "a"'s in "caaandy".
x{n,} Where "n" is a positive integer, matches at least "n" occurrences of the preceding item "x".
For example, /a{2,}/ doesn't match the "a" in "candy", but matches all of the a's in "caandy" and in "caaaaaaandy".
x{n,m} Where "n" is 0 or a positive integer, "m" is a positive integer, and m > n, matches at least "n" and at most "m" occurrences of the preceding item "x".
For example, /a{1,3}/ matches nothing in "cndy", the "a" in "candy", the two "a"'s in "caandy", and the first three "a"'s in "caaaaaaandy". Notice that when matching "caaaaaaandy", the match is "aaa", even though the original string had more "a"s in it.
x*?
x+?
x??
x{n}?
x{n,}?
x{n,m}?
By default quantifiers like * and + are "greedy", meaning that they try to match as much of the string as possible. The ? character after the quantifier makes the quantifier "non-greedy": meaning that it will stop as soon as it finds a match. For example, given a string like:
 
"some <foo> <bar> new </bar> </foo> thing"
 
/<.*>/ will match: "<foo> <bar> new </bar> </foo>"
 
/<.*?>/ will match: "<foo>"  

 

Unicode Property Escapes


 

Run Script 11 
 

Response


 

Escaping


 
Examples Of Text Replacement

The following text area contains a partial directory listing output from MS DOS. We want to extract only the directory name. A regular expression can be used to accomplish this task.

Source Text
Building The Regular Expression
RE Code Desciption Capture Code
([0-9]{2}) Two digits (The Month) $1
\/ A slash N/C
([0-9]{2}) Two digits (The Day) $2
\/ A slash N/C
([0-9]{4}) Four digits (The Year) $3
[ ]+ A space or a series of spaces N/C
([0-9]{2}) Two digits (The Hour) $4
: A colon N/C
([0-9]{2}) Two digits (The Minutes) $5
[ ]+ A space or a series of spaces N/C
([A-Z]{2}) Two alpha characters (AM or PM) $6
[ ]+ A space or a series of spaces N/C
<DIR> The literal <DIR> N/C
[ ]+ A space or a series of spaces N/C
([A-Z0-9 ]+) A series of upper and lower case characters and spaces. (The Folder Name) $7

([0-9]{2})\/([0-9]{2})\/([0-9]{4})[ ]+([0-9]{2}):([0-9]{2})[ ]+([A-Z]{2})[ ]+<DIR>[ ]+([A-Z0-9 ]+)

The previous regular expression is set up to capture more that what we need, which is just the folder name. This is done to illustrate how you can capture only the relevant information and discard or ignore text that is merely formatting.

Replacement String

Since we only want the folder name, the replacement string is $7, which is the capture code for the folder name.

function runScript100(){ let divResult = document.getElementById("results100"); let taSource = document.getElementById("textArea100"); let strResult = taSource.value; let regexp = /([0-9]{2})\/([0-9]{2})\/([0-9]{4})[ ]+([0-9]{2}):([0-9]{2})[ ]+([A-Z]{2})[ ]+([A-Z0-9 ]+)/g; strResult = strResult.replaceAll(regexp,'$7'); divResult.innerHTML = "<br/>"+ strResult; }

Run Script 100 
 

 
The-Bands Problem

This next problem involves a unique situation that happens with band names that start with the word "The", as in The Beatles or The Clash. When searching a database of bands alphabetically where the band name starts with a certain alpha character, bands preceded with the word "The" won't appear unless the first alpha character following the word "The" is a "T" and even then it would probabaly be incorrect alphabetically. A solution would be to eliminate the "The" and the subsequent space from all bands that are "The" bands. The problem then would be that the listing for the band name would be incorrect. The solution might be to include a flag that indicates that a band is a "The" band, but it's name in the database doesn't include the preceding "The".

In order to make changes in the data to only the items that match the regular expression, they first must be isolated from the rest. In the following code a regular expression test() is used to find matches and isolate them for manipulation using replaceAll().

Music Database

Let's say there is a database called Music that we need to modify the names of the artists, in a field called SongArtist, that have a preceding "The" and set a metadata item called MusicMeta to 1 indicating that the artist is a "The" band artist.
The following SQL statement is what this application seeks to yield from the given data.

UPDATE Music SET SongArtist='Beatles', SongMeta=1 WHERE SongArtist='The Beatles'

Source Text
Building The Regular Expression
RE Code Desciption Capture Code
(The) The literal word The $1
" " A literal space. N/C
([A-Za-z ]+) A series of alpha characters with possible spaces. (The band name minus "The") $2

(The) ([A-Za-z ]+)

Replacement String

The replacement string builds SQL statements from the source text:
 
UPDATE Music SET SongArtist='$2', SongMeta=1 WHERE SongArtist='The $2'.

function runScript101(){ let divResult = document.getElementById("results101"); let taSource = document.getElementById("textArea101"); let strResult = ""; let arrSource = taSource.value.split('\n'); // CREATES AN ARRAY FROM LINES OF TEXT let regexp = /(The) ([a-zA-Z0-9 ]+)/g; // LOOKS FOR THE-BANDS let replaceStr = "UPDATE Music SET SongArtist='$2', SongMeta=1 WHERE SongArtist='The $2'"; for(let i=0;i<arrSource.length;i++) { // TEST FOR THE-BAND MATCH if (regexp.test(arrSource[i])) { // IF MATCH, INCLUDE IN LIST. IGNORE ALL OTHERS strResult += "<br/>"+ arrSource[i].replaceAll(regexp,replaceStr); } } divResult.innerHTML = strResult; }

Run Script 101 
 

 
The Lottery Data Problem

In this problem, we have the following text that represents lottery draws. It includes the date that the drawing occured, picks one through five, a powerball pick, a megaplier pick and the amount of the jackpot at the time of the drawing.

With this information, we need to build SQL to enter the data into a database.

LottoDraws DrawID int (Auto) DrawDate datetime Pick1 int Pick2 int Pick3 int Pick4 int Pick5 int Powerball int Megaplier int Jackpot int

The Text To Manipulate
Building The Regular Expression
RE Code Desciption Capture Code
[a-zA-Z]{3} Three alpha characters (The Day of the Week) N/C
, A literal comma followed by a space N/C
([a-zA-Z]+) The text for the month of the draw $1
, A literal comma followed by a space N/C
([0-9]+) The integer for the day of the month $2
, A literal comma followed by a space N/C
([0-9]{4}) The four digit integer text for the year $3
[ ]+ A space or a series of spaces N/C
([0-9]+) Pick 1 $4
[ ]+ A space or a series of spaces N/C
([0-9]+) Pick 2 $5
[ ]+ A space or a series of spaces N/C
([0-9]+) Pick 3 $6
[ ]+ A space or a series of spaces N/C
([0-9]+) Pick 4 $7
[ ]+ A space or a series of spaces N/C
([0-9]+) Pick 5 $8
[ ]+ A space or a series of spaces N/C
([0-9]+) Powerball $9
[ ]+ A space or a series of spaces N/C
([0-9]+)x Megaplier $10
[ ]+ A space or a series of spaces N/C
/$ The dollar sign literal N/C
([0-9]+) The jackpot integer $11
[ ]+ A space or a series of spaces N/C
Million The literal "Million" N/C

Note: The following code display along with the code following is manipulated with line breaks to make it fit on the screen, the actual code is not.

/[a-zA-Z]{3}, ([a-zA-Z]+), ([0-9]+), ([0-9]{4})[ ]+ ([0-9]+)[ ]+([0-9]+)[ ]+([0-9]+)[ ]+([0-9]+)[ ]+([0-9]+)[ ]+([0-9]+)[ ]+ ([0-9]+)x[ ]+\$([0-9]+)[ ]+Million/g

Replacement

In this case there is no replacement string, but a replacement function called replacer that is shown with the code below.

function runScript102(){ let divResult = document.getElementById("results102"); let taSource = document.getElementById("textArea102"); let strResult = taSource.value; let regexp = /[a-zA-Z]{3}, ([a-zA-Z]+), ([0-9]+), ([0-9]{4})[ ]+ ([0-9]+)[ ]+([0-9]+)[ ]+([0-9]+)[ ]+([0-9]+)[ ]+ ([0-9]+)[ ]+([0-9]+)[ ]+ ([0-9]+)x[ ]+\$([0-9]+)[ ]+Million/g; strResult = strResult.replace(regexp,replacer); divResult.innerHTML = strResult; } function replacer(match, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, offset, string) { let returnStr = ""; let monthNum = getMonthFromString(p1); returnStr = "INSERT INTO LottonDraw (DrawDate, Pick1, Pick2, Pick3, Pick4, Pick5, "; returnStr += "Powerball, Megaplier, Jackpot)<br/>"; returnStr += "VALUES ('"+ monthNum +"-"+ p2 +"-"+ p3 +"',"; returnStr += p4 +","+ p5 +","+ p6 +","+ p7 +","+ p8 +","+ p9 +","+ p10 +","+ p11 +")"; return returnStr; } function getMonthFromString(mon){ var d = Date.parse(mon + "1, 2012"); if(!isNaN(d)){ return new Date(d).getMonth() + 1; } return -1; }

Run Script 102