Regular Expressions
Operators
Operator | Description |
---|---|
* | Zero or more |
+ | One or more |
? | Zero or one |
{} | Used to designate iterations i.e. a{4,10} means between 4 and 10. |
\ | Used to designate literals i.e. \* means a literal "*". Also used to designate special characters i.e. \t = tab, \n = newline, \ = space |
^ | Designates start |
$ | Designates end |
. | Natches any character like a wild card |
[] | Defines a set or class of characters. Certain characters lose their meaning such as *, $ and ^ which means NOT or ! |
| | Used to designate an OR |
() | Used to designate groups of regular expressions. Can be used to return the results of the match within the parentheses. Useful for find and replace or programming. |
& | Used to replace text. Equivalent to \0 For more \0 to \9 |
Character Escape
The backslash character (\) in a regular expression indicates that the character that follows it either is a special character (as shown in the following table), or should be interpreted literally.
Escape Char | Description |
---|---|
\t | Matches a tab. |
\r | Matches a carriage return. |
\n | Matches a new line or line-feed. |
\w | Matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to [A-Za-z0-9_]. |
\W | Matches any character that is not a word character from the basic Latin alphabet. Equivalent to [^A-Za-z0-9_]. |
\s |
Matches a single white space character, including space, tab, form feed, line feed,
and other Unicode spaces. Equivalent to [\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]. |
\S |
Matches a single character other than white space. Equivalent to [^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]. |
\d | Matches any digit (Arabic numeral). Equivalent to [0-9]. |
\D | Matches any character that is not a digit (Arabic numeral). Equivalent to [^0-9]. |
Anchors
Anchor | Description |
---|---|
^ | By default, the match must occur at the beginning of the string; in multiline mode, it must occur at the beginning of the line. |
$ | By default, the match must occur at the end of the string or before \n at the end of the string; in multiline mode, it must occur at the end of the line or before \n at the end of the line. |
\A | The match must occur at the beginning of the string only (no multiline support). |
\Z | The match must occur at the end of the string, or before \n at the end of the string. |
\z | The match must occur at the end of the string only. |
\G | The match must start at the position where the previous match ended, or if there was no previous match, at the position in the string where matching started. |
\b | The match must occur on a word boundary. |
\B | The match must not occur on a word boundary. |
Groups & Backreferences
Characters | Description |
---|---|
(x) |
Capturing Group: Matches x and remembers the match.
A regular expression can have multiple capturing groups.
Matches are accessed using the index of the result's elements ([1], ..., [n])
or from the predefined RegExp object's properties ($1, ..., $9). The capturing groups, unless named, start with 1. The zero ($0) group is the entire sentence. |
(? |
Named Capturing Group: Matches "x" and stores it on the groups property of the returned matches under the name specified by <Name>. The angle brackets (< and >) are required for group name. |
(?:x) | Non-Capturing Group: Matches "x" but does not remember the match. The matched substring can not be recalled from the resulting array's elements ([1], ..., [n]) or from the predefined RegExp object's properties ($1, ..., $9). |
\n | Back Reference: Where "n" is a positive integer. A back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses). For example, /apple(,)\sorange\1/ matches "apple, orange," in "apple, orange, cherry, peach". |
\k<Name> |
Named Back Reference: A back reference to the last substring matching the Named capture group specified by <Name>. For example, /(?<title>\w+), yes \k<title>/ matches "Sir, yes Sir" in "Do you copy? Sir, yes Sir!". Note: The \k is used literally here to indicate the beginning of a back reference to a Named capture group. |
Back References Explained
In JavaScript regular expressions, backreferences allow you to refer back to previously matched groups within the same regular expression pattern. They are denoted by the backslash followed by a digit (e.g., \1, \2, \3, and so on).
Let's consider an example to understand how backreferences work. Suppose we have a string that contains repeated words separated by a space, and we want to find and remove those repetitions. We can use a regular expression with a backreference to accomplish this.
const str = "This is is a test test string string."; const regex = /(\b\w+\b)\s+\1/g; const result = str.replace(regex, "$1"); console.log(result);
In this example, we start by defining the regular expression pattern: `(\b\w+\b)\s+\1`.
Let's break it down:
-
(\b\w+\b) - This is a capturing group denoted by the parentheses.
It matches a word boundary (\b), followed by one or more word characters (\w+),
and ends with another word boundary (\b).
The capturing group ( ) allows us to refer back to this matched group later using the backreference. - \s+ - This matches one or more whitespace characters (spaces, tabs, etc.) that separate the repeated words.
-
\1 - This is the backreference. It refers back to the first captured group ((\b\w+\b)).
So, it will match the same word that was previously captured.
The g flag at the end of the regular expression enables global matching, meaning it will find all occurrences of repeated words, not just the first one.
In the str.replace(regex, "$1") statement, we use the replace method to replace all occurrences of repeated words with the value of the first captured group ($1). This effectively removes the duplicate words from the string.
When we run this code, the output will be: "This is a test string."
As you can see, the repeated words "is is" and "test test" have been removed, leaving only the unique words in the resulting string.
In summary, the backreference \1 allows us to refer back to a previously captured group in a regular expression pattern. It is useful for identifying and manipulating repeated patterns within a string.
Example 2
const text = "Hello hello"; const regex = /(\w+)\s+\1/; const match = regex.exec(text); console.log(match[0]); // "Hello hello" console.log(match[1]); // "Hello"
In the example above, we have a regular expression pattern /(\w+)\s+\1/. Let's break it down:
- (\w+) - This is a capturing group that matches one or more word characters (letters, digits, or underscores).
- \s+ - This matches one or more whitespace characters.
- \1 - This is a back reference to the first capturing group. It matches the same text that was previously matched by the first capturing group.
In the given text "Hello hello", the pattern matches "Hello hello" because the first capturing group captures "Hello", and the back reference \1 matches the same text "Hello". The \s+ matches the space between the two occurrences of "Hello".
When we call regex.exec(text), it returns an array containing the matched text and the captured groups. In this case, match[0] contains the entire matched text "Hello hello", and match[1] contains the text captured by the first capturing group, which is "Hello".
Back references are useful when you want to match repeated patterns or ensure that a specific pattern occurs multiple times in a row. They allow you to reference previously matched text within the same regular expression and can be handy for tasks like finding repeated words or validating input formats.
Run ScriptGroupings & REs
Group | Description |
---|---|
[a-z] | All lowercase letters. |
[A-Za-z] | All letters lowercase and uppercase. |
[0-9] | All digits. |
[aeiouAEIOU] | All vowels. |
[^a-z] | All letters that are not lowercase letters. |
a(b|c|d)e | ae, abe, ace or ade |
a.o | Matches "aro" in "around" and "abo" in "about" but not "acro" in "across" |
a*r | Matches "r" in "rack", "ar" in "ark", and "aar" in "aardvark" |
c.*e | Matches "cke" in "racket", "comme" in "comment", and "code" in "code" |
e+d | Matches "eed" in "feeder" and "ed" in "faded" |
e.+e | Matches "eede" in "feeder" but finds no matches in "feed" |
\w*?d | Matches "fad" and "ed" in "faded" but not the entire word "faded" due to the lazy match |
e\w+? | Matches "ee" in "asleep" and "ed" in "faded" but finds no matches in "fade" |
^car | Matches the word "car" only when it appears at the beginning of a line |
car\r?$ | Matches "car" only when it appears at the end of a line |
b[abc] | matches "ba", "bb", and "bc" |
be[n-t] | Matches "bet" in "between", "ben" in "beneath", and "bes" in "beside", but finds no matches in "below" |
([a-z])X\1 | Matches "aXa"and "bXb", but not "aXb". "\1" refers to the first expression group "[a-z]". |
real(?!ity) | Matches "real" in "realty" and "really" but not in "reality." It also finds the second "real" (but not the first "real") in "realityreal". |
be[^n-t] | Matches "bef" in "before", "beh" in "behind", and "bel" in "below", but finds no matches in "beneath" |
(sponge|mud) bath | Matches "sponge bath" and "mud bath" |
\^ | Matches the character ^ |
x(ab){2}x | Matches "xababx" |
x(ab){2,3}x | Matches "xababx" and "xabababx" but not "xababababx" |
\bin | Uses word boundary. matches "in" in "inside" but finds no matches in "pinto" |
End\r?\nBegin | Matches "End" and "Begin" only when "End" is the last string in a line and "Begin" is the first string in the next line |
a\wd | Matches "add" and "a1d" but not "a d" |
Public\sInterface | Matches the phrase "Public Interface" |
\d | Matches "4" and "0" in "wd40" |
Using REs to Replace Text In An Editor
A capture group delineates a subexpression of a regular expression and captures a substring of an input string. You can use captured groups within the regular expression itself (for example, to look for a repeated word), or in a replacement pattern.
To create a numbered capture group, surround the subexpression with parentheses in the regular expression pattern. Captures are numbered automatically from left to right based on the position of the opening parenthesis in the regular expression. To access the captured group:
Within The Regular Expression:
Use \number. For example, \1 in the regular expression (\w+)\s\1 references the first capture group (\w+).
In A Replacement Pattern:
Use $number. For example, the grouped regular expression (\d)([a-z]) defines two groups: the first group contains a single decimal digit, and the second group contains a single character between a and z. The expression finds four matches in the following string: 1a 2b 3c 4d. The replacement string z$1 references the first group only ($1), which is the digits, and converts the string to z1 z2 z3 z4.
Input Text: | 1a 2b 3c 4d |
---|---|
Regular Expression: | (\d)([a-z]) |
Replace Expression: | z$2$1 - $0 |
Results: | za1 - 1a zb2 - 2b zc3 - 3c zd4 - 4d |
The regular expression (\w+)\s\1 and a replacement string $1. Both the regular expression and the replacement pattern reference the first capture group that is automatically numbered 1. When you choose Replace all in the Quick Replace dialog box in Visual Studio, repeated words are removed from the text.
- (\w+) - The \w matches any word character. The plus means one or more times.
- \s - Matches any white-space character
- \1 - A back reference to the previous capture group (\w+)
Text | Reg Exp. | Replacement | Result |
---|---|---|---|
They said that that was the the correct answer. | (\w+)\s\1 | $1 | They said that was the correct answer |
Real World RE Replace Examples
Most often what I have used RE Replacement for, is to turn text into html encapsulated text.
glyphicon glyphicon-asterisk glyphicon glyphicon-plus glyphicon glyphicon-minus glyphicon glyphicon-euro glyphicon glyphicon-cloud glyphicon glyphicon-envelope glyphicon glyphicon-scale glyphicon glyphicon-ice-lolly glyphicon glyphicon-ice-lolly-tasted
(glyphicon glyphicon-)([A-Za-z-]+)
<span class="$1$2"></span>
<span class="glyphicon glyphicon-asterisk"></span> <span class="glyphicon glyphicon-plus"></span> <span class="glyphicon glyphicon-minus"></span> <span class="glyphicon glyphicon-euro"></span> <span class="glyphicon glyphicon-cloud"></span> <span class="glyphicon glyphicon-envelope"></span> <span class="glyphicon glyphicon-scale"></span> <span class="glyphicon glyphicon-ice-lolly"></span> <span class="glyphicon glyphicon-ice-lolly-tasted"></span>
Example - Wrapping Text With HTML
8592 2190 Leftwards Arrow 01W56 8593 2191 Upwards Arrow 23X77 8594 2192 Rightwards Arrow XA007 8595 2193 Downwards Arrow HD876 8596 2194 LeftRight Arrow MD877
([0-9]{4})([ ]+)([0-9]{4})([ ]+)([A-Za-z]+)([ ]+)([A-Za-z]+)([ ]+)([0-9A-Za-z]{5})
<tr><th>$1</td><td>$3</td><td>$5</td><td>$7</td><td>$9</td></tr>
<tr><th>8592</td><td>2190</td><td>Leftwards</td><td>Arrow</td><td>01W56</td></tr> <tr><th>8593</td><td>2191</td><td>Upwards</td><td>Arrow</td><td>23X77</td></tr> <tr><th>8594</td><td>2192</td><td>Rightwards</td><td>Arrow</td><td>XA007</td></tr> <tr><th>8595</td><td>2193</td><td>Downwards</td><td>Arrow</td><td>HD876</td></tr> <tr><th>8596</td><td>2194</td><td>LeftRight</td><td>Arrow</td><td>MD877</td></tr>
Example - Wrapping Text With HTML
This is to be done in several steps. The date needs to be replaced with a date in a different format.
Fri, May 16, 2014 13 14 16 50 56 11 Tue, May 13, 2014 37 46 48 70 74 1 Fri, May 9, 2014 10 28 39 51 59 14
Step 1
First remove the text for the day of the week, the following comma and space.
([A-Za-z]{3}, )
(Empty)
May 16, 2014 13 14 16 50 56 11 May 13, 2014 37 46 48 70 74 1 May 9, 2014 10 28 39 51 59 14
Step 2
Replace the month abbreviation with a number.
May
05
05 16, 2014 13 14 16 50 56 11 05 13, 2014 37 46 48 70 74 1 05 9, 2014 10 28 39 51 59 14
Step 3
Breakdown Of Regular Expression: ([0-9]{2}) $1 - Month Number [ ]+ - Space ([0-9]{1,2}) $2 - Day Of Month , - Comma [ ]+ - Space ([0-9]{4}) $3 - Year [ ]+ - Space ([0-9]{1,2}) $4 - Pick 1 [ ]+ - Space ([0-9]{1,2}) $5 - Pick 2 [ ]+ - Space ([0-9]{1,2}) $6 - Pick 3 [ ]+ - Space ([0-9]{1,2}) $7 - Pick 4 [ ]+ - Space ([0-9]{1,2}) $8 - Pick 5 [ ]+ - Space ([0-9]{1,2}) $9 - Powerball
([0-9]{2})[ ]+([0-9]{1,2}),[ ]+([0-9]{4})[ ]+([0-9]{1,2})[ ]+([0-9]{1,2})[ ]+([0-9]{1,2})[ ]+([0-9]{1,2})[ ]+([0-9]{1,2})[ ]+([0-9]{1,2})
piLottoDraw '$6-$1-$3 23:00:00.000',105,$8,$10,$12,$14,$16,$18\nGO
piLottoDraw '2014-05-16 23:00:00.000',105,13,14,16,50,56,11 GO piLottoDraw '2014-05-13 23:00:00.000',105,37,46,48,70,74,1 GO piLottoDraw '2014-05-9 23:00:00.000',105,10,28,39,51,59,14 GO
Reorder Text
We need to move the asterisk to the back of the text instead of in front of it.
* State * ZipCode * HomePhone * DayPhone * Email * ContactMethod
([*]) ([A-Za-z]+)
$2 $1
State * ZipCode * HomePhone * DayPhone * Email * ContactMethod *
The Animals The Archies The Association The Band The Bee Gees The Box Tops The Byrds The Carpenters The Cascades The Cavaliers The Clash The Cure The Doobie Brothers The Doors The Drifters The Eagles The Everly Brothers The Gin Blossoms The Go Go's The Greatful Dead The Guess Who
(The) ([A-Za-z ]+)
$2 $0
Animals The Animals Archies The Archies Association The Association Band The Band Bee Gees The Bee Gees Box Tops The Box Tops Byrds The Byrds Carpenters The Carpenters Cascades The Cascades Cavaliers The Cavaliers Clash The Clash Cure The Cure Doobie Brothers The Doobie Brothers Doors The Doors Drifters The Drifters Eagles The Eagles Everly Brothers The Everly Brothers Gin Blossoms The Gin Blossoms Go Go The Go Go's Greatful Dead The Greatful Dead Guess Who The Guess Who
Regular Expressions & Javascript
Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects.
RE & JS Flags
Before defining the methods, note that there are flags that modify the way many of the methods work.
Flag | Description | Corresponding Property |
---|---|---|
d | Generate indices for substring matches. | RegExp.prototype.hasIndices |
g | Global search. | RegExp.prototype.global |
i | Case-insensitive search. | RegExp.prototype.ignoreCase |
m | Multi-line search. | RegExp.prototype.multiline |
s | Allows . to match newline characters. | RegExp.prototypedotAll |
u | "unicode"; treat a pattern as a sequence of unicode code points. | RegExp.prototype.unicode |
y | Perform a "sticky" search that matches starting at the current position in the target string. | RegExp.prototype.sticky |
const re = /pattern/flags;
Or
const re = new RegExp('pattern', 'flags');
Regular Expression Methods
RegEx - exec()
Executes a search for a match in a string. It returns an array of information or null on a mismatch.
JavaScript RegExp objects are stateful when they have the global or sticky flags set (e.g. /foo/g or /foo/y). They store a lastIndex from the previous match. Using this internally, exec() can be used to iterate over multiple matches in a string of text (with capture groups), as opposed to getting just the matching strings with String.prototype.match().
When using exec(), the global flag has no effect when the sticky flag is set - the match is always sticky.
exec() is the primitive method of regexps. Many other regexp methods call exec() internally - including those called by string methods, like @@replace. While exec() itself is powerful (and is the most efficient), it often does not convey the intent most clearly.
- If you only care whether the regex matches a string, but not what is actually being matched, use RegExp.prototype.test() instead.
- If you are finding all occurrences of a global regex and you don't care about information like capturing groups, use String.prototype.match() instead. In addition, String.prototype.matchAll() helps to simplify matching multiple parts of a string (with capture groups) by allowing you to iterate over the matches.
- If you are executing a match to find its index position in the string, use the String.prototype.search() method instead.
exec(str)
str | The string against which to match the regular expression. |
---|
Returns
If the match fails, the exec() method returns null, and sets the regex's lastIndex to 0.
If the match succeeds, the exec() method returns an array and updates the lastIndex property of the regular expression object. The returned array has the matched text as the first item, and then one item for each capturing group of the matched text. The array also has the following additional properties:
index | The 0-based index of the match in the string. |
---|---|
input | The original string that was matched against. |
groups | An object of named capturing groups whose keys are the names and values are the capturing groups or undefined if no named capturing groups were defined. See capturing groups for more information. |
indices |
(Optional) - This property is only present when the d flag is set.
It is an array where each entry represents the bounds of a substring match.
Each substring match itself is an array where the first entry represents its start index
and the second entry its end index. It additionally has a groups property which holds an object of all named capturing groups. The keys are the names of the capturing groups and each value is an array with the first item being the start entry and the second entry being the end index of the capturing group. If the regular expression doesn't contain any named capturing groups, groups is undefined. |
RegEx - test()
Tests for a match in a string. It returns true or false.
- Use test() whenever you want to know whether a pattern is found in a string. test() returns a boolean, unlike the String.prototype.search() method (which returns the index of a match, or -1 if not found).
- To get more information (but with slower execution), use the exec() method. (This is similar to the String.prototype.match() method.)
- As with exec() (or in combination with it), test() called multiple times on the same global regular expression instance will advance past the previous match.
Syntax
test(str)
Parameters
str | The string against which to match the regular expression. |
---|
Returns
true if there is a match between the regular expression and the string str. Otherwise, false.
Go To Sample CodeString - match()
Returns an array containing all of the matches, including capturing groups, or null if no match is found.
The implementation of String.prototype.match itself is very simple - it simply calls the Symbol.match method of the argument with the string as the first parameter. The actual implementation comes from RegExp.prototype[@@match]().
- If you need to know if a string matches a regular expression RegExp, use RegExp.prototype.test().
- If you only want the first match found, you might want to use RegExp.prototype.exec() instead.
- If you want to obtain capture groups and the global flag is set, you need to use RegExp.prototype.exec() or String.prototype.matchAll() instead.
Syntax
match(regular_expression)
Parameters
regular_expression |
A regular expression object, or any object that has a Symbol.match method. If regexp is not a RegExp object and does not have a Symbol.match method, it is implicitly converted to a RegExp by using new RegExp(regexp). If you don't give any parameter and use the match() method directly, you will get an Array with an empty string: [""], because this is equivalent to match(/(?:)/). |
---|
Returns
An Array whose contents depend on the presence or absence of the global (g) flag, or null if no matches are found.
- If the g flag is used, all results matching the complete regular expression will be returned, but capturing groups are not included.
- If the g flag is not used, only the first complete match and its related capturing groups are returned. In this case, match() will return the same result as RegExp.prototype.exec() (an array with some extra properties).
String - matchAll()
Returns an iterator containing all of the matches, including capturing groups.
The implementation of String.prototype.matchAll itself is very simple - it simply calls the Symbol.matchAll method of the argument with the string as the first parameter (apart from the extra input validation that the regex is global). The actual implementation comes from RegExp.prototype[@@matchAll]().
Syntax
matchAll(regular_expression)
Parameters
regular_expression |
A regular expression object, or any object that has a Symbol.matchAll method. If regexp is not a RegExp object and does not have a Symbol.matchAll method, it is implicitly converted to a RegExp by using new RegExp(regexp, 'g'). If regexp is a RegExp object (via the IsRegExp check), then it must have the global (g) flag set, or a TypeError is thrown. |
---|
Returns
An iterable iterator (which is not restartable) of matches. Each match is an array with the same shape as the return value of RegExp.prototype.exec().
Go To Sample CodeString - search()
Tests for a match in a string. It returns the index of the match, or -1 if the search fails.
The implementation of String.prototype.search() itself is very simple - it simply calls the Symbol.search method of the argument with the string as the first parameter. The actual implementation comes from RegExp.prototype[@@search]().
The g flag of regexp has no effect on the search() result, and the search always happens as if the regex's lastIndex is 0. For more information on the behavior of search(), see RegExp.prototype[@@search]().
When you want to know whether a pattern is found, and also know its index within a string, use search().
- If you only want to know if it exists, use the RegExp.prototype.test() method, which returns a boolean.
- If you need the content of the matched text, use match() or RegExp.prototype.exec().
Syntax
search(regular_expression)
Parameters
regular_expression |
A regular expression object, or any object that has a Symbol.search method. If regexp is not a RegExp object and does not have a Symbol.search method, it is implicitly converted to a RegExp by using new RegExp(regexp). |
---|
Returns
The index of the first match between the regular expression and the given string, or -1 if no match was found.
Go To Sample CodeString - replace()
Executes a search for a match in a string, and replaces the matched substring with a replacement substring. The replace() method returns a new string with one, some, or all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function called for each match. If pattern is a string, only the first occurrence will be replaced. The original string is left unchanged. This method does not mutate the string value it's called on. It returns a new string. A string pattern will only be replaced once. To perform a global search and replace, use a regular expression with the g flag, or use replaceAll() instead.
If pattern is an object with a Symbol.replace method (including RegExp objects), that method is called with the target string and replacement as arguments. Its return value becomes the return value of replace(). In this case the behavior of replace() is entirely encoded by the @@replace method - for example, any mention of "capturing groups" in the description below is actually functionality provided by RegExp.prototype[@@replace].
If the pattern is an empty string, the replacement is prepended to the start of the string.
"xxx".replace("", "_"); // "_xxx"
A regexp with the g flag is the only case where replace() replaces more than once.
Specifying a string as the replacement
The replacement string can include the following special replacement patterns:
Pattern | Inserts |
---|---|
$$ | Inserts a "$". |
$& | Inserts the matched substring. |
$` | Inserts the portion of the string that precedes the matched substring. |
$' | Inserts the portion of the string that follows the matched substring. |
$n | Inserts the nth (1-indexed) capturing group where n is a positive integer less than 100. |
$<Name> | Inserts the named capturing group where Name is the group name. |
$n and $<Name> are only available if the pattern argument is a RegExp object. If the pattern is a string, or if the corresponding capturing group isn't present in the regex, then the pattern will be replaced as a literal. If the group is present but isn't matched (because it's part of a disjunction), it will be replaced with an empty string.
"foo".replace(/(f)/, "$2"); // "$2oo"; the regex doesn't have the second group "foo".replace("f", "$1"); // "$1oo" "foo".replace(/(f)|(g)/, "$2"); // "oo"; the second group exists but isn't matched
Specifying A Function As The Replacement
You can specify a function as the second parameter. In this case, the function will be invoked after the match has been performed. The function's result (return value) will be used as the replacement string.
The function has the following signature:
function replacer(match, p1, p2, /* ..., */ pN, offset, string, groups) { return replacement; }
The arguments to the function are as follows:
Pattern | Inserts |
---|---|
match | The matched substring. (Corresponds to $& above.) |
p1, p2, ..., pN | The nth string found by a capture group (including named capturing groups), provided the first argument to replace() is a RegExp object. (Corresponds to $1, $2, etc. above.) For example, if the pattern is /(\a+)(\b+)/, then p1 is the match for \a+, and p2 is the match for \b+. If the group is part of a disjunction (e.g. "abc".replace(/(a)|(b)/, replacer)), the unmatched alternative will be undefined. |
offset | The offset of the matched substring within the whole string being examined. For example, if the whole string was 'abcd', and the matched substring was 'bc', then this argument will be 1. |
string | The whole string being examined. |
groups | An object whose keys are the used group names, and whose values are the matched portions (undefined if not matched). Only present if the pattern contains at least one named capturing group. |
The exact number of arguments depends on whether the first argument is a RegExp object - and, if so, how many capture groups it has.
The following example will set newString to 'abc - 12345 - #$*%':
function replacer(match, p1, p2, p3, offset, string) { // p1 is non-digits, p2 digits, and p3 non-alphanumerics return [p1, p2, p3].join(" - "); } const newString = "abc12345#$*%".replace(/([^\d]*)(\d*)([^\w]*)/, replacer); console.log(newString); // abc - 12345 - #$*%
The function will be invoked multiple times for each full match to be replaced if the regular expression in the first parameter is global.
replace(pattern, replacement)
pattern | Can be a string or an object with a Symbol.replace method - the typical example being a regular expression. Any value that doesn't have the Symbol.replace method will be coerced to a string. |
---|---|
replacement |
Can be a string or a function. If it's a string, it will replace the substring matched by pattern. A number of special replacement patterns are supported. If it's a function, it will be invoked for every match and its return value is used as the replacement text. |
Returns
A new string, with one, some, or all matches of the pattern replaced by the specified replacement.
Go To Sample CodeString - replaceAll()
Executes a search for all matches in a string, and replaces the matched substrings with a replacement substring. The replaceAll() method returns a new string with all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function to be called for each match. The original string is left unchanged. This method does not mutate the string value it's called on. It returns a new string. Unlike replace(), this method would replace all occurrences of a string, not just the first one. This is especially useful if the string is not statically known, as calling the RegExp() constructor without escaping special characters may unintentionally change its semantics.
If pattern is an object with a Symbol.replace method (including RegExp objects), that method is called with the target string and replacement as arguments. Its return value becomes the return value of replaceAll(). In this case the behavior of replaceAll() is entirely encoded by the @@replace method, and therefore will have the same result as replace() (apart from the extra input validation that the regex is global).
If the pattern is an empty string, the replacement will be inserted in between every UTF-16 code unit, similar to split() behavior.
"xxx".replaceAll("", "_"); // "_x_x_x_"
Syntax
replaceAll(pattern, replacement)
Parameters
pattern |
Can be a string or an object with a Symbol.replace method - the typical example being a regular expression.
Any value that doesn't have the Symbol.replace method will be coerced to a string. If pattern is a RegExp object (via the IsRegExp check), then it must have the global (g) flag set, or a TypeError is thrown. |
---|---|
replacement | Can be a string or a function. The replacement has the same semantics as that of String.prototype.replace(). |
Returns
A new string, with all matches of a pattern replaced by a replacement.
Go To Sample CodeString - split()
Uses a regular expression or a fixed string to break a string into an array of substrings.
The split() method takes a pattern and divides a String into an ordered list of
substrings by searching for the pattern, puts these substrings into an array,
and returns the array.
If separator is a non-empty string, the target string is split by all matches of the separator
without including separator in the results. For example, a string containing tab separated
values (TSV) could be parsed by passing a tab character as the separator,
like myString.split("\t").
If separator contains multiple characters, that entire character sequence must be found in order
to split.
If separator appears at the beginning (or end) of the string, it still has the effect of splitting,
resulting in an empty (i.e. zero length) string appearing at the first (or last) position of the
returned array.
If separator does not occur in str, the returned array contains one element consisting of the entire string.
If separator is an empty string (""), str is converted to an array of each of its UTF-16 "characters", without empty strings on either ends of the resulting string.
split() split(separator) split(separator, limit)
separator |
The pattern describing where each split should occur. Can be a string or an object with a Symbol.split method - the typical example being a regular expression. If undefined, the original target string is returned wrapped in an array. |
---|---|
limit |
A non-negative integer specifying a limit on the number of substrings to be included in the array. If provided, splits the string at each occurrence of the specified separator, but stops when limit entries have been placed in the array. Any leftover text is not included in the array at all. The array may contain fewer entries than limit if the end of the string is reached before the limit is reached. If limit is 0, [] is returned. |
Returns
An Array of strings, split at each point where the separator occurs in the given string.
Go To Sample CodeConstructing A Regular Expression In Javascript
You construct a regular expression in one of two ways:
-
Using a regular expression literal, which consists of a pattern enclosed between slashes, as follows:
const re = /ab+c/;
-
Or calling the constructor function of the RegExp object, as follows:
const re = new RegExp('ab+c');
Code Samples
RegEx - Test()
The pattern to test against.
/^\(?([0-9]){3}\)?(-|\s)?([0-9]){3}(-|\s)?([0-9]){4}(\s)?([a-zA-Z0-9])*$/;
What Is The Pattern Looking For Here
- /^ Start of text
- \(?([0-9]){3}\)? Zero or one set of 3 numbers wrapped in parentheses
- (-|\s)? Zero or one set of a dash or a whitespace character
- ([0-9]){3} A set of 3 numbers
- (-|\s)? Zero or one set of a dash or a whitespace character
- ([0-9]){4} A set of 4 numbers
- (\s)? Zero or one set of whitespace characters
- ([a-zA-Z0-9])* Zero or more set of letters, uppercase and lowercase, or numbers
- $ End of text
function validate1() { var testVal1 = document.getElementById("testVal1").value; var flag = false; var testRE = /^\(?([0-9]){3}\)?(-|\s)?([0-9]){3}(-|\s)?([0-9]){4}(\s)?([a-zA-Z0-9])*$/; flag = testRE.test(testVal1); if (flag) { alert("Valid Number"); } else { alert("Invalid Number"); } }
The javascript code descibed:
- Line 2 - Gets the value of the textbox with an id of "testVal1"
- Line 3 - Declares a boolean variable called "flag"
- Line 4 - Declares a regular expression string called "testRE"
- Line 6 - Here is where all of the action occurs. The boolean, "flag", calls the method "test()", with the value of the textbox "testVal1" in the variable of the same name, as the parameter. The test method checks to see if the text in the textbox matches the regular expression pattern. If it does, it sets the "flag" to true. If it does not, it sets the "flag" to false.
- Line 7 - Checks the value of the boolean "flag"
- Line 8 - Calls an alert to show the results of a valid number.
- Line 10 - Calls an alert to show the results of an invalid number.
Javascript RegEx Exec()
The exec() method executes a search for a match in a specified string and returns a result array, or null.
In the following example the regular expression is looking for a pattern with a literal "d" followed by one or more of the letter "b" followed by another literal "d".
The "g" flag indicates that it is a global search.
The string being searched is: c d b b d b s b z
So you can see that there is a match to the pattern starting at the second letter in the string. The indices should be 1 in a zero based array.
const myRe = /d(b+)d/g; const myArray = myRe.exec('cdbbdbsbz');
Object | Property or Index | Description | In This Example |
---|---|---|---|
myArray | The matched string and all remembered substrings. | [ 'dbbd', 'bb', index: 1, input: 'cdbbdbsbz' ] | |
index | The 0-based index of the match in the input string. | 1 | |
input | The original string. | 'cdbbdbsbz' | |
[0] | The last matched characters. | 'dbbd' | |
myRe | lastIndex | The index at which to start the next match. (This property is set only if the regular expression uses the g option) |
5 |
source | The text of the pattern. Updated at the time that the regular expression is created, not executed. | 'd(b+)d' |
function test_exec1() { var testVal2 = document.getElementById("testVal2").value; let resultDiv = document.getElementById("results2"); let rsltStr = ""; const myRe = /d(b+)d/g; const myArray = myRe.exec(testVal2); for (var index = 0; index < myArray.length; index++) { rsltStr += "myArray["+ index +"] = "+ myArray[index] +"<br/>"; } rsltStr += "myArray.index = "+ myArray.index + "<br/>"; rsltStr += "myArray.input = "+ myArray.input + "<br/>"; rsltStr += "myRe.lastIndex = "+ myRe.lastIndex + "<br/>"; rsltStr += "myRe.source = "+ myRe.source + "<br/>"; resultDiv.innerHTML =rsltStr; }
Javascript String Match()
The match() method retrieves the result of matching a string against a regular expression.
Return Value
An Array whose contents depend on the presence or absence of the global (g) flag, or null if no matches are found.
- If the g flag is used, all results matching the complete regular expression will be returned, but capturing groups are not included.
- If the g flag is not used, only the first complete match and its related capturing groups are returned. In this case, match() will return the same result as RegExp.prototype.exec() (an array with some extra properties).
const paragraph = 'The quick brown fox jumps over the lazy dog. It barked.'; const regex = /[A-Z]/g; const found = paragraph.match(regex); console.log(found); // expected output: Array ["T", "I"]
The following will search for upper case letters in a string and return an array of what it found.
function test_match1() { var testVal3 = document.getElementById("testVal3").value; var rsltStr = ""; const regex = /[A-Z]/g; // REGULAR EXPRESSION FOR ALL OF THE CAPITAL LETTERS FROM A TO Z const found = testVal3.match(regex); if (found !== null) { for (var index = 0; index < found.length; index++) { rsltStr += "found["+ index +"] = "+ found[index] +'\n'; } } alert(rsltStr); }
Javascript String MatchAll()
The matchAll() method returns an iterator of all results matching a string against a regular expression, including capturing groups.
Return Value
An iterable iterator (which is not restartable) of matches. Each match is an array with the same shape as the return value of RegExp.prototype.exec().
function test_matchall1() { var testVal4 = document.getElementById("testVal4").value; let resultDiv = document.getElementById("results3"); var rsltStr = ""; const regex = /t(e)(st(\d?))/g; const array = [...testVal4.matchAll(regex)]; var array2; if (array !== null) { for (var index = 0; index < array.length; index++) { array2 = array[index]; for (var index2 = 0; index2 < array2.length; index2++) { rsltStr += "array["+ index +"] = array2["+ index2 +"] = "+ array2[index2] +"<br/>"; } } } resultDiv.innerHTML = rsltStr; }
Javascript String Search()
The search() method executes a search for a match between a regular expression and this String object.
Return Value
The index of the first match between the regular expression and the given string, or -1 if no match was found.
function test_search1() { var testVal5 = document.getElementById("testVal5").value; var rsltStr = ""; // ANY CHARACTER THAT IS NOT A WORD CHARACTER OR WHITESPACE const regex = /[^\w\s]/g; rsltStr = testVal5.search(regex); alert(rsltStr); }
The escape characters in the regular expression:
\w - matches any word character, and
\s - matches any whitespace character (space, new line, carriage return, tab)
Javascript String Replace()
The replace() method returns a new string with one, some, or all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function called for each match. If pattern is a string, only the first occurrence will be replaced. The original string is left unchanged.
replace(pattern, replacement)
Parameters
- pattern - Can be a string or an object with a Symbol.replace method, the typical example being a regular expression. Any value that doesn't have the Symbol.replace method will be coerced to a string.
-
replacement - Can be a string or a function.
- If it's a string, it will replace the substring matched by pattern. A number of special replacement patterns are supported.
- If it's a function, it will be invoked for every match and its return value is used as the replacement text. The arguments supplied to this function are described in the Specifying a function as the replacement section below.
Returns
A new string, with one, some, or all matches of the pattern replaced by the specified replacement.
"foo".replace(/(f)/, "$2"); // "$2oo"; the regex doesn't have the second group "foo".replace("f", "$1"); // "$1oo" "foo".replace(/(f)|(g)/, "$2"); // "oo"; the second group exists but isn't matched
Specifying a function as the replacement
You can specify a function as the second parameter. In this case, the function will be invoked after the match has been performed. The function's result (return value) will be used as the replacement string.
Replace Example 1
Using a function as replacement example
function test_replace1() { var testVal6 = document.getElementById("testVal6").value; var rsltStr = ""; const regex = /([^\d]*)(\d*)([^\w]*)/; rsltStr = testVal6.replace(regex, replacer); alert(rsltStr); }
Replace Example 2
function test_replace2() { var testVal7 = document.getElementById("testVal7").value; var rsltStr = ""; const regex = /xmas/i; rsltStr = testVal7.replace(regex, 'Christmas...'); alert(rsltStr); }
Replace Example 3
function test_replace3() { var testVal8 = document.getElementById("testVal8").value; var rsltStr = ""; const regex = /apples/gi; rsltStr = testVal8.replace(regex, 'oranges'); alert(rsltStr); }
Replace Example 4
This really highlights the power of using regular expressions.
function test_replace4() { var testVal9 = document.getElementById("testVal9").value; var rsltStr = ""; const regex = /(\w+)\s(\w+)/; rsltStr = testVal9.replace(regex, '$2, $1'); alert(rsltStr); }
Replace Example 5
This converts a camel case property into a hyphenated property.
function styleHyphenFormat(propertyName) { function upperToHyphenLower(match, offset, string) { return (offset > 0 ? '-' : '') + match.toLowerCase(); } return propertyName.replace(/[A-Z]/g, upperToHyphenLower); } function test_replace5() { var testValA = document.getElementById("testValA").value; var rsltStr = styleHyphenFormat(testValA); alert(rsltStr); }
Javascript String ReplaceAll() - 1
function test_replaceAll1() { var testVal21 = document.getElementById("testVal21").value; var rsltStr = testVal21.replaceAll('dog','monkey'); const regex = /Dog/ig; rsltStr += testVal21.replaceAll(regex,'turtle');; alert(rsltStr); }
Javascript String ReplaceAll() - 2
function unsafeRedactName(text, name) { return text.replace(new RegExp(name, 'g'), '[REDACTED]'); } function safeRedactName(text, name) { return text.replaceAll(name, '[REDACTED]'); } function test_replaceAll2() { var testVal22 = document.getElementById("testVal22").value; var hackerName = "ha.*er"; var rsltStr = unsafeRedactName(testVal22,hackerName) + '\n'; rsltStr += safeRedactName(testVal22,hackerName); alert(rsltStr); }
Javascript String Split()
function test_split1() { var testVal31 = document.getElementById("testVal31").value; const words = testVal31.split(' '); var rsltStr = ""; for (var i=0; i < words.length; i++) { rsltStr += words[i] + '\n'; } alert(rsltStr); } function test_split2() { var testVal31 = document.getElementById("testVal31").value; const chars = testVal31.split(''); var rsltStr = ""; for (var i=0; i < chars.length; i++) { rsltStr += chars[i] + "-"; } alert(rsltStr); } function test_split3() { var testVal31 = document.getElementById("testVal31").value; const strCopy = testVal31.split(); var rsltStr = ""; for (var i=0; i < strCopy.length; i++) { rsltStr += strCopy[i] + '\n'; } alert(rsltStr); }
Javascript String Split() - 2
function test_split4() { var testVal32 = document.getElementById("testVal32").value; const re = /\s*(?:;|$)\s*/; const arRslt = testVal32.split(re); var rsltStr = ""; for (var i=0; i < arRslt.length; i++) { rsltStr += arRslt[i] + '\n'; } alert(rsltStr); } function test_split5() { var testVal32 = document.getElementById("testVal32").value; const re = /\s*(?:;|$)\s*/; const arRslt = testVal32.split(re,3); var rsltStr = ""; for (var i=0; i < arRslt.length; i++) { rsltStr += arRslt[i] + '\n'; } alert(rsltStr); }
Javascript String Split() - 3
function test_split6() { var testVal33 = document.getElementById("testVal33").value; const re = /(\d)/; const arRslt = testVal33.split(re); var rsltStr = ""; for (var i=0; i < arRslt.length; i++) { rsltStr += arRslt[i] + '\n'; } alert(rsltStr); }
Javascript String Split() - 4
function test_split7() { var testVal34 = document.getElementById("testVal34").value; var rsltStr = ""; const splitByNumber = { [Symbol.split](str) { let num = 1; let pos = 0; const result = []; while (pos < str.length) { const matchPos = str.indexOf(num, pos); if (matchPos === -1) { result.push(str.substring(pos)); break; } result.push(str.substring(pos, matchPos)); pos = matchPos + String(num).length; num++; } return result; } }; rsltStr = testVal34.split(splitByNumber); alert(rsltStr); }
Javascript String Split() - 5
function test_split8() { var taSource = document.getElementById("taSource1").value; var taResult = document.getElementById("taResult1"); const re = /<[//a-zA-Z]+>/; const arRslt = taSource.split(re); taResult.value = "Array Length"+ arRslt.length +'\n'; for (var i=0; i < arRslt.length; i++) { taResult.value += "arRslt["+ i +"]"+ arRslt[i] + '\n'; } } function test_split9() { var taSource = document.getElementById("taSource1").value; var taResult = document.getElementById("taResult1"); const re = /<[//a-zA-Z]+>/; const oneline = taSource.replaceAll('\n','') const arRslt = oneline.split(re); taResult.value = "Array Length"+ arRslt.length +'\n'; for (var i=0; i < arRslt.length; i++) { if (arRslt[i].trim() != '') { taResult.value += "arRslt["+ i +"] = "+ arRslt[i] + '\n'; } } }
Javascript String Split() - 6
function test_splitA() { var taSource = document.getElementById("taSource2").value; var taResult = document.getElementById("taResult2"); var taResultA = document.getElementById("taResult2A"); var taResultB = document.getElementById("taResult2B"); var taResultC = document.getElementById("taResult2C"); var taResultD = document.getElementById("taResult2D"); const re = /<[//a-zA-Z]+>/g; const re2 = /([ ]+)([a-zA-Z]+[a-zA-Z ]*)/g; const re3 = /([ ]+\n|[ ]+$|$)/g; const re4 = /^$\n/g; var mod_text = taSource.replaceAll(re,''); // REMOVE TAGS var mod_text2 = ""; var mod_text3 = ""; var mod_text4 = ""; taResultA.value = mod_text; mod_text2 = mod_text.replace(re2, '$2'); taResultB.value = mod_text2; mod_text3 = mod_text2.replace(re3,''); taResultC.value = mod_text3; mod_text4 = mod_text3.replace(re4,''); taResultD.value = mod_text4; const arRslt = mod_text4.split('\n'); taResult.value = "Array Length"+ arRslt.length +'\n'; for (var i=0; i < arRslt.length; i++) { if (arRslt[i].trim() != '') { taResult.value += "arRslt["+ i +"] = "+ arRslt[i].trim() + '\n'; } } }