Last modified: 26. July 2025 14:20 (Git # Written by alex_s168
)fee2a364
If you are any kind of programmer, you’ve probably heard of RegEx
RegEx (Regular expression) is kind of like a small programming language used to define string search and replace patterns.
RegEx might seem overwhelming at first, but you can learn the most important features of RegEx very quickly.
It is important to mention that there is not a single standard for RegEx syntax, but instead each “implementation” has it’s own quirks, and additional features. Most common features however behave identically on most RegEx “engines”/implementations.
The behavior of RegEx expressions / patterns depends on the match options passed to the RegEx engine.
Common match options:
In this article, we will refer to single expression parts as “atoms”.
Just use the character that you want to match. For example
to match an a
. This however does not work for all characters, because many are part of special RegEx syntax.a
Thee previously mentioned special characters like
can be matched by putting a backslash in front of them: [
\[
RegEx engines already define some groups of characters that can make writing RegEx expressions quicker.
is used to assert the beginning of a line in multi-line mode, or the beginning of the string in whole-string mode.^
is used to assert the end of a line in multi-line mode, or the end of the string in whole-string mode.$
The behaviours of these depend on the match options
Some combinators will either match “lazy”, or “greedy”.
Lazy is when the engine only matches as many characters required to get to the next step. This should almost always be used.
Greedy matching is when the engine tries to match as many characters as possible. The problem with this is that it might cause “backtracking”, which happens when the engine goes back in the pattern multiple times to ensure that as many characters as possible where matched. This can cause big performance issues.
Multiple atoms can be combined together to form more complex patterns.
When two expressions are next to each other, they will be chained together, which means that both will be evaluated in-order.
Example:
matches a x\d
and then a digit, like for example x
x9
Two expressions separated by a
cause the RegEx engine to first try to match the left side, and only if it fails, it tries the right side instead.|
Note that “or” has a long left and right scope, which means that
will match either ab|cd
or ab
cd
Tries to match the expression on the left to it, but won’t error if it doesn’t succeed.
Note that “or-not” has a short left scope, which means that
will always match ab?
, and then try to match a
b
A expression followed by either a
for greedy repeat, or a *
for lazy repeat.*?
This matches as many times as possible, but can also match the pattern zero times.
Note that this has a short left scope.
A expression followed by either a
for greedy repeat, or a +
for lazy repeat.+?
This matches as many times as possible, and at least one time.
Note that this has a short left scope.
Groups multiple expressions together for scoping.
Example:
will just match (?:abc)
abc
Similar to Non-Capture Groups except that they capture the matched text. This allows the matched text of the inner expression to be extracted later.
Capture group IDs are enumerated from left to right, starting with 1.
Example:
will match (abc)de
, and store abcde
in group 1.abc
By surrounding multiple characters in square brackets, the engine will match any of them. Special characters or expressions won’t be parsed inside them, which means that this can also be used to escape characters.
For example:
will match either [abc]
, a
or b
.c
and
will match either [ab(?:c)]
, a
, b
, (
, ?
, :
, or c
.)
Character groups and escaped characters still work inside character sets.
Character sets can also contain ranges. For example:
will match either any digit, or any lowercase letter.[0-9a-z]
RegEx is perfect for when you just want to match some patterns, but the syntax can make patterns very hard to read or modify.
In the next article, we will start to dive into implementing RegEx.
Stay tuned!