Cheatsheet and introduction to "globbing", a programming concept that involves the use of wildcards and special characters for matching and filtering.
- Contributing
- What is "globbing"?
- Wildcards
- Extended globbing
- extglobs
- Common options
- Caveats
- Resources
- TODO
(TOC generated by verb using markdown-toc)
We love contributors!
Pull requests to add documentation, links to tools, corrections or anything else are warmly accepted and gratefully appreciated!
Too busy to contribute directly, but still want to show your support? Please consider starring this project or tweeting about it!
The term "globbing", also referred to as "glob matching" or "filepath expansion", is a programming concept that describes the process of using wildcards, referred to as "glob patterns" or "globs", for matching file paths or other similar sets of strings.
Similar to regular expressions, but much simpler and limited in scope, glob patterns are defined using a combination of special characters, or wildcards, alongside literal (non-matching) characters. For example, the glob pattern *.txt
would match all files in a directory with a .txt
file extension.
Globbing syntax
TODO: describe wildcards vs. extended globbing, and segue to following sections
- wildcards
- extended globbing
The commonly supported characters supported across globbing implementations for basic wildcard matching are *
, ?
and a simplified version of regex brackets for matching any of a given set of characters.
Although many different globbing implementations exist across a number of different languages and environments, the following table summarizes the most commonly supported basic globbing features.
Character | Description |
---|---|
* |
Matches any character zero or more times, except for / . |
** |
Matches any character zero or more times, including / . |
? |
Matches any character one time |
[abc] |
Matches any of the specified characters (in this case, a , b or c ) |
Special exceptions:
*
typically does not match dotfiles (file names starting with a.
) unless explicitly enabled by the user via options?
also typically does not match the leading dot- More than two stars in a glob path segment are typically interpreted as a single star (e.g.
/***/
is the same as/*/
)
Implementations
Globbing is typically enabled using third-party libraries. A notable exception, bash, provides built-in support for basic globbing.
TODO: add list of implementations
Additional reading
In addition to wildcard matching, extended globbing describes the addition of more advanced matching features, such as:
- brace expansion
- extglobs
- POSIX character classes
- regular expressions
TBC...
In simple cases, brace expansion appears to work the same way as the logical OR
operator. For example, (a|b)
will achieve the same result as {a,b}
.
Here are some powerful features unique to brace expansion (versus character classes):
- range expansion:
a{1..3}b/*.js
expands to:['a1b/*.js', 'a2b/*.js', 'a3b/*.js']
- nesting:
a{c,{d,e}}b/*.js
expands to:['acb/*.js', 'adb/*.js', 'aeb/*.js']
TBC...
Visit the braces library for more examples and information about brace expansion.
TBC...
As described by the bash man page:
pattern | regex equivalent | matches |
---|---|---|
?(foo) |
(foo)? |
zero or one occurrence of the given patterns |
*(foo) |
(foo)* |
zero or more occurrences of the given patterns |
(foo) |
(foo) |
one or more occurrences of the given patterns |
@(foo) |
(foo) * |
one of the given patterns |
!(foo) |
^(?:(?!(foo)$).*?)$ |
anything except one of the given patterns |
* Note that @
isn't a RegEx character.
Example implementations
- extglob - extended glob parser and matcher for node.js
POSIX character classes, or "bracket expressions", provide a way of defining regular expressions using something closer to plain english.
For example, the pattern [[:alpha:][:digit:]]
would match a1
, but not aa
.
TBC...
Example implementations
- expand-brackets, node.js API for parsing and matching POSIX bracket expressions
This section describes matching using regular expressions as it relates to globbing.
When regex character classes are used in glob patterns, with the exception of brace expansion ({a,b}
, {1..5}
, etc), most of the special characters convert directly to regex, so you can expect them to follow the same rules and produce the same results as regex.
For example, given the list: ['a.js', 'b.js', 'c.js', 'd.js', 'E.js']
:
[ac].js
: matches botha
andc
, returning['a.js', 'c.js']
[b-d].js
: matches fromb
tod
, returning['b.js', 'c.js', 'd.js']
[b-d].js
: matches fromb
tod
, returning['b.js', 'c.js', 'd.js']
a/[A-Z].js
: matches and uppercase letter, returning['a/E.js']
However, there is
Learn about [regex character classes][character-classes].
Given ['a.js', 'b.js', 'c.js', 'd.js', 'E.js']
:
(a|c).js
: would match eithera
orc
, returning['a.js', 'c.js']
(b|d).js
: would match eitherb
ord
, returning['b.js', 'd.js']
(b|[A-Z]).js
: would match eitherb
or an uppercase letter, returning['b.js', 'E.js']
As with regex, parentheses can be nested, so patterns like ((a|b)|c)/b
will work. But it might be easier to achieve your goal using brace expansion.
The following options are commonly available on various globbing implementations.
Option name | Description |
---|---|
extglob |
Enable extended globs. In addition to the traditional globs (using wildcards: * , * , ? and [...] ), extended globs add (almost) the expressive power of regular expressions, allowing the use of patterns like `foo/!(a |
dotglob |
Allows files beginning with . to be included in matches. This option is automatically enabled if the glob pattern begins with a dot. Aliases: dot (supported by: minimatch, micromatch) |
failglob |
report an error when no matches are found |
globignore |
allows you to specify patterns a glob should not match Aliases: ignore (supported by: minimatch, micromatch) |
globstar |
recursively match directory paths (enabled by default in minimatch and micromatch, but not in bash) |
nocaseglob |
perform case-insensitive pathname expansion |
nocasematch |
perform case-insensitive matching. Aliases: nocase (supported by: minimatch, micromatch) |
nullglob |
when enabled, the pattern itself will be returned when no matches are found. Aliases: nonull (supported by: minimatch, micromatch) |
WIP
Like regular expressions, glob patterns are a type of regular language that must first be interpreted by a computer program before any actual matching takes place. This introduces two areas of risk:
- interpreting patterns:
- performing matches:
Interpreting patterns
TODO (The process of parsing and compiling a glob pattern into a regular expression...)
Performing matches
TODO (The process of matching the compiled regular expression against a set of strings)
Risks
- Intentional denial-of-service (DoS) attacks from hackers
- Unintentional denial-of-service (DoS) attacks from agressively greedy wildcard patterns
Learn more about globbing.
- GNU/Linux Command-Line Tools Summary: Wildcards
- Bash: globbing
- Wikipedia: glob (programming)
- Linux Programmer's Manual: GLOB(7)
Name | Programming language |
---|---|
bash-glob | JavaScript/node.js |
brace-expansion | JavaScript/node.js |
braces | JavaScript/node.js |
expand-brackets | JavaScript/node.js |
glob | JavaScript/node.js |
micromatch | JavaScript/node.js |
minimatch | JavaScript/node.js |
nanomatch | JavaScript/node.js |
The following concepts are similar to or include the concept of globbing:
- cheatsheet
- what is globbing
- wildcards
- extglobs
- posix brackets
- braces
- options