When dealing with real-world input, such as log files and even user input, it's difficult not to encounter whitespace. We use it to format pieces of information to make it easier to read and scan visually, and a single space can put a wrench into the simplest regular expression.
The most common forms of whitespace you will use with regular expressions are the space (␣), the tab (\t), the new line (\n) and the carriage return (\r) (useful in Windows environments), and these special characters match each of their respective whitespaces. In addition, a whitespace special character \s will match any of the specific whitespaces above and is extremely useful when dealing with raw input text.
In the strings below, you'll find that the content of each line is indented by some whitespace from the index of the line (the number is a part of the text to match). Try writing a pattern that can match each line containing whitespace characters between the number and the content. Notice that the whitespace characters are just like any other character and the special metacharacters like the star and the plus can be used as well.
Task | Text | |
match | 1. abc | |
match | 2. abc | |
match | 3. abc | |
skip | 4.abc |
Solution | We have to match only the lines that have a space between the list number and 'abc'. We can do that by using the expression \d\.\s abc to match the number, the actual period (which must be escaped), one or more whitespace characters then the text. If we had used the Kleene Star instead of the plus, we would also match the fourth line, which we actually want to skip. |