Regular Expressions

Posted on March 2, 2012 by Chris Roualin

Use regular expressions for more accurate pattern recognition if you require it. Regular expressions offer many more wildcard characters; for this reason, they can describe patterns in much greater detail. For the very same reason, however, regular expressions are also much more complicated.

Describing Patterns

Using the regular expression elements listed in Table 13.11, you can describe patterns with much greater precision. These elements are grouped into three categories:

Char: The Char represents a single character and a collection of Char objects represents a string.
Quantifier: Allows you to determine how often a character or a string occurs in a pattern.
Anchor: Allows you to determine whether a pattern is a separate word or must be at the beginning or end of a sentence.

The pattern represented by a regular expression may consist of four different character types:

Literal characterslike “abc” that exactly matches the “abc” string.
Masked or “escaped” characters with special meanings in regular expressions; when preceded by “”, they are understood as literal characters: “[test]” looks for the “[test]” string. The following characters have special meanings and for this reason must be masked if used literally: “. ^ $ * + ? { [ ] | ( )”.
Predefined wildcard charactersthat represent a particular character category and work like placeholders. For example, “d” represents any number from 0 to 9.
Custom wildcard characters: They consist of square brackets, within which the characters are specified that the wildcard represents. If you want to use any character except for the specified characters, use “^” as the first character in the square brackets. For example, the placeholder “[^f-h]” stands for all characters except for “f”, “g”, and “h”.

Element	Description
.	Exactly one character of any kind except for a line break (equivalent to [^n])
[^abc]	All characters except for those specified in brackets
[^a-z]	All characters except for those in the range specified in the brackets
[abc]	One of the characters specified in brackets
[a-z]	Any character in the range indicated in brackets
a	Bellalarm (ASCII 7)
c	Any character allowed in an XML name
cA-cZ	Control+A to Control+Z, equivalent to ASCII 0 to ASCII 26
d	A number (equivalent to [0-9])
D	Any character except for numbers
e	Escape (ASCII 9)
f	Form feed (ASCII 15)
n	New line
r	Carriage return
s	Any whitespace character like a blank character, tab, or line break
S	Any character except for a blank character, tab, or line break
t	Tab character
uFFFF	Unicode character with the hexadecimal code FFFF. For example, the Euro symbol has the code 20AC
v	Vertical tab (ASCII 11)
w	Letter, digit, or underline
W	Any character except for letters
xnn	Particular character, where nn specifies the hexadecimal ASCII code
.*	Any number of any character (including no characters at all)

Table 13.8: Placeholders for characters

Quantifiers

Every wildcard listed in Table 13.8 is represented by exactly one character. Using quantifiers, you can more precisely determine how many characters are respectively represented. For example, “d{1,3}” stands for a number occurring one to three times for a one-to-three digit number.

Element	Description
*	Preceding expression is not matched or matched once or several times (matches as much as possible)
*?	Preceding expression is not matched or matched once or several times (matches as little as possible)
.*	Any number of any character (including no characters at all)
?	Preceding expression is not matched or matched once (matches as much as possible)
??	Preceding expression is not matched or matched once (matches as little as possible)
{n,}	n or more matches
{n,m}	Inclusive matches between n and m
{n}	Exactly n matches
+	Preceding expression is matched once

Table 13.9: Quantifiers for patterns

Anchors

Anchors determine whether a pattern has to be at the beginning or ending of a string. For example, the regular expression “bd{1,3}” finds numbers only up to three digits if these turn up separately in a string. The number “123” in the string “Bart123” would not be found.

Elements	Description
$	Matches at end of a string (Z is less ambiguous for multi-line texts)
A	Matches at beginning of a string, including multi-line texts
b	Matches on word boundary (first or last characters in words)
B	Must not match on word boundary
Z	Must match at end of string, including multi-line texts
^	Must match at beginning of a string (A is less ambiguous for multi-line texts)

Table 13.10: Anchor boundaries

Recognizing IP Addresses

The patterns, such as an IP address, can be much more precisely described by regular expressions than by simple wildcard characters. Usually, you would use a combination of characters and quantifiers to specify which characters may occur in a string and how often:

$ip = “10.10.10.10”
$ip -match “bd{1,3}.d{1,3}.d{1,3}.d{1,3}b”

True
$ip = “a.10.10.10”
$ip -match “bd{1,3}.d{1,3}.d{1,3}.d{1,3}b”

False
$ip = “1000.10.10.10”
$ip -match “bd{1,3}.d{1,3}.d{1,3}.d{1,3}b”

False

The pattern is described here as four numbers (char: d) between one and three digits (using the quantifier {1,3}) and anchored on word boundaries (using the anchor b), meaning that it is surrounded by white space like blank characters, tabs, or line breaks. Checking is far from perfect since it is not verified whether the numbers really do lie in the permitted number range from 0 to 255.

# There still are entries incorrectly identified as valid IP addresses:
$ip = “300.400.500.999”
$ip -match “bd{1,3}.d{1,3}.d{1,3}.d{1,3}b”

True

Validating E-Mail Addresses

If you’d like to verify whether a user has given a valid e-mail address, use the following regular expression:

$email = “test@somewhere.com”
$email -match “b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b”

True
$email = “.@.”
$email -match “b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b”

False

Whenever you look for an expression that occurs as a single “word” in text, delimit your regular expression by word boundaries (anchor: b). The regular expression will then know you’re interested only in those passages that are demarcated from the rest of the text by white space like blank characters, tabs, or line breaks.

The regular expression subsequently specifies which characters may be included in an e-mail address. Permissible characters are in square brackets and consist of “ranges” (for example, “A-Z0-9”) and single characters (such as “._%+-“). The “+” behind the square brackets is a quantifier and means that at least one of the given characters must be present. However, you can also stipulate as many more characters as you wish.

Following this is “@” and, if you like, after it a text again having the same characters as those in front of “@”. A dot (.) in the e-mail address follows. This dot is introduced with a “” character because the dot actually has a different meaning in regular expressions if it isn’t within square brackets. The backslash ensures that the regular expression understands the dot behind it literally.

After the dot is the domain identifier, which may consist solely of letters ([A-Z]). A quantifier ({2,4}) again follows the square brackets. It specifies that the domain identifier may consist of at least two and at most four of the given characters.

However, this regular expression still has one flaw. While it does verify whether a valid e-mail address is in the text somewhere, there could be another text before or after it:

$email = “Email please to test@somewhere.com and reply!”
$email -match “b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b”

True

Because of “b”, when your regular expression searches for a pattern somewhere in the text, it only takes into account word boundaries. If you prefer to check whether the entire text corresponds to an authentic e-mail, use the elements for sentence beginnings (anchor: “^”) and endings (anchor: “$”):instead of word boundaries.

$email -match “^[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}$”

Simultaneous Searches for Different Terms

Sometimes, search terms are ambiguous because there may be several ways to write them. You can use the “?” quantifier to mark parts of the search term as optional. In simple cases, put a “?” after an optional character. Then the character in front of “?” may, but doesn’t have to, turn up in the search term:

“color” -match “colou?r”
True
“colour” -match “colou?r”
True

The “?” character here doesn’t represent any character at all, as you might expect after using simple wildcards. For regular expressions, “?” is a quantifier and always specifies how often a character or expression in front of it may occur. In the example, therefore, “u?” ensures that the letter “u” may, but not necessarily, be in the specified location in the pattern. Other quantifiers are “*” (may also match more than one character) and “+” (must match characters at least once).

If you prefer to mark more than one character as optional, put the character in a sub-expression, which are placed in parentheses. The following example recognizes both the month designator “Nov” and “November”:

“Nov” -match “bNov(ember)?b”

True

“November” -match “bNov(ember)?b”

True

If you’d rather use several alternative search terms, use the OR character “|”:

“Bob and Ted” -match “Alice|Bob”

True

And if you want to mix alternative search terms with fixed text, use sub-expressions again:

# finds “and Bob”:
“Peter and Bob” -match “and (Bob|Willy)”

True

# does not find “and Bob”:
“Bob and Peter” -match “and (Bob|Willy)”

False

Case Sensitivity

In keeping with customary PowerShell practice, the -match operator is case insensitive. Use the operator -cmatch as alternative if you’d prefer case sensitivity.:

# -match is case insensitive:
“hello” -match “heLLO”

True
# -cmatch is case sensitive:
“hello” -cmatch “heLLO”

False

If you want case sensitivity in only some pattern segments, use -match. Also, specify in your regular expression which text segments are case sensitive and which are insensitive. Anything following the “(?i)” construct is case insensitive. Conversely, anything following “(?-i)” is case sensitive. This explains why the word “test” in the below example is recognized only if its last two characters are lowercase, while case sensitivity has no importance for the first two characters:

“TEst” -match “(?i)te(?-i)st”

True
“TEST” -match “(?i)te(?-i)st”

False

If you use a .NET framework RegEx object instead of -match, the RegEx object will automatically sense shifts between uppercase and lowercase, behaving like -cmatch. If you prefer case insensitivity, either use the above construct to specify an option in your regular expression or avail yourself of “IgnoreCase” to tell the RegEx object your preference:

[regex]::matches(“test”, “TEST”, “IgnoreCase”)

Element	Description	Category
(xyz)	Sub-expression
\|	Alternation construct	Selection
	When followed by a character, the character is not recognized as a formatting character but as a literal character	Escape
x?	Changes the x quantifier into a “lazy” quantifier	Option
(?xyz)	Activates of deactivates special modes, among others, case sensitivity	Option
x+	Turns the x quantifier into a “greedy” quantifier	Option
?:	Does not backtrack	Reference
?<name>	Specifies name for back references	Reference

Table 13.11: Regular expression elements

Of course, a regular expression can perform any number of detailed checks, such as verifying whether numbers in an IP address lie within the permissible range from 0 to 255. The problem is that this makes regular expressions long and hard to understand. Fortunately, you generally won’t need to invest much time in learning complex regular expressions like the ones coming up. It’s enough to know which regular expression to use for a particular pattern. Regular expressions for nearly all standard patterns can be downloaded from the Internet. In the following example, we’ll look more closely at a complex regular expression that evidently is entirely made up of the conventional elements listed in Table 13.11:

$ip = “300.400.500.999”
$ip -match “b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).)” + `
“{3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)b”

False

The expression validates only expressions running into word boundaries (the anchor is b). The following sub-expression defines every single number:

(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

The construct ?: is optional and enhances speed. After it come three alternatively permitted number formats separated by the alternation construct “|”. 25[0-5] is a number from 250 through 255. 2[0-4][0-9] is a number from200 through 249. Finally, [01]?[0-9][0-9]? is a number from 0-9 or 00-99 or 100-199. The quantifier “?” ensures that the preceding pattern must be included. The result is that the sub-expression describes numbers from 0 through 255. An IP address consists of four such numbers. A dot always follows the first three numbers. For this reason, the following expression includes a definition of the number:

(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}

A dot, (.), is appended to the number. This construct is supposed to be present three times ({3}). When the fourth number is also appended, the regular expression is complete. You have learned to create sub-expressions (by using parentheses) and how to iterate sub-expressions (by indicating the number of iterations in braces after the sub-expression), so you should now be able to shorten the first used IP address regular expression:

$ip = “10.10.10.10”
$ip -match “bd{1,3}.d{1,3}.d{1,3}.d{1,3}b”

True

$ip -match “b(?:d{1,3}.){3}d{1,3}b”

True

Finding Information in Text

Regular expressions can recognize patterns. They can also filter out data corresponding to certain patterns from text. As such, regular expressions are excellent tools for parsing raw data. For example, use the same regular expression as the one above to identify e-mail addresses if you want to extract an e-mail address from a letter. Afterwards, look in the $matchesvariable to see which results were returned. The $matches variable is created automatically when you use the -matchoperator (or one of its siblings, like –cmatch).

$matches is a hash table (Chapter 4), so you can either output the entire hash table or access single elements in it by using their names, which you must specify in square brackets:

$rawtext = “If it interests you, my e-mail address is tobias@powershell.com.”

# Simple pattern recognition:
$rawtext -match “b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b”

True
# Reading data matching the pattern from raw text:
$matches

Name                           Value
—-                           —–
0                              tobias@powershell.com
$matches[0]

tobias@powershell.com

Does that also work for more than one e-mail addresses in text? Unfortunately, it doesn’t do so right away. The -matchoperator looks only for the first matching expression. So, if you want to find more than one occurrence of a pattern in raw text, you have to switch over to the RegEx object underlying the -match operator and use it directly.

In one essential respect, the RegEx object behaves unlike the -match operator. Case sensitivity is the default for the RegEx object, but not for -match. For this reason, you must put the “(?i)” option in front of the regular expression to eliminate confusion, making sure the expression is evaluated without taking case sensitivity into account.

# A raw text contains several e-mail addresses. -match finds the first one only:
$rawtext = “test@test.com sent an e-mail that was forwarded to spam@muell.de.”
$rawtext -match “b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b”

True
$matches

Name                           Value
—-                           —–
0                              test@test.com
# A RegEx object can find any pattern but is case sensitive by default:
$regex = [regex]”(?i)b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b”
$regex.Matches($rawtext)

Groups   : {test@test.com}
Success : True
Captures : {test@test.com}
Index    : 4
Length   : 13
Value    : test@test.com

Groups   : {spam@muell.de}
Success : True
Captures : {spam@muell.de}
Index    : 42
Length   : 13
Value    : spam@muell.de

# Limit result to e-mail addresses:
$regex.Matches($rawtext) | Select-Object -Property Value

Value
—–
test@test.com
spam@muell.de
# Continue processing e-mail addresses:
$regex.Matches($rawtext) | ForEach-Object { “found: $($_.Value)” }

found: test@test.com
found: spam@muell.de

Searching for Several Keywords

You can use the alternation construct “|” to search for a group of keywords, and then find out which keyword was actually found in the string:

“Set a=1” -match “Get|GetValue|Set|SetValue”

True

$matches

Name                           Value
—-                           —–
0                              Set

$matches tells you which keyword actually occurs in the string. But note the order of keywords in your regular expression—it’s crucial because the first matching keyword is the one selected. In this example, the result would be incorrect:

“SetValue a=1” -match “Get|GetValue|Set|SetValue”

True

$matches[0]

Set

Either change the order of keywords so that longer keywords are checked before shorter ones …:

“SetValue a=1” -match “GetValue|Get|SetValue|Set”

True

$matches[0]

SetValue

… or make sure that your regular expression is precisely formulated, and remember that you’re actually searching for single words. Insert word boundaries into your regular expression so that sequential order no longer plays a role:

“SetValue a=1” -match “b(Get|GetValue|Set|SetValue)b”

True

$matches[0]

SetValue

It’s true here, too, that -match finds only the first match. If your raw text has several occurrences of the keyword, use a RegExobject again:

$regex = [regex]”b(Get|GetValue|Set|SetValue)b”
$regex.Matches(“Set a=1; GetValue a; SetValue b=12”)

Groups   : {Set, Set}
Success : True
Captures : {Set}
Index    : 0
Length   : 3
Value    : Set

Groups   : {GetValue, GetValue}
Success : True
Captures : {GetValue}
Index    : 9
Length   : 8
Value    : GetValue

Groups   : {SetValue, SetValue}
Success : True
Captures : {SetValue}
Index    : 21
Length   : 8
Value    : SetValue

Forming Groups

A raw text line is often a heaping trove of useful data. You can use parentheses to collect this data in sub-expressions so that it can be evaluated separately later. The basic principle is that all the data that you want to find in a pattern should be wrapped in parentheses because $matches will return the results of these sub-expressions as independent elements. For example, if a text line contains a date first, then text, and if both are separated by tabs, you could describe the pattern like this:

# Defining pattern: two characters separated by a tab
$pattern = “(.*)t(.*)”

# Generate example line with tab character
$line = “12/01/2009`tDescription”

# Use regular expression to parse line:
$line -match $pattern

True
# Show result:
$matches

Name                           Value
—-                           —–
2                              Description
1                              12/01/2009
0                              12/01/2009    Description
$matches[1]

12/01/2009
$matches[2]

Description

When you use sub-expressions, $matches will contain the entire searched pattern in the first array element named “0”. Sub-expressions defined in parentheses follow in additional elements. To make them easier to read and understand, you can assign sub-expressions their own names and later use the names to call results. To assign names to a sub-expression, type ? in parentheses for the first statement:

# Assign subexpressions their own names:
$pattern = “(?.*)t(?.*)”

# Generate example line with tab character:
$line = “12/01/2009`tDescription”

# Use a regular expression to parse line:
$line -match $pattern

True
# Show result:
$matches

Name                    Value
—-                    —–
Text                    Description
Date                    12/01/2009
0                       12/01/2009    Description

$matches.Date

12/01/2009
$matches.Text

Description

Each result retrieved by $matches for each sub-expression naturally requires storage space. If you don’t need the results, discard them to increase the speed of your regular expression. To do so, type “?:” as the first statement in your sub-expression:

# Don’t return a result for the second subexpression:
$pattern = “(?.*)t(?:.*)”

# Generate example line with tab character:
$line = “12/01/2009`tDescription”

# Use a regular expression to parse line:
$line -match $pattern

True
# No more results will be returned for the second subexpression:
$matches

Name                   Value
—-                   —–
Date                   12/01/2009
0                      12/01/2009    Description

Further Use of Sub-Expressions

With the help of results from each sub-expression, you can create surprisingly flexible regular expressions. For example, how could you define a Web site HTML tag as a pattern? A tag always has the same structure: …. This means that a pattern for one particular strictly predefined HTML tag can be found quickly:

“contents” -match “]*>(.*?)”

True

$matches[1]

Contents

The pattern begins with the fixed text “body tag, which may consist of any number of any characters (.*?). The expression, enclosed in parentheses, is a sub-expression and will be returned later as a result in$matches so that you’ll know what is inside the body tag. The concluding part of the tag follows in the form of fixed text (”

This regular expression works fine for body tags, but not for other tags. Does this mean that a regular expression has to be defined for every HTML tag? Naturally not. There’s a simpler solution. The problem is that the name of the tag in the regular expression occurs twice, once initially (“”) and once terminally (“”). If the regular expression is supposed to be able to process any tags, then it would have to be able to find out the name of the tag automatically and use it in both locations. How to accomplish that? Like this:

“Contents” -match “<([A-Z][A-Z0-9]*)[^>]*>(.*?)1>”

True

$matches

Name                           Value
—-                           —–
2                              Contents
1                              body
0                              Contents

This regular expression no longer contains a strictly predefined tag name and works for any tags matching the pattern. How does that work? The initial tag in parentheses is defined as a sub-expression, more specifically as a word that begins with a letter and that can consist of any additional alphanumeric characters.

([A-Z][A-Z0-9]*)

The name of the tag revealed here must subsequently be iterated in the terminal part. Here you’ll find “”. “1” refers to the result of the first sub-expression. The first sub-expression evaluated the tag name and so this name is used automatically for the terminal part.

The following RegEx object could directly return the contents of any HTML tag:

$regexTag = [regex]”(?i)]*>(.*?)”
$result = $regexTag.Matches(“Press here”)
$result[0].Groups[2].Value + ” is in tag ” + $result[0].Groups[1].Value

Press here is in tag button

Greedy or Lazy? Detailed or Concise Results…

Readers who have paid careful attention may wonder why the contents of the HTML tag were defined by “.*?” and not simply by “.*” in regard to regular expressions. . After all, “.*” should suffice so that an arbitrary character (char: “.”) can turn up any number of times (quantifier: “*”). At first glance, the difference between “.*” and “.*? is not easy to recognize; but a short example should make it clear.

Assume that you would like to evaluate month specifications in a logging file, but the months are not all specified in the same way. Sometimes you use the short form, other times the long form of the month name is used. As you’ve seen, that’s no problem for regular expressions, because sub-expressions allow parts of a keyword to be declared optional:

“Feb” -match “Feb(ruary)?”

True
$matches[0]

Feb
“February” -match “Feb(ruary)?”

True
$matches[0]

February

In both cases, the regular expression recognizes the month, but returns different results in $matches. By default, the regular expression is “greedy” and wants to achieve a match in as much detail as possible. If the text is “February,” then the expression will search for a match starting with “Feb” and then continue searching “greedily” to check whether even more characters match the pattern. If they do, the entire (detailed) text is reported back.

However, if your main concern is just standardizing the names of months, you would probably prefer getting back the shortest common text. That’s exactly what the “??” quantifier does, which in contrast to the regular expression is “lazy.” As soon as it recognizes a pattern, it returns it without checking whether additional characters might match the pattern optionally.

“Feb” -match “Feb(ruary)?”

True
$matches[0]

Feb
“February” -match “Feb(ruary)?”

True
$matches[0]

Feb

Just what is the connection between the “??” quantifier of this example and the “*?” if the preceding example? In reality, “*?” is not a self-contained quantifier. It just turns a normally “greedy” quantifier into a “lazy” quantifier. This means you could use “?” to force the quantifier “*” to be “lazy” and to return the shortest possible result. That’s exactly what happened with our regular expressions for HTML tags. You can see how important this is if you use the greedy quantifier “*” instead of “*?”, then it will attempt to retrieve a result in as much detail as possible. That can go wrong:

# The greedy quantifier * returns results in as much detail as possible:
“Contents” -match “]*>(.*)”

True
$matches[1]

Contents
# The right quantifier is *?, the lazy one, which returns results that
# are as short as possible
“Contents” -match “]*>(.*?)”

True
$matches[1]

Contents

According to the definition of the regular expression, any characters are allowed inside the tag. Moreover, the entire expression must end with “”. If “” is also inside the tag, the following will happen: the greedy quantifier (“*”), coming across the first “”, will at first assume that the pattern is already completely matched. But because it is greedy, it will continue to look and will discover the second “” that also fits the pattern. The result is that it will take both “” specifications into account, allocate one to the contents of the tag, and use the other as the conclusion of the tag.

I this example, it would be better to use the lazy quantifier (“*?”) that notices when it encounters the first “” that the pattern is already correctly matched and consequently doesn’t go to the trouble of continuing to search. It will ignore the second “” and use the first to conclude the tag.

Finding String Segments

Entire books have been written about the uses of regular expressions. That’s why it would go beyond the scope of this book to discuss more details. However, our last example, which locates text segments, shows how you can use the elements listed in Table 13.11 to easily harvest surprising search results. If you type two words, the regular expression will retrieve the text segment between the two words if at least one word is, and not more than six other words are, between the two words:

“Find word segments from start to end” -match “bstartW+(?:w+W+){1,6}?endb”
True
$matches[0]

Name                           Value
—-                           —–
0                              start to end

Replacing a String

You already know how to replace a string because you were already introduced to the -replace operator. Simply tell the operator what term you want to replace in a string and the task is done:

“Hello, Ralph” -replace “Ralph”, “Martina”

Hello, Martina

But simple replacement isn’t always sufficient, so you need to use regular expressions for replacements. Some of the following interesting examples show how that could be useful.

Perhaps you’d like to replace several different terms in a string with one other term. Without regular expressions, you’d have to replace each term separately. Or use instead the alternation operator, “|”, with regular expressions:

“Mr. Miller and Mrs. Meyer” -replace “(Mr.|Mrs.)”, “Our client”

Our client Miller and Our client Meyer

You can type any term in parentheses and use the “|” symbol to separate them. All the terms will be replaced with the replacement string you specify.

Using Back References

This last example replaces specified keywords anywhere in a string. Often, that’s sufficient, but sometimes you don’t want to replace a keyword everywhere it occurs but only when it occurs in a certain context. In such cases, the context must be defined in some way in the pattern. How could you change the regular expression so that it replaces only the names Miller and Meyer? Like this:

“Mr. Miller, Mrs. Meyer and Mr. Werner” `
-replace “(Mr.|Mrs.)s*(Miller|Meyer)”, “Our client”

Our client, Our client and Mr. Werner

The result looks a little peculiar, but the pattern you’re looking for was correctly identified. The only replacements were Mr. orMrs. Miller and Mr. or Mrs. Meyer. The term “Mr. Werner” wasn’t replaced. Unfortunately, the result also shows that it doesn’t make any sense here to replace the entire pattern. At least the name of the person should be retained. Is that possible?

This is where the back referencing you’ve already seen comes into play. Whenever you use parentheses in your regular expression, the result inside the parentheses is evaluated separately, and you can use these separate results in your replacement string. The first sub-expression always reports whether a “Mr.” or a “Mrs.” was found in the string. The second sub-expression returns the name of the person. The terms “$1” and “$2” provide you the sub-expressions in the replacement string (the number is consequently a sequential number; you could also use “$3” and so on for additional sub-expressions).

“Mr. Miller, Mrs. Meyer and Mr. Werner” `
-replace “(Mr.|Mrs.)s*(Miller|Meyer)”, “Our client $2”

Our client , Our client and Mr. Werner

Strangely enough, at first the back references don’t seem to work. The cause can be found quickly: “$1” and “$2” look like PowerShell variables, but in reality they are regular terms of the -replace operator. As a result, if you put the replacement string inside double quotation marks, PowerShell will replace “$2” with the PowerShell variable $2, which is normally empty. So that replacement with back references works, consequently, you must either put the replacement string inside single quotation marks or add a backtick to the “$” special character so that PowerShell won’t recognize it as its own variable and replace it:

# Replacement text must be inside single quotation marks
# so that the PS variable $2:
“Mr. Miller, Mrs. Meyer and Mr. Werner” -replace `
“(Mr.|Mrs.)s*(Miller|Meyer)”, ‘Our client $2’

Our client Miller, Our client Meyer and Mr. Werner
# Alternatively, $ can also be masked by `$:
“Mr. Miller, Mrs. Meyer and Mr. Werner” -replace `
“(Mr.|Mrs.)s*(Miller|Meyer)”, “Our client `$2”

Our client Miller, Our client Meyer and Mr. Werner

Putting Characters First at Line Beginnings

Replacements can also be made in multiple instances in text of several lines. For example, when you respond to an e-mail, usually the text of the old e-mail is quoted in your new e-mail as and marked with “>” at the beginning of each line. Regular expressions can do the marking.

However, to accomplish this, you need to know a little more about “multi-line” mode. Normally, this mode is turned off, and the “^” anchor represents the text beginning and the “$” the text ending. So that these two anchors refer respectively to the line beginning and line ending of a text of several lines, the multi-line mode must be turned on with the “(?m)” statement. Only then will -replace substitute the pattern in every single line. Once the multi-line mode is turned on, the anchors “^” and “A”, as well as “$” and “Z”, will suddenly behave differently. “A” will continue to indicate the text beginning, while “^” will mark the line ending; “Z” will indicate the text ending, while “$” will mark the line ending.

# Using Here-String to create a text of several lines:
$text = @”
Here is a little text.
I want to attach this text to an e-mail as a quote.
That’s why I would put a “>” before every line.
“@
$text

Here is a little text.
I want to attach this text to an e-mail as a quote.
That’s why I would put a “>” before every line.
# Normally, -replace doesn’t work in multiline mode.
# For this reason, only the first line is replaced:
$text -replace “^”, “> ”

> Here is a little text.
I want to attach this text to an e-mail as a quote.
That’s why I would put a “>” before every line.

# If you turn on multiline mode, replacement will work in every line:
$text -replace “(?m)^”, “> “

> Here is a little text.
> I want to attach this text to an e-mail as a quote.
> That’s why I would put a “>” before every line.

# The same can also be accomplished by using a RegEx object,
# where the multiline option must be specified:
[regex]::Replace($text, “^”, “> “, `
[Text.RegularExpressions.RegExOptions]::Multiline)

> Here is a little text.
> I want to attach this text to an e-mail as a quote.
> That’s why I would put a “>” before every line.

# In multiline mode, A stands for the text beginning
# and ^ for the line beginning:
[regex]::Replace($text, “A”, “> “, `
[Text.RegularExpressions.RegExOptions]::Multiline)

> Here is a little text.
I want to attach this text to an e-mail as a quote.
That’s why I would put a “>” before every line.

Removing Superfluous White Space

Regular expressions can perform routine tasks as well, such as remove superfluous white space. The pattern describes a blank character (char: “s”) that occurs at least twice (quantifier: “{2,}”). That is replaced with a normal blank character.

“Too many blank characters” -replace “s{2,}”, ” ”

Too many blank characters

Finding and Removing Doubled Words

How is it possible to find and remove doubled words in text? Here, you can use back referencing again. The pattern could be described as follows:

“b(w+)(s+1){1,}b”

The pattern searched for is a word (anchor: “b”). It consists of one word (the character “w” and quantifier “+”). A blank character follows (the character “s” and quantifier “?”). This pattern, the blank character and the repeated word, must occur at least once (at least one and any number of iterations of the word, quantifier “{1,}”). The entire pattern is then replaced with the first back reference, that is, the first located word.

# Find and remove doubled words in a text:
“This this this is a test” -replace “b(w+)(s+1){1,}b”, ‘$1’

This is a test

(source : http://powershell.com/cs/blogs/ebook/archive/2009/03/30/chapter-13-text-and-regular-expressions.aspx#regular-expressions)

Exchange 2007 console tips and tricks

Posted on February 28, 2012 by Chris Roualin

Exchange Server 2007 introduced a new GUI management console (Exchange Management Console) to replace the Exchange System Manager (ESM) of previous versions. This earlier blog post The new Exchange 2007 Management Console overview gives an overview of the console. In this blog post I’ll show some tips and tricks of the console.

Tip#1: Specify the domain controller for your configuration data

Under the Organization Configuration or Server Configuration node, there is a “Modify Configuration Domain Controller” context menu to launch the Configuration Domain Controller dialog. This allows you to specify a domain controller to be used for AD read and write for organization or server configuration.

This is possible in Exchange 2003 ESM by selecting the DC to be used when manually adding Exchange System Manager snap-in into an MMC session, but it’s a lot easier here in Exchange 2007!

Tip#2: Modify the scope and the maximum number of recipients to display

By default, recipients in the current domain and up to 1000 maximum recipients are displayed in the result pane of the Recipient Configuration node. It may take quite a long time to load all recipients if there are a lot of recipients to display. Administrators can control the scope of recipients shown to be the whole forest, a whole domain, or by OU within a domain by using the “Modify Recipient Scope” context menu of the Recipient Configuration node. There is also a “Modify the Maximum Number of Recipients to Display” context menu to control the maximum number of recipients to display in the GUI.

Setting your scope controls which recipient objects will be displayed in the GUI result panes, and also controls which recipient objects will be found by the GUI pickers in many cases. For instance, if you configure your scope to be a particular OU, then you will only be able to specify this OU or one of its children as the target of a new mailbox creation and you will only be able to select a user from this OU or one of its children while enabling a mailbox. This can help to reduce the size of the result-set you have to filter through while doing administrative tasks if your tasks are easily scoped to a particular part of the directory!

In the Active Directory Users and Computers (ADUC) tool you see objects only under an OU scope, while Exchange 2007 recipient management allows you to define your scope to be an OU, domain, or even forest-wide increasing administrative flexibility!

Tip#3: Equivalent Shell One-liner provided after completion of each wizard

Since the Exchange Management Console builds on top of Exchange Management Shell (EMS), every wizard runs one or more “one-liners” (single-line of EMS command) to achieve the work required of the wizard. The one-liner for each wizard is displayed at the Completion page. This can be copied (Ctrl+C) and the command portion can then be pasted to be run in EMS cmdline.

The one-liner can be used as an example of the required syntax and altered to meet specific needs while learning PowerShell scripting.

Tip#4: Filtering

Filters are used in the result pane for Server Configuration and Recipient Configuration nodes and within picker dialogs.

Administrators can save a filter as the default filter, so that next time this filter is used when the result pane is visited. Just like specifying OU as the recipient scope, filtering by certain attributes also improve UI performance if there are a large number of recipients to display.

One great example of how this can be used is to simulate an Administrative Group view using filtering. Simply create a filter in the Server Workcenter that specifies the criteria used to distinguish your servers into your desired “Administrative Group” set – based on some aspect of server name or based on AD site membership, for instance!

Look for a separate blog post on Filtering soon!

Tip#5: Set visual effects of the Console

Visual Effects is another option which can be configured for the console. It can be used to set the visual effects to be always on, never on, or automatic.

Visual effects setting controls how Exchange wizards display by enabling or disabling the cool Exchange 2007 “skin” applied to the wizards by default. .

You might want to turn this off if you prefer the classic look within the console or if you experience performance issues with the visual effects.

Tip#6: Show/Hide the action pane in the console window

Action pane is shown to the right side of Exchange Management Console by default. It lists actions that are currently available to the selected node in the navigation tree and selected objects in the result pane. These options are also available as context menu items for the selected node or objects.

Action pane can be hidden to provide more space in the control window, by un-checking the Action pane checkbox in the Customize View dialog. The Customize View dialog can be launched via the View->Customize menu.

Tip#7: Modify the number of items to display on each page of Queue Viewer

To change the number of queues or messages on the Queues page or Messages page, specify the “Number of items to display on each page” field in the Queue View Options dialog. The value 1000 is used by default.

You can use this setting to control the number of items included in each page shown in the Exchange 2007 queue viewer. No matter what value you use here, the number of items shown will be complete – for instance, if you have 10 pages shown with 1000 items per page, you can modify this value to end up with 1 page of 10000 items each or to 100 pages of 100 items each.

Tip#8: Create a custom console snap-in

It’s possible to create a custom console snap-in for any of the 4 nodes under the root Microsoft Exchange node in the EMC navigation tree. This “reduced” custom console can be used to isolate visibility of additional management capabilities of the Exchange 2007 management console. Below are the steps creating a Recipient Management only console, which exposes just recipient management capabilities to administrators or helpdesk:

Step 1: Open MMC.exe directly (no snap-ins added)

Step 2: Add the Exchange Snap-in to this empty MMC console

Step 3: Select the Recipient Configuration node. Right click and choose “New Window from Here”

Step 4: Under File->Options, configure the Console mode as desired.

Step 5: Save this custom MMC to an MSC file. This MSC file can be used to launch your new Recipient Management only console!

Below is a snapshot of a Recipient Management only console:

The steps creating a reduced console for Organization Configuration, Server Configuration and Toolbox are the same except in Step3, the appropriate node should be selected.

Tip#9: Use the error provider information

In most cases, when you have something entered incorrectly within Exchange 2007 console property pages or wizards, the console will let you know something isn’t entered correctly by producing an error provider icon (!) next to the incorrect entry. Hover over this error provider to get more information on the cause of the failure.

(from : http://blogs.technet.com/b/exchange/archive/2006/10/20/3395119.aspx)

Validate an email address

Posted on February 28, 2012 by Chris Roualin

Validating an e-mail address could be really usefull. Here is a really good thing to do before sending that address to a creation/modification script :

Function ValidMail ([string]$Mail ) {
$regexp=”^([a-zA-Z0-9_-.]+)@(([[0-9]{1,3}” +
“.[0-9]{1,3}.[0-9]{1,3}.)|(([a-zA-Z0-9-]+” +
“.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$”

if ($Mail -match $regexp) {
$return=$True
}
Else {
$return=$False
}
Return $return
}

$regexp contains a regular expression that can validate any mail address, with many sub-domains, and can also contains an IP address in case of…

Sending mail using Powershell

Posted on February 26, 2012 by Chris Roualin

Sending an email in powershell can be done, once more, using .Net Frameword 🙂

Function SendMail([string]$file,[string]$sender,[string]$recipients,[string]$server){

$message = new-object System.Net.Mail.MailMessage

$message.from = $sender

$message.subject = “My first mail in Powershell”

$message.body = “Hello,” + “`r`n” +”This mail was sent automatically by the script” + “`r`n” +”You will find in attachement all information you asked.” + “`r`n” +”Regards”
foreach ($rec in $recipients){$message.TO.add($rec)}

$attachment = new-object System.Net.Mail.Attachment $file
$message.Attachments.Add($attachment)
$client = new-object System.Net.Mail.SmtpClient $server
$client.Send($message)
$attachment.Dispose()
}

Four parameters aremandatory to get all the power of that function : $fichier, $sender, $recipients, $server.

$file : Contains the exact path of the file to include as attachment. (“C:includesfile.doc”)

$sender : Simply the email address of the sender (“sender@mydomain.com”)

$recipients : Contains a list of recipient with that format :

recipient1@domain.com

recipient2@domain2.com

$server : It is the full name of the server that is allowed to send mails.

Exchange, Powershell, AD & Azure Tips

The greatest WordPress.com site in all the land!

Tag Archives: mail

Regular Expressions

Describing Patterns

Quantifiers

Anchors

Recognizing IP Addresses

Validating E-Mail Addresses

Simultaneous Searches for Different Terms

Case Sensitivity

Finding Information in Text

Searching for Several Keywords

Forming Groups

Further Use of Sub-Expressions

Greedy or Lazy? Detailed or Concise Results…

Finding String Segments

Replacing a String

Using Back References

Putting Characters First at Line Beginnings

Removing Superfluous White Space

Finding and Removing Doubled Words

Exchange 2007 console tips and tricks

Validate an email address

Sending mail using Powershell