আগামী ৩০ অক্টোবর -২০১৭ তারিখ থেকে শুরু হচ্ছে পাঁচ মাস ব্যাপী Professional Web Design and Development with HTML,CSS,Bootstrap,PHP,MySQl, AJAX and JQUERY কোর্সের ৮৬ তম ব্যাচ। আগ্রহীদেরকে অতিসত্বর মাসুদ আলম স্যার এর সাথে যোগাযোগ করতে অনুরোধ করা যাচ্ছে। স্যার এর মোবাইল: 01722817591, Email : [email protected] কোর্সের সিলেবাস এর জন্য এখানে ক্লিক করুন ।

Working With PHP Regular Expression

PHP Regular Expressions

What is a regex?

At its most basic level, a regex can be considered a method of pattern matching or matching patterns within a string. In PHP the most oft used is PCRE or “Perl Compatible Regular Expressions”. Here we will try to decipher the meaningless hieroglyphics and set you on your way to a powerful tool for use in your applications.

Where do I begin?

At the beginning let’s create a string.

<?php
 // create a string
 $string = 'abcdefghijklmnopqrstuvwxyz0123456789';

 // echo our string
 echo $string;

 ?>

If we simply wanted to see if the pattern ‘abc’ was within our larger string we could easily do something like this:

<?php
 // create a string
 $string = 'abcdefghijklmnopqrstuvwxyz0123456789';

 echo preg_match("/abc/", $string);
 ?>

The code above will echo ‘1’. This is because it has found 1 match for the regex. That means it has found the pattern ‘abc’ once in the larger string. preg_match() will count zero if there is no matches, or one if it finds a match. This function will stop searching after the first match. Of course, you would not do this in a real world situation as php has functions for this such as strpos() and strstr() which will do this much faster.

Match beginning of a string

Now we wish to see if the string begins with abc. The regex character for beginning is the caret ^. To see if our string begins with abc, we use it like this:

<?php
 // create a string
 $string = 'abcdefghijklmnopqrstuvwxyz0123456789';

 // try to match the beginning of the string
 if(preg_match("/^abc/", $string))
     {
     // if it matches we echo this line
     echo 'The string begins with abc';
     }
 else
     {
     // if no match is found echo this line
     echo 'No match found';
     }
 ?>

From the code above we see that it echo’s the line
The string begins with abc
The forward slashes are a delimeter that hold our regex pattern. The quotations are used to ‘wrap it all up’. So we see that using the caret(^) will give us the beginning of the string, but NOT whatever is after it.

What if I want case insensitive?

If you used the above code to find the pattern ABC like this:

if(preg_match("/^ABC/", $string))
 

the script would have returned the message:
No match found


This is because the search is case sensitive. The pattern ‘abc’ is not the same as ‘ABC’. To match both ‘abc’ and ‘ABC’ we need to use a modifier to make the search case in-sensitive. With php regex, like most regex, is use ‘i’ for insensitive. So now our script might look like this:

<?php
 // create a string
 $string = 'abcdefghijklmnopqrstuvwxyz0123456789';

 // try to match our pattern
 if(preg_match("/^ABC/i", $string))
         {
     // echo this is it matches
         echo 'The string begins with abc';
         }
 else
         {
     // if not match is found echo this line
         echo 'No match found';
         }
 ?>

Now the script will find the pattern abc. It would also match any case in-sensitive combination of abc, ABC, Abc, aBc, and so on.
More on modifiers later.

How do I find a pattern at the end of a string?

This is done in much the same way as with finding a a pattern at the beginning of a string. A common mistake made by many is to use the $ character to match the end of a string. This is incorrect and \z should be used. Consider this..
preg_match(“/^foo$/”, “foo\n”)
This will return true as $ is like \Z which is like (?=\z|\n\z). So when a newline is not wanted, $ should not be used. Also $ will match multiple times with the /m modifier whereas \z will not. Lets make a small change to the code from above by removing the caret(^) at the beginning of the pattern and putting \z at the end of the pattern, we will keep the case in-sensitive modifier in to match any case.

<?php
 // create a string
 $string = 'abcdefghijklmnopqrstuvwxyz0123456789';

 // try to match our pattern at the end of the string
 if(preg_match("/89\z/i", $string))
         {
     // if our pattern matches we echo this
         echo 'The string ends with 89';
         }
 else
         {
     // if no match is found we echo this line
         echo 'No match found';
         }
 ?>

The script now will show the line
The string ends with 89
because we have matched the end of the string with the pattern 89. Pretty easy stuff so far.

Metacharacters

During our first look at regex we did some simple pattern matching. We also introduced the caret(^) and the dollar($). These characters have special meaning. As we saw, the caret(^) matched the beginning of a string and the dollar matched the end of a string. These characters, along with others are calledMetacharacters. Here is a list of theMetacharacters used for regex:

  • . (Full stop)
  • ^ (Carat)
  • * (Asterix)
  • + (Plus)
  • ? (Question Mark)
  • { (Opening curly brace)
  • [ (Opening brace)
  • ] (Closing brace)
  • \ (Backslash)
  • | (Pipe)
  • ( (Opening parens)
  • ) (Closing parens)
  • } (Closing curly brace)

We will look at each of these during this tutorial, but it is important that you know what they are. If you wish to search a string that contains one of these characters, eg: “1+1” then you need to escape the the meta character with a backslash like this:

<?php
 // create a string
 $string = '1+1=2';

 // try to match our pattern
 if(preg_match("/^1\+1/i", $string))
         {
     // if the pattern matches we echo this
         echo 'The string begins with 1+1';
         }
 else
         {
     // if no match is found we echo this line
         echo 'No match found';
         }
 ?>

From the code above you will see the script print:
The string begins with 1+1
because it found the pattern 1+1 and ignored or escaped the special meaning of the + symbol. If you were to not escape the meta character and use the regex
preg_match(“/^1+1/i”, $string)
you would not find a match.
If you are looking for a backslash, you need to escape that also. But, we also need to escape the control character too, which is itself a backslash, hence we need to escape twice like this
\\\\

What do the otherMetacharacters do?

We have already seen the caret ^ and the dollar $ in action, so now lets look at the others, beginning with the square braces [ ]. TheseMetacharacters are used for specifying a character class.
A what?
A Character Class. This is just a set of characters you wish to match. They can be listed individually like:
[abcdef]
or as a range seperated by a – symbol like:
[a-f]

<?php
 // create a string
 $string = 'big';

 // Search for a match
 echo preg_match("/b[aoiu]g/", $string, $matches);

 ?>

The above code will return 1. This is because preg_match() found a match. This code would also match the string ‘bag’ ‘bog’ ‘big’ but not match the string ‘beg’. The character class range [a-f] is the same as [abcdef]. Think of it as [from a to f]. Once again, these are case sensitive, so [A-F] is not the same as [a-f].
Meta characters do not work inside of classes, so you do not need to escape them within the [ and the ]. You could have the class:
[abcdef$]
This would match the characters a b c d e f $. The dollar($) sign within the class is just a simple dollar sign and contains no special meaning within it.

Now that we have established that Metacharacters have no special meaning inside a class, we will now see how to use some of them inside a class. Yes I know I said they have no special meaning inside a class, and this is true most of the time, but there are times when they do have a meaning.
The powerful nature of regex will also allow us to match patterns NOT within a range. To do this we again use the caret( ^) as the first character of the class. If the caret( ^) appears anywhere else, it is simply regarded as a caret(^) with no special meaning. Here we will see how to match any charater exept b.

<?php
 // create a string
 $string = 'abcefghijklmnopqrstuvwxyz0123456789';

 // echo our string
 preg_match("/[^b]/", $string, $matches);

 // loop through the matches with foreach
 foreach($matches as $key=>$value)
         {
         echo $key.' -> '.$value;
         }
 ?>

From the code above, we get the result
0 -> a
What has happened is that preg_match() function has found the first character that does not match the pattern /[^b]/. Lets make a small change to our script and this time use preg_match_all() to match all characters within the string that do not match the pattern /[^b]/

<?php
 // create a string
 $string = 'abcefghijklmnopqrstuvwxyz0123456789';

 // try to match all characters not within out pattern
 preg_match_all("/[^b]/", $string, $matches);

 // loop through the matches with foreach
 foreach($matches[0] as $value)
         {
         echo $value;
         }
 ?>

As you can see from the results of the script above, it prints out all the characters of the string that do NOT match the pattern “b”
acefghijklmnopqrstuvwxyz0123456789
If we were to take this one step further, we could use it to filter out all the numbers from a string.

<?php
 // create a string
 $string = 'abcefghijklmnopqrstuvwxyz0123456789';

 // match any character that is not a number between 0 and 9
 preg_match_all("/[^0-9]/", $string, $matches);

 // loop through the matches with foreach
 foreach($matches[0] as $value)
         {
         echo $value;
         }
 ?>

The above script will return the string:
abcefghijklmnopqrstuvwxyz
Lets see what has happened in the above script. We used preg_match_all() to match our pattern. The pattern used the caret( ^) symbol with the class [] to match all character that do NOT match the pattern range 0-9.
So, you can simply remember that the ^(caret) means NOT when used inside a character class.

Still with me?

Ok, moving on, we come to the most used Metacharacter, the backslash(\).
The backslash(\) can be used in several ways with regex. The first we will deal with is escaping. The backslash(\) can be used to escape all Meta Characters, including itself, so you can match them in patterns.

If our string looked like
“This is a [templateVar]”
and we wanted to search for [templateVar] with the string, we could use the following code:

<?php
 // create a string
 $string = 'This is a [templateVar]';

 // try to match our pattern
 preg_match_all("/[\[\]]/", $string, $matches);

 // loop through the matches with foreach
 foreach($matches[0] as $value)
         {
         echo $value;
         }
 ?> 

From the above snippet, we get the result
[]
This is because we have specified that we wanted to match all characters that matched []. Without the backslashes the pattern would look like “/[[]]/”, but we had to escape the [ and the ] characters we wanted to search for. Similarly, you must escape a backslash if you have a string that looks like
c:\dir\file.php
you would need to use \\ in your pattern.
The backslash is also used to signal various special sequences.

The next meta character is the . dot, or full stop.
The dot matches any character except a line break such as \r and \n. So, we can match any single character, except for a line break. To make . match any character, including \n, you need to use the /s flag. First we will see how to use the . without the \s flag to match a single character.

<?php
 // create a string
 $string = 'sox at noon taxes';

 // look for a match
 preg_match_all("/s.x/", $string, $matches);

foreach($matches[0] as $value)
         {
         echo $value;
         }

 ?> 

Output: sox

This is because preg_match_all() has found sox. This example would also match sax, six, sox, sux, and s x, but would not match stix.

Now let’s see if we can match a new line character, for our example we will use \n.

<?php
 // create a string
 $string = 'sox'."\n".'at'."\n".'noon'."\n".'taxes'."\n";

 // echo the string
 echo nl2br($string);

 // look for a match
 echo preg_match_all("/\s/", $string, $matches);

 ?>

The code above will return this:
sox
at
noon
taxes
4

First we echo out the string to see the new lines, then we see a 4 at the bottom. this is because preg_match_all() found four matches for the new line character \n when we used the \s flag. More on flags later in the section on Special Sequences

Next in the meta characters is the asterix * character. This meta character matches zero or more occurences of the character immediately before it. This means that the character may or may not be there. So the code .* would match any number of any characters. eg:

<?php
 // create a string

// $string = 'phphphphphp';
 $string = 'php';

 // look for a match
 echo preg_match_all("/ph*p/", $string, $matches);

 ?>

Again we see the result is 1 as we have found 1 match. In the above example its match was with one h character. This same regex would match pp (zero h characters), and phhhp (3 h characters).

Again we see the result is 1 as we have found 1 match. In the above example its match was with one h character. This same regex would match pp (zero h characters), and phhhp (3 h characters).

This of course brings us to the + meta character. The + behaves in a similar manner as the asterix * with the exception that the + matches one or more times where the asterix * matches zero or more times. As we saw in the previous example, the asterix * would match the string ‘pp’. However with + meta character would not. Consider this code:

<?php
 // create a string
 $string = 'pp';

 // look for a match
 echo preg_match("/ph+p/", $string, $matches);

 ?>

The above script will echo 0. This is because the h character did not appear one of more times in the string.

Our next meta character is the ?. This character acts like the preceding meta characters, except the ? will match 0 or 1 occurence of the character or regular expression immediately preceding it. In the following code, we will see how this can be helpful if we wish to make something optional, like a phone number inAustraliais formatted 1234-5678.

<?php

 // create a string
 $string = '12345678';

 // look for a match
 echo preg_match("/1234-?5678/", $string, $matches);

 ?>

The above code will return 1. This is because the ? character has matched the – (hyphen) character zero times. Changing the string to 1234-5678 would yield the same result as seen below.

<?php

 // create a string
 $string = '1234-5678';

 // look for a match
 echo preg_match("/1234-?5678/", $string, $matches);

 ?>

Next we have the curly braces or the { } meta characters. These simply match a specific number of instances of the preceding character or range of characters. Here we will match the letters PHP which must be followed by exactly 3 digits.

<?php

 // create a string
 $string = 'PHP123';

 // look for a match
 echo preg_match("/PHP[0-9]{3}/", $string, $matches);

 ?> 

As we can see our pattern PHP 0-9(any numbers from 0 to 9) {3} (three times) has matched because this is the format of our string.

Special Sequences

The backslash(\) is also used for sequences.

What are sequences?
Sequences are predefined sets of characters you can use in a class.

  • \d – Matches any numeric character – same as [0-9]
  • \D – Matches any non-numeric character – same as [^0-9]
  • \s – Matches any whitespace character – sames as [ \t\n\r\f\v]
  • \S – Matches any non-whitespace character – same as [^ \t\n\r\f\v]
  • \w – Matches any alphanumeric character – same as [a-zA-Z0-9_]
  • \W – Matches any non-alphanumeric character – same as [^a-zA-Z0-9_]

So, with this in mind, we could use these as short-cuts in our patterns to reduce the length of our code. See if you can tell what the following script does:

<?php
 // create a string
 $string = 'ab-ce*[email protected] hi &amp; jkl(mnopqr)stu+vw?x yz0>1234<567890';

 // match our pattern containing a special sequence
 preg_match_all("/[\w]/", $string, $matches);

 // loop through the matches with foreach
 foreach($matches[0] as $value)
         {
         echo $value;
         }
 ?>

Well, lets see what we have done. We have matched all (preg_match_all) characters within the class ( [] ) that are alphanumeric (\w). So the resultant output here is:
abcefghijklmnopqrstuvwxyz0123456789
This can be useful for stripping nasty spaces or nasty characters from usernames etc.

Using this same method, we could make sure a string does not begin with a number

<?php
 // create a string
 $string = '2 bad for perl';

 // echo our string
 if(preg_match("/^\d/", $string))
     {
     echo 'String begins with a number';
     }
 else
     {
     echo 'String does not begin with a number';
     }
 ?>

The nextMetacharacter is the fullstop(.). A fullstop(.) is used to match any character one time only. This can be good if you wish to search a string for any character.

<?php
 // create a string
 $string = 'abcdefghijklmnopqrstuvwxyz0123456789';

 // try to match any character
 if(preg_match("/./", $string))
         {
         echo 'The string contains at least one character';
         }
 else
         {
         echo 'String does not contain anything';
         }
 ?>

Of course, this code contains at least one character. Other uses for the fullstop(.) could be to check if any character is before a number. Think up some of your own uses and try them.

Earlier we had a look at meta characters and we had a problem of matching a new line character because the . meta character does not match a new line such as \n Here we can use the \s flag to match any whitespace character. For our example we will use \n.

<?php
 // create a string
 $string = 'sox'."\n".'at'."\n".'noon'."\n".'taxes'."\n";

 // echo the string
 echo nl2br($string);

 // look for a match
 echo preg_match_all("/\s/", $string, $matches);

 ?> 

The code above will return this:
sex
at
noon
taxes
4

First we echo out the string to see the new lines, then we see a 4 at the bottom. this is because preg_match() found four matches for the new line character \n when we used the \s flag. More on flags later.

Putting it together

If you have come this far you now have the building blocks to match complex patterns. As we progress here, you will see how we use these building blocks in combination with each other. To move on, lets begin with the OR operator. The OR operator as we saw earlier is the pipe character |. Lets use it in a simple “Hello World” script.

<?php

 // a simple string
 $string = "This is a Hello World script";

 // try to match the patterns This OR That OR There
 echo preg_match("/^(This|That|There)/", $string);
 ?> 

Now we can start to see some of the flexibility PHP’s regular expressions allow. Lets now try to match Hello OR Jello in our string.

<?php
 // a simple string
 $string = "This is a Hello World script";

 // try to match the patterns Jello or Hello
 if(!preg_match("/(Je|He)llo/", $string))
         {
         echo 'Pattern not found';
         }
 else
         {
         echo 'pattern found';
         }
 ?>

This works well and we can see that the pattern matched. But it does not show which match is correct. To enable us to see which of the patterns matched we can extend preg_match() ability to hold results. The addition of a third arguement holds an array of matches. Consider this small addition to the code above.

<?php
 // a simple string
 $string = "This is a Hello World script";

 // try to match the patterns Jello or Hello
 // put the matches in a variable called matches
 preg_match("/(Je|He)llo/", $string, $matches);

 // loop through the array of matches and print them
 foreach($matches as $key=>$value)
     {
     echo $key.'->'.$value.'<br />';
     }
 ?>

The above code gives us a result that looks like this.

0->Hello

1->He

$matches[0] contains the text that matched the full pattern, ie. Hello.
$matches[1] has the text that matched the first captured subpattern.

Modifiers and Assertions

As we saw earlier in this tutorial we were able to create a regular expression that was case insensitive by the use of /i. This is a modifier and is one of many used in regular expressions for perform modifications to the behaviour of the pattern matching. Here is a list of regex modifiers and assertions used in PHP.

Modifiers

  • i – Ignore Case, case insensitive
  • U – Make search ungreedy
  • s – Includes New line
  • m – Multiple lines
  • x – Extended for comments and whitespace
  • e – Enables evaluation of replacement as PHP code. (preg_replace only)
  • S – Extra analysis of pattern
  • b – Word Boundry
  • B – Not a word boundary
  • A – Start of subject
  • Z – End of subject or newline at end
  • z – End of subject
  • G – First matching position in subject

Assertions

As you can see from the above list, there are many ways to modify your regular expressions, lets look at them one by one begining with the earlier example of using i.

<?php
 // create a string
 $string = 'abcdefghijklmnopqrstuvwxyz0123456789';

 // try to match our pattern
 if(preg_match("/^ABC/i", $string))
 {
 // echo this is it matches
 echo 'The string begins with abc';
 }
 else
 {
 // if not match is found echo this line
 echo 'No match found';
 }
 ?>

If you have read the earlier part of this tutorial it will be no surprise that “ABC” matched with abc because we used the case insensitive modifier to either abc or ABC. Moving on from the i modifier we have the s modifier. The s modifier adds matching for new line characters. This was demonstrated earlier in this tutorial also, but will be reproduced here once again for our country listeners. First we match without the s modifier.

<?php
     /*** create a string with new line characters ***/
     $string = 'sox'."\n".'at'."\n".'noon'."\n".'taxes'."\n";

     /*** look for a match using s modifier ***/
     echo preg_match("/sox.at.noon/", $string, $matches);
 ?>

The above pattern will provide no matches because the dot (.) does not match newline a new line With the addition of the s modifier we can match the new lines.

<?php
     /*** create a string with new line characters ***/
     $string = 'sox'."\n".'at'."\n".'noon'."\n".'taxes'."\n";

     /*** look for a match using s modifier ***/
     echo preg_match("/sox.at.noon/s", $string, $matches);
 ?>

The above code will echo 4 as it has found 4 newline characters.

Having extended our string variable to include new lines, brings us to our next modifier, the m modifier. This modifier is magic. It treats a string as having only a single newline character at the end, even if there are multiple new lines in our string. How it does this is by trying to match the characters immediately before and after any newline character that is not the end of the string. So, if there is no newline characters in the string, the this modifier does nothing.

<?php
 // create a string
 $string = 'sox'."\n".'at'."\n".'noon'."\n".'taxes'."\n";

 // look for a match
 if(preg_match("/^noon/im", $string))
 {
 echo 'Pattern Found';
 }
 else
 {
 echo 'Pattern not found';
 }
 ?>

Of course now with the m modifier the regex will match. Try without the m modifier and the pattern will not be found because it has stopped looking at the first newline character and assumed this to be the end.

In the above example we have used im for the modifiers. This to show we can use multiple modifiers for whatever purpose we require. The code above would have matchedNOONorNoonalso as we used the i which we have seen makes the regex case insensitive.

The x modifier allows us to put our regex on several lines, thus making long and complex regular expressions easier to read and debug and to allow for comments within the regex itself. Lets consider our previous example above, and add some comments, and put the regex on multiple lines.

<?php
 // create a string
 $string = 'sox'."\n".'at'."\n".'noon'."\n".'taxes'."\n";

 // create our regex using comments and store the regex
 // in a variable to be used with preg_match
 $regex ='
 /     # opening double quote
 ^     # caret means beginning of the string
 noon  # the pattern to match
 /imx
 ';

 // look for a match
 if(preg_match($regex, $string))
         {
         echo 'Pattern Found';
         }
 else
         {
         echo 'Pattern not found';
         }
 ?>

Well, this particular regex did not need much explanation but just demonstrates how to insert comments over multiple lines to document your code. Lets move on to some of the other modifiers. The e modifier is a special modifier that that evaluates. We have dedicated a special section just for it. So lets move onto the S modifier. This modifer provides us with some analysis before matching patterns that are not anchored. So, if a pattern does not have a single fixed starting position, like..

<?php
 /*** fixed starting position ***/
 preg_match("/abc(.*?)hij/", $string);
 ?>

If a pattern is to be used more than once, analysis may be able to help speed up the time taking doing multiple matches. In previous examples we have used non-anchored patterns and one instance is repeated here with the addition of the S modifier.

<?php
 // create a string
 $string = 'ab-ce*[email protected] hi &amp; jkl(mnopqr)stu+vw?x yz0>1234<567890';

 // match our pattern containing a special sequence
 preg_match_all("/[\w]/S", $string, $matches);

 // loop through the matches with foreach
 foreach($matches[0] as $value)
         {
         echo $value;
         }
 ?>

Next we move on to word boundaries. A word boundary is created between two \b modifiers. These are special “bookend” type modifiers that allow us to specify exactly what must be matched. The text must match excatly between the “bookends”. The following two scripts will demonstrate how the text must match exactly. eg: a match for “cat” will not match “catalog”, but lets see it in practice.

<?php

 /*** a simple string ***/
 $string = 'eregi will not be available in PHP 6';

 /*** here we will try match the string "lab" ***/
 if(preg_match ("/\blab\b/i", $string))
         {
     /*** if we get a match ***/
         echo $string;
         }
 else
         {
     /*** if no match is found ***/
         echo 'No match found';
         }
 ?>

From the code above, we see we are trying to match the pattern “lab” which does occur inside the string in the word “available”. But because we have used a word boundary, it does not match. Lets try again but have the word “lab” on its own.

<?php

 /*** a simple string ***/
 $string = 'eregi will remain in the computer lab';

 /*** here we will try match the string "lab" ***/
 if(preg_match ("/\blab\b/i", $string))
         {
     /*** if we get a match ***/
         echo $string;
         }
 else
         {
     /*** if no match is found ***/
         echo 'No match found';
         }
 ?>

Now we see from the above code that we have a match, and that the code echoes the string. This is because the pattern is matched within the word boundary. That is, \blab\b.

If the \b occurs within a character class like [\b], then it matches a single backspace character and not a word boundary.

This brings us to the next modifier, the \B modifier. It is related to the previous modifier but rather than stipulate a word boundary, it stipulates a non-word boundary. This can be useful if you wish to do something like match text that is contained within a word but not at the start or end of the word. Consider this code.

<?php

 /*** a little string ***/
 $string = 'This lathe turns wood.';

 /*** match word boundary and non-word boundary ***/
 if(preg_match("/\Bthe\b/", $string))
     {
     /*** if we match the pattern ***/
     echo 'Matched the pattern "the".';
     }
 else
     {
     /*** if no match is found ***/
     echo 'No match found';
     }
 ?>

When the code above is run, it will find a match for the pattern “the”. This is because we have used a non-word boundary to specify that the pattern must end with “the” but which has at least one other character before the “t”. Lets change the string and try again.

<?php

 /*** a little string ***/
 $string = 'The quick brown fox jumps over the lazy dog.';

 /*** match word boundary and non-word boundary ***/
 if(preg_match("/\Bthe\b/", $string))
     {
     /*** if we match the pattern ***/
     echo 'Matched the pattern "the".';
     }
 else
     {
     /*** if no match is found ***/
     echo 'No match found';
     }
 ?>

This time we find that No match is found is echoed. This is because the non-word boundary has not found another char before the “t” in the pattern “the”.

The final modifier we will look at is the \U modifier. By default, PCRE is greedy. Not that it will eat your last biscuit, but that it will try to match many matches as it can, unless, we tell it not to, which can cause a lot of backtracking. More backtracking, the slower the matching.

This means it will try to match every character (except new line) all the way to the end of the string, and then will work backward until it finds the end. To make the pattern quantifier miserly, or non-greedy you use the pattern quantifier limiter ?. This tells Perl to match as few as possible of the preceding symbol before continuing to the next part of the pattern. Or, the \U modifier can make our regex non-greedy.

Unless you have a strong understanding of regular expressions, it is not advisable to switch the default behavior. This can often lead to confusion. In a previous examples we have seen non-greedy regular expressions using (.*?), here we will use the \U modifier.

<?php

 /*** a simple string ***/
 $string = 'foobar foo--bar fubar';

 /*** try to match the pattern ***/
 if(preg_match("/foo(.*)bar/U", $string))
     {
     echo 'Match found';
     }
 else
     {
     echo 'No match found';
     }
 ?>

Evaluation with preg_replace.

If we have grasped the above we can move on to the e modifier. This modifier evaluates the replacement arguments when passed to preg_replace(). )We have not touched on preg_replace() yet, so a quick demonstration here will get us in the swing.

<?php
 // create a string
 $string = 'We will replace the word foo';

 // substitute the word for and put in bar
 $string = preg_replace("/foo/", 'bar', $string);

 // echo the new string
 echo $string;
 ?>

ou can see this does a simple replacement and in the real world str_replace() would be a lot faster but this example leads us nicely into the next. One of the strengths of preg_match() is that it will take an array in the same way as str_replace() does.

<?php

 // create a string with some template vars. the string and
 // the vars would easily have been called from a template.
 $string = 'This is the {_FOO_} bought to you by {_BAR_}';

 // create an array of template vars
 // of course, each variable could be an array itself
 $template_vars=array("FOO" => "The PHP Way", "BAR" => "PHP.net");

 // preg replace our variables and evaluate them
 $string = preg_replace("/{_(.*?)_}/ime", "\$template_vars['$1']",$string);

 // echo the new string
 echo $string;

 ?>

Without the e modifier this code would echo
This is a $template_vars[FOO] and this is a $template_vars[BAR]
The e modifier has evaluated or interpolated the php variables within the matches. Totally cool and now you begin to see just how bloated most template systems are.

Look Aheads

A look ahead does exactly what the name suggests, it looks ahead for a pattern to match. Look aheads come in two flavours, negative and positive. Lets first look at the negative look ahead. A negative look ahead which is denoted with ?!. This is useful for checking if a pattern is not in front of the match we wish. Lets take a look at a simple example.

<?php

 /*** a simple string ***/
 $string = 'I live in the whitehouse';

 /*** try to match white not followed by house ***/
 if(preg_match("/white+(?!house)/i", $string))
     {
     /*** if we find the word white, not followed by house ***/
     echo 'Found a match';
     }
 else
     {
     /*** if no match is found ***/
     echo 'No match found';
     }
 ?>

As you can see, no match is found. This is because the word white is followed by house. Lets change the string a little and run through again.

<?php

 /*** a simple string ***/
 $string = 'I live in the white house';

 /*** try to match white not followed by house ***/
 if(preg_match("/white+(?!house)/i", $string))
         {
         /*** if we find the word white, not followed by house ***/
         echo 'Found a match';
     }
 else
         {
         /*** if no match is found ***/
         echo 'No match found';
         }
 ?>

Now we see that we have a match, as the word white is not follow immediately by house as in whitehouse. Lets move on to a positive lookahead. This is denoted by ?= and looks ahead to see if a pattern is there. Lets see it in action.

<?php

 /*** a string ***/
 $string = 'This is an example eg: foo';

 /*** try to match eg followed by a colon ***/
 if(preg_match("/eg+(?=:)/", $string, $match))
     {
     print_r($match);
     }
 else
     {
     echo 'No match found';
     }
 ?>

The above code is looking for the pattern eg followed by a colon, and so returns a match of eg. But what if we wanted to find something before the colon in the above example, or before house in the earlier examples. For these, we need to a lookbehind.

Look Behinds

Once again, the name says it all, a lookbehind looks behind to see if it can match a pattern. And like the lookaheads there are positive lookbehinds and negative lookbehinds. A positive lookbehind looks like ?Look Aheadsection.

<?php

 /*** a simple string ***/
 $string = 'I live in the whitehouse';

 /*** try to match house preceded by white ***/
 if(preg_match("/(?<=white)house/i", $string))
         {
         /*** if we find the word white, not followed by house ***/
         echo 'Found a match';
         }
 else
         {
         /*** if no match is found ***/
         echo 'No match found';
         }
 ?>

Here we find a match as we have found the pattern house that is immediately preceded by the pattern white. The regex engine has looked behind house and completed the match. But what if wanted to be sure the pattern was NOT preceded by the word white. This is where we would use a negative lookbehind. A negative lookbehind is denote like this ?

<?php

 /*** a simple string ***/
 $string = 'I live in the whitehouse';

 /*** try to match house preceded by white ***/
 if(preg_match("/(?<!white)house/i", $string))
         {
         /*** if we find the word white, not followed by house ***/
         echo 'Found a match';
         }
 else
         {
         /*** if no match is found ***/
         echo 'No match found';
         }
 ?>

As you can see from running this code, no match is found. This is because the negative lookbehind did not find the pattern “house” without the pattern “white” before it. Lets change the colour of the house, white seems much to virginal for this govt office.

<?php

 /*** a simple string ***/
 $string = 'I live in the bluehouse';

 /*** try to match house preceded by white ***/
 if(preg_match("/(?<!white)house/i", $string))
         {
         /*** if we find the word white, not followed by house ***/
         echo 'Found a match';
         }
 else
         {
         /*** if no match is found ***/
         echo 'No match found';
         }
 ?>

A slight modification of the string by changing whitehouse to bluehouse sees that we have a match because now the pattern “house” is not preceded by the pattern white. The regex engine has looked behind the pattern “house” and does not find the pattern “white”, so all is well.

By default, PHP regular expressions are greedy by default. This means quanifiers such as *, +, ? would consume as many characters as are available. Lets look at then again.

  • * – zero or more characters, same as {0, }
  • + – 1 or more characters, same as {1, }
  • ? – zero or one character, same as {0,1}

Consider the following regex

<?php
 /*** 4 x and 4 z chars ***/
 $string = "xxxxzzzz";

 /*** greedy regex ***/
 preg_match("/^(.*)(z+)$/",$string,$matches);

 /*** results ***/
 echo $matches[1];
 echo "<br />";
 echo $matches[2];
 ?>

The first pattern (.*) has matched all four “x” characters and 3 of the four “z” characters. It has matched greedily as many as it possible can. It is a simple matter to make these quantifiers ungreedy with the addition of the ? quantifier as in the following code.

<?php

 /*** string of characters ***/
 $string = "xxxxzzzz";

 /*** a non greedy match ***/
 preg_match("/^(.*?)(z+)$/",$string,$matches);

 /*** show the matches ***/
 echo $matches[1];
 echo "<br />";
 echo $matches[2];

 ?>

Now $matches[1] contains four “x” characters and $matches[2] contains four “z” characters. This is because the ? quantifier has changed the behaviour from matching as MANY characters as possible, to as FEW characters as possible.

Of course, this behaviour reversal would be rather tedious if you had long and complex pattern matches. To counter this the U modifier can be used to make the regular expression ungreedy. The code below shows the way.

 <?php

 /*** string of characters ***/
 $string = "xxxxzzzz";

 /*** a non greedy match ***/
 preg_match("/^(.*)(z+)$/U",$string,$matches);

 /*** show the matches ***/
 echo $matches[1];
 echo "<br />";
 echo $matches[2];

 ?>

The pattern is the same as used in the first greedy example, and with the use of the U modifier the match has become ungreedy. The results are that $matches[1] contains four “x” characters and $matches[2] contains four “z” characters.

It is important to note that the U modifier does not merely make the search ungreedy, it reverses the behavior of the greediness. If a ? were used in conjuntion with the U modifier the non-greedy status of the ? would be reversed, as in this code.

 <?php

 /*** string of characters ***/
 $string = "xxxxzzzz";

 /*** a non greedy match ***/
 preg_match("/^(.*?)(z+)$/U",$string,$matches);

 /*** show the matches ***/
 echo $matches[1];
 echo "<br />";
 echo $matches[2];

 ?>

Now the non-greedy status of the pattern has been reversed with the use of the U modifier and the above code will produce
xxxxzzz
z

Delimiters

This tutorial has seen many regular expressions and to delimit them the forward slash / has been used as the delimiter. Sometimes it is required that the pattern needs to match a forward slash. this can be escaped, but if there are many forward slashes, such as in a URL, this can become quite ugly and hard to read. Rather than polluting the pattern, another delimter can be used to hold the pattern. This following example uses the hash # character as a delimiter.

 <?php
         /*** get the host name from a url ***/
         preg_match('#^(?:http://)?([^/]+)#i', "http://www.w3programmers.com/tutorials", $matches);

         /*** show the host name ***/
         echo $matches[1];
 ?>

Many other characters can be used as delimiters, some of these are shown here.

  • /
  • @
  • #
  • `
  • ~
  • %
  • &

Examples

We could not finish this without a little revision and examples to refresh you and help you on your way. Lets start small and work our way up.

Lets match the beginning and end of a string or a pattern that occurs anywhere within a string.

 <?php

 // the string to match against
 $string = 'The cat sat on the mat';

 // match the beginning of the string
 echo preg_match("/^The/", $string);

 // match the end of the string
 echo preg_match("/mat\z/", $string); // returns 1

 // match anywhere in the string
 echo preg_match("/dog/", $string); // returns 0 as no match was found for dog.
 ?>

Ok, that was easy, lets move onto matching more than a single pattern.

<?php

 // the string to match against
 $string = 'The cat sat on the matthew';

 // matches the letter "a" followed by zero or more "t" characters
 echo preg_match("/at*/", $string);

 // matches the letter "a" followed by a "t" character that may or may not be present
 echo preg_match("/at?/", $string);

 // matches the letter "a" followed by one or more "t" characters
 echo preg_match("/at+/", $string);

 // matches a possible letter "e" followed by one of more "w" characters anchored to the end of the string
 echo preg_match("/e?w+\z/", $string);

 // matches the letter "a" followed by exactly two "t" characters
 echo preg_match("/at{2}/", $string);

 // matches a possible letter "e" followed by exactly two "t" characters
 echo preg_match("/e?t{2}/", $string);

 // matches a possible letter "a" followed by exactly 2 to 6 "t" chars (att attt atttttt)
 echo preg_match("/at{2,6}/", $string);

 ?>

Validating Numbers

<?php

//  function  to  validate  an  integer
 function  validateInteger($str)  {

//  test  if  input  is  an  integer,  optionally  signed
      return  preg_match("/^-?([0-9])+$/",  $str);
  }

//  function  to  validate  a  float
 function  validateFloat($str)  {

//  test  if  input  is  a  floating-point  number,  optionally  signed
     return  preg_match("/^-?([0-9])+([\.|,]([0-9])*)?$/",  $str);
  }

//  result:  "Is  an  integer"

echo  validateInteger("123456")  ?  "Is  an  integer"  :  "Is  not  an  integer";

//  result:  "Is  not  an  integer"

echo  validateInteger("123456.506")  ?  "Is  an  integer"  : "Is  not  an  integer";

//  result:  "Is  not  an  integer"

echo  validateInteger("12a3456.506")  ?  "Is  an  integer"  : "Is  not  an  integer";

//  result:  "Is  a  float"

echo  validateFloat("123456")  ?  "Is  a  float"  :  "Is  not  a  float";

//  result:  "Is  a  float"

echo  validateFloat("123456.506")  ?  "Is  a  float"  :  "Is  not  a  float";
 ?>

Comments

While PHP does offer the built-in is_numeric(), is_float(), and is_int() functions to check numeric input, you might often prefer to implement custom
number-checking routines for more stringent validation. This listing illustrates two such validation routines, validateInteger() and validateFloat(), useful
for testing the format of integer and decimal input, respectively. In both cases, the regular expression pattern matches numbers in the range 0-9, with an optional sign prefix; in the latter case, the pattern also supports a decimal point and following values.

An alternative is to use PHP’s ctype_digit() function, which returns true only if every character of the supplied input argument is a number. As the next listing demonstrates, it thus returns false for every value except positive integers:

<?php

//  result:  "Is  a  number"

echo  ctype_digit("123456")  ?  "Is  a  number"  :  "Is  not  a  number";

//  result:  "Is  not  a  number"

echo  ctype_digit("123456.506")  ?  "Is  a  number"  :  "Is  not  a  number";

//  result:  "Is  not  a  number"

echo  ctype_digit("12a3456.506")  ?  "Is  a  number"  :  "Is  not  a  number";
 ?>

<?php

function  validateAlpha($str)  {

//  test  if  input  contains  only  alphabetic  characters return  preg_match("/^[a-z]+$/i",  $str);

}

//  result:  "Is  alphabetic"

echo  validateAlpha("abc")  ?  "Is  alphabetic"  :  "Is  not  alphabetic";

//  result:  "Is  not  alphabetic"

echo  validateAlpha("abc1")  ?  "Is  alphabetic"  :  "Is  not  alphabetic";
 ?>

Comments

PHP lets you test strings with the is_string() function, but this function doesn’t
distinguish between alphabetic, alphanumeric, and numeric strings. If you’d like more stringent validation, consider the validateAlpha() function in this listing, which only passes strings containing alphabetic characters (strings with numbers or symbols will be rejected). Notice the function’s use of the i modifier, which performs case-insensitive matching.

An alternative is to use PHP’s ctype_alpha() function, which returns true only if every character of the supplied input argument is an alphabetic character. Here’s
an example:

<?php

//  result:  "Is  alphabetic"

echo  ctype_alpha("abc")  ?  "Is  alphabetic"  :  "Is  not  alphabetic";

//  result:  "Is  not  alphabetic"

echo  ctype_alpha("abc1")  ?  "Is  alphabetic"  :  "Is  not  alphabetic";
 ?>

Validating Alphanumeric Strings

<?php

function  validateAlphaNum($str)  {

//  test  if  input  contains  alphabetic  and  numeric  characters return  preg_match("/^[a-z0-9]*$/i",  $str);

}

//  result:  "Is  an  alphabetic  string"

echo  validateAlphaNum("abc")  ?  "Is  an  alphabetic  string"  : "Is  not  an  alphabetic  string";

//  result:  "Is  an  alphabetic  string"

echo  validateAlphaNum("abc1")  ?  "Is  an  alphabetic  string"  : "Is  not  an  alphabetic  string";

//  result:  "Is  not  an  alphabetic  string"

echo  validateAlphaNum("abc?")  ?  "Is  an  alphabetic  string"  : "Is  not  an  alphabetic  string";

?>

Comments

PHP lets you test strings with the is_string() function, but this function doesn’t distinguish between alphabetic, alphanumeric, and numeric strings. If you need more precise validation, consider the validateAlphaNum() function in this listing,
which tests strings for alphabetic or numeric characters and rejects those containing symbols or special characters outside the alphabetic or numeric range. Notice also
the function’s use of the i modifier, for case-insensitive matching.
You could also perform this check with PHP’s ctype_alnum() function, which returns true only if every character of the supplied input argument belongs to either the alphabetic or numeric set. Here’s an example:

<?php

//  result:  "Is  an  alphabetic  string"

echo  ctype_alnum("abc")  ?  "Is  an  alphabetic  string"  : "Is  not  an  alphabetic  string";

//  result:  "Is  an  alphabetic  string"

echo  ctype_alnum("abc1")  ?  "Is  an  alphabetic  string"  : "Is  not  an  alphabetic  string";

//  result:  "Is  not  an  alphabetic  string"

echo  ctype_alnum("abc?")  ?  "Is  an  alphabetic  string"  : "Is  not  an  alphabetic  string";

?>

Validating Credit Card Numbers

<?php

//  function  to  validate  a  credit  card  number function  validateCCNum($str)  {

//  test  if  input  is  of  the  form  dddddddddddddddd
      return  preg_match("/^\d{16}$/"  ,$str);
  }

//  function  to  validate  a  credit  card  expiry  date function  validateCCExpDate($str)  {

//  test  if  input  is  of  the  form  mm/yyyy

return  preg_match("/(0[1-9]|1[0-2])\/20[0-9]{2}$/",  $str);

}

//  result:  "Is  a  valid  16-digit  number"

echo validateCCNum("4476269198125132")  ? "Is a valid  16-digit number"  : "Is  not  a  16-digit  number";

//  result:  "Is  a  valid  date  string"

echo  validateCCExpDate("12/2013")  ?  "Is  a  valid  date  string"  : "Is  an  invalid  date  string";

?>

Comments

Credit card numbers are typically sixteen digits long, and their expiration dates are usually in the format mm/yyyy. In this listing, the validateCCNum() and validateCCExpDate() both contain relatively trivial regular expressions to test input values and see if they conform to these patterns. Note that the regular expression used in the validateCCExpDate() function is somewhat stringent to ensure that only month values in the range 01-12 are accepted.
For more stringent validation, you might want to consider using PEAR’s Payment_Process or Validate classes, available from http://pear.php .net/package/Payment_Process and http://pear.php.net/package/ Validate_Finance_CreditCard, respectively. Both classes include intelligence to check the validity of a credit card number, using either the Luhn algorithm or specific knowledge of the valid number range for each card brand. Here’s an
example:

<?php

//  include  Payment_Process  class
 include  "Payment/Process.php";

//  initialize  object

$card  =  &amp;Payment_Process_Type::factory("CreditCard");

//  set  card  data

$card->type  =  PAYMENT_PROCESS_CC_MASTERCARD; $card->cardNumber  =  "5548111111111111";
 $card->expDate  =  "12/2005";

//  result:  "Is  a  properly-formatted  card  number"

echo  Payment_Process_Type::isValid($card)  ?  "Is  a  properly-formatted card  number"  :  "Is  an  improperly-formatted  card  number";
 ?>

<?php

//  include  Validate  class

include  "Validate/Finance/CreditCard.php";

//  test  credit  card  number  using  Luhn  algorithm

//  result:  "Is  an  improperly-formatted  card  number"

echo  Validate_Finance_CreditCard::type("5548111111121111",

"AmericanExpress")  ?  "Is  a  properly-formatted  card  number"  : "Is  an  improperly-formatted  card  number";

//  result:  "Is  a  properly-formatted  card  number"

echo  Validate_Finance_CreditCard::type("5548111111121111", "MasterCard")  ?  "Is  a  properly-formatted  card  number"  : "Is  an  improperly-formatted  card  number";

?>

TIP

Read more about the Luhn algorithm at http://www.webopedia.com/TERM/L/ Luhn_formula.html.

Validating Telephone Numbers

<?php

//  function  to  validate  an  international  phone  number function  validateIntlPhone($str)  {

//  test  if  input  is  of  the  form  +cc  aa  nnnn  nnnn

return  preg_match("/^(\+|00)[1-9]{1,3}(\.|\s|-)?([0-9]{1,5} (\.|\s|-)?){1,3}$/",  $str);

}

//  result:  "Is  a  properly-formatted  phone  number"

echo  validateIntlPhone("+1  301  111  1111")  ?  "Is  a  properly-formatted phone  number"  :  "Is  an  improperly-formatted  phone  number";

//  result:  "Is  a  properly-formatted  phone  number"

echo  validateIntlPhone("0091-11-2123-7574")  ?  "Is  a  properly-formatted phone  number"  :  "Is  an  improperly-formatted  phone  number";

//  result:  "Is  a  properly-formatted  phone  number"

echo  validateIntlPhone("+612  9555-5555")  ?  "Is  a  properly-formatted phone  number"  :  "Is  an  improperly-formatted  phone  number";

//  result:  "Is  an  improperly-formatted  phone  number"

echo  validateIntlPhone("12346")  ?  "Is  a  properly-formatted  phone  number"  :  "Is  an  improperly-formatted  phone  number";
 ?>

Comments

There are numerous ways of writing an international telephone number, and the previous regular expression tries to match all of them. The expression used here expects a number with country and area code, and enables you to prefix the country code with a + symbol or a pair of zeroes; separate the country, area, and local codes with spaces, periods, or hyphens; and write the local number as a single set of numbers or split it into spaced blocks.

If you find this regular expression a little too generous, you can alter it to be more restrictive, or to only support local numbers. As an illustration, consider the following variants, which validate local U.S. and Indian telephone numbers only:

<?php

//  function  to  validate  a  US  phone  number function  validateUSPhone($str)  {

//  test  if  input  is  of  the  form  aaa-nnn-nnnn

return  preg_match("/^[2-9]\d{2}-\d{3}-\d{4}$/",  $str);
 }

//  function  to  validate  an  Indian  phone  number function  validateIndiaPhone($str)  {

//  test  if  input  is  of  the  form  (0aa)  nnnn  nnnn

return  preg_match("/^\(0\d{2}\)\s?\d{8}$/",  $str);
 }

//  result:  "Is  a  properly-formatted  phone  number"

echo  validateUSPhone("301-111-1111")  ?  "Is  a  properly-formatted phone  number"  :  "Is  an  improperly-formatted  phone  number";

//  result:  "Is  a  properly-formatted  phone  number"

echo  validateIndiaPhone("(022)  22881111")  ?  "Is  a  properly-formatted  phone  number"  :  "Is  an  improperly-formatted  phone  number";
 ?>

Validating Social Security Numbers

<?php

//  function  to  validate  U.S.  Social  Security  number function  validateSSN($str)  {

//  test  if  input  is  of  the  form  ddd-dd-dddd

return  preg_match("/^\d{3}\-\d{2}\-\d{4}$/",  $str);
 }

//  result:  "Is  a  properly-formatted  SSN"

echo  validateSSN("123-45-6789")  ?  "Is  a  properly-formatted  SSN"  : "Is  an  improperly-formatted  SSN";

//  result:  "Is  an  improperly-formatted  SSN"

echo  validateSSN("123456789")  ?  "Is  a  properly-formatted  SSN"  : "Is  an  improperly-formatted  SSN";

?>

Comments

Social Security numbers in the USA are usually nine digits long, with hyphens after the third and fifth digits. This listing simply encapsulates this pattern into a regular expression, and uses the preg_match() function to test input against this pattern.

Validating Postal Codes

<?php

//  function  to  validate  a  zip  code
 function  validateZip($str)  {

//  test  if  input  is  of  the  form  dddddd
      return  preg_match("/^\d{6}$/"  ,$str);
  }

//  result:  "Is  a  properly-formatted  zip  code"

echo  validateZip("123456")  ?  "Is  a  properly-formatted  zip  code"  : "Is  an  improperly-formatted  zip  code";

//  result:  "Is  an  improperly-formatted  zip  code"

echo  validateZip("56456")  ?  "Is  a  properly-formatted  zip  code"  : "Is  an  improperly-formatted  zip  code";

?>

Comments

Postal codes differ from country to country, so there’s no one-size-fits-all solution to the problem. Usually, you will need to customize the regular expression to local conventions before you can use the validateZip() function. This listing assumes a postal code of six digits; however, it’s quite likely that you might have a nine-digit code separated with a hyphen (as in the USA) or a six-character code containing both letters and numbers (as in the UK). Here are some variants illustrating these local conventions:

<?php

//  function  to  validate  a  US  zip  code function  validateUSZip($str)  {

//  test  if  input  is  of  the  form  dddd-ddddd

return  preg_match("/^\d{5}(-\d{4})?$/"  ,$str);
 }

//  function  to  validate  a  UK  zip  code function  validateUKZip($str)  {

//  test  if  input  is  of  the  form  ssdd  dss

return  eregi("^[a-z]{1,2}[0-9]{1,2}([a-z])?[[:space:]]?[0-9][a-z] {2}$"  ,$str);

}

//  result:  "Is  a  properly-formatted  US  zip  code"

echo  validateUSZip("10113-1243")  ?  "Is  a  properly-formatted  US zip  code"  :  "Is  an  improperly-formatted  US  zip  code";

//  result:  "Is  a  properly-formatted  UK  postcode"
echo  validateUKZip("NW3  5ED")  ? "Is  a  properly-formatted  UK  postcode"  :  "Is  an  improperly-formatted  UK  postcode";
 ?>

Validating E-mail Addresses

<?php

//  function  to  validate
 //  an  e-mail  address

function  validateEmailAddress($str)  {

//  test  if  input  matches  e-mail  pattern

return  eregi("^([a-z0-9_-])+([\.a-z0-9_-])*@([a-z0-9-])+ (\.[a-z0-9-]+)*\.([a-z]{2,6})$",  $str);

}

//  result:  "Is  a  properly-formatted  e-mail  address"
 echo  validateEmailAddress("[email protected]")  ?

"Is a properly-formatted e-mail address"  : "Is an improperly-formatted e-mail  address";

//  result:  "Is  an  improperly-formatted  e-mail  address"

echo  validateEmailAddress("[email protected]") ? "Is a properly-formatted e-mail address"  : "Is an improperly-formatted e-mail  address";

?>

Comments

E-mail address validation is one of the most common types of input validation, and there’s no shortage of regular expressions available online to match e-mail address patterns. The previous listing uses one of the more stringent patterns, restricting the range of characters in both username and domain parts and requiring the length of the top-level domain to be between two and six characters.

An alternative to rolling your own regular expression is to use the one provided by the PEAR Validate class, available from http://pear.php.net/package/ Validate. The following listing illustrates this:

<?php

//  include  Validate  class
 include  "Validate.php";

//  test  e-mail  address

//  result:  "Is  a  properly-formatted  e-mail  address"

echo Validate::email("[email protected]")  ? "Is a properly-formatted e-mail address"  :  "Is  an  improperly-formatted  e-mail  address";

//  result:  "Is  an  improperly-formatted  e-mail  address"

echo  Validate::email("#[email protected]")  ?  "Is  a  properly-formatted  e-mail address"  :  "Is  an  improperly-formatted  e-mail  address";
 ?>

TIP

Looking for a more sophisticated regex? Try this one:

<?php

function  validateEmailAddress($str)  {

//  test  if  input  matches  e-mail  pattern

return  preg_match(‘/[^\x00-\x20()<>@,;:\\". [\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\] \x7f-\xff]+)*\@[^\x00-\x20()<>@,;:\\".[\]
 \x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\] \x7f-\xff]+)+/i’,  $str);

}

?> 

Validating URLs

<?php

//  function  to  validate  a  URL

function  validateUrl($str)  {

//  test  if  input  matches  URL  pattern

return  preg_match("/^(http|https|ftp):\/\/([a-z0-9]([a-z0-9_-]*  [a-z0-9])?\.)+[a-z]{2,6}\/?([a-z0-9\?\._-~&amp;#=+%]*)?/",  $str);
 }

//  result:  "Is  valid"

echo  validateUrl("http://www.example.com/html/index.php")  ? "Is  valid"  :  "Is  invalid";

//  result:  "Is  valid"

echo  validateUrl("http://www.some.site.info")  ? "Is  valid"  :  "Is  invalid";

//  result:  "Is  invalid"

echo  validateUrl("http://some")  ?  "Is  valid"  :"Is  invalid";
 ?>

Comments

URLs come in all shapes and colors and, as with e-mail addresses, you can be generous or strict in the regular expression you choose to validate them. The
expression used here restricts the protocol to HTTP, HTTPS, or FTP, requires the top-level domain to be between two and six characters long, and supports trailing path/file names or anchors.

Remember, these will only return 0 or 1, as preg_match() will stop looking after the first match. To match all the matches in a string you would use preg_match_all().

Cheat Sheet

Special Sequences

  • \w – Any “word” character (a-z 0-9 _)
  • \W – Any non “word” character
  • \s – Whitespace (space, tab CRLF)
  • \S – Any non whitepsace character
  • \d – Digits (0-9)
  • \D – Any non digit character
  • . – (Period) – Any character except newline

MetaCharacters

  • ^ – Start of subject (or line in multiline mode)
  • $ – End of subject (or line in multiline mode)
  • [ – Start character class definition
  • ] – End character class definition
  • | – Alternates, eg (a|b) matches a or b
  • ( – Start subpattern
  • ) – End subpattern
  • \ – Escape character

Quantifiers

  • n* – Zero or more of n
  • n+ – One or more of n
  • n? – Zero or one occurrences of n
  • {n} – n occurrences exactly
  • {n,} – At least n occurrences
  • {n,m} – Between n and m occurrences (inclusive)

Pattern Modifiers

  • i – Case Insensitive
  • m – Multiline mode – ^ and $ match start and end of lines
  • s – Dotall – . class includes newline
  • x – Extended– comments and whitespace
  • e – preg_replace only – enables evaluation of replacement as PHP code
  • S – Extra analysis of pattern
  • U – Pattern is ungreedy
  • u – Pattern is treated as UTF-8

Point based assertions

  • \b – Word boundary
  • \B – Not a word boundary
  • \A – Start of subject
  • \Z – End of subject or newline at end
  • \z – End of subject
  • \G – First matching position in subject

Assertions

  • (?=) – Positive look ahead assertion foo(?=bar) matches foo when followed by bar
  • (?!) – Negative look ahead assertion foo(?!bar) matches foo when not followed by bar
  • (?
  • (?
  • (?>) – Once-only subpatterns (?>\d+)bar Performance enhancing when bar not present
  • (?(x)) – Conditional subpatterns
  • (?(3)foo|fu)bar – Matches foo if 3rd subpattern has matched, fu if not
  • (?#) – Comment (?# Pattern does x y or z)
Hi, My name is Masud Alam, love to work with Open Source Technologies, living in Dhaka, Bangladesh. I graduated in 2009 with a bachelor's degree in Engineering from State University Of Bangladesh, I'm also a Certified Engineer on ZEND PHP 5.3, I served my first five years a number of leadership positions at Winux Soft Ltd, SSL Wireless Ltd, CIDA and MAX Group where I worked on ERP software and web development., but now i'm a co-founder and Chief Executive Officer and Managing Director of TechBeeo Software Consultancy Services Ltd. I'm also a Course Instructor of ZCPE PHP 7 Certification and professional web development course at w3programmers Training Institute - a leading Training Institute in the country.
5 comments on “Working With PHP Regular Expression
  1. This is a very useful tutorial for the beginners and advanced programmers both. I am very much grateful to the author of this page.
    I got one error in this case:
    $string = ‘ab-ce*[email protected] hi & jkl(mnopqr)stu+vw?x yz0>1234<567890';
    preg_match_all("/[\w]/", $string, $matches);

    In this tutorial, it is said that output will be abcefghijklmnopqrstuvwxyz01234567890, but actually the output will be: abcefghiampjklmnopqrstuvwxyz01234567890. Please check and correct this error.

Leave a Reply

Your email address will not be published. Required fields are marked *