About Regular Expressions

Pattern Matching for Forms

aerobics

It's normal to feel intimidated by regular expressions. What the
^\d{3}-\d{2}-\d{4}$ are they?

Explore Regular Expressions

When you are requesting information from a user within a form, it is very important that they provide it in a way that you expect so that your data is not contaminated or improperly formatted. Regular Expressions are a mechanism that can help you ensure that a user provides information that matches a pattern that you require.

Regular Expressions are rules used to match patterns in one or more strings. Think of them as filters or gatekeepers of information before it is accepted by a form.

Regular Expressions exist in other languages such as PHP and Java, but we are testing them in JavaScript in this application.

This application lets you write and field test regular expressions. By tweaking the expressions, you can:

  • learn the structural syntax
  • learn what is required and what is optional
  • see if you are learning how to correctly interpret the expressions and their functions
  • field test 'filters' before using them in real forms
  • All regular expressions begin and end with forward slashes (/). Those are put in for us in the JavaScript, we only need to supply the 'guts' in the form. A regular expression is created in JavaScript from a string like 'reg':
    • var regex = new RegExp(reg);
  • An (optional) caret (^) following the opening forward slash (/) means the expression MUST begin with the prescripted pattern. If omitted, the expression may appear anywhere in the string to validate
  • An (optional) dollar sign ($) preceeding the closing forward slash (/) means the expression MUST end with the prescripted pattern ~ nothing may come after it.
  • Expressions are checked using different functions in JavaScript:
    • matched = testExp.match(regex); returns the portion of the test expression that is matched by the pattern. null is returned if it is not found
    • validExp = regex.test(testExp); returns a true or false state based on whether the pattern is found

Metacharacters let us describe patterns of text within a regular expression

Metacharacter Meaning
\d digits, 0-9 ;
2 digits example code: \d\d or \d{2}
\w any alphanumeric character (letter or number)
\s whitespace: space, tab, newline, CR
^ Looks for beginning of a string
. Matches any one character except newline
$ Looks for end of a string

Quantifiers specify how many times characters or metacharacters should appear in a pattern:

  • /^d{10}$/ Here, the {10} says that ten digits are required in the pattern. {} are common quantifiers
  • {min, max}: Range of possible times the preceeding character or metacharacter should be repeated
  • + : The preceeding character or metacharacter must appear one or more times
  • ? : The preceeding character or metacharacter must appear once or not at all
  • * : The character or metacharacter can appear one or more times...or not at all

A character class is a set of rules for matching a single character

  • Character classes let you match characters from a specific set
  • The delimitter for character classes is the square bracket, [ ]
  • Within a character class, you may use the carat (^) as a means of saying "match everything except"
  • Multiple groups are allowed
  • Examples:
    • [0-2] matches 0, 1, 2
    • [^b-e] matches everything except b, c, d, e
    • [a-zA-Z] matches lower & upper case letters

More complicated regular expressions may be built using these ideas:

  • Expression fragments may be grouped with parenthesis ()
  • If you want to require a value that has "code meaning" (i.e. /, (, ), {, }, [, ], etc) you must "escape" it by preceeding it with a backslash (\)
    • /^\(\d{3}\)$/ matches patterns of (###)
    • \/ matches a required /
  • Use the vertical 'pipe' symbol (|) as a logical 'or'
    • /(dog)|(dawg)|(doggie)/ matches either word
Regular Expression What it Does
^\d\d\d\d$ Exactly 4 digits
^\d{6}$ Exactly 6 digits
^[1-9]{1}[0-9]*$ Positive (>0) integers
(^[1-9]{1}[0-9]*$)|(^0$) Non-Negative integers
(^[1-9]{1}[0-9]*$)|(^0$)|(^[-]{1}[1-9]{1}[0-9]*$) All Integers, no multiple zeros or -0 values
^([1-9]{1}[0-9]*)?0?\.?[0-9]*[0-9]{1}$ Positive decimals or integers, but no single decimals like '.' or '1.' or meaningless zeros like '001.2'
^-?([1-9]{1}[0-9]*)?0?\.?[0-9]*[0-9]{1}$ Same as above with negatives allowed
^-?([1-9]{1}[0-9]{0,15})?0?\.?[0-9]{0,12}[0-9]{1}$ Same as above but with limits on the number of digits before and after the decimal, by using the {min-max}constraint
^[A-Z]{1}[a-z]{1}[a-z]*$ Sentence Case single word, at least 2 letters
^[A-Z]{2}$ 2 Letter Initials in Caps
(^[A-Z]{1}[a-z]{1}[a-z]*$)|(^[A-Z]{2}$) Sentence Case word or two letters in caps
^[A-Z]{1}[a-z]{1}[a-z]*([\s-]?[A-Z]{1}[a-z]{1}[a-z]*)?$ Single or hyphenated names (with - or space or merged) Ex: JimBob, Mary Jane, Joe-Bob, MacDonald
^[A-Z]{1}('[A-Z])?[a-z]{1}[a-z]*([\s-]?[A-Z]{1}[a-z]{1}[a-z]*)?$ Same as above but allows apostrophes: Ex: D'Nae
^\d{3}-\d{2}-\d{4}$ Social Security Number with dashes
^\d{3}\s\d{2}\s\d{4}|\d{9}|\d{3}-\d{2}-\d{4}$ Social Security Number with dashes, spaces or all together
^\(\d{3}\)\s?\d{3}-\d{4}(-\d{1,4})?$

Phone number:

Required area code in parenthesis, optional space, 3 digits, dash, 4 digits, optional 1-4 length extention

^\([2-9]\d{2}\)\s?\d{3}-\d{4}(-\d{1,4})?$ Same as above, but valid US phone numbers cannot start with a one or zero. This prevents that
^\d{5}$ 5 digit zip code
^[1-3]{1}\d{4}$ 5 digit zip that must begin with a [1-3]
^((0[1-9])|(1[0-2]))\/{1}$ Month: 01 to 12 with forward slash
^((0[1-9])|(1[0-9])|2[0-9]|3[0-1])\/{1}$ Day: 01-31 with forward slash
^(19|20)?\d\d$ Year: 1900-2099 or just two digits: 00-99
^((0[1-9])|(1[0-2]))\/{1}((0[1-9])|(1[0-9])|2[0-9]|3[0-1])\/{1}(19|20)?\d\d$ Month/Day/Year: two digits required for month and day with restrictions 1-12 and 1-31 for month, date; years must be 1900-2099 or a two digit year for current century
^[a-zA-Z0-9][a-zA-Z0-9\._\-&!?=#]*@ Email: local name followed by @ symbol
^[a-zA-Z0-9][a-zA-Z0-9_\-&!?=#]* Email: Domain prefix (without period)
^(\.[a-z]+)+$ Email: Domain suffix (must begin with a period)
^[a-zA-Z0-9][a-zA-Z0-9\._\-&!?=#]*@[a-zA-Z0-9][a-zA-Z0-9_\-&!?=#]*(\.[a-z]+)+$ Total Email

Here is a list of helpful online resources about regular expressions and/or places where they are probably used: