Emma Barnes

Emma Barnes

Senior Insights and Analytics Analyst

“Some people, when confronted with a problem, think
‘I know, I’ll use regular expressions.’ Now they have two problems.” – Jamie Zawinski, Netscape Engineer.

Regular Expressions (RegEx) are a pretty nice way of filtering data in a complex manner. However, they can be a bit tricky to get working. This post will go through how you can use RegEx for error checking – for example, getting rid of rubbish inputs from telephone number or e-mail address fields.

How do I use RegEx for error checking?

Firstly, decide what you want to check for errors:

  • Should I only include a certain kind of term? (Are you allowed letters, numbers, spaces or a combination?)
  • Is there a minimum or maximum character length? (Phone numbers must not exceed 15 characters, for example)
  • Are there any terms I need to exclude? (Do you allow calls from 070 numbers?)
  • Does the field need to be in a specific format? (Email address must contain “@”, for example)

Including character types

To include just a certain type of character, firstly, you need to decide what kind of character you want to allow without your set and find the relevant RegEx:

  • Numbers: 0-9
  • lowercase alphabetical: a-z
  • Uppercase alphabetical: A-Z
  • Spaces: \s
  • Words:\w

Regex defines a set using [] and, to include more than one type of item within a set, they can be separated by commas.

For example, I want to include telephone numbers but allow people to type spaces, so I’ll be using the set [0-9,\s].

Because I want to match a string rather than just a character, I need to write a rule that says “all characters within the string must be within this format”. That can be done using the following RegEx:

^[0-9,\s]{1,}$

To break this down:

  • ^ means “must begin with”
  • [0-9,\s] is the set we’re using to just include numbers and spaces
  • {1,} says there must be at least one character of the string within that set and all the following characters must be within that set
  • $ means “must end with”

So, now we have a Regular Expression that says “return strings with only numbers and spaces”. To modify for a different set, you need to change the set being used: ^[0-9,\s]{1,}$

Using character limits

We have almost done this already and it all comes down to is using the curly brackets: {}. Using them allows you to set a limit on the minimum or maximum length your input can be. For example, telephone numbers must be at least 10 digits long and, at the most, contain 15 characters. However, to allow for people to put spaces in, this could be extended to 20 characters.

The Regular Expression is modified to this: ^[0-9,\s][0-9,\s]{10,20}$

Where {10,20} is, it says that “the string must be between 10 and 20 characters”. This can, however, be modified if you require it to be.

Excluding poor inputs

In this example, I do not want any leads from 070 numbers as, in the past, they have yielded poor results. I always wish to exclude 0123456789 as a telephone number as it is a rubbish number only used for testing purposes.

^(?!070)(?!0123456789)[0-9,\s][0-9,\s]{10,15}$

This makes use of the following RegEx modifiers:

  • () groups things and acts as an “and” operation
  • (?!) turns it into “do not include” (known as a “negative lookahead”)

To modify for your own “do not include”, you have to change the characters within the set: ^(?!070)(?!0123456789)[0-9,\s][0-9,\s]{10,15}$

If you have any questions, feel free to get in touch.

Resources

Regexr – For testing RegEx (My personal favourite)

Regex101 – For testing RegEx (But you might prefer this)

Regular-Expressions.info – For reference purposes

Free of charge. Unsubscribe anytime.