0% found this document useful (0 votes)
56 views

Lesson 1: Introducing Regular Expressions

Regular expressions are tools used to search text for patterns and replace or manipulate text matches. They can be used to solve problems like searching for text regardless of case, validating email addresses, and replacing substrings. Regular expressions use a specialized syntax and language to concisely describe text patterns. They are built into many programming languages and applications to perform powerful search and replace operations.

Uploaded by

Me Its
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Lesson 1: Introducing Regular Expressions

Regular expressions are tools used to search text for patterns and replace or manipulate text matches. They can be used to solve problems like searching for text regardless of case, validating email addresses, and replacing substrings. Regular expressions use a specialized syntax and language to concisely describe text patterns. They are built into many programming languages and applications to perform powerful search and replace operations.

Uploaded by

Me Its
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Lesson 1

Introducing Regular Expressions

In this lesson you’ll learn what regular expressions are and what they can do for you.

UNDERSTANDING THE NEED


Regular expressions (often shortened as RegEx or regex) are tools, and like all tools, regular expressions are designed
to solve a very specific problem. The best way to understand regular expressions and what they do is to understand
the problem they solve.

Consider the following scenarios:

You are searching for a file containing the text car (regardless of case) but do not want to also locate car in the
middle of a word (for example, scar, carry, and incarcerate).
You are generating a Web page and need to display text retrieved from a database. Text may contain URLs, and you
want those URLs to be clickable in the generated page (so that instead of generating just text, you generate a valid
HTML <a href></a>).

You create an app with a form that prompts for user information including e-mail address. You need to verify that
specified addresses are formatted correctly (that they are syntactically valid).
You are editing source code and need to replace all occurrences of size with iSize, but only size and
not size as part of another word.
You are displaying a list of all files in your computer file system and want to filter so that you locate only files
containing the text Application.

You are importing data into an application. The data is tab delimited and your application supports CSV format files
(one row per line, comma-delimited values, each possibly enclosed with quotes).
You need to search a file for some specific text, but only at a specific location (perhaps at the start of a line or at the
end of a sentence).

All these scenarios present unique programming challenges. And all of them can be solved in just about any language
that supports conditional processing and string manipulation. But how complex a task would the solution become?
You would need to loop through words or characters one at a time, perform all sorts of if statement tests, track lots of
flags so as to know what you had found and what you had not, check for whitespace and special characters, and more.
And you would need to do it all manually, over and over.

Or you could use regular expressions. Each of the preceding challenges can be solved using well-crafted statements—
highly concise strings containing text and special instructions—statements that may look like this:

\b[Cc][Aa][Rr]\b
Note
Don’t worry if the previous line does not make sense yet; it will shortly.

HOW REGULAR EXPRESSIONS ARE USED


Look at the problem scenarios again and you will notice that they all fall into one of two types: either information is
being located (search) or information is being located and edited (replace). In fact, at its simplest, that is all that
regular expressions are ever used for: search and replace. Every regular expression either matches text (performing a
search) or matches and replaces text (performing a replace).
RegEx Searches
Regular expressions are used in searches when the text to be searched for is highly dynamic, as in searching
for car in the scenario described earlier. For starters, you need to locate car or CAR or Car or even CaR; that’s the
easy part (many search tools are capable of performing searches that are not case sensitive). The trickier part is
ensuring that scar, carry, and incarcerate are not matched. Some more sophisticated editors have Match
Only Whole Wordoptions, but many don’t, and you may not be making this change in a document you are editing.
Using a regular expression for the search, instead of just the text car, solves the problem.
Tip
Want to know what the solution to this one is? You’ve actually seen it already—it is the sample statement shown
previously, \b[Cc][Aa][Rr]\b.

It is worth noting that testing for equality (for example, does this user-specified e-mail address match this regular
expression) is a search operation. The entire user-provided string is being searched for a match (in contrast to a
substring search, which is what searches usually are).

RegEx Replaces
Regular expression searches are immensely powerful, very useful, and not that difficult to learn. As such, many of the
lessons and examples that you will run into are matches. However, the real power of regex is seen in replace
operations, such as in the earlier scenario in which you replace URLs with clickable URLs. For starters, this requires
that you be able to locate URLs within text (perhaps searching for strings that start
with http:// or https:// and ending with a period or a comma or whitespace). Then it also requires that you
replace the found URL with two occurrences of the matched string with embedded HTML so that:
http://www.forta.com/

is replaced with

Click here to view code image


<a href="http://www.forta.com">http://www.forta.com/</a>

Or perhaps the text being located is just an address, and not a fully qualified URL, like this:

www.forta.com

which would also need to be turned into

Click here to view code image


<a href="http://www.forta.com">http://www.forta.com/</a>

The Search and Replace option in most applications could not handle this type of replace operation, but this task is
incredibly easy using a regular expression.

SO WHAT EXACTLY IS A REGULAR EXPRESSION?


Now that you know what regular expressions are used for, a definition is in order. Simply put, regular expressions are
strings that are used to match and manipulate text. Regular expressions are created using the regular expression
language, a specialized language designed to do everything that was just discussed and more. Like any language,
regular expressions have a specific syntax and instructions that you must learn, and that is what this book will teach
you.

The regular expression language is not a full programming language. It is usually not even an actual program or utility
that you can install and use. More often than not, regular expressions are mini-languages built-in to other languages
or products. The good news is that just about any decent language or tool these days supports regular expressions.
The bad news is that the regular expression language itself is not going to look anything like the language or tool you
are using it with. The regular expression language is a language unto itself—and not the most intuitive or obvious
language at that.
Note

Regular expressions originate from research in the 1950s in the field of mathematics. Years later, the principles and
ideas derived from this early work made their way into the Unix world into the Perl language and utilities such
as grep. For many years, regular expressions (used in the scenarios previously described) were the exclusive domain
of the Unix community, but this has changed, and now regular expressions are supported in a variety of forms on just
about every computing platform.

To put all this into perspective, the following are all valid regular expressions (and all will make sense shortly):

Ben
.
www\.forta\.com
[a-zA-Z0-9_.]*
<[Hh]1>.*</[Hh]1>
\r\n\r\n
\d{3,3}-\d{3,3}-\d{4,4}

It is important to note that syntax is the easiest part of mastering regular expressions. The real challenge, however, is
learning how to apply that syntax, how to dissect problems into solvable regex solutions. That is something that
cannot be taught by simply reading a book, but like any language, mastery comes with practice.

USING REGULAR EXPRESSIONS


As previously explained, there is no regular expressions program; it is not an application you run nor software you
buy or download. Rather, the regular expressions language is implemented in lots of software products, languages,
utilities, and development environments.

How regular expressions are used and how regular expression functionality is exposed varies from one application to
the next. Some applications have menu options and dialog boxes used to access regular expressions, whereas
programming languages typically provide functions or classes or objects that expose regex functionality.

Furthermore, not all regular expression implementations are the same. There are often subtle (and sometimes not so
subtle) differences between syntax and features.

Appendix A, “Regular Expressions in Popular Applications and Languages,” provides usage details and notes for
many of the applications and languages that support regular expressions. Before you proceed to the next lesson,
consult that appendix to learn the specifics pertaining to the application or language that you will be using.

To help you get started quickly, you’ll find links to online regular expression testing tools on this book’s Web page at

Click here to view code image


http://forta.com/books/0134757068/

These online tools are often the simplest way to experiment with regular expressions.

BEFORE YOU GET STARTED


Before you go any further, take note of a couple of important points:
When using regular expressions, you will discover that there are almost always multiple solutions to any problem.
Some may be simpler, some may be faster, some may be more portable, and some may be more capable. There is
rarely a right or wrong solution when writing regular expressions (as long as your solution works, of course).
As already stated, differences exist between regex implementations. As much as possible, the examples and lessons
used in this book apply to all major implementations, and differences or incompatibilities are noted as such.
As with any language, the key to learning regular expressions is practice, practice, practice.
Note
I strongly suggest that you try each and every example as you work through this book.

SUMMARY
Regular expressions are one of the most powerful tools available for text manipulation. The regular expressions
language is used to construct regular expressions (the actual constructed string is called a regular expression), and
regular expressions are used to perform both search and replace operations.

You might also like