0% found this document useful (0 votes)
60 views

Regular Expressions

The document discusses regular expressions (regex) and their use in searching text. It provides examples of common regex patterns using special characters like parentheses, brackets, periods, asterisks, plus signs, question marks, curly brackets, pipes, carets, and dollars signs. These special characters allow matching things like digits, non-digits, whitespace, alphanumeric characters, repetitions, character classes, single characters, start/end of lines, alternations and more. The document also provides examples of regex patterns for matching chromosomes, sequences of letters and numbers with commas, and comments.

Uploaded by

gotls
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Regular Expressions

The document discusses regular expressions (regex) and their use in searching text. It provides examples of common regex patterns using special characters like parentheses, brackets, periods, asterisks, plus signs, question marks, curly brackets, pipes, carets, and dollars signs. These special characters allow matching things like digits, non-digits, whitespace, alphanumeric characters, repetitions, character classes, single characters, start/end of lines, alternations and more. The document also provides examples of regex patterns for matching chromosomes, sequences of letters and numbers with commas, and comments.

Uploaded by

gotls
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Syntax

The select tool searches the data for lines containing or not containing a match to the given pattern.
Regular Expression is introduced in this tool. A Regular Expression is a pattern describing a certain
amount of text.

· ( ) { } [ ] . * ? + ^ $ are all special characters. \ can be used to "escape" a special character,


allowing that special character to be searched for.
· \A matches the beginning of a string(but not an internal line).
· \d matches a digit, same as [0-9].
· \D matches a non-digit.
· \s matches a whitespace character.
· \S matches anything BUT a whitespace.
· \t matches a tab.
· \w matches an alphanumeric character.
· \W matches anything but an alphanumeric character.
· ( .. ) groups a particular pattern.
· \Z matches the end of a string(but not a internal line).
· { n or n, or n,m } specifies an expected number of repetitions of the preceding pattern.
· {n} The preceding item is matched exactly n times.
· {n,} The preceding item is matched n or more times.
· {n,m} The preceding item is matched at least n times but not more than m times.
· [ ... ] creates a character class. Within the brackets, single characters can be placed. A dash (-)
may be used to indicate a range such as a-z.
· . Matches any single character except a newline.
· * The preceding item will be matched zero or more times.
· ? The preceding item is optional and matched at most once.
· + The preceding item will be matched one or more times.
· ^ has two meaning: - matches the beginning of a line or string. - indicates negation in a character
class. For example, [^...] matches every character except the ones inside brackets.
· $ matches the end of a line or string.
· | Separates alternate possibilities.

Example

· ^chr([0-9A-Za-z])+ would match lines that begin with chromosomes, such as lines in a BED
format file.
· (ACGT){1,5} would match at least 1 "ACGT" and at most 5 "ACGT" consecutively.
· ([^,][0-9]{1,3})(,[0-9]{3})* would match a large integer that is properly separated with commas
such as 23,078,651.
· (abc)|(def) would match either "abc" or "def".
· ^\W+# would match any line that is a comment.

So to learn about regex basics, We need to start learning about some special
characters that are known as MetaCharacters. They help us in creating more
complex regex search term. Mentioned below is the list of basic
metacharacters,

. or Dot will match any character


[ ]        will match a range of characters

[^ ]      will match all character except for the one mentioned in braces

*          will match zero or more of the preceding items

+         will match one or more of the preceding items

?         will match zero or one of the preceding items

{n}      will match ‘n’ numbers of preceding items

{n,}     will match ‘n’ number of or more of preceding items

{n m}  will match between ‘n’ & ‘m’ number of items

{ ,m}   will match less than or equal to m number of items

\           is
an escape character, used when we need to include one of the
metacharacters is our search.

We will now discuss all these metacharacters with examples.

You might also like