0% found this document useful (0 votes)

11 views

Regular Expression

Uploaded by

rathourabhishevk5

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Regular Expression

Uploaded by

rathourabhishevk5

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Regular Expression

Basic Regular Expression Patterns

The use of the brackets [] to specify a disjunction of characters.

Use of the brackets [] plus the dash - to specify a range

Caret ˆ for negation or just to mean ˆ

The question mark ? marks optionality of the previous expression

? - Zero or one instances of the previous character

* - zero or more occurrences of the immediately previous character or regular
expression

/a*/ - [1 ] Any string of zero or more a’s”. [2]. Match a or aaaaaa, but it will
also match the empty string at the start of Off Minor
/aa*/, meaning one a followed by zero or more a’s
/[ab]*/ means “zero or more a’s or b’s. Like aaaa or ababab or bbbb
Kleene + one or more occurrences of the immediately
preceding character or regular expression

Use of the period . to specify any character:

Anchors in regular expressions

caret ˆ
to match the start of a line
to indicate a negation inside of square brackets
just to mean a caret
/\b99\b

99 - There are 99 bottles of beer on the wall (because 99 follows a space)

not 99 - There are 299 bottles of beer on the wall (since 99 follows a number).
99 - $99 (since 99 follows a dollar sign ($), which is not a digit, underscore, or
letter).
Disjunction, Grouping, and Precedence
Search for either the string cat or the string dog : /cat|dog/
guppy and guppies? : /gupp(y|ies)/

Match repeated instances of a string:

Column 1 Column 2 Column 3

Match the word Column, followed by a number and

optional spaces, the whole pattern repeated zero or more times
Highest precedence to lowest precedence

Counters have a higher precedence than sequences,

/the*/ matches theeeee but not thethe.
Sequences have a higher precedence than disjunction
/the|any/ matches the or any but not thany or theny.
More Operators
Aliases for common sets of characters:

Regular expression operators for counting

Some characters that need to be backslashed
Reference

Code: https://github.com/codebasics/py/blob...
Exercise: https://github.com/codebasics/py/blob.
Substitution, Capture Groups, and ELIZA
Substitution operator s/regexp1/pattern/ regular expression to
be replaced by another string

s/colour/color/

To change 35 boxes to the <35> boxes

-put parentheses ( and ) around the first pattern and use the
number operator \1 in the second pattern to refer back
-s/([0-9]+)/<\1>/
the Xer they were, the Xer they will be
/the (.*)er they were, the \1er they will be
Eg: bigger they were, the bigger they will be

Capture group
/the (.*)er they (.*), the \1er we \2/
Eg: the faster they ran, the faster we ran

Non-capturing group
/(?:some|a few) (people|cats) like some \1/
Eg: some cats like some cats
Regular expression substitutions each of which matches and
changes some part of the input lines. After the input is
uppercased, substitutions change all instances of MY to YOUR,
and I’M to YOU ARE, and so on
Text normalization
1. Tokenizing (segmenting) words
2. Normalizing word formats
3. Segmenting sentences
Simple Unix Tools for Word Tokenization
Word Tokenization
Top-down (rule-based) tokenization
Define a standard and implement rules
A Bottom-up Tokenization Algorithm
Use simple statistics
Byte-Pair Encoding: A Bottom-up Tokenization
Algorithm
Deal with unknown words
Training corpus : low, new, newer
Test corpus: lower
Byte-Pair Encoding Algorithm
Word Normalization and Stemming

Normalization : To Standard Format

• Applications like IR: reduce all letters to lower case
• Since users tend to use lower case
• Possible exception: upper case in mid-sentence?
• e.g., General Motors
• Fed vs. fed
• SAIL vs. sail
• For sentiment analysis, MT, Information extraction
• Case is helpful (US versus us is important)
Lemmatization

• Reduce inflections or variant forms to base form

• am, are, is → be
• car, cars, car's, cars' → car
• the boy's cars are different colors → the boy car be different
color
• Lemmatization: have to find correct dictionary headword
form
Morphology
• Morphemes:
• The small meaningful units that make up words
• Stems: The core meaning-bearing units
• Affixes: Bits and pieces that adhere to stems
• Often with grammatical functions
• Eg: fox consists of one morpheme
cats consists of two: the morpheme cat and the morpheme -s.
Stemming
• Reduce terms to their stems in information retrieval
• Stemming is crude chopping of affixes
• language dependent
• e.g., automate(s), automatic, automation all reduced to
automat.

for example compressed for exampl compress and

and compression are both compress ar both accept
accepted as equivalent to as equival to compress
compress.
Porter’s algorithm
The most common English stemmer
Step 1a Step 2 (for long stems)
sses → ss caresses → caress ational→ ate relational→ relate
ies → i ponies → poni izer→ ize digitizer → digitize
ss → ss caress → caress ator→ ate operator → operate
s → ø cats → cat …

Step 1b Step 3 (for longer stems)

(*v*)ing → ø walking → walk al → ø revival → reviv
sing → sing able → ø adjustable → adjust
ate → ø activate → activ
(*v*)ed → ø plastered → plaster
…
…

NLP
No ratings yet
NLP
38 pages
Lecture 3
No ratings yet
Lecture 3
70 pages
2-Regular expressions, Text Normalization, Edit Distance
No ratings yet
2-Regular expressions, Text Normalization, Edit Distance
42 pages
9 The Sed Editor: Mauro Jaskelioff
No ratings yet
9 The Sed Editor: Mauro Jaskelioff
40 pages
RegularExpressions
No ratings yet
RegularExpressions
16 pages
3 Role of Parser
No ratings yet
3 Role of Parser
135 pages
Unit3 Toc
No ratings yet
Unit3 Toc
97 pages
3-Regular Expressions
No ratings yet
3-Regular Expressions
34 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
Lecture 03
No ratings yet
Lecture 03
36 pages
Chomsky Hierarchy of Languages
No ratings yet
Chomsky Hierarchy of Languages
24 pages
Syntax Analysis: CD: Compiler Design
No ratings yet
Syntax Analysis: CD: Compiler Design
36 pages
NLP Unit1Content
No ratings yet
NLP Unit1Content
106 pages
Code Python Notes
No ratings yet
Code Python Notes
17 pages
Chapter 2 - Regular Expression
No ratings yet
Chapter 2 - Regular Expression
25 pages
Unit 3
No ratings yet
Unit 3
106 pages
Regular Expressions and Sed & Awk
No ratings yet
Regular Expressions and Sed & Awk
14 pages
Unit - 3 Mid - 1
No ratings yet
Unit - 3 Mid - 1
37 pages
Chapter Two
No ratings yet
Chapter Two
72 pages
Regex
No ratings yet
Regex
30 pages
2. Simple Syntax Directed Translation
No ratings yet
2. Simple Syntax Directed Translation
51 pages
Working With Data
No ratings yet
Working With Data
44 pages
Slide14 RegularExpression
No ratings yet
Slide14 RegularExpression
45 pages
Lecture Notes On Lexical Processing
No ratings yet
Lecture Notes On Lexical Processing
16 pages
L4 Formal Grammers
No ratings yet
L4 Formal Grammers
23 pages
Regular Expressions: Item 15: Know The Precedence of Regular Expression Operators
No ratings yet
Regular Expressions: Item 15: Know The Precedence of Regular Expression Operators
36 pages
CD Unit2
No ratings yet
CD Unit2
73 pages
Chapter Four
No ratings yet
Chapter Four
54 pages
2 Regular Expressions
No ratings yet
2 Regular Expressions
34 pages
FLAT Unitt-1
No ratings yet
FLAT Unitt-1
9 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
Common Regular Expression 2
No ratings yet
Common Regular Expression 2
26 pages
Vim Search and Replace Guide
No ratings yet
Vim Search and Replace Guide
8 pages
Unit Iv Context Free Languages
No ratings yet
Unit Iv Context Free Languages
74 pages
The Current Topic: Python Announcements: Lecture Room
No ratings yet
The Current Topic: Python Announcements: Lecture Room
7 pages
Introduction - Types of Stemming Algorithms
No ratings yet
Introduction - Types of Stemming Algorithms
28 pages
Unit2 TopDownParsing
No ratings yet
Unit2 TopDownParsing
12 pages
Module 2a - With soln
No ratings yet
Module 2a - With soln
90 pages
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
No ratings yet
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
76 pages
ATCD PPT Module-3
No ratings yet
ATCD PPT Module-3
136 pages
Parser Lec1
No ratings yet
Parser Lec1
20 pages
2 - Python Strings
No ratings yet
2 - Python Strings
23 pages
Theory of Computation: Madhav Institute of Technology and Science
No ratings yet
Theory of Computation: Madhav Institute of Technology and Science
38 pages
ATCD UT3 Material
No ratings yet
ATCD UT3 Material
20 pages
Program Set 2
No ratings yet
Program Set 2
7 pages
Context Free Grammars
No ratings yet
Context Free Grammars
10 pages
FLAT UNIT 3 -28.9.20
No ratings yet
FLAT UNIT 3 -28.9.20
7 pages
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
No ratings yet
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
22 pages
Morphological Analysis (1)
No ratings yet
Morphological Analysis (1)
118 pages
Module 2
No ratings yet
Module 2
78 pages
RG CFG AMbiguity
No ratings yet
RG CFG AMbiguity
8 pages
Filter 4
No ratings yet
Filter 4
11 pages
Mod - 3 (2)
No ratings yet
Mod - 3 (2)
51 pages
Sys LW-08EN Regex-Filters
No ratings yet
Sys LW-08EN Regex-Filters
31 pages
Chapter 6
No ratings yet
Chapter 6
52 pages
2 Syntax Analysis - Introduction
No ratings yet
2 Syntax Analysis - Introduction
8 pages
Module 3 Notes
No ratings yet
Module 3 Notes
48 pages
CSE322 #Automata Full Unit - 4 Context Free Languages (@rajkumar)
No ratings yet
CSE322 #Automata Full Unit - 4 Context Free Languages (@rajkumar)
74 pages
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
Easy Password Creation Systems For Life
From Everand
Easy Password Creation Systems For Life
Josephine Rosenburgh
5/5 (2)
5 Things To Do In The Winter (B1)
No ratings yet
5 Things To Do In The Winter (B1)
29 pages
Sari, A.
No ratings yet
Sari, A.
15 pages
Challenges in Translating Scientific Texts
No ratings yet
Challenges in Translating Scientific Texts
12 pages
People Also Search For: Durable, Permanent, Firm, Steady
No ratings yet
People Also Search For: Durable, Permanent, Firm, Steady
1 page
Reported Speech Grade 10
No ratings yet
Reported Speech Grade 10
14 pages
Communication Strategies, System and Skills - How To Build Strong Business Communication
No ratings yet
Communication Strategies, System and Skills - How To Build Strong Business Communication
12 pages
An Intensive Course in English v1 PDF
100% (2)
An Intensive Course in English v1 PDF
351 pages
RFP Early Intervention
No ratings yet
RFP Early Intervention
21 pages
2.3.singing 3 Song My Sweet Lordh
No ratings yet
2.3.singing 3 Song My Sweet Lordh
2 pages
Conditional 1 Boardgame Boardgames - 144711
No ratings yet
Conditional 1 Boardgame Boardgames - 144711
1 page
English Language: University of Belgrade Faculty of Philosophy
No ratings yet
English Language: University of Belgrade Faculty of Philosophy
21 pages
Between vs. Among_ When To Use Each One _ Thesaurus.com
No ratings yet
Between vs. Among_ When To Use Each One _ Thesaurus.com
4 pages
How To Write A Thesis Statement For A Poetry Essay
100% (2)
How To Write A Thesis Statement For A Poetry Essay
6 pages
اصلاح اسماعيل Watermark
No ratings yet
اصلاح اسماعيل Watermark
63 pages
Teaching Receptive and Productive Skills
No ratings yet
Teaching Receptive and Productive Skills
37 pages
b1 +
No ratings yet
b1 +
5 pages
Examen Inglés de Galicia (Extraordinaria de 2020) (WWW - Examenesdepau.com)
No ratings yet
Examen Inglés de Galicia (Extraordinaria de 2020) (WWW - Examenesdepau.com)
2 pages
DLL - All Subjects 2 - Q3 - W7 - D1
No ratings yet
DLL - All Subjects 2 - Q3 - W7 - D1
9 pages
Determinism in New Dialect Formation and The Genesis of New Zealand English
No ratings yet
Determinism in New Dialect Formation and The Genesis of New Zealand English
20 pages
Final 4th Grade Slide Show
No ratings yet
Final 4th Grade Slide Show
17 pages
Work Sheet Subjunctive Practice
No ratings yet
Work Sheet Subjunctive Practice
2 pages
13.history of The Development of Didactics of Foreign Languages: From The Grammar Translation To Current Approaches
No ratings yet
13.history of The Development of Didactics of Foreign Languages: From The Grammar Translation To Current Approaches
9 pages
Cel2106 Portfolio 1 - Task 2 - Resume
No ratings yet
Cel2106 Portfolio 1 - Task 2 - Resume
7 pages
IELTS Map Vocabulary
No ratings yet
IELTS Map Vocabulary
1 page
Modal Özeti: Konu I Billl' Y: Yetenek
No ratings yet
Modal Özeti: Konu I Billl' Y: Yetenek
7 pages
Ball_P_Kelly_K_Clegg_J_2015_Putting_CLIL_into_Prac
No ratings yet
Ball_P_Kelly_K_Clegg_J_2015_Putting_CLIL_into_Prac
3 pages
Year 3 Cefr Week 11
No ratings yet
Year 3 Cefr Week 11
25 pages
Medaille College Department of Education Modified Lesson Plan
No ratings yet
Medaille College Department of Education Modified Lesson Plan
4 pages
The Opinion Essay
No ratings yet
The Opinion Essay
2 pages
Musical Theatre Song Chart - Barcelona 5
No ratings yet
Musical Theatre Song Chart - Barcelona 5
1 page

Regular Expression

Uploaded by

Regular Expression

Uploaded by

Regular Expression

Basic Regular Expression Patterns

The use of the brackets [] to specify a disjunction of characters.

Caret ˆ for negation or just to mean ˆ

? - Zero or one instances of the previous character

Use of the period . to specify any character:

99 - There are 99 bottles of beer on the wall (because 99 follows a space)

Match repeated instances of a string:

Column 1 Column 2 Column 3

Match the word Column, followed by a number and

Counters have a higher precedence than sequences,

Regular expression operators for counting

To change 35 boxes to the <35> boxes

Normalization : To Standard Format

• Reduce inflections or variant forms to base form

for example compressed for exampl compress and

Step 1b Step 3 (for longer stems)

You might also like