Regular Expression
Regular Expression
Aarti Dharmani
Estimate bigram probabilities
• <s> I am Sam </s>
• <s> Sam I am </s>
• <s> I do not like green eggs and ham </s>
P(I|<s>) =
P(Sam|<s>) =
P(am|I) =
P(</s>|Sam) =
P(Sam|am) =
P(do|I) =
Given no. of bigrams and unigrams count of
a dataset
i want to eat chinese food lunch spend
i 5 827 0 9 0 0 0 2
want 2 0 608 1 6 6 5 1
to 2 0 4 686 2 0 6 211
eat 0 0 2 0 16 2 42 0
chinese 1 0 0 0 0 82 1 0
food 15 0 15 0 1 4 0 0
lunch 2 0 0 0 0 1 0 0
spend 1 0 1 0 0 0 0 0
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Elements of Regular Expressions
1. Repeaters ( *, +, and { } )
These symbols act as repeaters and tell the computer that the preceding character
is to be used for more than just one time.
5. Wildcard ( . )
The dot symbol can take the place of any other symbol, that is why it is called the
wildcard character.
6. Optional character ( ? )
This symbol tells the computer that the preceding character may or may not be present in the string to be
matched.
• f you're looking for a regular expression for a mobile number that should start with 8 or 9 and have a total of 10
digits, you can use the following:
• regexCopy code
• ^[89]\d{9}$
• Explanation:
• ^[89]: The caret (^) asserts the start of the string. [89] means the first digit should be 8 or 9.
• \d{9}: \d represents any digit, and {9} specifies that there should be exactly 9 digits following the first one.
• $: The dollar sign asserts the end of the string.
Email ID:
Should have the format "[email protected]"
• regexCopy code
• ^[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z]{2,}$
• Explanation:
• ^[a-zA-Z0-9]+: Starts with one or more alphanumeric characters.
• @: Contains the "@" symbol.
• [a-zA-Z0-9]+: Followed by one or more alphanumeric characters for the
domain name.
• \.: Contains a dot before the top-level domain.
• [a-zA-Z]{2,}$: Ends with at least two alphabetic characters for the top-level
domain.
First Character uppercase, contains lower case
alphabets, only one digit allowed in between
regex
• ^[A-Z][a-z]*\d?[a-z]*$
• Explanation:
• ^[A-Z]: The caret (^) asserts the start of the string. [A-Z] means the first
character should be an uppercase letter.
• [a-z]*: Matches zero or more lowercase letters.
• \d?: Optionally matches one digit.
• [a-z]*$: Matches zero or more lowercase letters until the end of the string.
• This regular expression ensures that the first character is uppercase, and
the string can contain lowercase letters with at most one digit in between
them.