There are multiple string searching methods in Python which can be implemented manually but RegEx or ‘Regular Expression’ is a built-in library in Python which can aid in string searching, the ‘re’ library includes expressions and individual commands which can dictate what you want to search in a string. There are numerous possibilities and RegEx Covers almost all of them when it comes to searching for a specific thing in a string or something that is widespread.
RegEx can also be defined as an expression/array of characters defined over a search pattern. There are multiple patterns and many different applications of RegEx. Let’s begin with a simple implementation.
Let’s assume that we have a string:
Hey There!
Now, we want to see if the above string starts with an A, for that purpose we will use the following syntax:
^A
The above caret ‘^’ symbol will check if the string starts with A or not, we can use the re.match() function to compare a control pattern and a string to see if they match or not:
import re #This imports the RegEx library which contains all the commands and sequences
pat = '^H' #The caret '^' defines that our string must start with A
str1 = 'Hey There!'
result = re.match(pat, str1) #This is the .match() function which takes in the control pattern and string and returns a true or false value.
print(result) #Prints the result, in this case, it will just print 'None'.
There are multiple symbols just like the caret that perform specific tasks:
. ^ $ {} [] | \ ? * +
Period (.): A period matches any single character that is present in the control pattern and the string it self, if there is a single ‘a’ in “Hey There!”, it will return a result otherwise a simple ‘None’ will be returned.
Caret (^): A caret defines the start of the string, if we use ^A inside the control pattern it will judge the string if it starts with ‘A’, if it doesn’t, it will simply return a ‘None’ value.
Dollar Sign ($): A dollar sign indicates the end of the a string, if we use ‘$e’ in the control pattern, it will look if the string ends with ‘e’ or not, if it does end with ‘e’, it will return a comprehensive true statement which will depict where the e is located and the length of the string, all indicators, if true, will also return the length of the string.
Set/Curly Brackets {}: The curly braces takes in two values which depict the least and most amount of repetitions of the matched pattern left to it. ‘e{2,5}’ this will look for at least 2 e’s in the string and at max 5 e’s in the string.
OR indicator |: The OR operator ‘|’, simply depict alternations between a search pattern, if we define ‘h | e’, it will look ‘h’ OR ‘e’ in the string, returning true if any one of them is there.
Group (): A group can be used to incorporate multiple search patterns with the use of ‘|’.
Star *: A star or * is used to search zero or more occurrences.
Question mark ?: A question mark ‘?’ depict zero or one occurrences of the searched query.
Plus Sign +: A plus depicts one or more occurrences of the searched pattern.
Apart from the general symbolised sequences, there are special sequences or \* commands like \D, \d, that are reserved for special functions, these are:
\A: If a set of characters are at the beginning of the searched string.
\B: ‘r\Band’, here ‘and’ is being searched and \B returns if the characters are present in the string or not, doesn’t matter if it is the start or end. ‘r’ depicts a raw string.
\b: Same thing as \B but only checks for the characters at the end or beginning of the searched string.
\D: Is only true if the string DOES NOT contain digits.
\d: Is true when the string DOES contain digits.
\s: Is only true when the string has white spaces or a single white space.
\S: Is only true when the string DOES NOT contain any white spaces.
\w: Returns a response where the string contains word characters ( a-z, 0-9 and _ )
\W: Returns a response when the string DOES NOT have any word characters.
\Z: Only returns a positive response when the specified characters are at the end of the searched string.
Apart from the re.match() function and its symbols, there are other functions where these symbols can be used, let’s take a look:
The following depicts a re.findall() function which returns all the existing matches.
import re #This imports the RegEx library which contains all the commands and sequences
str1 = 'Hey There!'
result = re.findall('e', str1) #The findall function will return all the matches.
print(result) #Prints the result in the form of a detailed response if the functions returns a true value otherwise just prints 'None'
The re.search() function, takes in special functions and searches them according to the defined characters, let’s see:
import re #This imports the RegEx library which contains all the commands and sequences
str1 = 'Hi There123!'
result = re.search('\d', str1) #The search will return a response that will include the location of the digits due to \d.
print(result) #Prints the result in the form of a detailed response if the functions returns a true value otherwise just prints 'None'
The re.split() function, splits the string according to the defined condition, here is an example of the code:
import re #This imports the RegEx library which contains all the commands and sequences
str1 = 'Hi There123!'
result = re.split('\s', str1) #The split function will split the string according to the defined condition
print(result) #Prints the result in the form of a detailed response if the functions returns a true value otherwise just prints 'None'
The re.sub() function will substitute whatever element you provide to the function at the place where the condition is true, let’s take in an example:
import re #This imports the RegEx library which contains all the commands and sequences
str1 = 'Hi There123!'
result = re.sub('\s', "sub", str1) #The sub function will substitute the a value according to the defined condition
print(result) #Prints the result in the form of a detailed response if the functions returns a true value otherwise just prints 'None'