Check for URL in a String - Python
Last Updated :
12 Apr, 2025
We are given a string that may contain one or more URLs and our task is to extract them efficiently. This is useful for web scraping, text processing, and data validation. For example:
Input:
s = "My Profile: https://www.geeksforgeeks.org/user/Prajjwal%20/contributions/ in the portal of https://www.geeksforgeeks.org/"
Output:
['https://www.geeksforgeeks.org/404.html/', 'https://www.geeksforgeeks.org/']
Using re.findall()
Python’s Regular Expressions (regex) module allows us to extract patterns like URLs from texts, it comes with various functions like findall(). The re.findall() function in Python is used to find all occurrences of a pattern in a given string and return them as a list.
Python
import re
s = 'My Profile: https://www.geeksforgeeks.org/404.html/ in the portal of https://www.geeksforgeeks.org/'
pattern = r'https?://\S+|www\.\S+'
print("URLs:", re.findall(pattern, s))
OutputURLs: ['https://www.geeksforgeeks.org/404.html/', 'https://www.geeksforgeeks.org/']
Explanation:
- r'https?://\S+|www\.\S+' is a regex pattern to match URLs starting with http://, https://, or www.
- findall() extracts all matches in a list.
Using the urlparse()
urlparse() function from Python's urllib.parse module helps break down a URL into its key parts, such as the scheme (http, https), domain name, path, query parameters, and fragments. This function is useful for validating and extracting URLs from text by checking if a word follows a proper URL structure.
Python
from urllib.parse import urlparse
s = 'My Profile: https://www.geeksforgeeks.org/404.html/ in the portal of https://www.geeksforgeeks.org/'
# Split the string into words
split_s = s.split()
# Empty list to collect URLs
urls = []
for word in split_s:
parsed = urlparse(word)
if parsed.scheme and parsed.netloc:
urls.append(word)
print("URLs:", urls)
OutputURLs: ['https://www.geeksforgeeks.org/404.html/', 'https://www.geeksforgeeks.org/']
Explanation:
- s.split() function splits the string to words.
- then urlparse(word) function checks each word to see if it has a valid scheme (http/https) and domain.
- URLs are added to url list using append() function.
Using urlextract()
urlextract is a third party library so to use it we need to first install it by giving the command "pip install urlextract" in out terminal, it offers a pre-built solution to find URLs in text. Its URLExtract class helps us to quickly identify URLs without needing custom patterns, making it a convenient choice for difficult extraction of URLs.
Python
from urlextract import URLExtract
s = 'My Profile: https://www.geeksforgeeks.org/user/Prajjwal%20/contributions/ in the portal of https://www.geeksforgeeks.org/'
extractor = URLExtract()
urls = extractor.find_urls(s)
print("URLs:", urls)
OutputUrls: ['https://www.geeksforgeeks.org/user/Prajjwal%20/contributions/', 'https://www.geeksforgeeks.org/']
Explanation:
- import URLExtract from the urlextract library.
- URLExtract() creates an extractor object to scan the string.
- find_urls() detects all URLs in s and returns them as a list, no manual splitting or validation is needed.
Using startswith()
One simple approach is to split the string and check if each word starts with "https://" or "https://" using .startswith() built-in method, we can use .split() function to split the string and then check each word, if it starts with "https://" or "https://". If it does, we add it to our list of extracted URLs.
Python
s = 'My Profile: https://www.geeksforgeeks.org/404.html/ in the portal of https://www.geeksforgeeks.org/'
x = s.split()
# Empty list to extract the URL
res=[]
for i in x:
if i.startswith("https:") or i.startswith("http:"):
res.append(i)
print("Urls: ", res)
OutputUrls: ['https://www.geeksforgeeks.org/404.html/', 'https://www.geeksforgeeks.org/']
Explanation:
- string.split() method splits the string into words.
- then we checks if each word starts with http:// or https:// using the "if" statement.
- if it does, then we add it to the list of URLs using .append() method.
Using find() method
find() is a built-in method in Python that is used to find a specific element in a collection, so we can use it to identify and extract a URL from a string. Here's how:
Python
s = 'My Profile: https://www.geeksforgeeks.org/404.html/ in the portal of https://www.geeksforgeeks.org/'
split_s = s.split()
res=[]
for i in split_s:
if i.find("https:")==0 or i.find("http:")==0:
res.append(i)
print("Urls: ", res)
OutputUrls: ['https://www.geeksforgeeks.org/404.html/', 'https://www.geeksforgeeks.org/']
Explanation:
- s.split() funtion splits the string to words.
- identify url using i.find() function.
- add the URLs to the list 'res' using .append().
Related Articles:
Similar Reads
Python Tutorial - Learn Python Programming Language Python is one of the most popular programming languages. Itâs simple to use, packed with features and supported by a wide range of libraries and frameworks. Its clean syntax makes it beginner-friendly. It'sA high-level language, used in web development, data science, automation, AI and more.Known fo
10 min read
Python Interview Questions and Answers Python is the most used language in top companies such as Intel, IBM, NASA, Pixar, Netflix, Facebook, JP Morgan Chase, Spotify and many more because of its simplicity and powerful libraries. To crack their Online Assessment and Interview Rounds as a Python developer, we need to master important Pyth
15+ min read
Python OOPs Concepts Object Oriented Programming is a fundamental concept in Python, empowering developers to build modular, maintainable, and scalable applications. OOPs is a way of organizing code that uses objects and classes to represent real-world entities and their behavior. In OOPs, object has attributes thing th
11 min read
Python Projects - Beginner to Advanced Python is one of the most popular programming languages due to its simplicity, versatility, and supportive community. Whether youâre a beginner eager to learn the basics or an experienced programmer looking to challenge your skills, there are countless Python projects to help you grow.Hereâs a list
10 min read
Python Exercise with Practice Questions and Solutions Python Exercise for Beginner: Practice makes perfect in everything, and this is especially true when learning Python. If you're a beginner, regularly practicing Python exercises will build your confidence and sharpen your skills. To help you improve, try these Python exercises with solutions to test
9 min read
Python Programs Practice with Python program examples is always a good choice to scale up your logical understanding and programming skills and this article will provide you with the best sets of Python code examples.The below Python section contains a wide collection of Python programming examples. These Python co
11 min read
Python Introduction Python was created by Guido van Rossum in 1991 and further developed by the Python Software Foundation. It was designed with focus on code readability and its syntax allows us to express concepts in fewer lines of code.Key Features of PythonPythonâs simple and readable syntax makes it beginner-frien
3 min read
Python Data Types Python Data types are the classification or categorization of data items. It represents the kind of value that tells what operations can be performed on a particular data. Since everything is an object in Python programming, Python data types are classes and variables are instances (objects) of thes
9 min read
Input and Output in Python Understanding input and output operations is fundamental to Python programming. With the print() function, we can display output in various formats, while the input() function enables interaction with users by gathering input during program execution. Taking input in PythonPython's input() function
7 min read
Enumerate() in Python enumerate() function adds a counter to each item in a list or other iterable. It turns the iterable into something we can loop through, where each item comes with its number (starting from 0 by default). We can also turn it into a list of (number, item) pairs using list().Let's look at a simple exam
3 min read