Open In App

Scrape Content from Dynamic Websites

Last Updated : 18 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Many websites load content using JavaScript after the page opens, so data may not appear in the initial HTML. Since requests and BeautifulSoup only fetch the static HTML, they can't access this dynamic content. Selenium helps by loading full page and running JavaScript. After that, BeautifulSoup can extract the required data.

This project demonstrates how to scrape dynamically loaded job profile links from "Top Jobs by Designation" page on Naukri Gulf using Selenium and BeautifulSoup. It uses webdriver-manager to automatically manage ChromeDriver, avoiding manual installation.

Install Selenium

Before using Selenium, we need to install it in our Python environment. We’ll also install webdriver-manager, which helps automatically manage the browser driver (like ChromeDriver), so you don’t need to manually download and set the path.

Run the following command in notebook or terminal:

pip install selenium
pip install webdriver-manager

Now, let's break down the scraping process step by step.

Step 1: Importing Required Libraries

To start, all the necessary Python libraries must be imported. These include Selenium components for browser automation, BeautifulSoup for parsing HTML content and webdriver-manager to automatically handle the browser driver setup.

Python
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup

Explanation:

  • selenium: Automates browser actions.
  • Service, Options: Configure the Chrome browser and driver.
  • By, WebDriverWait, expected_conditions: Used to wait for elements to load dynamically.
  • ChromeDriverManager: Automatically downloads the correct driver.
  • BeautifulSoup: Parses HTML content.

Step 2: Set Up Chrome Options

In this step, Chrome browser options are configured. These settings allow the browser to run in headless mode (without a visible window), disable GPU usage and use a custom user-agent string to mimic a real browser environment.

Python
chrome_options = Options()
chrome_options.add_argument("--headless")  
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument(
    "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.6778.265 Safari/537.36"
)

Step 3: Initialize WebDriver with webdriver-manager

This step initializes the Chrome WebDriver using webdriver-manager package. It automatically downloads and configures the correct version of ChromeDriver, avoiding manual setup. The driver is launched with previously defined Chrome options.

Python
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=chrome_options)

Step 4: Open the Target Webpage

In this step, the browser is directed to the target URL. Selenium opens the webpage, allowing dynamic content to load for further processing.

Python
url = "https://www.naukrigulf.com/top-jobs-by-designation"
driver.get(url)

It navigates to the Naukri Gulf page.

Step 5: Wait for Dynamic Content to Load

Before trying to extract data, the script should wait for the webpage to fully load the job links (which appear using JavaScript). This step uses a wait command to pause until those elements are ready.

Python
WebDriverWait(driver, 30).until(
        EC.presence_of_element_located((By.CLASS_NAME, "soft-link"))
    )

Step 6: Get and Parse the Page Source

Once the page is fully loaded, the script grabs the complete HTML content. Then, BeautifulSoup is used to parse this HTML so we can easily search and extract specific elements.

Python
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")

Step 7: Extract Top 10 Job Profiles

Now that the HTML is parsed, the script searches for all job profile links using their class name. It then prints the top 10 job titles from the list.

Python
job_profiles_section = soup.find_all('a', class_='soft-link darker')

print("Top Job Profiles:")
for i, job in enumerate(job_profiles_section[:10], start=1):
    print(f"{i}. {job.text.strip()}")

Step 8: Close the WebDriver

After the scraping is complete, it's important to close the browser properly. This step shuts down the WebDriver to free up system resources.

Python
driver.quit()

Output

Output
Output

Next Article
Article Tags :
Practice Tags :

Similar Reads