0% found this document useful (0 votes)
33 views

11

COde

Uploaded by

Rithvik Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

11

COde

Uploaded by

Rithvik Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Please explain the error:

SessionNotCreatedException: Message: session not created: probably user data


directory is already in use, please specify a unique value for --user-data-dir
argument, or don't use --user-data-dir
Stacktrace:
# 0 0x5c5ca68ead0a <unknown>
# 1 0x5c5ca639b5f0 <unknown>
# 2 0x5c5ca63d50db <unknown>
# 3 0x5c5ca63d0e2f <unknown>
# 4 0x5c5ca6420f9e <unknown>
# 5 0x5c5ca64204c6 <unknown>
# 6 0x5c5ca6412433 <unknown>
# 7 0x5c5ca63deea3 <unknown>
# 8 0x5c5ca63dfb01 <unknown>
# 9 0x5c5ca68afb5b <unknown>
# 10 0x5c5ca68b3a41 <unknown>
# 11 0x5c5ca6896c52 <unknown>
# 12 0x5c5ca68b45b4 <unknown>
# 13 0x5c5ca687af0f <unknown>
# 14 0x5c5ca68d8db8 <unknown>
# 15 0x5c5ca68d8f96 <unknown>
# 16 0x5c5ca68e9b56 <unknown>
# 17 0x7c40b978cac3 <unknown>
Hypothesis
The error message "session not created: probably user data directory is already in
use, please specify a unique value for --user-data-dir argument, or don't use --
user-data-dir" indicates that the specified user data directory /tmp/chrome_profile
is likely being used by another Chrome instance or a previous run of your script
that didn't terminate properly.

This directory stores browsing data, including cookies, cache, and user profiles,
and Chrome doesn't allow multiple instances to use the same directory
simultaneously to avoid data corruption.

Suggested Changes
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import pandas as pd
from tqdm import tqdm
import random
import os

# Configure Chrome options


chrome_options = Options()
# Use a unique user data directory
user_data_dir = f"/tmp/chrome_profile_{os.getpid()}" # create unique directory
using process ID
chrome_options.add_argument(f"--user-data-dir={user_data_dir}")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")

# Initialize WebDriver with proper service


try:
service = Service()
driver = webdriver.Chrome(service=service, options=chrome_options)
except Exception as e:
print(f"Error initializing WebDriver: {e}")
raise

base_url = "https://incometaxindia.gov.in/Pages/utilities/exempted-
institutions.aspx"
all_records = []
search_chars = [chr(i) for i in range(ord('a'), ord('z')+1)]

try:
for char in tqdm(search_chars, desc="Processing characters"):
driver.get(base_url)
time.sleep(random.uniform(1, 3)) # Random delay

try:
name_field = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH,
"//input[@placeholder='Name']"))
)
name_field.send_keys(char)

search_button = driver.find_element(By.XPATH,
"//input[@value='Search']")
search_button.click()
time.sleep(3)

while True:
soup = BeautifulSoup(driver.page_source, 'html.parser')
records = soup.find_all('div', class_='record-item') # Adjust
class

for record in records:


record_data = {}
# Extract data here (adjust selectors)

all_records.append(record_data)

try:
next_button = driver.find_element(By.XPATH,
"//a[contains(text(), 'Next')]")
if 'disabled' in next_button.get_attribute('class'):
break
next_button.click()
time.sleep(random.uniform(2, 4))
except:
break

except Exception as e:
print(f"Error processing character {char}: {e}")
continue

finally:
# Ensure driver quits even if error occurs
driver.quit()

# Save data
df = pd.DataFrame(all_records)
df.to_csv('exempted_institutions.csv', index=False)
print("Scraping completed. Data saved.")
Use code with caution
Explanation of Changes:

Import os: Added import os to use the os.getpid() function.


Unique User Data Directory:
Instead of a fixed path /tmp/chrome_profile, the code now generates a unique
directory name using the current process ID (os.getpid()).
This ensures that each run of the script has its own dedicated user data directory,
preventing conflicts.
The line user_data_dir = f"/tmp/chrome_profile_{os.getpid()}" creates this unique
directory path.
Updated Chrome Options:
The chrome_options.add_argument() line is updated to use the new user_data_dir
variable.
Sources
Rate this answer
Enter a prompt here
0 / 2000
0 / 2000
Gemini can make mistakes so double-check responses and use code with caution. Learn
more

0s
completed at 12:03 PM

You might also like