0% found this document useful (0 votes)
8 views14 pages

Pickles Script Untested With Instructions

The document outlines a solution for scraping car listings from the Pickles Salvage website using a Python scraper that stores data in a MySQL database and downloads images. It includes a PHP interface for searching and sorting the listings, along with setup instructions for running the scraper manually or on a schedule. Additionally, it provides detailed steps for installing necessary software and configuring the database and scripts.

Uploaded by

Kidus Michael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views14 pages

Pickles Script Untested With Instructions

The document outlines a solution for scraping car listings from the Pickles Salvage website using a Python scraper that stores data in a MySQL database and downloads images. It includes a PHP interface for searching and sorting the listings, along with setup instructions for running the scraper manually or on a schedule. Additionally, it provides detailed steps for installing necessary software and configuring the database and scripts.

Uploaded by

Kidus Michael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Below is a stand-alone solution consisting of:

1.​ A Python scraper that downloads and parses car listings from the Pickles Salvage
website,​

○​ Stores the results in a MySQL database


○​ Downloads listing photos into a local folder
2.​ A PHP search interface that reads from the same MySQL database and allows you to
search and sort by fields.​

3.​ Instructions on how to set up the system and run it daily at 1 am AEST (or any chosen
time).​

1. Database Setup
Create a new MySQL database (e.g., car_listings) and a table (listings) with the
following schema:

Unset
CREATE DATABASE car_listings;
USE car_listings;

CREATE TABLE listings (


id INT AUTO_INCREMENT PRIMARY KEY,
year INT,
brand VARCHAR(255),
model VARCHAR(255),
vehicle_type VARCHAR(255),
color VARCHAR(255),
wovr_status VARCHAR(255),
incident_type VARCHAR(255),
compliance_date VARCHAR(255),
transmission VARCHAR(255),
drive_type VARCHAR(255),
engine_capacity VARCHAR(255),
fuel VARCHAR(255),
cylinders VARCHAR(255),
registration VARCHAR(255),
doors VARCHAR(255),
vin VARCHAR(255),
damage_details TEXT,
views INT,
watchers INT,
location VARCHAR(255),
start_date DATETIME,
stock VARCHAR(255) UNIQUE,
price DECIMAL(10, 2),
total_times_listed INT,
added_date DATETIME,
url TEXT
);

Note: If you want to call your table something else or add indexing, feel free to
modify the schema as needed.

2. Python Scraper
2.1 Requirements

Python Dependencies:

●​ requests
●​ beautifulsoup4
●​ pymysql

Install them via pip:

Unset
pip install requests beautifulsoup4 pymysql

2.2 Python Script


Save this script as scrape_pickles.py (or any name you prefer). Update the database
credentials (user, password, database) to match your MySQL setup.

Python
import os
import requests
from bs4 import BeautifulSoup
import pymysql
from datetime import datetime

# Update these to match your MySQL credentials


db = pymysql.connect(
host="localhost",
user="your_user",
password="your_password",
database="car_listings",
charset="utf8mb4"
)

PHOTO_DIR = "car_photos"
os.makedirs(PHOTO_DIR, exist_ok=True)

def save_to_db(data):
with db.cursor() as cursor:
sql = """
INSERT INTO listings (
year, brand, model, vehicle_type, color, wovr_status,
incident_type,
compliance_date, transmission, drive_type,
engine_capacity, fuel,
cylinders, registration, doors, vin, damage_details,
views, watchers,
location, start_date, stock, price,
total_times_listed, added_date, url
) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s,
%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
ON DUPLICATE KEY UPDATE added_date = VALUES(added_date)
"""
cursor.execute(sql, (
data["year"], data["brand"], data["model"],
data["vehicle_type"], data["color"],
data["wovr_status"], data["incident_type"],
data["compliance_date"], data["transmission"],
data["drive_type"], data["engine_capacity"],
data["fuel"], data["cylinders"],
data["registration"], data["doors"], data["vin"],
data["damage_details"],
data["views"], data["watchers"], data["location"],
data["start_date"],
data["stock"], data["price"],
data["total_times_listed"], data["added_date"],
data["url"]
))
db.commit()

def download_photos(photos, stock):


stock_dir = os.path.join(PHOTO_DIR, stock)
os.makedirs(stock_dir, exist_ok=True)

for idx, photo_url in enumerate(photos):


response = requests.get(photo_url)
if response.status_code == 200:
photo_path = os.path.join(stock_dir, f"photo_{idx +
1}.jpg")
with open(photo_path, "wb") as f:
f.write(response.content)

def scrape_details(listing_url):
response = requests.get(listing_url)
soup = BeautifulSoup(response.text, "html.parser")

data = {}
data["url"] = listing_url
data["added_date"] = datetime.now().strftime("%Y-%m-%d
%H:%M:%S")
data["price"] = None
data["total_times_listed"] = None

details = soup.find("div", class_="vehicle-details")


year_brand_model = details.find("h1").get_text(strip=True)
year, brand, model = year_brand_model.split(" ", 2)

data["year"] = year
data["brand"] = brand
data["model"] = model

data["vehicle_type"] = soup.find("li", text="Vehicle


Type:").find_next_sibling("li").get_text(strip=True)
data["color"] = soup.find("li",
text="Colour:").find_next_sibling("li").get_text(strip=True)
data["wovr_status"] = soup.find("li", text="WOVR
Status:").find_next_sibling("li").get_text(strip=True)
data["incident_type"] = soup.find("li", text="Incident
Type:").find_next_sibling("li").get_text(strip=True)
data["compliance_date"] = soup.find("li", text="Compliance
Date:").find_next_sibling("li").get_text(strip=True)
data["transmission"] = soup.find("li",
text="Transmission:").find_next_sibling("li").get_text(strip=True
)
data["drive_type"] = soup.find("li", text="Drive
Type:").find_next_sibling("li").get_text(strip=True)
data["engine_capacity"] = soup.find("li", text="Engine
Capacity:").find_next_sibling("li").get_text(strip=True)
data["fuel"] = soup.find("li",
text="Fuel:").find_next_sibling("li").get_text(strip=True)
data["cylinders"] = soup.find("li",
text="Cylinders:").find_next_sibling("li").get_text(strip=True)
data["registration"] = soup.find("li",
text="Registration:").find_next_sibling("li").get_text(strip=True
)
data["doors"] = soup.find("li", text="No. of
Doors:").find_next_sibling("li").get_text(strip=True)
data["vin"] = soup.find("li",
text="VIN:").find_next_sibling("li").get_text(strip=True)
data["damage_details"] = soup.find("li", text="Damage
Details:").find_next_sibling("li").get_text(strip=True)
data["views"] = soup.find("li",
text="Views:").find_next_sibling("li").get_text(strip=True)
data["watchers"] = soup.find("li",
text="Watchers:").find_next_sibling("li").get_text(strip=True)
data["location"] = soup.find("li",
text="Location:").find_next_sibling("li").get_text(strip=True)
data["start_date"] = soup.find("li",
text="Date:").find_next_sibling("li").get_text(strip=True)
data["stock"] = soup.find("li",
text="STOCK:").find_next_sibling("li").get_text(strip=True)

photos = [img["src"] for img in soup.find_all("img",


class_="vehicle-image")]
download_photos(photos, data["stock"])

return data

def scrape_pickles():
search_url =
"https://www.pickles.com.au/used/search/lob/salvage?search=m2%2Cm
3%2Cm4&page=1"
response = requests.get(search_url)
soup = BeautifulSoup(response.text, "html.parser")

listings = soup.find_all("a", class_="card-link")


for listing in listings:
listing_url = "https://www.pickles.com.au" +
listing["href"]
data = scrape_details(listing_url)
save_to_db(data)

if __name__ == "__main__":
scrape_pickles()

2.3 How to Run the Scraper

1.​ Manual Execution​

Unset
python scrape_pickles.py

2.​ ​
This will connect to your database, scrape the listings, and populate the listings
table.​

3.​ Scheduled Execution (Daily at 1 AM AEST)​

○​ Convert 1 AM AEST to UTC time (AEST is typically +10 hours from UTC, or +11
if daylight savings applies).
○​ Create a cron job (if on Linux/Unix). Example for 1 AM AEST = 15:00 UTC:

Unset
crontab -e

○​ Then add:

Unset
0 15 * * * /usr/bin/python /path/to/scrape_pickles.py
○​ Replace /usr/bin/python with the path to your Python executable and
/path/to/ with the location of the script.
4.​ On Windows, use Task Scheduler:​

○​ Action: Start a program


○​ Program/Script: python.exe
○​ Add arguments: C:\path\to\scrape_pickles.py

3. PHP Search/Sort Interface


This script provides a simple PHP page to display, search, and sort the data stored in the
listings table. Save it as index.php (or similar) and place it in a directory accessible by
your PHP-enabled web server. Update database credentials accordingly.

Unset
<?php
$host = "localhost";
$user = "your_user";
$password = "your_password";
$database = "car_listings";

$conn = new mysqli($host, $user, $password, $database);


if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}

$search = isset($_GET['search']) ?
$conn->real_escape_string($_GET['search']) : '';
$order_by = isset($_GET['order_by']) ?
$conn->real_escape_string($_GET['order_by']) : 'brand';
$order_dir = (isset($_GET['order_dir']) && $_GET['order_dir'] ===
'desc') ? 'DESC' : 'ASC';

$sql = "
SELECT *
FROM listings
WHERE
brand LIKE '%$search%' OR
model LIKE '%$search%' OR
vehicle_type LIKE '%$search%' OR
color LIKE '%$search%' OR
CAST(price AS CHAR) LIKE '%$search%' OR
CAST(year AS CHAR) LIKE '%$search%'
ORDER BY $order_by $order_dir
";
$result = $conn->query($sql);
?>

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Car Listings</title>
<style>
table {
border-collapse: collapse;
width: 90%;
margin: 20px auto;
}
th, td {
border: 1px solid #eaeaea;
padding: 10px;
text-align: left;
}
th {
background: #f2f2f2;
}
th a {
text-decoration: none;
color: #000;
}
.search-form {
text-align: center;
margin: 20px;
}
.search-form input[type="text"] {
padding: 5px;
}
.search-form button {
padding: 5px 10px;
}
h1 {
text-align: center;
}
</style>
</head>
<body>
<h1>Car Listings</h1>
<div class="search-form">
<form method="get">
<input type="text" name="search" placeholder="Search..."
value="<?php echo htmlspecialchars($search); ?>">
<button type="submit">Search</button>
</form>
</div>
<table>
<thead>
<tr>
<th>
<a href="?order_by=year&order_dir=<?php echo
($order_by === 'year' && $order_dir === 'ASC') ? 'desc' : 'asc';
?>">
Year
</a>
</th>
<th>
<a href="?order_by=brand&order_dir=<?php echo
($order_by === 'brand' && $order_dir === 'ASC') ? 'desc' : 'asc';
?>">
Brand
</a>
</th>
<th>
<a href="?order_by=model&order_dir=<?php echo
($order_by === 'model' && $order_dir === 'ASC') ? 'desc' : 'asc';
?>">
Model
</a>
</th>
<th>
<a href="?order_by=vehicle_type&order_dir=<?php
echo ($order_by === 'vehicle_type' && $order_dir === 'ASC') ?
'desc' : 'asc'; ?>">
Type
</a>
</th>
<th>
<a href="?order_by=color&order_dir=<?php echo
($order_by === 'color' && $order_dir === 'ASC') ? 'desc' : 'asc';
?>">
Color
</a>
</th>
<th>
<a href="?order_by=price&order_dir=<?php echo
($order_by === 'price' && $order_dir === 'ASC') ? 'desc' : 'asc';
?>">
Price
</a>
</th>
</tr>
</thead>
<tbody>
<?php if ($result && $result->num_rows > 0): ?>
<?php while ($row = $result->fetch_assoc()): ?>
<tr>
<td><?php echo htmlspecialchars($row['year']);
?></td>
<td><?php echo htmlspecialchars($row['brand']);
?></td>
<td><?php echo htmlspecialchars($row['model']);
?></td>
<td><?php echo
htmlspecialchars($row['vehicle_type']); ?></td>
<td><?php echo htmlspecialchars($row['color']);
?></td>
<td><?php echo htmlspecialchars($row['price']);
?></td>
</tr>
<?php endwhile; ?>
<?php else: ?>
<tr><td colspan="6" style="text-align:center;">No results
found</td></tr>
<?php endif; ?>
</tbody>
</table>
</body>
</html>

<?php
$conn->close();
?>

4. Installation & Usage Steps


1.​ Install MySQL (if not already installed) and create the car_listings database with
the provided schema.​

2.​ Install Python 3 (if not already installed).​

○​ Run pip install requests beautifulsoup4 pymysql to install


dependencies.
3.​ Clone or download the Python scraper script (scrape_pickles.py) into a suitable
directory, for instance:​

Unset
/home/user/scraper/scrape_pickles.py

4.​ ​
Update scrape_pickles.py with:​

○​ Your MySQL credentials (user, password, database).


5.​ Test the scraper manually:​

Unset
python scrape_pickles.py

6.​ ​
Check the listings table in your MySQL database to confirm data has been inserted.
Also, verify a car_photos/ folder was created and listing images were saved inside
subfolders named by their stock field.​

7.​ Set up scheduling (optional, for daily scraping at 1 AM AEST):​

○​ Linux/Unix:

Unset
crontab -e

○​ Add a line (15:00 UTC = 1 AM AEST for +10 hours offset):


Unset
0 15 * * * /usr/bin/python /home/user/scraper/scrape_pickles.py

○​ Windows: Use Task Scheduler to run the same command daily at your chosen
time.
8.​ Set up the PHP interface (optional):​

○​ Copy the index.php file into a directory served by your web server (e.g.,
Apache, Nginx).
○​ Update the $host, $user, $password, $database variables to match your
MySQL setup.
○​ Navigate to the PHP page (e.g., http://yourserver.com/index.php) to
view, search, and sort the listings.

5. Summary
You now have:

●​ A Python script to scrape data, store it in a MySQL database, and download listing
photos.
●​ A PHP interface to search and sort the scraped records.

Simply maintain or adjust scheduling as needed, ensure your database credentials are correct,
and your developer should be able to continue from this foundation.

Feel free to modify or extend:

●​ The scraper logic for additional fields.


●​ The PHP interface for filtering, pagination, or editing listings.
●​ Logging, error handling, and concurrency controls for production environments.

Good luck testing and running your new scraping + search system!

You might also like