0% found this document useful (0 votes)
10 views

Algorithm

Uploaded by

danishahmd212
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Algorithm

Uploaded by

danishahmd212
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

WEB SCRAPING

GROUP No: 23
DHANISH AHAMMED MV (MS21218)
DILSHA NAHAN (MS21053)

1.Name of the institute: Indian Institute of Management (IIM) Calcutta.


url of the faculty page: https://www.iimcal.ac.in/faculty/faculty-directory

Algorithm of the code:


1. Import necessary libraries:
import requests
from bs4 import BeautifulSoup.
import sqlite3
2. Connect to the SQLite database 'faculty_data_iimc.db' and create a cursor for executing SQL
commands.
3. Create a table 'faculty' in the database to store faculty information, including columns for 'id,'
'name,' 'academic_group,' 'email,' and 'phone_no.'
4. Make an HTTP GET request to the IIM Calcutta faculty directory URL and parse the HTML
content using BeautifulSoup.
5. Find all elements with the class "views-field-field-name" in the HTML content.
6. Iterate through the faculty box to extract and format the information:
Extract the professor's name.
Extract the professor's academic group (if available).
Extract the professor's email (if available).
Extract the professor's phone number (if available).
7. Check if the professor's details already exist in the 'faculty' table by executing a SQL query.
8. If the professor's details exist in the table, update the existing record with the new academic
group, email, and phone number.
9. If the professor's name doesn't exist in the table, insert a new record with the name, academic
group, email, and phone number.
10. Commit the changes to the database after each faculty member is processed to ensure data
integrity.
11. Retrieve all the data from the 'faculty' table using a SQL SELECT query and fetch all records.
12. Display the faculty data in the form of an SQLite table by iterating through the fetched data and
printing each row.
13. Close the connection to the SQLite database to ensure data is saved and resources are released.
2. Name of the institute: Indian Institute of Technology Hyderabad.
URL of the faculty page= https://www.iith.ac.in/people/faculty/

Algorithm for the code:

1. Import the necessary libraries:


import requests
from bs4 Import BeautifulSoup
import sqlite3
2. Initialize the SQLite database:
Create a connection to the 'faculty_info.db' database or create a new one if it doesn't
exist.
Create a cursor for executing SQL commands.
3. Create a 'faculty' table in the database:
Define the table structure with columns: 'id' (INTEGER PRIMARY KEY), 'name' (TEXT),
'designation' (TEXT), 'qualification' (TEXT), and 'research_interests' (TEXT).
4. Commit the table creation to the database.
5. Make an HTTP GET request to the IIT Hyderabad faculty directory URL.
6. Parse the HTML content of the webpage using BeautifulSoup.
7. Find all faculty cards with the class "facultycard" in the HTML.
8. Iterate through faculty cards to extract and format the information for each faculty member:
Extract the professor's name
Extract designation.
Extract educational qualification if available.
Extract research interests.

9. Check if the faculty member's information already exists in the 'faculty' table:
Execute a SQL query to find a matching record.
10. Update or insert faculty information into the 'faculty' table:
If the record already exists, update it with the new designation, qualification, and research
interests.
If the record doesn't exist, insert a new record with the extracted information.
11. Commit the changes to the database after processing each faculty member.
12. Retrieve all faculty data from the 'faculty' table using a SQL SELECT query.
13. Display the faculty data as an SQLite table:
Iterate through the fetched data and print each row.
14. Close the connection to the SQLite database to save the data and release resources.
3.Name of the institute: Indian Institute of Technology Palakkad.
URL: https://iitpkd.ac.in/faculty
Algorithm of the code:
1. Import Necessary Libraries:
Import requests
From bs4 import BeautifulSoup
import sqlite3
2. Initialize SQLite Database:
Connect to an existing 'faculty_data_new.db' database or create a new one if it
doesn't exist.
Create a cursor for executing SQL commands.

3. Create 'faculty' Table:


Define the structure of the 'faculty' table in the database with the following columns:
'id' (INTEGER PRIMARY KEY), 'name' (TEXT), and 'department' (TEXT).

4. Commit Table Creation:


Ensure that the 'faculty' table is created and committed to the database.
5. HTTP Request and Parsing:
Make an HTTP GET request to the IIT Palakkad faculty directory URL.
Parse the HTML content of the webpage using BeautifulSoup.
6. Extract Faculty Name and Department:
Find all HTML elements with the class "views-field-field-full-name" in the HTML,
representing faculty members.

7. Iterate through the faculty names:


Extract the professor's name.
Extract the professor's department (if available).

8. Check for Existing Faculty Record:


Check if the faculty member's information already exists in the 'faculty' table by
executing a SQL query to find a matching record.
9. Update or Insert Faculty Information:
If the faculty member's record already exists in the table, update it with the new
department.
If the faculty member's record doesn't exist in the table, insert a new record with the
extracted information.
10. Commit Database Changes:
Ensure that the database is committed after processing each faculty member to save the
changes.
11. Retrieve Faculty Data:
Execute a SQL SELECT query to retrieve all faculty data from the 'faculty' table.
12. Display Faculty Data:
Iterate through the fetched data and print each faculty member's information as an SQLite
table.
13. Close the Database Connection:
Close the connection to the SQLite database to save the data and release resources.
This algorithm outlines the key steps performed by the provided code, which involves
web scraping faculty information and storing it in an SQLite database.

4.Name of the institute: Indian Institute of Science Education and Research Trivandrum.
URL: https://www.iisertvm.ac.in/faculty

Algorithm:
1. Import Necessary Libraries:
requests
BeautifulSoup from bs4
sqlite3
2. Initialize SQLite Database:
Connect to an existing 'faculty_data.db' database or create a new one if it doesn't exist.
Create a cursor for executing SQL commands.
3. Drop Existing 'faculty' Table:
Check if the 'faculty' table exists and drop it if it does.
4. Create a New 'faculty' Table:
Define the structure of the 'faculty' table in the database with the following columns: 'id'
(INTEGER PRIMARY KEY AUTOINCREMENT), 'name' (TEXT), 'designation' (TEXT), and 'research_interest'
(TEXT).
5. Commit Table Creation:
Ensure that the new 'faculty' table is created and committed to the database.
6. HTTP Request and Parsing:
Make an HTTP GET request to the IISER Thiruvananthapuram faculty directory URL.
Parse the HTML content of the webpage using BeautifulSoup.
7. Extract Faculty Information:
Find all HTML elements with the class "faculty_inner_wrapper" in the HTML, representing
faculty members.
8. Iterate through the faculty items and extract the following information for each faculty member:
Name
Designation
Research interests
9. Check for Existing Faculty Record:
Check if the faculty member's information already exists in the 'faculty' table by executing
a SQL query to find a matching record.
10. Update or Insert Faculty Information:
If the faculty member's record already exists in the table, update it with the new
designation and research interests.
If the faculty member's record doesn't exist in the table, insert a new record with the
extracted information.
11. Commit Database Changes:
Ensure that the database is committed after processing each faculty member to save the
changes.
12. Retrieve Faculty Data:
Execute a SQL SELECT query to retrieve all faculty data from the 'faculty' table.
13. Display Faculty Data:
Iterate through the fetched data and print each faculty member's information as an SQLite
table.
14. Close the Database Connection:
Close the connection to the SQLite database to save the data and release resources.

You might also like