0% found this document useful (0 votes)
28 views7 pages

SIH2024_1676_Web-scrapping tool

The document outlines a proposal for a web-scraping tool aimed at identifying and reporting critical and high severity vulnerabilities in OEM equipment. The tool will utilize Python and open-source libraries to provide real-time alerts and automate reporting, enhancing cybersecurity for critical sector organizations. It addresses potential challenges such as website structure changes and legal concerns while emphasizing the importance of timely vulnerability information.

Uploaded by

sarkargayetri91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views7 pages

SIH2024_1676_Web-scrapping tool

The document outlines a proposal for a web-scraping tool aimed at identifying and reporting critical and high severity vulnerabilities in OEM equipment. The tool will utilize Python and open-source libraries to provide real-time alerts and automate reporting, enhancing cybersecurity for critical sector organizations. It addresses potential challenges such as website structure changes and legal concerns while emphasizing the importance of timely vulnerability information.

Uploaded by

sarkargayetri91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

SMART INDIA HACKATHON 2024

TITLE PAGE

• Problem Statement ID – 1676


• Problem Statement Title- Web-scrapping tool to be developed to search
and report Critical and High Severity Vulnerabilities of OEM equipment
(IT and OT) published at respective OEM websites and other relevant
web platforms.
• Theme- Blockchain & Cybersecurity
• PS Category- Software
• Team ID- 336
• Team Name (Registered on portal)
Blue-Hat
IDEA TITLE
Detailed Explanation of the Proposed Solution How It Addresses the Problem
• Tool Development: • Real-Time Alerts:
• Build a web-scraping script using open- • Provides immediate notifications of critical vulnerabilities.
source tools (e.g., BeautifulSoup, Scrapy). • Direct Monitoring:
• Data Sources: • Reduces time lag by fetching data directly from OEM sources.
• Monitor OEM websites and relevant • Timely Information:
platforms for vulnerability information. • Ensures critical sector organizations are updated promptly.
• Data Extraction:
• Extract details on Critical and High Severity
vulnerabilities. Innovation and Uniqueness of the Solution
• Reporting Mechanism: • Real-Time Scraping:
• Automate email alerts with extracted • First-hand data extraction from multiple sources.
vulnerability data. • Adaptable Parsing:
• Handles diverse data formats across OEM websites.
• Automated Reporting:
• Streamlines alert process with predefined email notifications.

@SIH Idea submission- Template 2


Blue-Hat TECHNICAL APPROACH
Methodology and Process for Implementation
Technologies to Be Used
• Programming Languages:
• Requirements Gathering:
• Python (for scripting and web
• Identify target OEM websites.
scraping) • Define data extraction needs.
• Frameworks and Libraries:
• Tool Development:
• BeautifulSoup (for HTML parsing)
• Scraping Script: Use Python with BeautifulSoup and Scrapy.
• Scrapy (for web scraping)
• Data Handling: Process and clean the data.
• Requests (for HTTP requests)
• Reporting Module: Set up email notifications with extracted details.
• Pandas (for data manipulation, if
• Testing and Validation:
needed) • Unit Testing: Test each component separately.
• Hardware:
• Integration Testing: Check full process from scraping to email.
• Standard server or cloud-based
• Deployment:
infrastructure for running the script • Deploy on a server or cloud.
• Schedule regular runs for continuous monitoring.
• Maintenance:
• Monitor performance.
• Update the script for website changes.
@SIH Idea submission- Template 3
Blue-Hat FEASIBILITY AND VIABILITY

Feasibility: Potential Challenges and Risks


• Technology Use: Python and open-source • Website Changes:
libraries are well-suited for web scraping • Websites may update their structure,
and data handling. breaking the scraper.
• Data Sources: OEM websites and relevant • Data Format Variability:
platforms are accessible for scraping. • Different OEMs use various data formats
• Infrastructure: Easily deployable on and syntax.
standard servers or cloud services. • Legal and Ethical Concerns:
• Scalability: Can handle multiple websites • Scraping may raise legal issues or violate
and large volumes of data. terms of service.

@SIH Idea submission- Template 4


Blue-Hat IMPACT AND BENEFITS

Potential Impact on the Target Audience Benefits of the Solution


• Social:
• Enhanced Security: • Protects critical sector organizations
• Provides timely alerts on critical from cyber threats, safeguarding
public services.
vulnerabilities, improving security • Economic:
posture. • Reduces costs associated with data
breaches and downtime.
• Reduced Risk: • Environmental:
• Minimizes potential damage from • Indirect benefits through improved
operational efficiency and reduced
unpatched vulnerabilities. reliance on reactive measures.

@SIH Idea submission- Template 5


Blue-Hat RESEARCH AND REFERENCES

Research and References

• Web Scraping Techniques:


• BeautifulSoup Documentation: BeautifulSoup
• Scrapy Documentation: Scrapy
• Vulnerability Reporting:
• Common Vulnerabilities and Exposures (CVE): CVE Details
• National Vulnerability Database (NVD): NVD
• Data Handling and Processing:
• Pandas Documentation: Pandas
• Legal and Ethical Considerations:
• Web Scraping Ethics and Legality: Legalities of Web Scraping

@SIH Idea submission- Template 6


IMPORTANT INSTRUCTIONS
Please ensure below pointers are met while submitting the Idea PPT:

1. Kindly keep the maximum slides limit up to six (6). ( Including the title slide)
2. Try to avoid paragraphs and post your idea in points /diagrams / Infographics /pictures
3. Keep your explanation precise and easy to understand
4. Idea should be unique and novel.
5. You can only use provided template for making the PPT without changing the idea details pointers
(mentioned in previous slides).
6. You need to save the file in PDF and upload the same on portal. No PPT, Word Doc or any other
format will be supported.

Note - You can delete this slide (Important Pointers) when you upload the details of your idea on SIH
portal.

@SIH Idea submission- Template 7

You might also like