0% found this document useful (0 votes)
94 views2 pages

Overview of Scrapy Framework

Scrapy is a Python-based web crawling framework used for web scraping. It was originally developed by Mydeco and is now maintained by Zyte. Scrapy uses "spiders" that follow a set of instructions to crawl websites in a reusable way. It provides features like throttling and rotating proxies to scrape websites undetected. Major companies that use Scrapy include Lyst, Parse.ly, and Sciences Po Medialab.

Uploaded by

katherine976
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views2 pages

Overview of Scrapy Framework

Scrapy is a Python-based web crawling framework used for web scraping. It was originally developed by Mydeco and is now maintained by Zyte. Scrapy uses "spiders" that follow a set of instructions to crawl websites in a reusable way. It provides features like throttling and rotating proxies to scrape websites undetected. Major companies that use Scrapy include Lyst, Parse.ly, and Sciences Po Medialab.

Uploaded by

katherine976
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Scrapy

Scrapy (/ˈskreɪpaɪ/[2] SKRAY-peye) is a free and open-source web-


Scrapy
crawling framework written in Python and developed in
Cambuslang. Originally designed for web scraping, it can also be
used to extract data using APIs or as a general-purpose web
crawler.[3] It is currently maintained by Zyte (formerly Developer(s) Zyte (formerly
Scrapinghub), a web-scraping development and services company. Scrapinghub)

Scrapy project architecture is built around "spiders", which are Initial release 26 June 2008
self-contained crawlers that are given a set of instructions. Stable release 2.9.0[1]  / 8
Following the spirit of other don't repeat yourself frameworks, May 2023
such as Django,[4] it makes it easier to build and scale large Repository [Link]
crawling projects by allowing developers to reuse their code.
/scrapy/scrapy
The Scrapy framework provides you with powerful features such ([Link]
as auto-throttle, rotating proxies and user-agents, allowing you com/scrapy/scr
scrape virtually undetected across the net. Scrapy also provides a apy)
web-crawling shell, which can be used by developers to test their
Written in Python
assumptions on a site’s behavior.[5]
Operating system Windows,
Some well-known companies and products using Scrapy are: macOS, Linux
Lyst,[6][7] [Link],[8] Sayone Technologies,[9] Sciences Po Type Web crawler
Medialab,[10] [Link]’s World Government Data site.[11]
License BSD License
Website [Link] (htt
History ps://[Link]
g) 
Scrapy was born at London-based web-aggregation and e-
commerce company Mydeco, where it was developed and
maintained by employees of Mydeco and Insophia (a web-consulting company based in Montevideo,
Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release
happening in June 2015.[12] In 2011, Zyte (formerly Scrapinghub) became the new official
maintainer.[13][14]

References
1. "Release 2.9.0" ([Link] 8 May 2023. Retrieved
31 May 2023.
2. Commit 975f150 ([Link]
e04433d9811dd)
3. Scrapy at a glance ([Link]
4. "Frequently Asked Questions" ([Link]
m-django). Frequently Asked Questions, Scrapy 2.8.0 documentation. Retrieved 28 July
2015.
5. "Scrapy shell" ([Link] Retrieved 28 July 2015.
6. Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning" ([Link]
[Link]/web/20160604082034/[Link]
Archived from the original ([Link] on 4 June
2016. Retrieved 28 July 2015.
7. Scrapy | Companies using Scrapy ([Link]
8. Montalenti, Andrew (October 27, 2012). "Web Crawling & Metadata Extraction in Python" (htt
ps://[Link]/amontalenti/web-crawling-and-metadata-extraction-in-python). Web
Crawling & Metadata Extraction in Python - Speaker Deck. Retrieved May 11, 2015.
9. "Scrapy Companies" ([Link] Scrapy | Companies using Scrapy.
10. Hyphe v0.0.0: the first release of our new webcrawler is out! ([Link]
[Link]/blog/hyphe-v0-0-0-the-first-release-of-our-new-webcrawler-is-out/)
11. Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr,
Haystack, Scrapy and other exciting buzzwords [Link]/5jU3La #opendata #datastore" (https://
[Link]/bfirsh/status/8025368963) (Tweet) – via Twitter.
12. Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!" ([Link]
rum/#!topic/scrapy-users/sMbBVIq0sko). scrapy-users (Mailing list).
13. Hoffman, Pablo (2013). List of the primary authors & contributors ([Link]
crapy/blob/master/AUTHORS). Retrieved 18 November 2013.
14. Interview Scraping Hub ([Link]
webcrawling/).

External links
Official website ([Link]
Scrapy Tutorial Series ([Link]

Retrieved from "[Link]

You might also like