Web Scraping - Amazon Customer Reviews Last Updated : 08 Sep, 2021 Comments Improve Suggest changes Like Article Like Report In this article, we are going to see how we can scrape the amazon customer review using Beautiful Soup in Python. Module neededbs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal. pip install bs4 requests: Request allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal. pip install requests To begin with web scraping, we first have to do some setup. Import all the required modules. Get the cookies data for making the request to amazon, without this you can not able to scrape. Create a header that contains your request cookies, without cookies you can not scrape amazon data it always shows some error. This website will provide you, specific user agent. Pass the URL in the getdata() function(User Defined Function) to that will request to a URL, it returns a response. We are using get method to retrieve information from the given server using a given URL. Syntax: requests.get(url, args) Convert that data into HTML code and then Parse the HTML content using bs4. Syntax: soup = BeautifulSoup(r.content, 'html5lib') Parameters: r.content : It is the raw HTML content.html.parser : Specifying the HTML parser we want to use. Now filter the required data using soup.Find_all function. Program: Python3 # import module import requests from bs4 import BeautifulSoup HEADERS = ({'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) \ AppleWebKit/537.36 (KHTML, like Gecko) \ Chrome/90.0.4430.212 Safari/537.36', 'Accept-Language': 'en-US, en;q=0.5'}) # user define function # Scrape the data def getdata(url): r = requests.get(url, headers=HEADERS) return r.text def html_code(url): # pass the url # into getdata function htmldata = getdata(url) soup = BeautifulSoup(htmldata, 'html.parser') # display html code return (soup) url = "https://www.amazon.in/Columbia-Mens-wind-\ resistant-Glove/dp/B0772WVHPS/?_encoding=UTF8&pd_rd\ _w=d9RS9&pf_rd_p=3d2ae0df-d986-4d1d-8c95-aa25d2ade606&pf\ _rd_r=7MP3ZDYBBV88PYJ7KEMJ&pd_rd_r=550bec4d-5268-41d5-\ 87cb-8af40554a01e&pd_rd_wg=oy8v8&ref_=pd_gw_cr_cartx&th=1" soup = html_code(url) print(soup) Output: Note: This is only HTML code or Raw data. Now since the core setup is done let us see how scraping for a specific requirement can be done. Scrape customer name Now find the customer list with span tag where class_ = a-profile-name. You can open the webpage in the browser and inspect the relevant element by pressing right-click as shown in the figure. You have to pass the tag name and attribute with its corresponding value to the find_all() function. Code: Python def cus_data(soup): # find the Html tag # with find() # and convert into string data_str = "" cus_list = [] for item in soup.find_all("span", class_="a-profile-name"): data_str = data_str + item.get_text() cus_list.append(data_str) data_str = "" return cus_list cus_res = cus_data(soup) print(cus_res) Output: ['Amaze', 'Robert', 'D. Kong', 'Alexey', 'Charl', 'RBostillo'] Scrape user review: Now find the customer review as same above methods. Find the unique class name with a specific tag, here we use div tag. Code: Python3 def cus_rev(soup): # find the Html tag # with find() # and convert into string data_str = "" for item in soup.find_all("div", class_="a-expander-content \ reviewText review-text-content a-expander-partial-collapse-content"): data_str = data_str + item.get_text() result = data_str.split("\n") return (result) rev_data = cus_rev(soup) rev_result = [] for i in rev_data: if i is "": pass else: rev_result.append(i) rev_result Output: Scraping Production information Here we will scrape product information like product name, ASIN number, Weight, dimension. By doing this we will use the span tag and with a specific unique class name. Code: Python3 def product_info(soup): # find the Html tag # with find() # and convert into string data_str = "" pro_info = [] for item in soup.find_all("ul", class_="a-unordered-list a-nostyle\ a-vertical a-spacing-none detail-bullet-list"): data_str = data_str + item.get_text() pro_info.append(data_str.split("\n")) data_str = "" return pro_info pro_result = product_info(soup) # Filter the required data for item in pro_result: for j in item: if j is "": pass else: print(j) Output: Scraping Review Image: Here we will extract the image link from the review of the product using the same as the above methods. The tag name and attribute of the tag is passed to findAll() as above. Code: Python3 def rev_img(soup): # find the Html tag # with find() # and convert into string data_str = "" cus_list = [] images = [] for img in soup.findAll('img', class_="cr-lightbox-image-thumbnail"): images.append(img.get('src')) return images img_result = rev_img(soup) img_result Output: Saving details into CSV file: Here we will save the details into the CSV file, We will convert the data into dataframe and then export it into the CSV, Let us see how to export a Pandas DataFrame to a CSV file. We will be using the to_csv() function to save a DataFrame as a CSV file. Syntax : to_csv(parameters)Parameters : path_or_buf : File path or object, if None is provided the result is returned as a string. Code: Python3 import pandas as pd # initialise data of lists. data = {'Name': cus_res, 'review': rev_result} # Create DataFrame df = pd.DataFrame(data) # Save the output. df.to_csv('amazon_review.csv') Output: Comment More infoAdvertise with us K kumar_satyam Follow Improve Article Tags : Python Python-requests Web-scraping Python BeautifulSoup Practice Tags : python Similar Reads Python Requests Python Requests Library is a simple and powerful tool to send HTTP requests and interact with web resources. It allows you to easily send GET, POST, PUT, DELETE, PATCH, HEAD requests to web servers, handle responses, and work with REST APIs and web scraping tasks.Features of Python Requests LibraryS 5 min read Getting Started with python-requestsWhat is Web Scraping and How to Use It?Suppose you want some information from a website. Letâs say a paragraph on Donald Trump! What do you do? Well, you can copy and paste the information from Wikipedia into your file. But what if you want to get large amounts of information from a website as quickly as possible? Such as large amounts o 7 min read How to Install Requests in Python - For Windows, Linux, MacRequests is an elegant and simple HTTP library for Python, built for human beings. One of the most famous libraries for Python is used by developers all over the world. This article revolves around how one can install the requests library of Python in Windows/ Linux/ macOS using pip.Table of Content 7 min read HTTP Request MethodsGET method - Python requestsRequests library is one of the important aspects of Python for making HTTP requests to a specified URL. This article revolves around how one can make GET request to a specified URL using requests.GET() method. Before checking out GET method, let's figure out what a GET request is - GET Http Method T 2 min read POST method - Python requestsRequests library is one of the important aspects of Python for making HTTP requests to a specified URL. This article revolves around how one can make POST request to a specified URL using requests.post() method. Before checking out the POST method, let's figure out what a POST request is -  POST Ht 2 min read PUT method - Python requestsThe requests library is a powerful and user-friendly tool in Python for making HTTP requests. The PUT method is one of the key HTTP request methods used to update or create a resource at a specific URI.Working of HTTP PUT Method If the resource exists at the given URI, it is updated with the new dat 2 min read DELETE method- Python requestsRequests library is one of the important aspects of Python for making HTTP requests to a specified URL. This article revolves around how one can make DELETE request to a specified URL using requests.delete() method. Before checking out the DELETE method, let's figure out what a Http DELETE request i 2 min read HEAD method - Python requestsRequests library is one of the important aspects of Python for making HTTP requests to a specified URL. This article revolves around how one can make HEAD request to a specified URL using requests.head() method. Before checking out the HEAD method, let's figure out what a Http HEAD request is - HEAD 2 min read PATCH method - Python requestsRequests library is one of the important aspects of Python for making HTTP requests to a specified URL. This article revolves around how one can make PATCH request to a specified URL using requests.patch() method. Before checking out the PATCH method, let's figure out what a Http PATCH request is - 3 min read Response Methodsresponse.headers - Python requestsThe response.headers object in Python's requests library functions as a special dictionary that contains extra information provided by the server when we make an HTTP request. It stores metadata like content type, server details and other headers, such as cookies or authorization tokens. The keys in 3 min read response.encoding - Python requestsPython requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would be used to access certain features such as content, headers, etc. This article revolves 2 min read response.elapsed - Python requestsPython requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would be used to access certain features such as content, headers, etc. This article revolves 2 min read response.close() - Python requestsPython requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would be used to access certain features such as content, headers, etc. This article revolves 2 min read response.content - Python requestsWhen you make an HTTP request in Python using the requests library, it returns a response object. One of the most important attributes of this object is response.content, which gives you the raw response body in bytes. This is especially useful when dealing with binary data like images, PDFs, audio 1 min read response.cookies - Python requestsPython requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would be used to access certain features such as content, headers, etc. This article revolves 2 min read response.history - Python requestsPython requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would be used to access certain features such as content, headers, etc. This article revolves 2 min read response.is_permanent_redirect - Python requestsPython requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would be used to access certain features such as content, headers, etc. This article revolves 2 min read response.is_redirect - Python requestsPython requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would be used to access certain features such as content, headers, etc. This article revolves 2 min read response.iter_content() - Python requestsresponse.iter_content() iterates over the response.content. Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would be used to access certain fe 2 min read response.json() - Python requestsPython requests are generally used to fetch the content from a particular resource URL. Whenever we make a request to a specified URL through Python, it returns a response object. Now, this response object would be used to access certain features such as content, headers, etc. This article revolves 3 min read response.url - Python requestsresponse.url returns the URL of the response. It will show the main url which has returned the content, after all redirections, if done. Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a 2 min read response.text - Python requestsIn Pythonâs requests library, the response.text attribute allows developers to access the content of the response returned by an HTTP request. This content is always returned as a Unicode string, making it easy to read and manipulate. Whether the response body contains HTML, JSON, XML, or plain text 3 min read response.status_code - Python requestsresponse.status_code returns a number that indicates the status (200 is OK, 404 is Not Found). Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object 2 min read response.request - Python requestsresponse.request returns the request object that requested this response. Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would be used to acc 2 min read response.reason - Python requestsresponse.reason returns a text corresponding to the status code. for example, OK for 200, Not Found for 404. Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this r 2 min read response.raise_for_status() - Python requestsWe are given a scenario where we use the Python requests library to make HTTP calls, and we want to check if any error occurred during the request. This can be done using the raise_for_status() method on the response object. For example, if we request a page that doesn't exist, this method will rais 3 min read response.ok - Python requestsresponse.ok returns True if status_code is less than 400, otherwise False. Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response object would be used to ac 2 min read response.links - Python requestsresponse.links returns the header links. To know more about Http Headers, visit - Http Headers. Python requests are generally used to fetch the content from a particular resource URI. Whenever we make a request to a specified URI through Python, it returns a response object. Now, this response objec 2 min read Convert JSON data Into a Custom Python Object In Python, converting JSON data into a custom object is known as decoding or deserializing JSON data. We can easily convert JSON data into a custom object by using the json.loads() or json.load() methods. The key is the object_hook parameter, which allows us to define how the JSON data should be con 2 min read Authentication using Python requests Authentication refers to giving a user permissions to access a particular resource. Since, everyone can't be allowed to access data from every URL, one would require authentication primarily. To achieve this authentication, typically one provides authentication data through Authorization header or a 2 min read SSL Certificate Verification - Python requests Requests verifies SSL certificates for HTTPS requests, just like a web browser. SSL Certificates are small data files that digitally bind a cryptographic key to an organization's details. Often, a website with a SSL certificate is termed as secure website. By default, SSL verification is enabled, an 2 min read Exception Handling Of Python Requests Module Python's requests module is a simple way to make HTTP requests. In this article, weâll use the GET method to fetch data from a server and handle errors using try and except. This will help us understand how to manage situations where the request fails or returns an error."url: Returns the URL of the 3 min read Memory Leak in Python requests When a programmer forgets to clear a memory allocated in heap memory, the memory leak occurs. It's a type of resource leak or wastage. When there is a memory leak in the application, the memory of the machine gets filled and slows down the performance of the machine. This is a serious issue while bu 5 min read ProjectsHow to get the Daily News using PythonIn this article, we are going to see how to get daily news using Python. Here we will use Beautiful Soup and the request module to scrape the data. Modules neededbs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. T 3 min read How to Build Web scraping bot in PythonIn this article, we are going to see how to build a web scraping bot in Python. Web Scraping is a process of extracting data from websites. A Bot is a piece of code that will automate our task. Therefore, A web scraping bot is a program that will automatically scrape a website for data, based on our 8 min read Send SMS with REST Using PythonIn this article, we are going to see how we can send SMS with REST using Python. The requests library can be used to make REST requests using Python to send SMS. Approach:You need to first create a REST API KEY for sending SMS using Python Script. We have used Fast2SMS for creating API KEY.You can 2 min read How to check horoscope using Python ?In this article, we are going to see how to get a horoscope a day before, on that day as well as the day after using Beautifulsoup. Module needed:bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this t 4 min read Web Scraping - Amazon Customer ReviewsIn this article, we are going to see how we can scrape the amazon customer review using Beautiful Soup in Python. Module neededbs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below comma 5 min read Like