Downloading Files from Web using Perl
Last Updated :
02 Feb, 2022
Perl is a multi-purpose interpreted language that is often implemented using Perl scripts that can be saved using the .pl extension and run directly using the terminal or command prompt. It is a stable, cross-platform language that was developed primarily with strong capabilities in terms of text manipulation and modifying, and extracting information from web pages. It is under active development and open source. It finds major use in web development, system administration, and even GUI development due to its capability of working with HTML, XML, and other mark-up languages. It is prominently used along with the Web as it can handle encrypted web data in addition to E-Commerce transactions.
In this article, we will be seeing different approaches to download web pages as well as images using Perl scripts.
Downloading Web Pages using Perl
Downloading a Web Page using the system command wget
In this approach, we write a sub routine where a URL is passed to a system command. The variable stores the content of the web page in the raw HTML form. We then return these contents.
Perl
#!usr/bin/perl
# using the strict pragma
use strict;
# using the warnings pragma
# to generate warnings in case of incorrect
# code
use warnings;
# specifying the Perl version
use 5.010;
# declaring the sub routine
sub getWebPage {
# variable to store the URL
my $url = 'http://www.google.com/';
# variable to store the contents of the
# web page
my $webpage = system
"wget --output-document=- $url";
# returning the contents of the web page
return $webpage;
}
# printing user friendly message
say "the contents of the downloaded web page : ";
# calling the sub routine
getWebPage();
Output:
the contents of the downloaded web page :
<raw HTML web page>
Downloading a Web Page using the system command curl
This approach is exactly the same as above, the only difference being that here the system command used is "curl" in place of "wget".
Perl
#!usr/bin/perl
# using the strict pragma
use strict;
# using the warnings pragma to
# generate warnings in case of
# erroneous code
use warnings;
# specifying the Perl version
use 5.010;
# declaring the sub routine
sub getWebPage {
# variable to store the URL
my $url = 'http://www.google.com/';
# variable to store the contents of the
# downloaded web page
my $downloadedPage = system "curl $url";
# returning the contents using the variable
return $downloadedPage;
}
# displaying a user friendly message
say "the contents of the web page : ";
# calling the sub routine
getWebPage();
Output:
the contents of the downloaded web page :
<raw HTML web page>
Downloading a Web Page using the LWP::Simple Module
LWP::Simple is a module in Perl which provides a get() that takes the URL as a parameter and returns the body of the document. It returns undef if the requested URL cannot be processed by the server.
Perl
#!usr/bin/perl
# using the strict pragma
use strict;
# using the warnings pragma to
# generate warnings in case of
# erroneous codes
use warnings;
# specifying the Perl version
use 5.010;
# calling the LWP::Simple module
use LWP::Simple;
# declaring the sub routine
sub getWebPage {
# variable to store the URL
my $url = 'http://www.google.com';
# passing the URL to the get function
# of LWP::Simple module
my $downloadedPage = get $url;
# printing the contents of the web page
say $downloadedPage;
}
# displaying a user friendly message
say 'the contents of the web page are : ';
#calling the sub routine
getWebPage();
Output:
the contents of the downloaded web page :
<raw HTML web page>
Downloading a Web Page using HTTP::Tiny
HTTP::Tiny is a simple HTTP/1.1 client which implies it is used to get, put, delete, head (basic HTTP actions). It is used for performing simple requests without the overhead of a large framework. First, an HTTP variable is instantiated using the new operator. Next, we get the code for the request by passing the URL in the get method. On successful code, we get the length and the content of the web page at the address of the specified URL. In the case of an unsuccessful code, we display the appropriate message and mention the reasons for the failure of connection.
Perl
#!usr/bin/perl
# using the warnings pragma to
# generate warnings in case of
# erroneous code
use warnings;
# specifying the Perl version
use 5.010;
# calling the HTTP::Tiny module
use HTTP::Tiny;
# declaring the sub routine
sub getWebPage{
# variable to store the URL
my $url = 'http://www.google.com/';
# instantiating the HTTP variable
my $httpVariable = HTTP::Tiny->new;
# storing the response using the get
# method
my $response = $httpVariable->get($url);
# checking if the code returned successful
if ($response -> {success}){
# specifying the length of the
# web page content using the
# length keyword
say 'the length of the web page : ';
my $length = length $response->{content};
say $length;
# displaying the contents of the webpage
say 'the contents of the web page are : ';
my $downloadedPage = $response->{content};
say $downloadedPage;
}
# logic for when the code is
# unsuccessful
else{
# displating the reason for failed
# request
say "Failed to establish connection :
$response->{status}.$response->{reasons}";
}
}
# calling the sub routine
getWebPage();
Output:
the length of the web page :
15175
the contents of the web page are :
<html code of the web page>
Downloading multiple web pages using HTTP::Tiny
The approach for the download of multiple web pages using HTTP::Tiny is the same as mentioned above. The only modification is that here the URL of all the web pages are stored in an array and we loop through the array displaying the contents of each web page.
Perl
#!usr/bin/perl
# using the warnings pragma
# to generate warnings for
# erroneous code
use warnings;
# specifying the Perl version
use 5.010;
# calling the HTTP::Tiny module
use HTTP::Tiny;
# declaring the sub routine
sub getWebPages{
# instantiating the HTTP client
my $httpVariable = HTTP::Tiny->new;
# array of URLs
my @urls = ('http://www.google.com/',
'https://www.geeksforgeeks.org/'
);
# start of foreach loop to
# loop through the array of URLs
foreach my $singleURL (@urls){
# displaying user friendly message
say 'downloading web page...';
# variable to store the response
my $response = $httpVariable->
get($singleURL);
# logic for successful connection
if ($response->{success}){
say $singleURL.
" downloaded successfully";
# displaying the length of
# the web page
# the contents can be displayed
# similarly
say "Length : length
$response->{content}";
}
# logic for unsuccessful connection
else{
say $singleURL.
" could not be downloaded";
# displaying the reason for
# unsuccessful connection
say "$response->{status}
$response->{reasons}";
}
}
}
# calling the sub routine
getWebPages();
Output:
downloading web page...
downloaded successfully
Length : 15175
<html content of the landing page of google>
downloading web page...
downloaded successfully
Length : <Length of the landing page of GFG>
<html content of the landing page of GFG>
Downloading Images using Perl
In this section, we will see two approaches to download images using Perl scripts. In order to get the URL of these images, we first right-click on them. Next, we click on Copy Image Address from the drop-down and paste this as the URL for the image.
Downloading images using LWP::Simple
In this approach, we use LWP::Simple module and get the HTTP code using getstore function. In this function, we have to specify the URL of the image to be downloaded and the location to store the downloaded image. Next, we check if the code is successful or not and display the corresponding message to the user.
Perl
#!usr/bin/perl
# using the strict pragma
use strict;
# using the warnings pragma
# to generate warnings for
# erroneous code
use warnings;
# specifying the Perl version
use 5.010;
# calling the module
use LWP::Simple;
# declaring the sub routine
sub getImage {
# displaying a user friendly message
say "Downloading ... ";
# variable to store the status code
# first parameter is the URL of the image
# second parameter is the location
# of the downloaded image
my $statusCode = getstore
("https://www.geeksforgeeks.org/wp-content/uploads/gfg_200X200-1.png",
"downloaded_image.png");
# checking for successful
# connection
if ($statusCode == 200) {
say "Image successfully downloaded.";
}
else {
say "Image download failed.";
}
}
# calling the sub routine
getImage();
Output:
Downloading...
Image successfully downloaded.
(the downloaded image will be saved at the specified location
with the given name. If no location is specified then the image
would be saved in the current working directory.
Downloading Images using Image::Grab Module
Image::Grab is a simple module meant for downloading the images specified by their URLs. It works with images that might be hidden by some method too. In this approach, we use the Image::Grab module and after instantiating it, we pass the URL. Next, we call the grab method and save the downloaded image to disk.
Perl
#!usr/bin/perl
# using the strict pragma
use strict;
# using the warnings pragma to
# generate warnings for erroneous
# code
use warnings;
# specifying the Perl version
use 5.010;
# calling the Image::Grab module
use Image::Grab;
# instantiating the module
# and storing it in a variable
my $instantiatedImage = new Image::Grab;
# declaring the sub routine
sub getImage {
# specifying the URL
$instantiatedImage->url
('https://www.geeksforgeeks.org/wp-content/uploads/gfg_200X200-1.png');
# calling grab to grab the image
$instantiatedImage->grab;
# creating a file to store
# the downloaded image
open(DOWNLOADEDIMAGE, '>downloaded_image1.png') ||
die'downloaded_image1.png: $!';
# for MSDOS only
binmode DOWNLOADEDIMAGE;
# saving the image in the created
# file
print DOWNLOADEDIMAGE $instantiatedImage->image;
# closing the file
close instantiatedImage;
}
# calling the sub routine
getImage();
Output:
The image is stored with the specified file name.
Downloaded Image:
Similar Reads
Downloading Files from FTP with R
File Transfer Protocol (FTP) is a standard network protocol for transferring files between a client and server. It is a widely used method for transferring files over the internet from one computer to another. It allows users to upload, download, and manage files on a remote server. FTP is like a fi
9 min read
Downloading and Uploading Files from Internet
The terms downloading and uploading are generally used while browsing the Internet. Receiving data or a file from the Internet on your computer is referred to as "downloading." Uploading refers to the process of sending data or a file from your computer to a remote location on the Internet. Remote S
7 min read
How to Download Files from Urls With Python
Here, we have a task to download files from URLs with Python. In this article, we will see how to download files from URLs using some generally used methods in Python. Download Files from URLs with PythonBelow are the methods to Download files from URLs with Python: Using 'requests' ModuleUsing 'url
2 min read
Comparing content of files using Perl
In Perl, we can easily compare the content of two files by using the File::Compare module. This module provides a function called compare, which helps in comparing the content of two files specified to it as arguments. If the data present in both the files comes out to be same, the function returns
2 min read
JSP - File Downloading
In this article, we will learn how to download files from the server/file directories using JSP. The JSP code has a response header indicating that text is being returned that should be downloaded rather than displayed. We will set the content type header to "APPLICATION/OCTET-STREAM", and it will i
3 min read
How to Download a File Using Node.js?
Downloading files from the internet is a common task in many Node.js applications, whether it's fetching images, videos, documents, or any other type of file. In this article, we'll explore various methods for downloading files using Node.js, ranging from built-in modules to external libraries.Using
3 min read
How to use R to download file from internet ?
In this article, we will be looking at the approach to download any type of file from the internet using R Programming Language. To download any type of file from the Internet download.file() function is used. This function can be used to download a file from the Internet. Syntax: download.file(url,
2 min read
How to download File Using JavaScript/jQuery ?
The ability to download files directly from a website is an essential feature for many web applications. Whether you're providing documents, images, or other data, enabling users to download files seamlessly enhances their experience and ensures they can access the content offline. This article prov
2 min read
How to download Files with Scrapy ?
Scrapy is a fast high-level web crawling and web scraping framework used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. In this tutorial, we will be exploring how to download files usi
8 min read
How to Download a File from a Server with SSH / SCP?
Secure Copy Protocol (SCP) is a secure file transfer protocol that allows you to transfer files between computers over a secure connection. It is based on the Secure Shell (SSH) protocol and is commonly used to transfer files between servers and local computers. One of the main benefits of using SCP
3 min read