HDFS - Data Read Operation
Last Updated :
17 Nov, 2022
HDFS is a distributed file system that stores data over a network of commodity machines. HDFS works on the streaming data access pattern means it supports write-ones and read-many features. Read operation on HDFS is very important and also very much necessary for us to know while working on HDFS that how actually reading is done on HDFS(Hadoop Distributed File System). Let's understand how HDFS data read works.
Reading on HDFS seems to be simple but it is not. Whenever a client sends a request to HDFS to read something from HDFS the access to the data or DataNode where actual data is stored is not directly granted to the client because the client does not have the information about the data i.e. on which DataNode data is stored or where the replica of data is made on DataNodes. Without knowing information about the DataNodes the client can never access or read data from HDFS.
So, that's why the client first sends the request to NameNode since the NameNode contains all the metadata or information we require to perform read operation on HDFS. Once the request is received by the NameNode it responds and sends all the information like the number of DataNodes, the location where the replica is made, the number of data blocks and their location, etc to the client. Now the client can read data with all this information provided by the NameNode. The client reads the data parallelly since the replica of the same data is available on the cluster. Once the whole data is read it combines all the blocks as the original file.
Let's understand data read on HDFS with a suitable diagram
Components that we have to know before learning HDFS read operation.
NameNode: The primary purpose of Namenode is to manage all the MetaData. As we know the data is stored in the form of blocks in a Hadoop cluster. So on which DataNode or on which location that block of the file is stored is mentioned in MetaData. Log of the Transaction happening in a Hadoop cluster, when or who read or write the data, all this information will be stored in MetaData.
DataNode: DataNode is a program run on the slave system that serves the read/write request from the client and used to store data in form of blocks.
HDFS Client: HDFS Client is an intermediate component between HDFS and the user. It communicates with the Datanode or Namenode and fetches the essential output that the user requests.

In the above, image we can see that first, we send the request to our HDFS client which is a set of programs. Now, this HDFS client contacts the NameNode because it has all information or metadata about the file we want to read. The NamoNode responds and then sends all the metadata back to the HDFS client. Once the HDFS client knows from which location it has to pick the data block, It asks the FS Data Input Stream to point out those blocks of data on data nodes. The FS Data Input Stream then does some processing and made this data available for the client.
Let's see the way to read data from HDFS.
Using HDFS command:
With the help of the below command, we can directly read data from HDFS(NOTE: Make sure all of your Hadoop daemons are running).
Commands to start Hadoop Daemons
start-dfs.sh
start-yarn.sh
Syntax For Reading Data From HDFS:
hdfs dfs -get <source-path> <destination-path> # here source path is file path on HDFS that we want to read
# destination path is where we want to store the read file on local machine
Command
In our case, we have one file with the name dikshant.txt with some data on the HDFS root directory. The below command, we can use to list data on the HDFS root directory.
hdfs dfs -ls /

the below command will read the data from the root directory of HDFS and stores it in the /home/dikshant/Desktop location on my local machine.
hdfs dfs -get /dikshant.txt /home/dikshant/Desktop

In the below image we can observe that the data is successfully read and stored in /home/dikshant/Desktop directory and now we can see the content of it by opening this file.

Similar Reads
Digital Image Processing Basics Digital Image Processing means processing digital image by means of a digital computer. We can also say that it is a use of computer algorithms, in order to get enhanced image either to extract some useful information. Digital image processing is the use of algorithms and mathematical models to proc
7 min read
DDA Line generation Algorithm in Computer Graphics Introduction : DDA (Digital Differential Analyzer) is a line drawing algorithm used in computer graphics to generate a line segment between two specified endpoints. It is a simple and efficient algorithm that works by using the incremental difference between the x-coordinates and y-coordinates of th
12 min read
Introduction to Computer Graphics The term 'Computer Graphics' was coined by Verne Hudson and William Fetter from Boeing who were pioneers in the field. Computer graphics is a dynamic and essential field within computing that involves the creation, manipulation, and rendering of visual content using computers.In today's digital era,
5 min read
Projections in Computer Graphics Representing an n-dimensional object into an n-1 dimension is known as projection. It is process of converting a 3D object into 2D object, we represent a 3D object on a 2D plane {(x,y,z)->(x,y)}. It is also defined as mapping or transforming of the object in projection plane or view plane. When g
5 min read
Difference between Raster Scan and Random Scan Raster Scan associates degreed, random scan square measure the mechanisms employed in displays for rendering the image or picture. The most distinction between formation scan and random scan lies within the drawing of an image wherever the formation scan points the nonparticulate radiation at the wh
2 min read
Applications of Computer Graphics Computer graphics is the part of computer science that studies methods for manipulating visual content although computer graphics deals with 3D graphics, 2D graphics, and image processing. It also deals with the creation, manipulation, and storage of different types of images and objects. There are
4 min read
Computer Graphics - 3D Translation Transformation 3-D Transformation: In very general terms a 3D model is a mathematical representation of a physical entity that occupies space. In more practical terms, a 3D model is made of a description of its shape and a description of its color appearance.3-D Transformation is the process of manipulating the vi
3 min read
Vector vs Raster Graphics When it comes to digital images, two main types are commonly used: raster and vector graphics. Understanding the difference between these two can help you choose the right format for your project. Raster graphics, made up of tiny pixels, are ideal for detailed and colorful images like photographs. O
7 min read
2D Transformation in Computer Graphics | Set 1 (Scaling of Objects) We can use a 2 Ã 2 matrix to change or transform, a 2D vector. This kind of operation, which takes in a 2-vector and produces another 2-vector by a simple matrix multiplication, is a linear transformation. By this simple formula, we can achieve a variety of useful transformations, depending on what
5 min read
Difference Between RGB, CMYK, HSV, and YIQ Color Models The colour spaces in image processing aim to facilitate the specifications of colours in some standard way. Different types of colour models are used in multiple fields like in hardware, in multiple applications of creating animation, etc. Letâs see each colour model and its application. RGBCMYKHSV
3 min read