Hadoop - File Permission and ACL(Access Control List)
Last Updated :
10 Jul, 2020
In general, a Hadoop cluster performs security on many layers. The level of protection depends upon the organization's requirements. In this article, we are going to Learn about Hadoop's first level of security. It contains mainly two components. Both of these features are part of the default installation.
1. File Permission
2. ACL(Access Control List)
1. File Permission
The HDFS(Hadoop Distributed File System) implements POSIX(Portable Operating System Interface) like a file permission model. It is similar to the
file permission model in Linux. In Linux, we use
Owner, Group, and Others which has permission for each file and directory available in our Linux environment.
Owner/user Group Others
rwx rwx rwx
Similarly, the HDFS file system also implements a set of permissions, for this
Owner, Group, and Others. In Linux we use
-rwx for permission to the specific user where
r is read,
w is for write or append and
x is for executable. But in HDFS for a file, we have
r for reading,
w for writing and appending and there is no sense for
x i.e. for execution permission, because in HDFS all files are supposed to be data files and we don't have any concept of executing a file in HDFS. Since we don't have an executable concept in HDFS so we don't have a
setUID and
setGID for HDFS.

Similarly, we can have permission for a
directory in our HDFS. Where
r is used to list the content of a directory,
w is used for creation or deletion of a directory and
x permission is used to access the child of a directory. Here also we don't have a
setUID and
setGID for HDFS.
How You Can Change this HDFS File's Permission?
-chmod that stands for change mode command is used for changing the permission for the files in our HDFS. The first list down the directories available in our HDFS and have a look at the permission assigned to each of this directory. You can list the directory in your HDFS root with the below command.
hdfs dfs -ls /
Here,
/ represents the root directory of your HDFS.

Let me first list down files present in my Hadoop_File directory.
hdfs dfs -ls /Hadoop_File

In above Image you can see that for
file1.txt, I have only read and write permission for
owner user only. So I am adding write permission to
group and others also.
Pre-requisite:
You have to be familiar with the use of
-chmod command in Linux means how to use switch for permissions for users. To add write permission to group and others use below command.
hdfs dfs -chmod go+w /Hadoop_File/file1.txt
Here,
go stands for group and other and
w means write, and
+ sign shows that I am adding write permission to group and other. Then list the file again to check it worked or not.
hdfs dfs -ls /Hadoop_File

And we have done with it, similarly, you can change the permission for any file or directory available in our HDFS(Hadoop Distributed File System).
Similarly, you can change permission as per your requirement for any user. you can also change group or owner of a directory with
-chgrp and
-chown respectively.
2. ACL(Access Control List)
ACL provides a more flexible way to assign permission for a file system. It is a list of access permission for a file or a directory. We need the use of ACL in case you have made a separate user for your Hadoop single node cluster setup, or you have a multinode cluster setup where various nodes are present, and you want to change permission for other users.
Because if you want to change permission for the different users, you can not do it with
-chmod command. For example, for single node cluster of Hadoop your main user is
root and you have created a separate user for Hadoop setup with name let say
Hadoop. Now if you want to change permission for the
root user for files that are present in your HDFS, you can not do it with
-chmod command. Here comes ACL(Access Control List) in the picture. With ACL you can
set permission for a specific named user or named group.
In order to enable ACL in HDFS you need to add the below property in
hdfs-site.xml file.
[code]
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>
[/code]
Note: Don't forget to restart all the daemons otherwise changes made to
hdfs-site.xml don't reflect.
You can check the entry's in your access control list(ACL) with
-getfacl command for a directory as shown below.
hdfs dfs -getfacl /Hadoop_File

You can see that we have 3 different entry's in our ACL. Suppose you want to change permission for your
root user for any HDFS directory you can do it with below command.
Syntax:
hdfs dfs -setfacl -m user:user_name:r-x /Hadoop_File
You can change permission for any user by adding it to the ACL for that directory. Below are some of the example to change permission of different named users for any HDFS file or directory.
hdfs dfs -setfacl -m user:root:r-x /Hadoop_File
Another example, for raj user:
hdfs dfs -setfacl -m user:raj:r-x /Hadoop_File
Here r-x denotes only read and executing permission for HDFS directory for that
root, and raj user.
In my case, I don't have any other user so I am changing permission for my only user i.e.
dikshant
hdfs dfs -setfacl -m user:dikshant:rwx /Hadoop_File
Then list the ACL with
-getfacl command to see the changes.
hdfs dfs -getfacl /Hadoop_File

Here, you can see another entry in ACL of this directory with
user:dikshant:rwx for new permission of
dikshant user. Similarly, in case you have multiple users then you can change their permission for any HDFS directory. This is another example to change the permission of the user
dikshant from
r-x mode.

Here, you can see that I have changed
dikshant user permission from
rwx to
r-x.
Similar Reads
Hadoop - Architecture As we all know Hadoop is a framework written in Java that utilizes a large cluster of commodity hardware to maintain and store big size data. Hadoop works on MapReduce Programming Algorithm that was introduced by Google. Today lots of Big Brand Companies are using Hadoop in their Organization to dea
6 min read
Hadoop Ecosystem Overview: Apache Hadoop is an open source framework intended to make interaction with big data easier, However, for those who are not acquainted with this technology, one question arises that what is big data ? Big data is a term given to the data sets which can't be processed in an efficient manner
6 min read
Introduction to Hadoop Hadoop is an open-source software framework that is used for storing and processing large amounts of data in a distributed computing environment. It is designed to handle big data and is based on the MapReduce programming model, which allows for the parallel processing of large datasets. Its framewo
3 min read
Top 60+ Data Engineer Interview Questions and Answers Data engineering is a rapidly growing field that plays a crucial role in managing and processing large volumes of data for organizations. As companies increasingly rely on data-driven decision-making, the demand for skilled data engineers continues to rise. If you're preparing for a data engineer in
15+ min read
What is Big Data? Data science is the study of data analysis by advanced technology (Machine Learning, Artificial Intelligence, Big data). It processes a huge amount of structured, semi-structured, and unstructured data to extract insight meaning, from which one pattern can be designed that will be useful to take a d
5 min read
Explain the Hadoop Distributed File System (HDFS) Architecture and Advantages. The Hadoop Distributed File System (HDFS) is a key component of the Apache Hadoop ecosystem, designed to store and manage large volumes of data across multiple machines in a distributed manner. It provides high-throughput access to data, making it suitable for applications that deal with large datas
5 min read
What is Big Data Analytics ? - Definition, Working, Benefits Big Data Analytics uses advanced analytical methods that can extract important business insights from bulk datasets. Within these datasets lies both structured (organized) and unstructured (unorganized) data. Its applications cover different industries such as healthcare, education, insurance, AI, r
9 min read
Hadoop - HDFS (Hadoop Distributed File System) Before head over to learn about the HDFS(Hadoop Distributed File System), we should know what actually the file system is. The file system is a kind of Data structure or method which we use in an operating system to manage file on disk space. This means it allows the user to keep maintain and retrie
7 min read
Map Reduce and its Phases with numerical example. Map Reduce is a framework in which we can write applications to run huge amount of data in parallel and in large cluster of commodity hardware in a reliable manner.Phases of MapReduceMapReduce model has three major and one optional phase.âMappingShuffling and SortingReducingCombining1) MappingIt is
3 min read
What is Data Lake ? In todayâs data-driven world, organizations face the challenge of managing vast amounts of raw data to get meaningful insights. To resolve this Data Lakes was introduced. It is a centralized storage repository that allows businesses to store structured, semi-structured and unstructured data at any s
5 min read