Data Lake Security
Data Lake Security
Effective Hadoop security depends on a holistic approach that revolves around five pillars of security:
administration, authentication and perimeter security, authorization, auditing, and data protection.
Azure
Enterprise customers demand a data analytics cloud platform that is secure and
easy to use. Azure Data Lake Store is designed to help address these requirements
through identity management and authentication via Azure Active Directory
integration, ACL-based authorization, network isolation, data encryption in transit
and at rest (coming in the future), and auditing.
It is vital for an enterprise to make sure that critical business data is stored more securely, with the
correct level of access granted to individual users. Azure Data Lake Store is designed to help meet these
security requirements. In this article, learn about the security capabilities of Data Lake Store, including:
•Authentication
•Authorization
•Network isolation
•Data protection
•Auditing
Authentication is the process by which a user's identity is verified when the user interacts with Data
Lake Store or with any service that connects to Data Lake Store. For identity management and
authentication, Data Lake Store uses Azure Active Directory, a comprehensive identity and access
management cloud solution that simplifies the management of users and groups.
Four basic roles are defined for Data Lake Store by default. The roles permit
different operations on a Data Lake Store account via the Azure portal,
PowerShell cmdlets, and REST APIs. The Owner and Contributor roles can
perform a variety of administration functions on the account. You can assign
the Reader role to users who only interact with data.
Using ACLs for operations on file systems
Data Lake Store is a hierarchical file system like Hadoop Distributed File
System (HDFS), and it supports POSIX ACLs. It controls read (r), write (w),
and execute (x) permissions to resources for the Owner role, for the Owners
group, and for other users and groups. In Data Lake Store, ACLs can be
enabled on the root folder, on subfolders, and on individual files. For more
information on how ACLs work in context of Data Lake Store, see Access
control in Data Lake Store.
We recommend that you define ACLs for multiple users by using security
groups. Add users to a security group, and then assign the ACLs for a file or
folder to that security group.
Network isolation
Use Data Lake Store to help control access to your data store at the network
level. You can establish firewalls and define an IP address range for your
trusted clients. With an IP address range, only clients that have an IP address
within the defined range can connect to Data Lake Store.
Network isolation
Use Data Lake Store to help control access to your data store at the network
level. You can establish firewalls and define an IP address range for your
trusted clients. With an IP address range, only clients that have an IP address
within the defined range can connect to Data Lake Store
Data protection
Azure Data Lake Store protects your data throughout its life cycle. For data
in transit, Data Lake Store uses the industry-standard Transport Layer
Security (TLS) protocol to secure data over the network.
Data Lake Store also provides encryption for data that is stored in the account. You
can chose to have your data encrypted or opt for no encryption. If you opt in for
encryption, data stored in Data Lake Store is encrypted prior to storing on
persistent media. In such a case, Data Lake Store automatically encrypts data prior
to persisting and decrypts data prior to retrieval, so it is completely transparent to
the client accessing the data. There is no code change required on the client side to
encrypt/decrypt data
You can use auditing or diagnostic logs, depending on whether you are
looking for logs for management-related activities or data-related activities.