Hive Securing Hive
Hive Securing Hive
https://docs.cloudera.com/
Legal Notice
© Cloudera Inc. 2024. All rights reserved.
The documentation is and contains Cloudera proprietary information protected by copyright and other intellectual property
rights. No license under copyright or any other intellectual property right is granted herein.
Unless otherwise noted, scripts and sample code are licensed under the Apache License, Version 2.0.
Copyright information for Cloudera software may be found within the documentation accompanying each component in a
particular release.
Cloudera software includes software from various open source or other third party projects, and may be released under the
Apache Software License 2.0 (“ASLv2”), the Affero General Public License version 3 (AGPLv3), or other license terms.
Other software included may be released under the terms of alternative open source licenses. Please review the license and
notice files accompanying the software for additional licensing information.
Please visit the Cloudera software product page for more information on Cloudera software. For more information on
Cloudera support services, please visit either the Support or Sales page. Feel free to contact us directly to discuss your
specific needs.
Cloudera reserves the right to change any products at any time, and without notice. Cloudera assumes no responsibility nor
liability arising from the use of products, except as expressly agreed to in writing by Cloudera.
Cloudera, Cloudera Altus, HUE, Impala, Cloudera Impala, and other Cloudera marks are registered or unregistered
trademarks in the United States and other countries. All other trademarks are the property of their respective owners.
Disclaimer: EXCEPT AS EXPRESSLY PROVIDED IN A WRITTEN AGREEMENT WITH CLOUDERA,
CLOUDERA DOES NOT MAKE NOR GIVE ANY REPRESENTATION, WARRANTY, NOR COVENANT OF
ANY KIND, WHETHER EXPRESS OR IMPLIED, IN CONNECTION WITH CLOUDERA TECHNOLOGY OR
RELATED SUPPORT PROVIDED IN CONNECTION THEREWITH. CLOUDERA DOES NOT WARRANT THAT
CLOUDERA PRODUCTS NOR SOFTWARE WILL OPERATE UNINTERRUPTED NOR THAT IT WILL BE
FREE FROM DEFECTS NOR ERRORS, THAT IT WILL PROTECT YOUR DATA FROM LOSS, CORRUPTION
NOR UNAVAILABILITY, NOR THAT IT WILL MEET ALL OF CUSTOMER’S BUSINESS REQUIREMENTS.
WITHOUT LIMITING THE FOREGOING, AND TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE
LAW, CLOUDERA EXPRESSLY DISCLAIMS ANY AND ALL IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, QUALITY, NON-INFRINGEMENT, TITLE, AND
FITNESS FOR A PARTICULAR PURPOSE AND ANY REPRESENTATION, WARRANTY, OR COVENANT BASED
ON COURSE OF DEALING OR USAGE IN TRADE.
Cloudera Runtime | Contents | iii
Contents
Hive authentication.................................................................................................12
Securing HiveServer using LDAP..................................................................................................................... 13
Client connections to HiveServer.......................................................................................................................14
Pluggable authentication modules in HiveServer.............................................................................................. 15
JDBC connection string syntax.......................................................................................................................... 15
Communication encryption................................................................................... 17
Enabling TLS/SSL for HiveServer.................................................................................................................... 18
Enabling SASL in HiveServer........................................................................................................................... 19
4
Cloudera Runtime Transactional table access
When you run grant/revoke commands and Apache Ranger is enabled, a Ranger policy is created/removed.
Related Information
HDFS ACLS
Configure a Resource-based Policy: Hive
Row-level Filtering and Column Masking in Hive
Query Hive
Hive assigns a default permission of 777 to the hive user, sets a umask to restrict subdirectories, and provides a
default ACL to give Hive read and write access to all subdirectories. External tables must be secured using Ranger.
5
Cloudera Runtime Accessing Hive files in Ozone
In this task, you first enable Ozone in the Ranger service, and then set up the required policies.
Procedure
1. In Cloudera Manager, click Clusters Ozone Configuration to navigate to the configuration page for Ozone.
2. Search for ranger_service, and enable the property.
3. Click Clusters Ranger Ranger Admin Web UI , enter your user name and password, then click Sign In.
The Service Manager for Resource Based Policies page is displayed in the Ranger console.
7. Click the Service Manager link in the breadcrumb trail and then click the Hadoop SQL preloaded resource-based
service to update the Hadoop SQL URL policy.
6
Cloudera Runtime Configuring access to Hive on YARN
8.
In the Hadoop SQL policies page, click the Policy ID or click Edit against the "all - url" policy to modify
the policy details.
By default, "hive", "hue", "impala", "admin" and a few other users are provided access to all the Ozone URLs.
You can select users and groups in addition to the default. To grant everyone access, add the "public" group to the
group list. Every user is then subject to your allow conditions.
What to do next
Create a Hive external table having source data in Ozone.
Also, it is recommended that you set certain Hive configurations before querying Hive tables in Ozone.
Related Information
Set up Ozone security
Cloudera's Ranger documentation
Creating an Ozone-based Hive external table
7
Cloudera Runtime Configuring access to Hive on YARN
Procedure
1. In Cloudera Manager, click Clusters Hive on Tez Configuration and search for hive.server2.enable.doAs.
2. Set the value of doas to false. Uncheck Hive (Service-Wide) to disable impersonation.
For more information about configuring doas, see "Enabling or disabling impersonation".
Save changes.
3. Search for the Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml setting.
4.
In the Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml setting, click .
5. Add the properties and values to allow the Hive workload on YARN.
For more information about allowing the Hive workload on YARN, see "Configuring HiveServer for ETL using
YARN queues".
Save changes.
8
Cloudera Runtime Configuring access to Hive on YARN
6. In Cloudera Manager, click Clusters YARN Configuration , and search for ResourceManager Advanced
Configuration Snippet (Safety Valve) for yarn-site.xml.
7.
In the ResourceManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml setting, click .
8. Add the properties and values to allow the end user to access YARN using placement rules.
For more information about allowing end user access to YARN, see "Configure queue mapping to use the user
name from the application tag using Cloudera Manager".
Save changes.
9. Restart the YARN ResourceManager service for the changes to apply.
End users you specified can now query Hive workloads in YARN queues.
Related Information
Disabling impersonation (doas)
Configuring HiveServer for ETL using YARN queues
Managing YARN queue users
Configuring queue mapping to use the user name from the application tag using Cloudera Manager
Procedure
1. In Cloudera Manager, click Clusters Hive-on-Tez Configuration .
2. Search for the Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml setting.
9
Cloudera Runtime Configuring access to Hive on YARN
3. In the Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml setting, click +.
4. In Name enter the property hive.server2.tez.initialize.default.sessions and in value enter false.
5. In Name enter the property hive.server2.tez.queue.access.check and in value enter true.
6. In Name enter the property hive.server2.tez.sessions.custom.queue.allowed and in value enter true.
Configuring queue mapping to use the user name from the application tag
using Cloudera Manager
You learn how to add service users to the YARN queue by following a mapping procedure.
You can configure queue mapping to use the user name from the application tag instead of
the proxy user who submitted the job. You can add only service users like hive using the
yarn.resourcemanager.application-tag-based-placement.username.whitelist property
and not normal users.
When a user runs Hive queries, HiveServer2 submits the query in the queue mapped from an end user instead of a
hive user. For example, when user alice submits a Hive query with doAs=false mode, job will run in YARN as hive
user. If application-tag based scheduling is enabled, then the job will be placed to a target queue based on the queue
mapping configuration of user alice.
For more information about queue mapping configurations, see Manage placement rules. For information about Hive
access, see Apache Hive documentation.
1. In Cloudera Manager, select the YARN service.
2. Click the Configuration tab.
3. Search for ResourceManager. In the Filters pane, under Scope, select ResourceManager.
4. In ResourceManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml add the following:
a. Enable the application-tag-based-placement property to enable application placement based on the user ID
passed using the application tags.
Name: yarn.resourcemanager.application-tag-based-placement.enable
Value: true
Description: Set to "true" to enable application placement based on the
user ID passed using the application tags. When it is enabled, it chec
ks for the userid=<userId> pattern and if found, the application will be
10
Cloudera Runtime Disabling impersonation (doas)
placed onto the found user's queue, if the original user has the requir
ed rights on the passed user's queue.
b. Add the list of users in the allowlist who can use application tag based placement. The applications when
the submitting user is included in the allowlist, will be placed onto the queue defined in the yarn.scheduler.c
apacity.queue-mappings property defined for the user from the application tag. If there is no user defined, the
submitting user will be used.
Name: yarn.resourcemanager.application-tag-based-placement.username.whit
elist
Value:
Description: Comma separated list of users who can use the application
tag based placement, if "yarn.resourcemanager.application-tag-based-pla
cement.enable" is enabled.
5. Restart the ResourceManager service for the changes to apply.
11
Cloudera Runtime Connecting to an Apache Hive endpoint through Apache Knox
Procedure
1. In Cloudera Manager, click Clusters Hive on Tez Configuration , and change the Hive on Tez service transport
mode in Cloudera Manager to http.
KNOX discovers the service automatically and builds a proxy URL for Hive on Tez only when the transport mode
is http.
2. Download the Knox Gateway TLS/SSL client trust store JKS file from Knox, and save it locally.
You can find the location of the JKS file from value of the Knox property gateway.tls.keystore.path.
3. In the Hive connection string, include parameters as follows:
jdbc:hive2://<host>:8443/;ssl=true;transportMode=http; \
httpPath=gateway/cdp-proxy-api/hive; \
sslTrustStore=/<path to JKS>/bin/certs/gateway-client-trust.jks; \
trustStorePassword=<Java default password>
In this example, changeit is the Java default password for the trust store.
Hive authentication
HiveServer supports authentication of clients using Kerberos or user/password validation backed by LDAP.
If you configure HiveServer to use Kerberos authentication, HiveServer acquires a Kerberos ticket during startup.
HiveServer requires a principal and keytab file specified in the configuration. Client applications (for example, JDBC
or Beeline) must have a valid Kerberos ticket before initiating a connection to HiveServer2. JDBC-based clients must
include principal=<hive.server2.authentication.principal> in the JDBC connection string. For example:
12
Cloudera Runtime Hive authentication
where hive is the principal configured in hive-site.xml and HiveServerHost is the host where HiveServer is running.
To start Beeline and connect to a secure HiveServer, enter a command as shown in the following example:
beeline -u "jdbc:hive2://10.65.13.98:10000/default;principal=hive/_HOST@CLOU
DERA.SITE"
Procedure
1. In Cloudera Manager, select Hive-on-Tez Configuration .
2. Search for ldap.
3. Check Enable LDAP Authentication for HiveServer2 for Hive (Service Wide).
4. Enter your LDAP URL in the format ldap[s]://<host>:<port>.
LDAP_URL is the access URL for your LDAP server. For example, ldap://ldap_host_name.xyz.com:389
5. Enter the Active Directory Domain or LDAP Base DN for your environment.
• Active Directory (AD)
• LDAP_BaseDN
Enter the domain name of the AD server. For example, corp.domain.com.
Enter the base LDAP distinguished name (DN) for your LDAP server. For example, ou=dev, dc=xyz.
13
Cloudera Runtime Hive authentication
The following example shows a secure connection string that uses encrypted passwords.
where the LDAP_Userid value is the user ID and LDAP_Password is the password of the client user.
Embedded The Beeline client and the Hive installation reside on the same host
machine or virtual machine. No TCP connectivity is required.
jdbc:hive2://<host>:<port>/<db>.
jdbc:hive2://
14
Cloudera Runtime Hive authentication
Transport Modes
As administrator, you can start HiveServer in one of the following transport modes:
Transport Mode Description
TCP HiveServer uses TCP transport for sending and receiving Thrift RPC
messages.
HTTP HiveServer uses HTTP transport for sending and receiving Thrift RPC
messages.
15
Cloudera Runtime Hive authentication
jdbc:hive2://<host>:<port>/<dbName>;<sessionConfs>?<hiveConfs>#<hiveVars>
jdbc:hive2://<host>:<port>/<dbName>;transportMode=http;httpPath=<http_endpoi
nt>; \
<otherSessionConfs>?<hiveConfs>#<hiveVars>
User Authentication
If configured in remote mode, HiveServer supports Kerberos, LDAP, Pluggable Authentication Modules (PAM), and
custom plugins for authenticating the JDBC user connecting to HiveServer. The format of the JDBC connection URL
for authentication with Kerberos differs from the format for other authentication models. The following table shows
the variables for Kerberos authentication.
User Authentication Variable Description
16
Cloudera Runtime Communication encryption
saslQop Quality of protection for the SASL framework. The level of quality is
negotiated between the client and server during authentication. Used by
Kerberos authentication with TCP transport.
jdbc:hive://<host>:<port>/<dbName>;principal=<HiveServer2_kerberos_principal
>;<otherSessionConfs>?<hiveConfs>#<hiveVars>
jdbc:hive2://<host>:<port>/<dbName>; \
ssl=true;sslTrustStore=<ssl_truststore_path>;trustStorePassword=<truststo
re_password>; \
<otherSessionConfs>?<hiveConfs>#<hiveVars>
When using TCP for transport and Kerberos for security, HiveServer2 uses Sasl QOP for encryption rather than SSL.
Sasl QOP Variable Description
The JDBC connection string for Sasl QOP uses these variables.
jdbc:hive2://fqdn.example.com:10000/default;principal=hive/_H
[email protected];saslQop=auth-conf
The _HOST is a wildcard placeholder that gets automatically replaced with the fully qualified domain name (FQDN)
of the server running the HiveServer daemon process.
Communication encryption
Encryption between HiveServer2 and its clients is independent from Kerberos authentication. HiveServer supports the
following types of encryption between the service and its clients (Beeline, JDBC/ODBC):
• SASL (Simple Authentication and Security Layer)
17
Cloudera Runtime Communication encryption
jdbc:hive2://fqdn.example.com:10000/default;ssl=true;\
sslTrustStore=$JAVA_HOME/jre/lib/security/jssecacerts;trustStorePassword
=extraneous
• Set the path to the trust store one time in the Java system javax.net.ssl.trustStore property:
java -Djavax.net.ssl.trustStore=/usr/java/jdk1.8.0_141-cloudera/jre/lib/
security/jssecacerts \
-Djavax.net.ssl.trustStorePassword=extraneous MyClass \
jdbc:hive2://fqdn.example.com:10000/default;ssl=true
18
Cloudera Runtime Communication encryption
Procedure
1. In Cloudera Manager, navigate to Clusters Hive Configuration .
2. In Filters, select HIVE for the scope.
3. Select Security for the category.
4. Accept the default Enable TLS/SSL for HiveServer2, which is checked for Hive (Service-Wide).
5. Enter the path to the Java keystore on the host system.
/opt/cloudera/security/pki/keystore_name.jks
6. Enter the password for the keystore you used on the Java keytool command-line when the key and keystore were
created.
The password for the keystore must match the password for the key.
7. Enter the path to the Java trust store on the host system.
8. Click Save Changes.
9. Restart the Hive service.
10. Construct a connection string for encrypting communications using TLS/SSL.
jdbc:hive2://#<host>:#<port>/#<dbName>;ssl=true;sslTrustStore=#<ssl_trus
tstore_path>; \
trustStorePassword=#<truststore_password>;#<otherSessionConfs>?#<hiveCon
fs>#<hiveVars>
auth-int Authentication with integrity protection. Signed message digests (checksums) verify the integrity of
messages sent between client and server.
auth-conf Authentication with confidentiality (transport-layer encryption) and integrity. Applicable only if
HiveServer is configured to use Kerberos authentication.
Procedure
1. In Cloudera Manager, navigate to Clusters Hive Configuration .
2. In HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site click + to add a property and value.
3. Specify the QOP auth-conf setting for the SASL QOP property.
For example,
Name:hive.server2.thrift.sasl.qop
Value: auth-conf
4. Click Save Changes.
19
Cloudera Runtime Securing an endpoint under AutoTLS
jdbc:hive2://fqdn.example.com:10000/default;principal=hive/_HOST@EXAMPLE
.COM;saslqop=auth-conf
The _HOST is a wildcard placeholder that gets automatically replaced with the fully qualified domain name
(FQDN) of the server running the HiveServer daemon process.
Procedure
1. Configure the HS2 transport mode as http to support the Knox proxy interface.
jdbc:hive2://<host>:8443/;ssl=true;\
transportMode=http;httpPath=gateway/cdp-proxy-api/hive;\
...
20
Cloudera Runtime Token-based authentication for Cloudera Data Warehouse
integrations
principal=hive/_HOST@<realm>;user=<user name>;\
sslTrustStore=<path>/certs/home90_cacert.jks;\
trustStorePassword=changeit
The httpPath default is configured in Cloudera Manager. The sslTrustStore is required is you are using a self-
signed certificate.
Procedure
1. Add a firewall rule on the metastore service host to allow access to the metastore port only from the HiveServer2
host. You can do this using iptables.
2. Grant access to the metastore database only from the metastore service host.
For example, in MySQL: GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'metastorehost'; where
metastorehost is the host where the metastore service is running.
3. Make sure users who are not administrators cannot log into the HiveServer host.
Procedure
1. In Cloudera Manager, go to Clusters Hive-on-Tez Configuration .
2. Search for HiveServer2 Advanced Configuration Snippet (Safety valve) for hive-site.xml
3.
Click and add the following property and value: hive.server2.webui.spnego.keytab = hive.keytab
21
Cloudera Runtime Activating the Hive Web UI
4.
Click and add the following property and value: hive.server2.webui.spnego.principal = HTTP/
_HOST@<REALM NAME>
5.
Click and add the following property and value: hive.server2.webui.use.spnego = true
6.
Click and add the following property and value: hive.users.in.admin.role = [***USERNAME1,USERNAME2,
…***]
Note: [***USERNAME1,USERNAME2,…***] is the list of comma separated users who want to access
historic query detail from web UI.
22