HIVE Built-In Functions
HIVE Built-In Functions
There are various built-in functions available in Hive that can be used in queries for executing various
operations. These functions are used to extract or manipulate data in Hive tables.
Return
Function Name Description
Type
acos(DOUBLE x), It will return an arc cosine value of x if the value of x is equal to or between -1 and 1. Otherwise, it
DOUBLE
acos(DECIMAL x) will return NULL.
asin(DOUBLE x), It will return an arc sin value of x if the value of x is equal to or between -1 and 1. Otherwise it will
DOUBLE
asin(DECIMAL x) return NULL.
atan(DOUBLE x),
atan(DECIMAL x)
DOUBLE It will return an arc tangent value of x.
bin(BIGINT x) STRING It will return the binary value of the number "x" in string format.
ceil(DOUBLE x) BIGINT It will return the ceil value of x that is the minimum number greater than or equal to x.
conv(BIGINT x, INT It is used to convert the number x from one base to another base. The value returned will be in string
from_base, INT STRING
to_base) format.
conv(STRING x, INT
It is used to convert the string x from one base to another base. The value returned will be in String
from_base, INT STRING
to_base) format.
degrees(DOUBLE|DECIMAL It is used to convert x from radians to degree format, where the x parameter is in double or decimal
x) DOUBLE
format.
exp(DOUBLE|DECIMAL x) DOUBLE It will return the exponential value of x, where e is the base of the natural algorithm.
factorial(INT x) BIGINT It will return the factorial value of the number x.
floor(DOUBLE x) BIGINT It will return the floor value of x that is the maximum number less than or equal to x.
greatest(T v1, T v2, It will return the maximum value from the list of values specified. The data type can be any but must
and T Vn) T
be the same for all values passed to this function. If any argument is NULL then it will return NULL.
It is used to get a hexadecimal value of the number x of the bigint, string, or binary data type. In the
hex(BIGINT|
STRING|BINARY x) STRING case of string, it will convert each character into its hexadecimal format and will return the resulting
string.
least(T v1, T v2, and It will return the lowest value from the list of values specified. The data type can be any but must be
so on) T
the same for all values passed to this function. If any argument is NULL then it will return NULL.
pmod(INT x, INT y) INT It will return a positive value of x modulus y, that is, x mod y.
pmod(DOUBLE x, DOUBLE
y) DOUBLE It will return a positive value of x modulus y, that is, x mod y.
pow(DOUBLE x, DOUBLE
n) DOUBLE It is used to derive the power n of the number x, that is, xn.
power(DOUBLE x, DOUBLE
n) DOUBLE It is the same as the pow function.
radians(DOUBLE| It is used to convert x from degree to radian format, where the x parameter is in double or decimal
DECIMAL x) DOUBLE
format.
rand(INT x) DOUBLE It will return any random number with seed value x.
round(DOUBLE x, INT n) DOUBLE It will round off the value of x to n decimal places.
sign(DOUBLE x) DOUBLE It will return sign of the number. If x is positive then it will return '1.0' and if x is negative then it
will return '-1.0', otherwise it will return '0.0'.
sign(DECIMAL x) DECIMAL It is the same as just described but for decimal numbers.
sin(DOUBLE x) DOUBLE It is used to derive the sine value of x, where x is in radians of the double data type.
sin(DECIMAL x) DOUBLE It is used to derive the sine value of x, where x is in radians of the decimal data type.
sqrt(DOUBLE x) DOUBLE It will return the square root value of x, where x is of the double data type.
sqrt(DECIMAL x) DOUBLE It will return the square root value of x, where x is of the decimal data type.
tan(DOUBLE|DECIMAL x) DOUBLE It is used derive the tangent value of x, where x is in radians of the double or decimal data type.
Collection functions
Hive also supports some functions that can be executed on Hive complex data types, such as array and
map:
array_contains(ARRAY<T>, value) BOOLEAN It is used to check whether a value exists in an array or not.
map_keys(Map<K,V>) ARRAY<K> It will return all the keys of a map in an unordered array.
map_values(Map<K,V>) ARRAY<V> It will return all the values of a map in an unordered array.
binary(string|binary) BINARY It is used to cast the field value into binary format.
current_date() DATE It will return the current date. Only the date part is returned as a result.
date_add(string startDate, int n) STRING It is used to add n number of days to a specified date.
date_format(date/timestamp/string
ts, string fmt) STRING It is used to format the date to any specified format.
date_sub(string startDate, int n) STRING It is used to subtract n of days from a specified date.
day(string date) INT It is used to extract the day part from a date.
from_unixtime(bigint unixtime[,
string format]) STRING It is used to convert UNIX epoch time to timestamp in the system time zone format.
from_utc_timestamp(timestamp,
string timezone) TIMESTAMP It is used to convert UTC time to a specified time zone format.
hour(string date) INT It is used to extract the hour part from a timestamp.
It will return the timestamp of the last day of the month of which the specified date
last_day(string date) STRING
belongs to.
minute(string date) INT It is used to extract the minute part from a timestamp.
month(string date) INT It is used to extract the month part from a timestamp.
months_between(date1, date2) DOUBLE It will return number of months between a specific date range.
It will return the date of the day that is after the start date and matches the specified
dayOfWeek. There are three type of values supported in the second argument
next_day(string startDate, string STRING dayOfWeek: (a) 2 letters day of week; example MO, TU (b) 3 letters day of week;
dayOfWeek) example MON, TUE, and (c) full name day of week; example MONDAY,
TUESDAY.
second(string date) INT It is used to extract the second part from a timestamp.
to_date(string timestamp) STRING It will return the date part of a specified timestamp value.
to_utc_timestamp(timestamp,
string timezone) TIMESTAMP It is used to convert the timestamp of any time zone to UTC format.
It will truncate the date as per the specified format. Formats supported are
trunc(string date, string format) STRING
YEAR/YYYY/YY, MONTH/MON/MM.
unix_timestamp(string date) BIGINT It will convert the date to UNIX timestamp in seconds.
unix_timestamp(string date,
string pattern) BIGINT It will convert the date of the specified pattern to UNIX timestamp in seconds.
weekofyear(string date) INT It will return the week number for a year of the specified date.
year(string date) INT It is used to extract the year part from a timestamp.
String functions
Hive supports the following built-in functions for operations on string objects:
ascii(STRING x) INT It will return the numeric (ASCII) value of the first character of a string.
base64(BINARY x) STRING It is used to convert the binary value to base-64 string format.
concat(STRING x, STRING
y...) STRING It is used to concatenate two or more strings.
concat(BINARY x, BINARY
y...) BINARY It is used to concatenate two or more binary values.
concat_ws(STRING sep, Similar to the concat function, it is used to concatenate two or more
STRING x, STRING y...) STRING
strings but with custom separator 'sep'.
It is used to encode the string value into a binary using the specified
encode(STRING x, STRING
charset) BINARY charset. Supported values for the charset are: UTF-8, UTF-16, UTF-16LE,
UTF-16BE, US-ASCII, and ISO-8859-1.
format_number(NUMBER x, INT It is used to format a number into the format '#,###,###.##' rounded to
d) STRING
d decimal places.
It will return the string with the first letter of each word in uppercase and
initcap(STRING x) STRING
all other letters in the same case.
instr(STRING x, STRING It will return the index/position of the first occurrence of substr in string
substr) INT
'X'. Index starts from 1 so the first character will return 1.
locate(STRING substr, It will return the position of the first occurrence of the substring substr
STRING x, INT n) INT
in string x after index n.
It is the same as the lower function and is used to return the string in
lcase(string A) STRING
lowercase.
lpad(STRING str, INT n, It is used to return the string str with left padded with the specified pad
STRING pad) STRING
to length n.
It trims the whitespaces from the left side of the string and returns the
ltrim(STRING x) STRING resulting string.
ngrams(ARRAY<ARRAY<STRING>>
ARRAY<STRUCT<STRING,DOUBLE>>
It will return k most frequent ngrams from an array of different tokenized
x, INT n, INT k, INT pf) sentences.
repeat(STRING x, int n) STRING It will repeat string x n times, and will return the resulting string.
rpad(STRING str, INT n, It is used to return the string str with right padded with the specified pad
STRING pad) STRING
to length n.
It trims the whitespaces from the right side of the string and return the
rtrim(string A) STRING
resulting string.
sentences(STRING x [, It is used to tokenize the string into different sentences, where each
STRING lang, STRING ARRAY<ARRAY<STRING>> sentence is an array of words. This function takes lang and locale as
locale])
optional arguments.
split(STRING x, STRING
regex)
ARRAY<STRING> It will split the string as per the specified regular expression.
str_to_map(STRING str, It will split the string into a key-value pair using specified delimiters.
STRING delimiter1, STRING MAP<STRING,STRING> delimiter1 splits the text into K-V pairs, and delimiter2 splits each K-V
delimiter2) pair into key and value.
substr(STRING|BINARY x, INT
STRING|BINARY
It will return the substring of the string or binary value starting from the
start) specified position.
substring(STRING|BINARY x,
INT start)
STRING|BINARY It is the same as the substr function.
It trims the whitespaces from both sides of the string and returns the
trim(STRING x) STRING
resulting string.
It is the same as the upper function and is used to return the string in
ucase(STRING x) STRING uppercase.
How it works…
Let's see how these functions can be used in real-time environments.
Mathematical functions
The following are a few examples of different mathematical functions.
ABS: This function returns the absolute value of a number:
hive> SELECT abs(-20.0);
20.0
Collection functions
The following are a few examples of different collection functions.
For using array functions, let's create a table with the array data type:
CREATE TABLE table_with_array_datatype (city STRING, pins ARRAY<INT>) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t' collection items terminated by ',';
Noida [201301,201303,201307]
Delhi [110001,110002,110003]
ARRAY_CONTAINS: This function can be used to check if a particular element in an array exists or not.
For example, we have to check in which cities the 110001 pin lies:
hive> SELECT city, array_contains(pins,110001) FROM table_with_array_datatype;
Noida false
Delhi true
SIZE: It is used to check the number of elements in a collection, that is, an array or map. Run the
following command to get the count of pin codes in each city:
hive> SELECT city, size(pins) FROM table_with_array_datatype;
Noida 3
Delhi 3
Time taken: 0.108 seconds, Fetched: 2 row(s)
To cast an object from one data type to another data type, data must be appropriate. If data is invalid
and cannot be cast as the specified data type, then this function will return NULL:
hive> SELECT cast('Hi John' as INT);
NULL
DATE_ADD: Run the following command to add five days to the date '2016-01-30':
hive> SELECT date_add('2016-01-30',5);
2016-02-04
DATE_FORMAT: Using this function, you can format the date from one format to another format. Run
the following command to convert the specified date into the format 'yyyy_MM_dd':
hive> SELECT date_format('2016-01-30','yyyy_MM_dd');
2016_01_30
DATE_SUB:
DAY, MONTH, YEAR: These functions are used to extract different parts of the date:
Day:
hive> SELECT day('2016-01-30');
30
Month:
hive> SELECT month('2016-01-30');
1
Year:
hive> SELECT year('2016-01-30');
2016
String functions
Let's see how string functions work in Hive:
ASCII: This function returns the ASCII value of the first character of string. The following example
will return the ASCII value of the character 'a'.
hive> SELECT ascii('abcd');
97
CONCAT:
CONCAT_WS:
FIND_IN_SET:
LCASE:
UPPER:
UCASE:
INITCAP:
Return
Function Name Description
Type
CASE a WHEN b THEN c [WHEN d THEN e]* When a = b then it will return c. When a = d then it will return e. Otherwise,
[ELSE f] END T
it will return f.
CASE WHEN a THEN b [WHEN c THEN d]* When a = true then it will return b. When c = true then it will return d.
[ELSE e] END T
Otherwise, it will return e.
It will return the first argument that is not NULL. If all arguments are NULL then
COALESCE(T v1, T v2, T vn) T
it will return NULL.
if(BOOLEAN testCondition, T x, T y) T If testCondition is true then it will return x, otherwise it will return y.
isnotnull(a) BOOLEAN It will return TRUE if a is not NULL, otherwise it will return FALSE.
isnull(a) BOOLEAN It will return TRUE if a is NULL, otherwise it will return FALSE.
Miscellaneous functions
There are some other functions that are used for different purposes, such as encryption, decryption,
hashing, and so on:
Return
Function Name Description
Type
current_user() STRING It will return the name of the current user who is connected to Hive in that session.
hash(a1[, a2...]) INT It will return the hash value of specified arguments.
java_method(class, method[, It is used to invoke static Java methods within Hive queries. The same functionality can be
arg1[, arg2..]]) varies
achieved using the reflect function.
reflect(class, method[, arg1[, varies It is used to invoke static Java methods within Hive queries.
arg2..]])