0% found this document useful (0 votes)
21 views

HIVE Built-In Functions

Uploaded by

vasikar22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

HIVE Built-In Functions

Uploaded by

vasikar22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Using built-in functions

There are various built-in functions available in Hive that can be used in queries for executing various
operations. These functions are used to extract or manipulate data in Hive tables.

Built-in functions are divided into the following categories:


Mathematical functions
Collection functions
Type conversion functions
Date functions
String functions
Conditional functions
Miscellaneous functions
How to do it…
In the following sections, a few built-in functions are explained.
Mathematical functions
Hive supports various functions to run some mathematical operations on field values. There are a large
number of mathematical functions available in Hive that can be used in queries. Most of the mathematical
functions are the same as supported in RDBMS:

Return
Function Name Description
Type

abs(DOUBLE x) DOUBLE It will return an absolute value of x.

acos(DOUBLE x), It will return an arc cosine value of x if the value of x is equal to or between -1 and 1. Otherwise, it
DOUBLE
acos(DECIMAL x) will return NULL.

asin(DOUBLE x), It will return an arc sin value of x if the value of x is equal to or between -1 and 1. Otherwise it will
DOUBLE
asin(DECIMAL x) return NULL.

atan(DOUBLE x),
atan(DECIMAL x)
DOUBLE It will return an arc tangent value of x.

bin(BIGINT x) STRING It will return the binary value of the number "x" in string format.

cbrt(DOUBLE x) DOUBLE It will return the cube root value of x.

ceil(DOUBLE x) BIGINT It will return the ceil value of x that is the minimum number greater than or equal to x.

ceiling(DOUBLE x) BIGINT It is the same as the ceil function.

conv(BIGINT x, INT It is used to convert the number x from one base to another base. The value returned will be in string
from_base, INT STRING
to_base) format.

conv(STRING x, INT
It is used to convert the string x from one base to another base. The value returned will be in String
from_base, INT STRING
to_base) format.

cos(DOUBLE|DECIMAL x) DOUBLE It is used derive a cosine value of x, where x is in radians.

degrees(DOUBLE|DECIMAL It is used to convert x from radians to degree format, where the x parameter is in double or decimal
x) DOUBLE
format.

e() DOUBLE It will return the value of exponential e.

exp(DOUBLE|DECIMAL x) DOUBLE It will return the exponential value of x, where e is the base of the natural algorithm.
factorial(INT x) BIGINT It will return the factorial value of the number x.

floor(DOUBLE x) BIGINT It will return the floor value of x that is the maximum number less than or equal to x.

greatest(T v1, T v2, It will return the maximum value from the list of values specified. The data type can be any but must
and T Vn) T
be the same for all values passed to this function. If any argument is NULL then it will return NULL.

It is used to get a hexadecimal value of the number x of the bigint, string, or binary data type. In the
hex(BIGINT|
STRING|BINARY x) STRING case of string, it will convert each character into its hexadecimal format and will return the resulting
string.

least(T v1, T v2, and It will return the lowest value from the list of values specified. The data type can be any but must be
so on) T
the same for all values passed to this function. If any argument is NULL then it will return NULL.

negative(INT x) INT It will return a negative value of x.

negative(DOUBLE x) DOUBLE It will return a negative value of x.

pi() DOUBLE It will return a value of pi.

pmod(INT x, INT y) INT It will return a positive value of x modulus y, that is, x mod y.

pmod(DOUBLE x, DOUBLE
y) DOUBLE It will return a positive value of x modulus y, that is, x mod y.

positive(INT a) INT It will return a positive value of x.

positive(DOUBLE a) DOUBLE It will return a positive value of x.

pow(DOUBLE x, DOUBLE
n) DOUBLE It is used to derive the power n of the number x, that is, xn.

power(DOUBLE x, DOUBLE
n) DOUBLE It is the same as the pow function.

radians(DOUBLE| It is used to convert x from degree to radian format, where the x parameter is in double or decimal
DECIMAL x) DOUBLE
format.

rand() DOUBLE It will return any random number.

rand(INT x) DOUBLE It will return any random number with seed value x.

round(DOUBLE x) DOUBLE It will round off the value of x.

round(DOUBLE x, INT n) DOUBLE It will round off the value of x to n decimal places.
sign(DOUBLE x) DOUBLE It will return sign of the number. If x is positive then it will return '1.0' and if x is negative then it
will return '-1.0', otherwise it will return '0.0'.

sign(DECIMAL x) DECIMAL It is the same as just described but for decimal numbers.

sin(DOUBLE x) DOUBLE It is used to derive the sine value of x, where x is in radians of the double data type.

sin(DECIMAL x) DOUBLE It is used to derive the sine value of x, where x is in radians of the decimal data type.

sqrt(DOUBLE x) DOUBLE It will return the square root value of x, where x is of the double data type.

sqrt(DECIMAL x) DOUBLE It will return the square root value of x, where x is of the decimal data type.

tan(DOUBLE|DECIMAL x) DOUBLE It is used derive the tangent value of x, where x is in radians of the double or decimal data type.

unhex(STRING x) BINARY It will return a byte conversion of the number x.

Collection functions
Hive also supports some functions that can be executed on Hive complex data types, such as array and
map:

Function Name Return Type Description

array_contains(ARRAY<T>, value) BOOLEAN It is used to check whether a value exists in an array or not.

map_keys(Map<K,V>) ARRAY<K> It will return all the keys of a map in an unordered array.

map_values(Map<K,V>) ARRAY<V> It will return all the values of a map in an unordered array.

size(Array<T>) INT It will return the number of elements in an array.

size(Map<K,V>) INT It will return the number of elements in a map.

sort_array(Array<T>) ARRAY<T> It will return the sorted array in ascending order.

Type conversion functions


Hive supports the following type conversion functions:

Function Name Return Type Description

binary(string|binary) BINARY It is used to cast the field value into binary format.

cast(expr as T) T It is used to cast the result of an expression to a specific data type.


Date functions
Hive supports the following built-in functions for date and time operations:

Function Name Return Type Description

add_months(string startDate, int


n) STRING It is used to add n of months to a specified date.

current_date() DATE It will return the current date. Only the date part is returned as a result.

current_timestamp() TIMESTAMP It will return the current timestamp.

date_add(string startDate, int n) STRING It is used to add n number of days to a specified date.

date_format(date/timestamp/string
ts, string fmt) STRING It is used to format the date to any specified format.

date_sub(string startDate, int n) STRING It is used to subtract n of days from a specified date.

datediff(string endDate, string


startDate) INT It will return the number of days between a specified date range.

day(string date) INT It is used to extract the day part from a date.

dayofmonth(date) INT It is the same as the day function.

from_unixtime(bigint unixtime[,
string format]) STRING It is used to convert UNIX epoch time to timestamp in the system time zone format.

from_utc_timestamp(timestamp,
string timezone) TIMESTAMP It is used to convert UTC time to a specified time zone format.

hour(string date) INT It is used to extract the hour part from a timestamp.

It will return the timestamp of the last day of the month of which the specified date
last_day(string date) STRING
belongs to.

minute(string date) INT It is used to extract the minute part from a timestamp.

month(string date) INT It is used to extract the month part from a timestamp.

months_between(date1, date2) DOUBLE It will return number of months between a specific date range.

It will return the date of the day that is after the start date and matches the specified
dayOfWeek. There are three type of values supported in the second argument
next_day(string startDate, string STRING dayOfWeek: (a) 2 letters day of week; example MO, TU (b) 3 letters day of week;
dayOfWeek) example MON, TUE, and (c) full name day of week; example MONDAY,
TUESDAY.

second(string date) INT It is used to extract the second part from a timestamp.

to_date(string timestamp) STRING It will return the date part of a specified timestamp value.

to_utc_timestamp(timestamp,
string timezone) TIMESTAMP It is used to convert the timestamp of any time zone to UTC format.

It will truncate the date as per the specified format. Formats supported are
trunc(string date, string format) STRING
YEAR/YYYY/YY, MONTH/MON/MM.

unix_timestamp() BIGINT It will return the current UNIX timestamp in seconds.

unix_timestamp(string date) BIGINT It will convert the date to UNIX timestamp in seconds.

unix_timestamp(string date,
string pattern) BIGINT It will convert the date of the specified pattern to UNIX timestamp in seconds.

weekofyear(string date) INT It will return the week number for a year of the specified date.

year(string date) INT It is used to extract the year part from a timestamp.

String functions
Hive supports the following built-in functions for operations on string objects:

Function Name Return Type Description

ascii(STRING x) INT It will return the numeric (ASCII) value of the first character of a string.

base64(BINARY x) STRING It is used to convert the binary value to base-64 string format.

concat(STRING x, STRING
y...) STRING It is used to concatenate two or more strings.

concat(BINARY x, BINARY
y...) BINARY It is used to concatenate two or more binary values.

concat_ws(STRING sep, Similar to the concat function, it is used to concatenate two or more
STRING x, STRING y...) STRING
strings but with custom separator 'sep'.

It is the same as the preceding function. It takes the array of a string as


concat_ws(STRING SEP,
ARRAY<STRING> arr) STRING an argument and is used to concatenate all strings of the array with the
specified separator.
decode(BINARY x, STRING STRING It is used to decode the binary value into a string using the specified
charset)
charset. Supported values for the charset are: UTF-8, UTF-16, UTF-16LE,
UTF-16BE, US-ASCII, and ISO-8859-1.

It is used to encode the string value into a binary using the specified
encode(STRING x, STRING
charset) BINARY charset. Supported values for the charset are: UTF-8, UTF-16, UTF-16LE,
UTF-16BE, US-ASCII, and ISO-8859-1.

It is used to find an element in a comma separated list of elements. This


find_in_set(STRING element,
function returns the index/position of a string element in elementList,
STRING elementList) INT where element is a comma separated string of different elements.

If the first argument contains a comma then it will return 0.

format_number(NUMBER x, INT It is used to format a number into the format '#,###,###.##' rounded to
d) STRING
d decimal places.

It is used to get a JSON object from the JSON path specified.


get_json_object(STRING
STRING
json_string, STRING path) In the JSON path, uppercase characters and special characters are not
allowed. Also in JSON, keys should not start with any number.

in_file(STRING str, STRING


filename) BOOLEAN It is used to check if a particular string exists as a line in a file.

It will return the string with the first letter of each word in uppercase and
initcap(STRING x) STRING
all other letters in the same case.

instr(STRING x, STRING It will return the index/position of the first occurrence of substr in string
substr) INT
'X'. Index starts from 1 so the first character will return 1.

length(STRING x) INT It will return the length of string x.

It is used to calculate the levenshtein distance between two string


levenshtein(STRING x, arguments. The levenshtein distance between two words is the
STRING y) INT
minimum number of changes of characters that are required to convert
one word to another word.

locate(STRING substr, It will return the position of the first occurrence of the substring substr
STRING x, INT n) INT
in string x after index n.

lower(string A) STRING It will return the string in lowercase.

It is the same as the lower function and is used to return the string in
lcase(string A) STRING
lowercase.

lpad(STRING str, INT n, It is used to return the string str with left padded with the specified pad
STRING pad) STRING
to length n.

It trims the whitespaces from the left side of the string and returns the
ltrim(STRING x) STRING resulting string.
ngrams(ARRAY<ARRAY<STRING>>
ARRAY<STRUCT<STRING,DOUBLE>>
It will return k most frequent ngrams from an array of different tokenized
x, INT n, INT k, INT pf) sentences.

repeat(STRING x, int n) STRING It will repeat string x n times, and will return the resulting string.

reverse(string x) STRING It will return the reverse of a string.

rpad(STRING str, INT n, It is used to return the string str with right padded with the specified pad
STRING pad) STRING
to length n.

It trims the whitespaces from the right side of the string and return the
rtrim(string A) STRING
resulting string.

sentences(STRING x [, It is used to tokenize the string into different sentences, where each
STRING lang, STRING ARRAY<ARRAY<STRING>> sentence is an array of words. This function takes lang and locale as
locale])
optional arguments.

soundex(STRING x) STRING It will return the soundex code of string x.

space(INT n) STRING It will return a blank string with n whitespaces.

split(STRING x, STRING
regex)
ARRAY<STRING> It will split the string as per the specified regular expression.

str_to_map(STRING str) MAP<STRING,STRING>


It will split the string into a key-value pair using the delimiter "," between
each key-value pair and the delimiter "=" between key and value.

str_to_map(STRING str, It will split the string into a key-value pair using specified delimiters.
STRING delimiter1, STRING MAP<STRING,STRING> delimiter1 splits the text into K-V pairs, and delimiter2 splits each K-V
delimiter2) pair into key and value.

substr(STRING|BINARY x, INT
STRING|BINARY
It will return the substring of the string or binary value starting from the
start) specified position.

substring(STRING|BINARY x,
INT start)
STRING|BINARY It is the same as the substr function.

It trims the whitespaces from both sides of the string and returns the
trim(STRING x) STRING
resulting string.

unbase64(STRING x) BINARY It will convert the string x from base64 to binary.

upper(STRING x) STRING It will return the string in uppercase.

It is the same as the upper function and is used to return the string in
ucase(STRING x) STRING uppercase.
How it works…
Let's see how these functions can be used in real-time environments.
Mathematical functions
The following are a few examples of different mathematical functions.
ABS: This function returns the absolute value of a number:
hive> SELECT abs(-20.0);
20.0

ACOS: This function returns the arc cosine value of a number:


hive> SELECT acos(0.5);
1.0471975511965979
hive> SELECT acos(1);
0.0

ASIN: This function returns the arc sine value of a number:


hive> SELECT asin(0.5);
0.5235987755982989
hive> SELECT asin(1);
1.5707963267948966

BIN: This function returns the binary value of a number:


hive> SELECT bin(14);
1110
hive> SELECT bin(15);
1111

CBRT: This function returns the cube-root value of a given number:


hive> SELECT cbrt(27.0);
3.0

RAND: This function is used to generate any random number:


hive> SELECT rand();
0.5654304130197764
hive> SELECT rand();
0.3892359489373104

Collection functions
The following are a few examples of different collection functions.

For using array functions, let's create a table with the array data type:
CREATE TABLE table_with_array_datatype (city STRING, pins ARRAY<INT>) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t' collection items terminated by ',';

Now, load some sample data, as shown in the following table:


City Pins

Noida [201301,201303,201307]

Delhi [110001,110002,110003]

ARRAY_CONTAINS: This function can be used to check if a particular element in an array exists or not.
For example, we have to check in which cities the 110001 pin lies:
hive> SELECT city, array_contains(pins,110001) FROM table_with_array_datatype;
Noida false
Delhi true

SIZE: It is used to check the number of elements in a collection, that is, an array or map. Run the
following command to get the count of pin codes in each city:
hive> SELECT city, size(pins) FROM table_with_array_datatype;
Noida 3
Delhi 3
Time taken: 0.108 seconds, Fetched: 2 row(s)

Type conversion functions


The following are a few examples of type conversion functions:
CAST: The next example will cast a string object with the value 100 to the integer object:
hive> SELECT cast('1000' as INT);
1000

To cast an object from one data type to another data type, data must be appropriate. If data is invalid
and cannot be cast as the specified data type, then this function will return NULL:
hive> SELECT cast('Hi John' as INT);
NULL

The following image is showing examples of the cast function:


Date functions
The following are a few examples of date functions.
ADD_MONTHS: Run the following command to add three months to the date '2016-01-30':
hive> SELECT add_months('2016-01-30',3);
2016-04-30

CURRENT_DATE: This function return the current date of the system:


hive> SELECT current_date();
2016-01-23

CURRENT_TIMESTAMP: This function return the current timestamp of the system:


hive> SELECT current_timestamp();
2016-01-23 15:51:04.616

DATE_ADD: Run the following command to add five days to the date '2016-01-30':
hive> SELECT date_add('2016-01-30',5);
2016-02-04

DATE_FORMAT: Using this function, you can format the date from one format to another format. Run
the following command to convert the specified date into the format 'yyyy_MM_dd':
hive> SELECT date_format('2016-01-30','yyyy_MM_dd');
2016_01_30

DATE_SUB:

hive> SELECT date_sub('2016-01-30',3);


2016-01-27
DATEDIFF:

hive> SELECT datediff('2016-01-30', '2016-01-25');

DAY, MONTH, YEAR: These functions are used to extract different parts of the date:
Day:
hive> SELECT day('2016-01-30');
30

Month:
hive> SELECT month('2016-01-30');
1

Year:
hive> SELECT year('2016-01-30');
2016

UNIX_TIMESTAMP: It will return the current UNIX timestamp in seconds:


hive> SELECT unix_timestamp();
1453548270

String functions
Let's see how string functions work in Hive:
ASCII: This function returns the ASCII value of the first character of string. The following example
will return the ASCII value of the character 'a'.
hive> SELECT ascii('abcd');
97

CONCAT:

hive> SELECT concat('value1','value2','value3');


value1value2value3

CONCAT_WS:

hive> SELECT concat_ws('_','value1','value2','value3');


value1_value2_value3

FIND_IN_SET:

hive> SELECT find_in_set('india', 'us,uk,india,pakistan');


3

LOWER, LCASE, UPPER, UCASE:


LOWER:

hive> SELECT lower('heLLo woRlD');


hello world

LCASE:

hive> SELECT lcase('heLLo woRlD');


hello world

UPPER:

hive> SELECT upper('heLLo woRlD');


HELLO WORLD

UCASE:

hive> SELECT ucase('heLLo woRlD');


HELLO WORLD

INITCAP:

hive> SELECT initcap('heLLo woRlD');


Hello World
There's more
Apart from the various functions (of different categories, such as mathematical, collection, type
conversion, date, and string) described previously, there are also some more functions that can be used in
Hive.
Conditional functions
These are the functions that are used for conditional statements.

Return
Function Name Description
Type

CASE a WHEN b THEN c [WHEN d THEN e]* When a = b then it will return c. When a = d then it will return e. Otherwise,
[ELSE f] END T
it will return f.

CASE WHEN a THEN b [WHEN c THEN d]* When a = true then it will return b. When c = true then it will return d.
[ELSE e] END T
Otherwise, it will return e.

It will return the first argument that is not NULL. If all arguments are NULL then
COALESCE(T v1, T v2, T vn) T
it will return NULL.

if(BOOLEAN testCondition, T x, T y) T If testCondition is true then it will return x, otherwise it will return y.

isnotnull(a) BOOLEAN It will return TRUE if a is not NULL, otherwise it will return FALSE.

isnull(a) BOOLEAN It will return TRUE if a is NULL, otherwise it will return FALSE.

It will return x if x is not NULL, otherwise it will return the specified


nvl(T x, T defaultValue) T
defaultValue.

Miscellaneous functions
There are some other functions that are used for different purposes, such as encryption, decryption,
hashing, and so on:

Return
Function Name Description
Type

current_user() STRING It will return the name of the current user who is connected to Hive in that session.

hash(a1[, a2...]) INT It will return the hash value of specified arguments.

java_method(class, method[, It is used to invoke static Java methods within Hive queries. The same functionality can be
arg1[, arg2..]]) varies
achieved using the reflect function.
reflect(class, method[, arg1[, varies It is used to invoke static Java methods within Hive queries.
arg2..]])

You might also like