Hadoop Hive Cheat Sheet - Developer Guide For SQL To HiveQL - Qubole
Hadoop Hive Cheat Sheet - Developer Guide For SQL To HiveQL - Qubole
(/FATUR/)
://
(https://www.quole.com/)
(/
/)
COMMUNITY
PARTNR
IGNUPFORFR
(HTTP://WWW.QUOL.COM/TRIAL/)
PAG)
ROURC
HTTP
WWW QUOL COM ROURC
LOG HTTP
://WWW.QUOL.COM/LOG/)
(/AOUT-U)
(/UPPORT)
COMPANY
LOGIN HTTP
://www.quole.com/) >
Home https
HLP
://API.QUOL.COM/UR/IGN_IN)
HiveFunctionCheatheet
HiveFunctionCheatheet
(http://quole2.wpengine.com/wp-content/uploads/2014/01/hive-function-cheatsheet.pdf)
HiveFunctionMetacommands
listsHivefunctionsandoperators
DCRIFUNCTION[functionname]displasshortdescription
HOWFUNCTION
HiveFunctionCheatheet
DateFunctions
ofthefunction
]access
DCRIFUNCTIONXTNDD functionname
MathematicalFunctions
extendeddescriptionofthefunction
tringFunctions
TpesofHiveFunctions
CollectionFunctions
isafunctionthattakesoneormorecolumnsfromarowas
argumentandreturnsasinglevalueoroject.g:concat(col1,
col2)
UDTFtakeszeroormoreinputsandandproducesmultiple
columnsorrowsofoutput.g:explode()
MacrosafunctionthatusesotherHivefunctions.
UDF
HowToDevelopUDFs
UDAF
UDTF
ConditionalFunctions
FunctionsforTextAnaltics
Goto
Pig
@Description(name="YourUDFName",
value="_FUNC_(InputDataType)usingtheinputdatat
ypeXargument,"+
"returnsYYY.",
extended="Example:\n"
+">SELECT_FUNC_(InputDataType)FROMtabl
ename")
publicclassYourUDFNameextendsUDF{
..
publicYourUDFName(InputDataTypeInputValue){
..
}
publicStringevaluate(InputDataTypeInputValue){
..
}
}
Cheat
heet
(http://www.quole.com/resources/cheatsheet/pigfunction-cheat-sheet/)
importjava.util.Date
importjava.text.SimpleDateFormat
importorg.apache.hadoop.hive.ql.exec.UDF
Function
pulicclassYourGenericUDFNameextendsGenericUDF
{..}
pulicclassYourGenericUDAFNameextendsAstractGenericUDAFResolver
pulicclassYourGenericUDTFNameextendsGenericUDTF
{..}
{..}
HowToDeplo DropUDFs
Atstartofeachsession
ADDJAR/full_path_to_jar/YourUDFName.jar
CREATETEMPORARYFUNCTIONYourUDFNameAS'org.apache.hadoop.hive.contrib.udf.example.
YourUDFName'
Attheendofeachsession
DROPTEMPORARYFUNCTIONIFEXISTSYourUDFName
DateFunctions
-
Thefollowinguilt indatefunctionsaresupportedinhive
Return
Tpe
Name ignature
string
from
_unixtime(igintunixtime[,
stringformat])
xample
Convertsthenumerofsecondsfromunixepoch
(1970-01-0100:00:00UTC)toastringrepresenting
thetimestampofthatmomentinthecurrentsstem
- -
_timestamp()
Getscurrenttimestampusingthedefaulttimezone
_timestamp(stringdate)
Convertstimestringinformat MM dd
igint
unix
igint
unix
HH mm sstoUnixtimestamp return0iffail
_timestamp(2009-03-2011:30:01)=1237573801
unix
igint
string
_timestamp(stringdate,
stringpattern)
unix
to
_date(stringtimestamp)
ConverttimestringwithgivenpatterntoUnixtime
_timestamp(2009-03-20,
-MM-dd)=1237532400
Returnsthedatepartofatimestampstring
_date(1970-01-0100:00:00)=1970-01-01
to
int
ear stringdate
Returnstheearpartofadateoratimestampstring
(1970-01-0100:00:00)=1970,ear(1970-0101)=1970
ear
int
month stringdate
Returnsthemonthpartofadateoratimestamp
(1970-11-0100:00:00)=11,
month(1970-11-01)=11
string month
int
da stringdate
(1970-11-0100:00:00)=1,da(1970-11-01)=1
daofmonth date
da
:
(
3012:58:59)=12,hour(12:58:59)=12
int
hour stringdate
int
minute stringdate
int
second stringdate
int
weekofear stringdate
- -
Returnstheminuteofthetimestamp
Returnsthesecondofthetimestamp
Returntheweeknumerofatimestampstring
(1970-11-0100:00:00)=44,
weekofear(1970-11-01)=44
weekofear
int
startdate
Returnthedapartofadateoratimestampstring
Returnthenumerofdasfromstartdateto
- ,
- ) =
_add(stringstartdate,int
das)
string
date
- , ) =
timestamp
utractanumerofdastostartdate
date
_utc_timestamp(timestamp,
stringtimezone)
from
_utc_timestamp(timestamp,
stringtimezone)
to
- -
date
timestamp
_add(2008-
12 31 1 2009 01 01
_su(stringstartdate,int
das)
string
Addanumerofdastostartdate date
_su(2008-12-31,1)=2008-12-30
AssumesgiventimestampistUTCandconvertsto
. . )
giventimezone asofHive0 8 0
Assumesgiventimestampisingiventimezoneand
. . )
convertstoUTC asofHive0 8 0
MathematicalFunctions
-
( )
The following uilt in mathematical functions are supported in hive most return NULL when the argument s
areNULL
Return
Tpe
Name ignature
IGINT
round doulea
DOUL
IGINT
floor doulea
xample
ReturnstheroundedIGINTvalueofthedoule
Returnsthedouleroundedtoddecimalplaces
ReturnsthemaximumIGINTvaluethatisequalorless
thanthedoule
IGINT
),
ceil doulea
ReturnstheminimumIGINTvaluethatisequalor
ceiling doulea
doule
(),rand(intseed)
rand
greaterthanthedoule
Returnsarandomnumer thatchangesfromrowtorow
thatisdistriuteduniformlfrom0to1 pecifiingthe
seedwillmakesurethegeneratedrandomnumer
sequenceisdeterministic
doule
exp doulea
doule
ln doulea
doule
log10 doulea
doule
log2 doulea
doule
doule
Returnse whereeistheaseofthenaturallogarithm
Returnsthenaturallogarithmoftheargument
Returnsthease 10logarithmoftheargument
Returnsthease 2logarithmoftheargument
),
Returna
doule
sqrt doulea
string
in IGINTa
string
Returnsthesquarerootofa
Returnsthenumerininarformat
Iftheargumentisanint hexreturnsthenumerasa
stringinhexformat Otherwiseifthenumerisastring it
convertseachcharacterintoitshexrepresentationand
returnstheresultingstring
string
unhex stringa
Inverseofhex Interpretseachpairofcharactersasa
hexidecimalnumerandconvertstothecharacter
representedthenumer
string
(
,
from_ase,intto_ase),
conv(TRINGnum,int
from_ase,intto_ase)
conv IGINTnum int
Convertsanumerfromagivenasetoanother
doule
as doulea
intdoule
doule
sin doulea
doule
asin doulea
doule
cos doulea
doule
acos doulea
tan doule
Returnstheasolutevalue
(
,
)
pmod(doulea,doule)
(
Returnsthesineofa aisinradians
- <=a<=1ornullotherwise
Returnsthearcsinofxif 1
Returnsthecosineofa aisinradians
Returnsthepositivevalueofamod
- <=a<=1ornullotherwise
Returnsthearccosineofxif 1
tan doulea
Returnsthetangentofa aisinradians
doule
atan doulea
Returnsthearctangentofa
doule
degrees doulea
doule
radians doulea
intdoule
positive inta
intdoule
negative inta
float
sign doulea
doule
(
),
positive(doulea)
(
),
negative(doulea)
(
()
Convertsvalueofafromradianstodegrees
Convertsvalueofafromdegreestoradians
Returnsa
Returns a
-1.0
Returnsthesignofaas 1 0 or
Returnsthevalueofe
()
doule
pi
Returnsthevalueofpi
tringFunctions
-
Thefollowingareuilt intringfunctionsaresupportedinhive
ReturnTpe
Name ignature
int
ascii stringstr
xample
Returnsthenumericvalueofthefirst
characterofstr
string
string inar
Returnsthestringortesresultingfrom
concatenatingthestringsortespassed
. . .
ar)resultsinfooar.Notethatthis
functioncantakeannumerofinput
strings
<
<
>>
_ngrams(arra<arra>,
arra,intK,intpf)
context
asetoftokenizedsentences givenastring
of context
eetatisticsAndDataMiningformore
information
string
_ws(stringP,string
A,string)
concat
_ws(stringP,arra)
string
concat
int
find
_in_set(stringstr,string
.
()
separatorP
_ws()aove,uttakinganarra
ofstrings.(asofHive0.9.0)
Likeconcat
ReturnsthefirstoccuranceofstrinstrList
strList
wherestrListisacomma delimitedstring
Returnsnullifeitherargumentisnull
Returns0ifthefirstargumentcontainsan
. . .
_in_set(a,
ac,,a,c,def)returns3
commas e g find
string
_numer(numerx,int
format
string
_json_oject(string
json_string,stringpath)
get
FormatsthenumerXtoaformatlike
#,###,###.##,roundedtoDdecimal
places,andreturnstheresultasastring.If
Dis0,theresulthasnodecimalpointor
fractionalpart.(asofHive0.10.0)
xtractjsonojectfromajsonstringased
onjsonpathspecified andreturnjson
stringoftheextractedjsonoject Itwill
returnnulliftheinputjsonstringis
[ - - _],i.e.,noupper-case
orspecialcharacters.Also,thekes
*cannotstartwithnumers.*Thisisdueto
restrictionsonHivecolumnnames.
thecharacters 0 9a z
oolean
int
_file(stringstr,string
filename)
in
Returnstrueifthestringstrappearsasan
entirelineinfilename
Returnsthepositionofthefirstoccurence
ofsustrinstr
int
length stringA
int
Returnsthelengthofthestring
Returnsthepositionofthefirstoccurrence
[,
str intpos
])
ofsustrinstrafterpositionpos
string
string
(
pad)
string
ltrim stringA
Returnsthestringresultingfromtrimming
spacesfromtheeginning lefthandside of
. .
<
<
>>
<
>,intN,
intK intpf
Returnsthetop kN gramsfromasetof
tokenizedsentences suchasthose
()
returnedthesentences UDAF
eetatisticsAndDataMiningformore
information
string
_url(stringurltring,
stringpartToxtract[,string
keToxtract])
parse
ReturnsthespecifiedpartfromtheURL
ValidvaluesforpartToxtractinclude
,
,
,
,
AUTHORITY,FIL,andURINFO.e.g.
parse_url(http://faceook.com/path1/p.php?
k1=v1&k2=v2#Ref1,HOT)returns
faceook.com.Alsoavalueofaparticular
HOT PATH QURY RF PROTOCOL
keinQURYcaneextracted
providingthekeasthethirdargument
. .
e g
_url(http://faceook.com/path1/p.php?
k1=v1&k2=v2#Ref1,QURY,k1)returns
v1.
parse
string
printf tringformat Oj
args
string
Returnstheinputformattedaccordingdo
. . )
_extract(stringsuject,
stringpattern,intindex)
regexp
Returnsthestringextractedusingthe
. . .
_extract(foothear,
foo(.*?)(ar),2)returnsar.Notethat
pattern e g regexp
somecareisnecessarinusingpredefined
\sasthesecond
argumentwillmatchtheletters;sis
necessartomatchwhitespace,etc.The
indexparameteristheJavaregexMatcher
group()methodindex.ee
docs/api/java/util/regex/Matcher.htmlfor
moreinformationontheindexorJava
regexgroup()method.
characterclasses using
string
_replace(string
INITIAL_TRING,string
PATTRN,string
RPLACMNT)
regexp
Returnsthestringresultingfromreplacing
_TRINGthatmatch
allsustringsinINITIAL
thejavaregularexpressionsntaxdefined
inPATTRNwithinstancesof
, . .
regexp_replace(fooar,oo|ar,)returns
f.Notethatsomecareisnecessarin
usingpredefinedcharacterclasses:using
\sasthesecondargumentwillmatchthe
letters;sisnecessartomatch
whitespace,etc.
RPLACMNT e g
string
reverse stringA
string
Repeatstrntimes
Returnsthereversedstring
string
pad
lengthoflen
string
rtrim stringA
Returnsthestringresultingfromtrimming
)
e.g.rtrim(fooar)resultsinfooar
<
>
arra arra
lang stringlocale
Tokenizesastringofnaturallanguagetext
intowordsandsentences whereeach
sentenceisrokenattheappropriate
sentenceoundarandreturnedasan
optionalarguments.e.g.sentences(Hello
there!Howareou?)returns((Hello,
there),(How,are,ou))
arraofwords The lang and locale are
string
space intn
Returnastringofnspaces
arra
expression
<
>
_to_map(text[,delimiter1,
delimiter2])
str
plitstraroundpat patisaregular
)
-
plitstextintoke valuepairsusingtwo
=fordelimiter2.
string
,
start)sustring(string|inar
A,intstart)
sustr string inarA int
delimiters Delimiter1separatestextintoK V
,fordelimiter1and
Returnsthesustringorsliceofthete
arraofAstartingfromstartpositiontillthe
. .
, )
in ar
string
Returnsthesustringorsliceofthete
start intlen
arraofAstartingfromstartpositionwith
start intlen
string
. .
from stringto
, , )
Translatestheinputstringreplacingthe
characterspresentinthefromstringwith
thecorrespondingcharactersin
thetostring Thisissimilarto
thetranslatefunctioninPostgreQL Ifan
oftheparameterstothisUDFareNULL the
resultisNULLaswell availaleasof
. . )
Hive0 10 0
string
trim stringA
Returnsthestringresultingfromtrimming
. .
spacesfromothendsofAe g trim
string
Returnsthestringresultingfromconverting
. .
upper(fOoaR)resultsinFOOAR
allcharactersofAtouppercasee g
CollectionFunctions
-
Thefollowinguilt incollectionfunctionsaresupportedinhive
Return
Tpe
Name ignature
int
size Map
xample
Returnsthenumerofelementsinthemaptpe
int
size Arra
Returnsthenumerofelementsinthearratpe
arra
map
_kes(Map)
Returnsanunorderedarracontainingthekesoftheinputmap
arra
map
_values(Map)
Returnsanunorderedarracontainingthevaluesoftheinput
map
_contains(Arra,
value)
oolean
arra
arra
sort
_arra(Arra)
ReturnsTRUifthearracontainsvalue
ortstheinputarrainascendingorderaccordingtothenatural
. . )
orderingofthearraelementsandreturnsit asofversion0 9 0
Thefollowingareuilt inaggregatefunctionsaresupportedinHive
Return
Tpe
Name ignature
igint
count
xample
(*),count(expr),
count(DITINCTexpr[,
expr_.])
(*)Returnsthetotalnumerofretrievedrows,
includingrowscontainingNULLvalues;count(expr)
count
Returnsthenumerofrowsforwhichthesupplied
[,
])
( )
doule
),
Returnsthesumoftheelementsinthegrouporthesumof
col
doule
thedistinctvaluesofthecolumninthegroup
),
Returnstheaverageoftheelementsinthegrouporthe
averageofthedistinctvaluesofthecolumninthegroup
doule
min col
Returnstheminimumofthecolumninthegroup
doule
max col
doule
doule
var
Returnsthemaximumvalueofthecolumninthegroup
),
_pop(col)
_samp(col)
Returnsthevarianceofanumericcolumninthegroup
Returnstheuniasedsamplevarianceofanumericcolumn
inthegroup
doule
_pop(col)
stddev
Returnsthestandarddeviationofanumericcolumninthe
group
doule
_samp(col)
stddev
Returnstheuniasedsamplestandarddeviationofa
numericcolumninthegroup
doule
_pop(col1,col2)
covar
Returnsthepopulationcovarianceofapairofnumeric
columnsinthegroup
doule
_samp(col1,col2)
covar
Returnsthesamplecovarianceofapairofanumeric
columnsinthegroup
doule
ReturnsthePearsoncoefficientofcorrelationofapairofa
numericcolumnsinthegroup
doule
arra
, )
Returnstheexactp
Returnstheexactpercentilesp1 p2
percentile IGINTcol p
percentile IGINTcol
[,
arra p1 p2
]))
th
percentileofacolumninthegroup
(doesnotworkwithfloatingpointtpes).pmusteetween
0and1.NOT:Atruepercentilecanonlecomputedfor
integervalues.UsePRCNTIL_APPROXifourinputis
non-integral.
,
, ofacolumninthe
group(doesnotworkwithfloatingpointtpes).pimuste
computedforintegervalues UsePRCNTIL
_APPROXif
ourinputisnon integral
doule
percentile
arra
percentile
th
_approx(DOUL
col,arra(p1[,p2])[,])
arra
histogram
_numeric(col,)
ameasaove utacceptsandreturnsanarraof
percentilevaluesinsteadofasingleone
Computesahistogramofanumericcolumninthegroup
( , )
_set(col)
arra
collect
Returnsasetofojectswithduplicateelementseliminated
(), take in a single input row and output a single output row. In
contrast,tale-generatingfunctionstransformasingleinputrowtomultipleoutputrows.
Normal user defined functions such as concat
ReturnTpe
Name ignature
<
[,
]>)
xample
xplodesanarraofstructsintoatale asofHive
. )
0 10
()
xplode
explode takesinanarraasaninputandoutputs
theelementsofthearraasseparaterows UDTF s
caneusedintheLCTexpressionlistandasa
partofLATRALVIW
ConditionalFunctions
Return
Tpe
Name ignature
if ooleantestCondition T
xample
ReturnvalueTruewhentestConditionistrue
valueTrue TvalueFalseOrNull
returnsvalueFalseOrNullotherwise
, )
ReturnthefirstvthatisnotNULL orNULLifallv s
areNULL
= ,
CAaWHNTHNc WHNd
]*[Lf]ND
THNe
= ,
returnf
CAWHNaTHN WHNc
]*[Le]ND
THNd
elsereturne
FunctionsforTextAnaltics
(
ReturnTpe
<
<
Name ignature
>>
_ngrams(arra<arra>,
arra,intK,intpf)
context
xample
fromasetoftokenizedsentences
givenastringof context
eetatisticsAndDataMiningfor
. -
moreinformation N gramsare
susequencesoflengthNdrawn
fromalongersequence The
()
purposeofthengrams UDAFisto
findthekmostfrequentn grams
fromoneormoresequences Itcan
eusedinconjunctionwiththe
()
sentences UDFtoanalze
unstructurednaturallanguagetext
()
orthecollect functiontoanalze
moregeneralstringdata
<
<
>>
(
K,intpf)
<
>,intN,int
Returnsthetop kN gramsfromaset
oftokenizedsentences suchas
thosereturnedthesentences
()
UDAF
eetatisticsAndDataMiningfor
whichn gramsaretoeestimated
Forexample oucanspecifthat
ou reonlinterestedinfindingthe
mostcommontwo wordphrasesin
textthatfollowthecontext Ilove
Youcouldachievethesameresult
manuallstrippingsentencesof
non contextualcontentandthen
passingthemtongrams
context
(),ut
_ngrams()makesitmuch
easier
to Amazon 3 http
data
processing
pipeline
hosted
Quole
://www.quora.com/AW-Redshift)aseddatawarehouse
Redshift http
PrakashJanakiraman Co FounderandVPngineering
Nextdoor
ContactUs
upport
FreeTrial
(https://www.quole.com/contact(https
- ://quole.zendesk.com/hc/(en
https
- ://www.quole.com/trialus/)
us)
page)
AoutUs
igDataUseCases
(https://www.quole.com/aoutus/)
(/resources/solution/estuse-cases-for-igdata-analtics/)
PressReleases
(https://www.quole.com/pressreleases/)
Weinars
(https://www(.http
faceook
://www.com
.(linkedin
https
/quole
://twitter
.com
) /compan
.com/@quole
/quole
) )
2016Quole,Inc.Allrightsreserved
ecurit
Reporting
(https://www.quole.com/weinars
/) (/securit-reporting/)Privac(/privac-polic/)