The document defines key concepts in data warehousing including data warehouses, data marts, OLTP vs OLAP, fact tables, dimension tables, star schemas, and snowflake schemas. It also discusses why organizations implement data warehousing, providing a brief history and explaining that data warehousing separates analytical reporting and decision support functions from operational transaction systems.
Download as DOC, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
32 views
Basic Definitions
The document defines key concepts in data warehousing including data warehouses, data marts, OLTP vs OLAP, fact tables, dimension tables, star schemas, and snowflake schemas. It also discusses why organizations implement data warehousing, providing a brief history and explaining that data warehousing separates analytical reporting and decision support functions from operational transaction systems.
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10
BASIC DEFINITIONS
DWH: is a repository of integrated information, specifically structured for
queries and analysis. Data and information are extracted from heterogeneous sources as they are generated. This makes it much easier and more efficient to run queries over data that originally came from different sources. Data Mart: is a collection of subject areas organized for decision support based on the needs of a given department. x: sales, marketing etc. the data mart are designed to suit the needs of a department. Data mart is much less granular than the !are house data Data Warehouse: is used on an enterprise level, !hile data marts are used on a business division " department level. Data !arehouses are arranged around the corporate subject areas found in the corporate data model. Data !arehouses contain more detail information !hile most data marts contain more summarized or aggregated data. OLTP: #nline Transaction $rocessing. This is standard, normalized database structure. #%T$ is designed for Transactions, !hich means that inserts, updates and deletes must be fast. OLAP: #nline &nalytical $rocessing. 'ead(only, historical, aggregated data. Fact Table: contain the quantitative measures about the business $age ) of )* Dimesio Table: descriptive data about the facts +business, Co!orme" "imesios: dimension table shared by fact tables... these tables connect separate star schemas into an enterprise star schema. Star Schema: is a set of tables comprised of a single, central fact table surrounded by de(normalized dimensions. -tar schema implement dimensional data structures !ith de(normalized dimensions So# Fla$e: is a set of tables comprised of a single, central fact table surrounded by normalized dimension hierarchies. -no!flake schema implements dimensional data structures !ith fully normailized dimensions. Sta%i% Area: it is the !ork place !here ra! data is brought in, cleaned, combined, archived and exported to one or more data marts. The purpose of data staging area is to get data ready for loading into a presentation layer. &ueries: The D./ contains 0 types of queries. There !ill be fixed queries that are clearly defined and !ell understood, such as regular reports, canned queries and common aggregations. There !ill also be ad hoc queries that are unpredictable, both in quantity and frequency. A" Hoc &uer': 1s the starting point for any analysis into a database. The ability to run any query !hen desired and expect a reasonable response that $age 0 of )* makes the data !arehouse !orth!hile and makes the design such a significant challenge. The end(user access tools are capable of automatically generating the database query that ans!ers any question posted by the user. Cae" &ueries: are pre(defined queries. 2anned queries contain prompts that allo! you to customize the query for your specific needs (imball )Bottom u*+ ,s- Imo )To* "o#+ a**roaches: Acc. To Ralph Kimball, when you plan to design analytical solutions for an enterprise, try building data marts. When you have 3 or 4 such data marts, you would be having an enterprise wide data warehouse built up automatically without time and effort from eclusively spent on building the !"W#. $ecause the time re%uired for building a data mart is lesser than for an !"W#. INMON : try to build an nterprise !ide Data !arehouse first and all the data marts !ill be the subsets of the D./. &cc. To him, independent data marts cannot make up an enterprise data !arehouse under any circumstance, but they !ill remain isolated pieces of information 3stove pieces $age 4 of )* INTRODUCTION TO DATA WAREHOUSE Itro"uctio Busiesses o! all si.es a" i "i!!eret i"ustries/ as #ell as %o,ermet a%ecies/ are !i"i% that the' ca reali.e si%i!icat bee!its b' im*lemeti% a "ata #arehouse- It is %eerall' acce*te" that "ata #arehousi% *ro,i"es a e0cellet a**roach !or tras!ormi% the ,ast amouts o! "ata that e0ist i these or%ai.atios ito use!ul a" reliable i!ormatio !or %etti% as#ers to their 1uestios a" to su**ort the "ecisio ma$i% *rocess- A "ata #arehouse *ro,i"es the base !or the *o#er!ul "ata aal'sis techi1ues that are a,ailable to"a' such as "ata mii% a" multi"imesioal aal'sis/ as #ell as the more tra"itioal 1uer' a" re*orti%- Ma$i% use o! these techi1ues alo% #ith "ata #arehousi% ca result i easier access to the i!ormatio 'ou ee" !or more i!orme" "ecisio ma$i%- Ho#e,er/ at the e" o! all the research/ *lai%/ a" architecti%/ 'ou #ill come to reali.e that it all starts #ith a !irm !ou"atio- Whether 'ou are buil"i% a lar%e cetrali.e" "ata #arehouse/ oe or more smaller "istribute" "ata #arehouses sometimes calle" "ata marts+/ or some combiatio o! the t#o/ 'ou #ill al#a's come to the *oit #here 'ou must "eci"e o ho# the "ata is to be structure"- This is/ a!ter all/ oe o! the most $e' coce*ts i "ata #arehousi% a" #hat "i!!eretiates it !rom the more t'*ical o*eratioal "atabase a" "ecisio su**ort a**licatio buil"i%- That is/ 'ou structure the "ata a" buil" a**licatios arou" it rather tha structuri% a**licatios a" bri%i% "ata to them- $age 5 of )* Data #arehouse mo"eli% is a *rocess that *ro"uces abstract "ata mo"els !or oe or more "atabase com*oets o! the "ata #arehouse- It is oe *art o! the o,erall "ata #arehouse "e,elo*met *rocess/ #hich is com*rise" o! other ma2or *rocesses such as "ata #arehouse architecture/ "esi%/ a" costructio- We cosi"er the "ata #arehouse mo"eli% *rocess to cosist o! all tas$s relate" to re1uiremets %atheri%/ aal'sis/ ,ali"atio/ a" mo"eli%- T'*icall' !or "ata #arehouse "e,elo*met/ these tas$s are "i!!icult to se*arate- The se*aratio bet#ee mo"eli% a" "esi% is "oe !or *ractical reasos: it is our itetio to co,er the mo"eli% acti,ities a" techi1ues 1uite e0tesi,el'- Data Warehousi% Data #arehousi% as more tha 2ust a *ro"uct/ or set o! *ro"ucts3it is a solutio4 It is a i!ormatio e,iromet that is se*arate !rom the more t'*ical trasactio5oriete" o*eratioal e,iromet- Data #arehousi% is/ i a" o! itsel!/ a i!ormatio e,iromet that is e,ol,i% as a critical resource i to"a'6s or%ai.atios- O!te #e thi$ that a "ata #arehouse is a *ro"uct/ or %rou* o! *ro"ucts/ that #e ca bu' to hel* %et as#ers to our 1uestios a" im*ro,e our "ecisio5 ma$i% ca*abilit'- But/ it is ot so sim*le- A "ata #arehouse ca hel* us %et as#ers !or better "ecisio ma$i%/ but it is ol' oe *art o! a more %lobal set o! *rocesses- These are all 1uestios that must be as#ere" be!ore a "ata #arehouse ca be built- We *re!er to "iscuss the more $age 6 of )* %lobal e,iromet/ a" #e re!er to it as "ata #arehousi%- Data #arehousi% is the "esi% a" im*lemetatio o! *rocesses/ tools/ a" !acilities to maa%e a" "eli,er com*lete/ timel'/ accurate/ a" u"ersta"able i!ormatio !or "ecisio ma$i%- It iclu"es all the acti,ities that ma$e it *ossible !or a or%ai.atio to create/ maa%e/ a" maitai a "ata #arehouse or "ata mart- Wh' Data Warehousi%7 The coce*t o! "ata #arehousi% has e,ol,e" out o! the ee" !or eas' access to a structure" store o! 1ualit' "ata that ca be use" !or "ecisio ma$i%- It is %loball' acce*te" that i!ormatio is a ,er' *o#er!ul asset that ca *ro,i"e si%i!icat bee!its to a' or%ai.atio a" a com*etiti,e a",ata%e i the busiess #orl"- Or%ai.atios ha,e ,ast amouts o! "ata but ha,e !ou" it icreasi%l' "i!!icult to access it a" ma$e use o! it- This is because it is i ma' "i!!eret !ormats/ e0ists o ma' "i!!eret *lat!orms/ a" resi"es i ma' "i!!eret !ile a" "atabase structures "e,elo*e" b' "i!!eret ,e"ors- Thus or%ai.atios ha,e ha" to #rite a" maitai *erha*s hu"re"s o! *ro%rams that are use" to e0tract/ *re*are/ a" cosoli"ate "ata !or use b' ma' "i!!eret a**licatios !or aal'sis a" re*orti%- Also/ "ecisio ma$ers o!te #at to "i% "ee*er ito the "ata oce iitial !i"i%s are ma"e- This #oul" t'*icall' re1uire mo"i!icatio o! the e0tract *ro%rams or "e,elo*met o! e# oes- This *rocess is costl'/ ie!!iciet/ a" ,er' time cosumi%- Data #arehousi% o!!ers a better a**roach- Data #arehousi% im*lemets the *rocess to access hetero%eeous "ata sources8 clea/ !ilter/ a" tras!orm the "ata8 a" store the "ata i a $age 7 of )* structure that is eas' to access/ u"ersta"/ a" use- The "ata is the use" !or 1uer'/ re*orti%/ a" "ata aal'sis- As such/ the access/ use/ techolo%'/ a" *er!ormace re1uiremets are com*letel' "i!!eret !rom those i a trasactio5oriete" o*eratioal e,iromet- The ,olume o! "ata i "ata #arehousi% ca be ,er' hi%h/ *articularl' #he cosi"eri% the re1uiremets !or historical "ata aal'sis- Data aal'sis *ro%rams are o!te re1uire" to sca ,ast amouts o! that "ata/ #hich coul" result i a e%ati,e im*act o o*eratioal a**licatios/ #hich are more *er!ormace sesiti,e- There!ore/ there is a re1uiremet to se*arate the t#o e,iromets to miimi.e co!licts a" "e%ra"atio o! *er!ormace i the o*eratioal e,iromet- Short Histor' The ori%i o! the coce*t o! "ata #arehousi% ca be trace" bac$ to the earl' 9:;<s/ #he relatioal "atabase maa%emet s'stems emer%e" as commercial *ro"ucts- The !ou"atio o! the relatioal mo"el #ith its sim*licit'/ to%ether #ith the 1uer' ca*abilities *ro,i"e" b' the S&L la%ua%e/ su**orte" the %ro#i% iterest i #hat the #as calle" e"5 user com*uti% or "ecisio su**ort- To su**ort e"5user com*uti% e,iromets/ "ata #as e0tracte" !rom the or%ai.atios olie "atabases a" store" i e#l' create" "atabase s'stems "e"icate" to su**orti% a" hoc e"5user 1ueries a" re*orti% !uctios o! all $i"s- Oe o! the *rime cocers u"erl'i% the creatio o! these s'stems #as the *er!ormace im*act o! e"5user com*uti% o the o*eratioal "ata *rocessi% s'stems- This cocer *rom*te" the re1uiremet to se*arate $age 8 of )* e"5user com*uti% s'stems !rom trasactioal *rocessi% s'stems- I those earl' "a's o! "ata #arehousi%/ the e0tracts o! o*eratioal "ata #ere usuall' sa*shots or subsets o! the o*eratioal "ata- These sa*shots #ere loa"e" i a e"5user com*uti% )or "ecisio su**ort+ "atabase s'stem o a re%ular basis/ *erha*s oce a #ee$ or oce *er moth- Sometimes a limite" umber o! ,ersios o! these sa*shots #ere e,e accumulate" i the s'stem #hile access #as *ro,i"e" to e" users e1ui**e" #ith 1uer' a" re*orti% tools- Data mo"eli% !or these "ecisio su**ort "atabase s'stems #as ot much o! a cocer- Data mo"els !or these "ecisio su**ort s'stems t'*icall' matche" the "ata mo"els o! the o*eratioal s'stems because/ a!ter all/ the' #ere e0tracte" sa*shots a'ho#- Oe o! the !re1uetl' occurri% remo"eli% issues the #as to =ormali.>?the "ata to elimiate the ast' e!!ects o! "esi% techi1ues that ha" bee a**lie" o the o*eratioal s'stems to ma0imi.e their *er!ormace/ to elimiate co"e tables that #ere "i!!icult to u"ersta"/ alo% #ith other local cleau* acti,ities- But b' a" lar%e/ the "ecisio su**ort "ata mo"els #ere techical i ature a" *rimaril' cocere" #ith *ro,i"i% "ata a,ailable i the o*eratioal a**licatio s'stems to the "ecisio su**ort e,iromet- The role a" *ur*ose o! "ata #arehouses i the "ata *rocessi% i"ustr' ha,e e,ol,e" cosi"erabl' sice those earl' "a's a" are still e,ol,i% ra*i"l'- Com*ari% to"a'6s "ata #arehouses #ith the earl' "a'6s "ecisio su**ort "atabases shoul" be "oe #ith %reat care- Data #arehouses shoul" o lo%er be i"eti!ie" #ith "atabase s'stems that su**ort e"5user 1ueries a" re*orti% !uctios- The' shoul" o lo%er be cocei,e" as sa*shots o! o*eratioal "ata- $age 9 of )* Data #arehouse "atabases shoul" be cosi"ere" as e# sources o! i!ormatio/ cocei,e" !or use b' the #hole or%ai.atio or !or smaller commuities o! users a" "ata aal'sts #ithi the or%ai.atio- Sim*l' ree%ieeri% source "ata mo"els i the tra"itioal #a' #ill o lo%er satis!' the re1uiremets !or "ata #arehousi%- De,elo*i% "ata #arehouses re1uires a much more thou%ht!ull' a**lie" set o! mo"eli% techi1ues a" a much closer #or$i% relatioshi* #ith the busiess si"e o! the or%ai.atio- Data #arehouses shoul" also be cocei,e" o! as sources o! e# i!ormatio- This statemet sou"s cotro,ersial at !irst/ because there is %lobal a%reemet that "ata #arehouses are rea"5ol' "atabase s'stems- The *oit is/ that b' accumulati% a" cosoli"ati% "ata !rom "i!!eret sources/ a" b' $ee*i% this historical "ata i the #arehouse/ e# i!ormatio about the busiess/ com*etitors/ customers/ su**liers/ the beha,ior o! the or%ai.atios busiess *rocesses/ a" so !orth/ ca be u,eile"- The ,alue o! a "ata #arehouse is o lo%er i bei% able to "o a" hoc 1uer' a" re*orti%- The real ,alue is reali.e" #he someoe %ets to #or$ #ith the "ata i the #arehouse a" "isco,ers thi%s that ma$e a "i!!erece !or the or%ai.atio/ #hate,er the ob2ecti,e o! the aal'tical #or$ ma' be- To achie,e such iteresti% results/ sim*l' ree%ieeri% the source "ata mo"els #ill ot "o- $age : of )* $age )* of )*
Python Machine Learning: Using Scikit Learn, TensorFlow, PyTorch, and Keras, an Introductory Journey into Machine Learning, Deep Learning, Data Analysis, Algorithms, and Data Science
Python Machine Learning: Using Scikit Learn, TensorFlow, PyTorch, and Keras, an Introductory Journey into Machine Learning, Deep Learning, Data Analysis, Algorithms, and Data Science