Data Rdbms
Data Rdbms
Name:
1
Contents
Section 01:......................................................................................................................................................................4
Database.....................................................................................................................................................................4
Entity Relationship Diagram.......................................................................................................................................4
Database Tables:.........................................................................................................................................................6
Customer.................................................................................................................................................................6
Role.........................................................................................................................................................................6
Qualification...........................................................................................................................................................6
Employees...............................................................................................................................................................7
DML Statements.......................................................................................................................................................9
Data Analytics in BUYS US...................................................................................................................................21
HDFS (Hadoop Distributed File System).............................................................................................................21
Data warehouse...................................................................................................................................................22
Data Mining.........................................................................................................................................................23
Section 2.......................................................................................................................................................................24
Start Schema of Data warehouse...........................................................................................................................24
Section 3.......................................................................................................................................................................25
Data Analysis............................................................................................................................................................25
Data Integrity......................................................................................................................................................25
Voice Mail Messages Per State...............................................................................................................................27
Total Night Call in Minutes....................................................................................................................................27
Maximum Number of Voice Mail Messages For OH, OK AND RI State..........................................................28
Total Day Calls for OH & NY................................................................................................................................28
Strength & Weakness of Data Analytics Tools.....................................................................................................30
Bibliography.................................................................................................................................................................31
2
Figure 1: Conceptual Model...........................................................................................................................................4
Figure 2 Logical Model Of BUYSUS............................................................................................................................5
Figure 3: Customer Table Create Command...................................................................................................................6
Figure 6: Employees Table.............................................................................................................................................6
Figure 7: Product Table...................................................................................................................................................7
Figure 9: Order Table......................................................................................................................................................8
Figure 10: Orderline........................................................................................................................................................8
Figure 11: Customer DML Statement...........................................................................................................................11
Figure 14: Employee Table DML Statement................................................................................................................12
Figure 15: Product Table DML Statement....................................................................................................................14
Figure 17:Order Table DML.........................................................................................................................................16
Figure 18: Order Line...................................................................................................................................................17
Figure 19: Star Schema BuyUS....................................................................................................................................21
Figure 20: Data Integrity Of State Column...................................................................................................................22
Figure 21: Total Day Calls of Data Integrity Check.....................................................................................................23
Figure 22: International Calls Data Integrity................................................................................................................23
Figure 23: Total Night Call in Minutes.........................................................................................................................25
Figure 24: Maximum Number of Voice Message in OH, OK, RI................................................................................26
Figure 25: Day Calls for OH AND NY........................................................................................................................27
3
Section 01:
Database
Entity Relationship Diagram
BuyUs retail company which rum its business online and company wants to develop database.
There are different approaches of data warehouse and top to down approach is used in this
database designing. The below conceptual diagram (Figure 1) and crow foot notation in (Figure
2).
The above diagram describes the relationship of Customer, Employees, Orders and Product. The
one-to-many relationship between Customers and orders means one customer placed multiple
orders and order is only placed by single customer. Many-to-Many relationship between order
and product because every product contains many orders. One-to-One relationship between
employees and his job role. To avoid the duplication of data the additional order line was added
and Many-to-Many relationship between them.
4
The model of diagram which includes the detail of Entities and clearly describe the primary keys
and foreign Keys.
Customer (Customer id, First Name, Last Name, Address, Salary, Job Title, Qualification)
Employee (Employee id, First Name, Last Name, Job Title, Salary)
Order (Order id, Order Date, Customer id*, Employee id*, Branch id*)
Order line (Order id, Product id, Quantity Ordered)
Product (Product id, Product Name, Product Price, Product Quantity)
5
Database Tables:
Customer
Employees
Product
6
Figure 5: Product Table
Order
7
Order Line
Figure 7: Orderline
DML Statements
Insert Data into Customer Table
8
9
Figure 8: Customer DML Statement
10
DML Statement Employee Table
DML Of Product
11
12
Figure 10: Product Table DML Statement
Order Table
13
14
Figure 11:Order Table DML
15
Figure 12: Order Line
16
Data Analytics in BUYS US
Today the data volume is increasing everyday and need more powerful infrastructure for large
volume to analyze the unstructured data and covert data to structure form. Different Tools are
used to normalize data and hence tools like SQL Database, Hadoop, Apache Spark, and other
tools required to integrate data into different platform for visualization and other purposes. The
No SQL data is one of the open-source distributed system which is cost effective [ CITATION
Bea20 \l 1033 ].
The NoSQL database is one of the database in which easily query different type of data and there
is no failure of data using this technique [ CITATION Ave17 \l 1033 ]. The HDFS cloud
technology is easily scalable according to the usage by just only one click. The speed of the
HDFS tools easily run all the ML tools and other data extraction techniques [ CITATION
Har16 \l 1033 ].
The NoSQL Database having different characteristics and deals with some large sets of data with
horizontal scaling also capable to normalize data easily using technology. HDFS integrates with
other relational databases combine column-oriented database. By using MapReduce feature data
is processed and converted into respected format. Then integrate with power tools and easily
visualized into different shapes [ CITATION Sei15 \l 1033 ].
17
Many organizations used HDFS for Datawarehouse purpose which handle large set of data of
company. The functionalities include reliability, low latency and querying on data to create
results for different reports. The stored data is structured, unstructured and other forms of data in
HDFS [ CITATION Sei15 \l 1033 ]. According to the oracle the HDFS system is not replaced
with relational database but co-operate with this technology. Relational database is easy to use
and every organizations prefer this because normalization of data is easy using this technology
[ CITATION Smi11 \l 1033 ].
The HDFS maintaining data availability if one node is not working then secondary node is
working. Same information availability on different servers and there is a replication of data. In
the version 3 of HDFS the multiples nodes are working at the same time and easily recovery of
data in case of any disaster [ CITATION Sin19 \l 1033 ]. To avoid different hacking techniques
then companies installing firewalls other different software’s to avoid hacking from third party.
Companies are using HDFS to create logs of networking computer and improve its system and
rules according to the new virus attacks [ CITATION Ech15 \l 1033 ].
When any laptop and other device connected with HDFS then automatically permission is taken
from admin otherwise not connected. Data monitoring and all other measures are taken to avoid
different events. HDFS usage in BuyUs all the data related to customer and its feedbacks are
easily managed. Using powerful tool data is extracted transforming and loading in database
process is easy. BuyUs easily access all customers reviews by using HDFS technology and
analyze data by using different data analytics techniques.
Data warehouse
In the modern age the information is the most important factor. Different companies collect all
customers data to find the product demand. Using Datawarehouse different approaches analyze
18
data into city wise and sale product trend wise to increase profit from different products. In
business intelligence the Datawarehouse is the Backbone of Business Intelligence where Data
transformation is easy [ CITATION Jag13 \l 1033 ].
The data is stored in database one a time and every time when user can see the analysis of data
then reports are generated. Only one click report is generated, and all the historical data is shown
according to date wise. Different decisions are taken place to and data is extracted according to
decisions [ CITATION Sil11 \l 1033 ].
Companies using Top to Down Datawarehouse approach which is the best approach and data
Normalization taken place to see that from different perspectives. The data extraction
transformation and loading is the main process and consume much more effort in Normalization
process. In this process different jobs are created in SQL which is running after specific time and
data is automatically loading after complete process. Data Engineers are working with data more
precisely in the whole process. There are different options some companies use cloud to perform
all tasks and this process is cost effective. Some companies use on premises hardware and there
is a capacity and scalability problems [ CITATION ByD20 \l 1033 ].
The Datawarehouse in BuyUs for data analysis of different customers data and tell which
customer buys specific product most of time and what is the profit margin of product. By
collecting city wise, State wise data is analyzed and order stock according to demand. Hence
from the above report we concluded that without apply Datawarehouse approach we cannot
make our business profitable.
Data Mining
In the last few decades, the data mining is the one of most important techniques introduces
having ability of data extraction transformation easily. In every field a huge amount of data
collection in database which is not in manageable format hence data mining is tool which is
extracting data in going in dept [ CITATION Han11 \l 1033 ].
The data mining includes the collection of data of all peoples who are watching specific channel
and how much stay on channel. This powerful tool collect data easily by just one click. The data
monitoring through this tool is little bit risk because every individual data is collected.
Different European states are collaborating and concluded that everyone has right to see his data,
and no one is accessing others personal data. Hence the data security in data mining is most
important and everyone is convinced that data is not misused. The data sharing has been
described well in different research papers. Now the focus of peoples on data privacy using this
tool.
The privacy laws breaking is the major concern and hence reputation on company is lose when
there is data leakage. Peoples are paying huge amount to avoid data breaching but there is
leakage of data because of different cyber-attacks.
19
significant inquiry in utilizing information mining methods on individuals' information is if this
is a moral practice. A few creators accept that utilizing racial and sexual data for clinical
intentions is moral at the same time, if a similar information is utilized in digging for credit
installment conduct then this may not be moral. Certain ethnic gatherings might be living in
explicit ZIP codes or districts putting them at weakness in any event, when delicate information
was taken out from the datasets.
Different creators consider that key moral issues emerge from the way that people may not know
that their information is gathered and for what reason just as the way that they may have not
agreed to the utilization of their data in the planned reason. Once more, the recommended answer
is to lead information mining on anonymized information (Kantardzic, 2011). Despite likely
issues, information mining can help organizations acquire new bits of knowledge, settle on better
choices, and even offer them an upper hand.
BuyUs can utilize prescient and engaging information mining procedures to distinguish
examples and influence them to increment deals or better location the necessities of their clients.
For instance the organization can utilize administered learning calculations like different straight
relapse to check if the sum spent on orders (subordinate variable) has a direct/roundabout
relationship with merchandise costs, normal item survey or perhaps the ease of use of the stage
(which could be likely surveyed when spent on the page or snap stream). Subsequently, the
organization can improve their site, bring down the cost of the items, source them from another
provider or work on boosting client fulfillment for better outcomes. BuyUs could likewise utilize
solo learning procedures to bunch client base and send them customized offers or then again to
recognize items that clients buy together and advance them as a bundle.
Section 2
Start Schema of Data warehouse
The data retrieval from database is the most important process in which the fact and dimensions
are used to distribute data. The descriptive data is placed in all the dimensions tables and values
data is available on the Fact table. All the id is placed in fact tables. The BuyUs star schema is
also made, and all the data is placed in fact and dimensions tables.
In the whole process first analyzed data and thinking about what the facts and dimensions are
available in the available dataset. The Dimensions is placed in the dimension table and fact
consist of ID and hence ID is populated.
20
Figure 13: Star Schema BuyUS
Section 3
Data Analysis
Using Tableau Find the Analysis of Data and Visualize the Data
Data Integrity
Data integrity includes in the tableau to check whether datatype of selected column is correct or
not. The state column has string datatype and indicate that correct datatype because only use
string datatype hence this is valid as shown in the screenshot.
21
Figure 14: Data Integrity Of State Column
The State column Data type is correct and will be in String Format. As we seen that Status is “Valid”.
22
Figure 15: Total Day Calls of Data Integrity Check
Another column international plan is also string because it contains only “YES” and “NO”
values and valid data types.
The international calls include yes and No and hence column is relevant.
23
Voice Mail Messages Per State
The chart shows that total number of voice mail messages according to state wise. The summary shows
that Average calls include 529.29, Minimum Message is 301, Maximum Messages is 925 and Median is
522.
24
Figure 17: Total Night Call in Minutes
The following analysis tell us total night call calls in minutes. Total night calls in minutes just drag the night calls in
rows and shows that in minutes in bar. Approximately 333 k night calls in minutes.
25
Figure 18: Maximum Number of Voice Message in OH, OK, RI
26
Figure 19: Day Calls for OH AND NY
27
Strength & Weakness of Data Analytics Tools
Strength
In this Module the focus on Tableau and SQL. According to knowledge point of view these tools
are most powerful tools where data is extracted, transform and Loading and linked this data with
power platform Tableau. Data visualization is most important in the field of Analytics where data
is easily understandable in visual form. At the starting stage, the SQL is difficult but after the
using with implementation of queries to create different table, populate tables and fetching data
from database is easy. The SQL Server is high demand tool in field of Data Analytics where
integration with different platform to visualize data easy. Different cloud platform providing
SQL services and other services like python which is directly linked with this language [
CITATION Nie19 \l 1033 ]. In Tableau there is only drag and drop work related to dataset.
Different charts are available on this tool when data is dragged then specific charts are selected
and data is shown in visual form. This new tool exploration is the most important step taken by
teachers for students. In future professional jobs this tool is mostly used and there is a clear
understanding of tool. The other knowledgeable thing is Hadoop Distributed file system which
consist of No SQL database. There is huge amount of data storage in this technology and much
more power than other devices which used in daily life. Using these tools are helpful for me
because in future Analytics field having different jobs and instead of learning these tools after
degree learn with in the study tenure. A lot of Database developers’ jobs and internships is
available and other tutorials to enhance skills. I learn a lot from different websites regarding
database and easy to use these tools because understanding tools is the main concept. HDFS
assists companies with settling on choices dependent on complete examination of various factors
and informational indexes, as opposed to a little testing of information or episodic occurrences.
The capacity to deal with huge arrangements of dissimilar information gives Hadoop clients a
more extensive perspective on their clients, tasks, openings, hazards, and so on to build up a
comparable point of view without large information, associations would have to lead various,
restricted information investigations at that point figure out how to orchestrate the outcomes,
which would probably include a ton of manual exertion and emotional examination [ CITATION
Wal16 \l 1033 ].
Weakness
In this module learn Tableau, SQL, and concepts of Datawarehouse, data mining etc. Beside its
high performance and other different advantages of Tableau this tool is expensive, inflexible
pricing, security issues. For proper use of these tools we need IT assistance and there is poor
versioning. The training before using tool is time consuming. Before starting SQL, language
there is a complete knowledge of its interface and query writing skills. Every beginner student
faced a lot of problems while using this tool because need expert to understand tool complexities.
HDFS is also one of the best tools but student is unable to access all feature because of its high
cost and no training from IT experts. In Datawarehouse there is extra reporting work needed,
Data flexibility and cost are the major disadvantages of data warehouse.
Hence it is concluded that these tools are important for student career because there is a lot of
scope in analytics field in future. As a student I am motivated by using these tools because these
are easily used and there are some difficulties which is improved after some training from
experts. University also provide some tools with training material for students this is also
beneficial for me to learn latest tools and technologies.
28
29
Bibliography
Aven, J., 2017. Sams Teach Yourself Hadoop in 24 Hours. [Online]
Available at: https://learning.oreilly.com/library/view/sams-teach-yourself/9780134456737/ch15.html.
[Accessed 3 May 2021].
Beaulieu, A., 2020. Learning SQL, 3rd Edition. 3rd ed. s.l.: O’Reilly Media, Inc..
Echeverria, j., 2015. Hadoop Security. [Online]
Available at: https://data-flair.training/blogs/hadoop-security/#:~:text=Hadoop%20HDFS%20will%20never
%20store,media%20such%20as%20a%20disk.
[Accessed 3 May 2021].
Han, J. P. J. a. K. M., 2011. Data Mining: Concepts and Techniques. 3rd Edition ed. s.l.: Morgan Kaufmann..
Harrington, j., 2016. Relational Database Design and Implementation. 4th ed. s.l.:Morgan Kaufman.
Jagadesh, S., 2013. Big Data Imperatives. In: Enterprise Big Data,Warehouse, " BI Implementation and Analytics".
s.l.:Appress.
Kumar, S. a., 2019. Mastering Hadoop 3. s.l.: Packet Publishing.
Marr, B., 2020. What Is Hadoop?. [Online]
Available at: https://www.bernardmarr.com/default.asp?contentID=1080
[Accessed 3 May 2021].
Nield, 2019. SQL FOR DATA ANALYSIS DESIGN. s.l.:O'REILLY Media,Inc.
Rowe, W., 2016. Advantages of using Hadoop. [Online]
Available at: https://www.bmc.com/blogs/hadoop-benefits-business-case/
[Accessed 3 May 2021].
Seidman, j., 2015. Hadoop Application Archeitecture. s.l.:O'REILLY Media inc.
Silvers, 2011. Datawarehouse Design. s.l.:Auerbach Publications.
Smith, 2011. Hadoop and NoSQL Technologies and oracle Database, s.l.: Oracle Corporation.
Tobin, B. D., 2020. ETL & Data Warehousing Explained: ETL Tool Basics. [Online]
Available at: https://www.xplenty.com/blog/etl-data-warehousing-explained-etl-tool-basics/#:~:text=ETL%20(or
%20Extract%2C%20Transform%2C%20Load)%20is%20a%20process,that%20data%20into%20your
%20warehouse.
[Accessed 3 May 2021].
30