Data exam 3
Data exam 3
LECTURE NOTES
ON
Prepared by
1 1-10
INTRODUCTION TO DATA SCIENCE
2 11-14
DATA MANAGEMENT PLAN USING
IBM SPSS
3 15-20
DATA ANALYSIS USING R
PROGRAMMING LANGUAGE
4 21-25
DATA VISUALISATION
UNIT-I
INTRODUCTION TO DATA SCIENCE
Q1. What is Data Science? Explain different terminologies used in data science.
Data Science is the area of study which involves extracting insights from vast amounts of data
by the use of various scientific methods, algorithms, and processes. It helps you to discover
hidden patterns from the raw data.
Algorithms
An algorithm is a set of instructions we give a computer so it can take values and manipulate
them into a usable form. An algorithm is a set of instructions designed to perform a specific
task.
Big Data
Big data is a term that refers to data sets or combinations of data sets whose size (volume),
complexity (variability), rate of growth (velocity) and consistency (veracity) and value make
them difficult to be captured, managed, processed or analyzed by conventional technologies
and tools.
Machine Learning
A process where a computer uses an algorithm to gain understanding about a set of data, then
makes predictions based on its understanding. There are many types of machine learning
techniques; most are classified as either supervised or unsupervised techniques.
Classification
Classification is a supervised machine learning problem. It deals with categorizing a data point
based on its similarity to other data points. You take a set of data where every item already has
a category and look at common traits between each item. You then use those common traits as
a guide for what category the new item might have.
Database
As simply as possible, this is a storage space for data. We mostly use databases with a Database
Management System (DBMS), like SQL or MySQL. These are computer applications that
allow us to interact with a database to collect and analyze the information inside.
Data Warehouse
A data warehouse is a system used to do quick analysis of business trends using data from
many sources. They’re designed to make it easy for people to answer important statistical
questions without a Ph.D. in database architecture.
Data Wrangling
The process of conversion of data, often through the use of scripting languages, to make it
easier to work with is known as data Wrangling or data munging.
Web Analytics
Statistical or machine learning methods applied to web data such as page views, hits, clicks,
and conversions (sales), generally with a view to learning what web presentations are most
effective in achieving the organizational goal (usually sales). This goal might be to sell products
and services on a site, to serve and sell advertising space, to purchase advertising on other sites
or to collect contact information. Key challenges in web analytics are the volume and constant
flow of data. And the navigational complexity and sometimes lengthy gaps that precede users’
relevant web decisions.
1-R
R is a programming language used for data manipulation and graphics. Originating in 1995,
this is a popular tool used among data scientists and analysts. It is the open source version of
the S language widely used for research in statistics. According to data scientists, R is one of
the easier languages to learn as there are numerous packages and guides available for users.
2-Python
Python is another widely used language among data scientists, created by Dutch programmer
Guido Van Rossum. It’s a general-purpose programming language, focusing on readability and
simplicity. If you are not a programmer but are looking to learn, this is a great language to start
with. It’s easier than other general-purpose languages.
3-Keras
Keras is a deep learning library written in Python. It runs on TensorFlow allowing for fast
experimentation. Keras was developed to make deep learning models easier and helping users
treat their data intelligently in an efficient manner.
3. BigML
BigML, it is another widely used Data Science Tool. It provides a fully intractable, cloud-based
GUI environment that you can use for processing Machine Learning Algorithms. BigML
provides a standardized software using cloud computing for industry requirements. Through it,
companies can use Machine Learning algorithms across various parts of their company.
4. D3
Javascript is mainly used as a client-side scripting language. D3.js, a Javascript library allows
you to make interactive visualizations on your web-browser. With several APIs of D3.js, you
can use several functions to create dynamic visualization and analysis of data in your browser.
5. MATLAB
MATLAB is a multi-paradigm numerical computing environment for processing mathematical
information. It is a closed-source software that facilitates matrix functions, algorithmic
implementation and statistical modeling of data. MATLAB is most widely used in several
scientific disciplines.In Data Science, MATLAB is used for simulating neural networks and
fuzzy logic. Using the MATLAB graphics library, you can create powerful visualizations.
MATLAB is also used in image and signal processing. This makes it a very versatile tool for
Data Scientists as they can tackle all the problems, from data cleaning and analysis to more
advanced Deep Learning algorithms. Furthermore, MATLAB’s easy integration for enterprise
applications and embedded systems make it an ideal Data Science tool. It also helps in
automating various tasks ranging from extraction of data to re-use of scripts for decision
making. However, it suffers from the limitation of being a closed-source proprietary software.
6. Jupyter
Project Jupyter is an open-source tool based on IPython for helping developers in making
open-source software and experiences interactive computing. Jupyter supports multiple
languages like Julia, Python, and R. It is a web-application tool used for writing live code,
visualizations, and presentations. Jupyter is a widely popular tool that is designed to address
the requirements of Data Science.
7. Matplotlib
Matplotlib is a plotting and visualization library developed for Python. It is the most popular
tool for generating graphs with the analyzed data. It is mainly used for plotting complex graphs
using simple lines of code. Using this, one can generate bar plots, histograms, scatterplots etc.
8. NLTK
Natural Language Processing has emerged as the most popular field in Data Science. It deals
with the development of statistical models that help computers understand human language.
These statistical models are part of Machine Learning and through several of its algorithms,
are able to assist computers in understanding natural language. NLTK is widely used for
various language processing techniques like tokenization, stemming, tagging, parsing and
machine learning.
9. Scikit-learn
Scikit-learn is a library based in Python that is used for implementing Machine Learning
Algorithms. It is simple and easy to implement a tool that is widely used for analysis and data
science. It supports a variety of features in Machine Learning such as data preprocessing,
classification, regression, clustering, dimensionality reduction, etc
10. TensorFlow
TensorFlow has become a standard tool for Machine Learning. It is widely used for advanced
machine learning algorithms like Deep Learning. Developers named TensorFlow after Tensors
which are multidimensional arrays. It is an open-source and ever-evolving toolkit which is
known for its performance and high computational abilities. TensorFlow can run on both CPUs
and GPUs and has recently emerged on more powerful TPU platforms.
initial paperwork while sanctioning loans. They decided to bring in data scientists in order to
rescue them out of losses. Over the years, banking companies learned to divide and conquer
data via customer profiling, past expenditures, and other essential variables to analyze the
probabilities of risk and default. Moreover, it also helped them to push their banking products
based on customer’s purchasing power.
Healthcare
The healthcare sector, especially, receives great benefits from data science applications.
Internet Search
Now, this is probably the first thing that strikes your mind when you think Data Science
Applications. When we speak of search, we think ‘Google’. Right? But there are many other
search engines like Yahoo, Bing, Ask, AOL, and so on. All these search engines (including
Google) make use of data science algorithms to deliver the best result for our searched query
in a fraction of seconds. Considering the fact that, Google processes more than 20 petabytes of
data every day.
Targeted Advertising
If you thought Search would have been the biggest of all data science applications, here is a
challenger – the entire digital marketing spectrum. Starting from the display banners on various
websites to the digital billboards at the airports – almost all of them are decided by using data
science algorithms. This is the reason why digital ads have been able to get a lot higher CTR
(Call-Through Rate) than traditional advertisements. They can be targeted based on a user’s
past behavior. This is the reason why you might see ads of Data Science Training
Programs while I see an ad of apparels in the same place at the same time.
Website Recommendations
Aren’t we all used to the suggestions about similar products on Amazon? They not only help
you find relevant products from billions of products available with them but also adds a lot to
the user experience. A lot of companies have fervidly used this engine to promote their
products in accordance with user’s interest and relevance of information. Internet giants like
Amazon, Twitter, Google Play, Netflix, Linkedin, imdb and many more use this system to
improve the user experience. The recommendations are made based on previous search results
for a user.
You upload your image with friends on Facebook and you start getting suggestions to tag your
friends. This automatic tag suggestion feature uses face recognition algorithm.
In their latest update, Facebook has outlined the additional progress they’ve made in this area,
making specific note of their advances in image recognition accuracy and capacity.
In addition, Google provides you with the option to search for images by uploading them. It
uses image recognition and provides related search results.
Speech Recognition
Some of the best examples of speech recognition products are Google Voice, Siri, Cortana etc.
Using speech-recognition feature, even if you aren’t in a position to type a message, your life
wouldn’t stop. Simply speak out the message and it will be converted to text. However, at
times, you would realize, speech recognition doesn’t perform accurately.
.
Gaming
Games are now designed using machine learning algorithms which improve/upgrade
themselves as the player moves up to a higher level. In motion gaming also, your opponent
(computer) analyzes your previous moves and accordingly shapes up its game. EA Sports,
Zynga, Sony, Nintendo, Activision-Blizzard have led gaming experience to the next level using
data science.
TYPES OF DATA
Thus Data and Big Data includes huge volume, high velocity, and extensible variety of data.
The data in it will be of four types.
Any data with unknown form or the structure is classified as unstructured data. In addition to
the size being huge data poses multiple challenges in terms of its processing for deriving value
out of it. Typical example of unstructured data is, a heterogeneous data source containing a
combination of simple text files, images, audio and videos etc.
Semi-Structured data
Semi structured data can contain both the forms of data. We can see Semi structured data in
form but it is actually not defined .With example a table definition in relational DBMS.
Example of semi-structured data is a data represented in XML file. Web pages are generated
in scripting of HTML which is also an example semi structured data.
Meta Data
Metadata is defined as the data providing information about one or more aspects of the data. It
is used to summarize basic information about data which can make tracking and working with
specific data easier.
• Structural metadata indicates how compound objects are put together e.g. how pages are
ordered to form chapters.
Administrative metadata provides information to help manage a resource, such as when and
how it was created, file type and other technical information, and who can access it.
Structured Data
Any data that can be stored, accessed and processed in the form of fixed format is termed as a
‘Structured’ data. In other words all data which can be stored in database SQL in form of table
with rows and columns.
An employee table is an example of structured data
Employee_Id Employee_name Gender Dept Salary
UNIT-II
A data management plan describes how research data are collected or created, how data are
used and stored during research and how made accessible for others after the research has been
completed.
1. Describe briefly what kind of data will be collected and how they will be collected.
2. Outline the type(s) of data (e.g. survey, interview, observation, face-to-face focus group,
self-administered writings/diaries, photographs, news articles etc.) and estimate the foreseeable
amount/volume of each data type.
An API is a set of subroutine definitions, protocols, and tools for building application software.
In general terms, it is a set of clearly defined methods of communication between various
software components. A good API makes it easier to develop a computer program by all the
building blocks. Which are then put together by the programmer. An API may be for a web
based system, operating system, database system, and computer hardware or software library.
An API specification can take many forms, but often includes specifications for Structures
object classes, variables or remote calls. POSIX, Window ASPI.
1. Modern APIS adhere to standards that are easily accessible and understood broadly.
2. They are treated more like products than code. They are designed for consumption for
specific audiences (e.g., mobile developers), they are documented, and they are versioned in a
way that users can have certain expectations of its maintenance and lifecycle.
3. Because they are much more standardized, they have a much stronger discipline for
Security and governance,
STORAGE MANAGEMENT
The term storage management encompasses the technologies and processes organization to
maximize or improve the performance of their data storage resources. It is a broadly categories
that includes virtualization, replication, mirroring, security, compression, traffic analysis
automation, storage provisioning and related techniques. By some estimates, the amount of
digital information stored in the world’s computer system is doubling every year. As a result,
organizations feel constant pressure to expand their capacity. However, doubling a company’s
storage capacity every year is an expensive) In order to reduce some of those costs and improve
the capabilities and security of solutions, organizations turn to a variety of storage management
solutions.
UNIT-III
1) Data analysis is a process involving the collection, manipulation, and examination of data
for getting a deep insight.1) Data analytics is taking the analyzed data and working on it in a
meaningful and useful way to make well-versed business decisions.
2) Data analysis helps design a strong business plan for businesses, using its historical data that
tell about what worked, what did not, and what was expected from a product or service. 2) Data
analytics helps businesses in utilizing the potential of the past data and in turn identifying new
opportunities that would help them plan future strategies. It helps in business growth by
reducing risks, costs, and making the right decisions.
3) In data analysis, experts explore past data, break down the macro elements into the micros
with the help of statistical analysis, and draft a conclusion with deeper and significant insights.
3) Data analytics utilizes different variables and creates predictive and productive models to
challenge in a competitive marketplace.
Categorical Data
Categorical data represents characteristics. Therefore it can represent things like a person’s
gender, language etc. Categorical data can also take on numerical values (Example: 1 for female
and 0(zero) for male). Note that those numbers don’t have mathematical meaning.
Nominal Data
Nominal values represent discrete units and are used to label variables that have no quantitative
value. Just think of them as „labels“. Note that nominal data that has no order. Therefore if you
would change the order of its values, the meaning would not change. You can see two examples
The left feature that describes if a person is married would be called „dichotomous“, which is a
Ordinal Data
Ordinal values represent discrete and ordered units. It is therefore nearly the same as nominal
data, except that it’s ordering matters. You can see an example below:
Note that the difference between Elementary and High School is different than the difference
between High School and College. This is the main limitation of ordinal data, the differences
between the values is not really known. Because of that, ordinal scales are usually used to
Numerical Data
These data have meaning as a measurement, such as a person ‘s height, weight, IQ, or blood
pressure; or they’re a count, such as the number of stock shares a person owns, how many teeth
a dog has, or how many pages you can read of your favorite book before you fall asleep.
Numerical data can be further broken into two types: discrete and continuous.
Discrete data represent items that can be counted; they take on possible values that can be listed
out. The list of possible values may be fixed (also called finite); or it may go from 0, 1, 2, on to
infinity (making it countable infinite). For example, the number of heads in 100 coin flips takes
on values from 0 through 100 (finite case), but the number of flips needed to get 100 heads takes
on values from 100 (the fastest scenario) on up to infinity (if you never get to that 100th heads).
Continuous data represent measurements, their possible values cannot be counted and can only
be described using intervals on the real number line. For example. the exact amount of gas
Purchased at the pump for cars with 20-gallon tanks would be continuous data from 0 gallons
Interval Data
Interval values represent ordered units that have the same difference. Therefore we speak of
interval data when we have a variable that contains numeric values that are ordered and where
we know the exact differences between the values. An example would be a feature that contains
The problem with interval values data is that they don’t have a „true zero“. That means in
regards to our example, that there is no such thing as no temperature. With interval data, we can
add and subtract, but we cannot multiply, divide or calculate ratios. Because there is no true
Ratio Data
Ratio values are also ordered units that have the same difference. Ratio values are the same as
interval values, with the difference that they do have an absolute zero. Good examples are
b) Diagnostic Analytics
It consists of asking the question: Why did it happen? Diagnostic analytics looks for the root
cause of a problem. It is used to determine why something happened. This type attempts to find
and understand the causes of events and behaviors.
ç) Predictive Analytics
It consists of asking the question: What is likely to happen?
It uses past data in order to predict the future. It is all about forecasting. Predictive analytics
uses many techniques like data mining and artificial intelligence to analyze current data and
make scenarios of what might happen.
d) Prescriptive Analytics
It consists of asking the question: What should be done? It is dedicated to finding the right
action to be taken. Descriptive analytics provides a historical data, and predictive analytics
helps forecast what might happen. Prescriptive analytics uses these parameters to find the best
solution.
Unit-IV
Charts
The easiest way to show the development of one or several data sets is a chart. Charts vary
from bar and line charts that show relationship between elements over time to pie charts that
demonstrate the components or proportions between the elements of one whole.
Plots
Plots allow to distribute two or more data sets over a 2D or even 3D space to show the
relationship between these sets and the parameters on the plot. Plots also vary: scatter and
bubble plots are the most traditional. Though when it comes to big data, analysts use box plots
that enable to visualize the relationship between large volumes of different data.
Maps
Maps are widely-used in different industries. They allow to position elements on relevant
objects and areas - geographical maps, building plans, website layouts, etc. Among the most
popular map visualizations are heat maps, dot distribution maps, cartograms.
Diagrams and matrices
Diagrams are usually used to demonstrate complex data relationships and links and include
various types of data on one visualization. They can be hierarchical, multidimensional, and
tree-like. Matrix is a big data visualization technique that allows to reflect the correlations
between multiple constantly updating (steaming) data sets.
Analog data to Digital signals − This process can be termed as digitization, which is done by
Pulse Code Modulation PCM. Hence, it is nothing but digital modulation. As we have already
discussed, sampling and quantization are the important factors in this. Delta Modulation gives
a better output than PCM.
Digital data to Analog signals − The modulation techniques such as Amplitude Shift
Keying ASK, Frequency Shift Keying FSK, Phase Shift Keying PSK, etc., fall under this
category. These will be discussed in subsequent chapters.
Digital data to Digital signals − These are in this section. There are several ways to map
digital data to digital signals.
equal to the number of categories and dummy producing is one less. This should ultimately be
handled by the modeller accordingly in the validation process.
Target or Impact or Likelihood encoding
Target Encoding is similar to label encoding, except here labels are correlated directly with the
target. For example, in mean target encoding for each category in the feature label is decided
with the mean value of the target variable on a training data. This encoding method brings out
the relation between similar categories, but the relations are bounded within the categories and
target itself. The advantages of the mean target encoding are that it does not affect the volume
of the data and helps in faster learning and the disadvantage is its harder to
validate. Regularization is required in the implementation process of this encoding
methodology. Visit target encoder in python and R.
UNIT-V
What is Python?
Python is a popular programming language. It was created by Guido van Rossum, and
released in 1991.
It is used for:
Characteristics of Python
Following are important characteristics of Python Programming −
It supports functional and structured programming methods as well as OOP.
It can be used as a scripting language or can be compiled to byte-code for building
large applications.
It provides very high-level dynamic data types and supports dynamic type checking.
It supports automatic garbage collection.
It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
Applications of Python
As mentioned before, Python is one of the most widely used language over the web. I'm going
to list few of them here:
Easy-to-learn − Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly.
Easy-to-read − Python code is more clearly defined and visible to the eyes.
Easy-to-maintain − Python's source code is fairly easy-to-maintain.
A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
Extendable − You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.
Databases − Python provides interfaces to all major commercial databases.
GUI Programming − Python supports GUI applications that can be created and ported
to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.
Scalable − Python provides a better structure and support for large programs than shell
scripting.
a=10
b=2.5
f=float(a)
print(f)
i=int(b)
print(i)
List
Lists are used to store multiple items in a single variable.
Lists are one of 4 built-in data types in Python used to store collections of data, the other 3
are Tuple, Set, and Dictionary, all with different qualities and usage.
Lists are created using square brackets:
L=["apple","banana","cherry"]
print(L)
Example
sub=['phy','chem',96,96.5]
print(sub)
print(sub[0:2])
print(sub[0:3])
String
A="welcome to python tutorial"
print(A)
print(len(A))
print(A[8:10])
print(A[::-1])
print(A.lower())
print(A.upper())
SET
A={1,2,3}
print(A)
B={3,4,5,6}
print(B)
print(A|B)#union
print(A&B)#intersection
output
{1, 2, 3}
{3, 4, 5, 6}
{1, 2, 3, 4, 5, 6}
output
{1, 2, 3}
{3, 4, 5, 6}
{3}
A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set,
or a string).
This is less like the for keyword in other programming languages, and works more like an
iterator method as found in other object-orientated programming languages.
With the for loop we can execute a set of statements, once for each item in a list, tuple, set
etc.
apple
banana
for x in range(6):
print(x)
i = 1
while i < 6:
print(i)
i += 1
i = 1
while i < 6:
print(i)
if i == 3:
break
i += 1
What is a Module?
A module is a file consisting of python code.
A module can define functions, classes and variables.
Example
Save this code in a file named mymod.py
def greeting(name):
print("Hello, " + name)
Example
Import the module named mymod, and call the greeting function:
import mymodule
mymodule.greeting("Jonathan")
User Drfined Module
Example
def fun(x,y):
z=x+y
print(z)
def fun1(sarkar):
print(sarkar)
import mymod
mymod.fun(10,20)
mymod.fun1("hellow")
Built in module
import calendar
cal=calendar.month(2019,1)
print(cal)
output
January 2019
Mo Tu We Th Fr Sa Su
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
import calendar
cal=calendar.month(2019,1)
x=dir(calendar)
print(cal)
print(x)
PYTHON LIBRARY
NumPy (short for Numerical Python) is an open source Python library for doing scientific
computing with Python.
It gives an ability to create multidimensional array objects and perform faster mathematical
operations. The library contains a long list of useful mathematical functions, including some
functions for linear algebra and complex mathematical operations such as Fourier Transform
(FT) and random number generator (RNG).
import numpy as np
output- [ 0 7 0 27 13 0 0]
Pandas is a Python library comprising high-level data structures and tools that has designed to
help Python programmers to implement robust data analysis. The utmost purpose of Pandas is
to help us identify intelligence in data. Pandas is in practice in a wide range of academic and
commercial domains, including finance, neurosciences, economics, statistics, advertising, and
web analytic.
import pandas as pd
Matplotlib is a versatile Python library that generates plots for data visualization. Matplotlib
offers simple and powerful plotting interface, versatile plot types and robust customization. With
the diverse plot types and elegant styling options available, it works well for creating
professional figures for demonstrations and scientific reports.
import matplotlib.pyplot as plt
#Plot a line graph
plt.plot([5, 15])
plt.title("Interactive Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
UNIT-VI
1. Google Chart
Google is an obvious benchmark and well known for the user-friendliness offered by its
products and Google chart is not an exception. It is one of the easiest tools for visualising
huge data sets. Google chart holds a wide range of chart gallery, from a simple line graph to
complex hierarchical tree-like structure and you can use any of them that fits your requirement.
2. Tableau
Tableau desktop is an amazing data visualisation tool (SaaS) for manipulating big data and it’s
available to everyone. It has two other variants “Tableau Server” and cloud-based “Tableau
Online” which are dedicatedly designed for big data-related organisations.
3. D3
D3 or Data Driven Document is a Javascript library for big data visualisations in virtually any
way you want. This is not a tool, like the others and the user needs a good grasp over javascript
to give the collected data a shape. The manipulated data are rendered through HTML, SVG,
and CSS, so there is no place for old browsers (IE 7 or 8) as they don’t support SVG (Scalable
Vector Graphics). D3 is extremely fast and supports large data sets in real-time. It also produces
dynamic interaction and animation in both 2D and 3D with minimal overhead. The functional
style of D3 allows you to reuse codes through the various collection of components and plug-
ins.
4. Fusion Chart
Fusion chart XT is a Javascript charting library for the web and mobile devices, spread across
120 countries with having clients such as Google, Intel, Microsoft and many others. However,
you need a bit knowledge on Javascript for implementing it. Technically, it collects data in
XML or JSON format and renders it through charts using Javascript (HTML5), SVG and VML
format. It provides more than 90 chart styles in both 2D and 3D visual formats with an array
of features like scrolling, panning, and animation effects. However, this tool doesn’t come for
free. Its pricing range starts from $199 (for individual developers or freelancers) for one year
and updates with one-month priority support.
5. Highcharts
Highcharts is a charting library written purely in Javascript hence, a bit knowledge of Javascript
is necessary for implementing this tool. It uses HTML5, SVG and VML for displaying charts
across various browsers and devices like android, iPhone etc. For any execution, it requires
two .js files: This tool is efficient enough to process real-time JSON data and represents them
as a chart mentioned by the user. If you are an enthusiastic programmer you can download
its source code and modify it as per your need.
6. Canvas
Canvas.js is a javascript charting library with a simple API design and comes with a bunch of
eye-catching themes. It is a lot faster than the conventional SVG or Flash charts. It also comes
with a responsive design so that it can run on various devices like Android, iPhone, Tablets,
Windows, Mac etc.
7. Qlikview
Qlik is one of the major players in the data analytics space with their Qlikview tool which is
also one of the biggest competitors of Tableau. Qlikview boasts over 40,000 customers
spanning across over 100 countries. Qlik is particularly known for its highly customisable setup
and a host of features that help create the visualisations much faster. However, the available
options could mean there would be a learning curve to get accustomed with the tool so as to
use it to its full potential. Apart from its data visualisation prowess, Qlikview also offers
analytics, business intelligence and enterprise reporting features. The clean and clutter-free
user experience is one of the notable aspects of Qlikview.
8. Datawrapper
Datawrapper is a data visualisation tool that’s gaining popularity fast, especially among media
companies which use it for presenting statistics and creating charts. It has an easy to navigate
user interface where you can easily upload a CSV file to create maps, charts and visualisations
that can be quickly added to reports. Although the tool is primarily aimed at journalists, its
flexibility should accommodate a host of applications apart from media usage.
9. Microsoft Power BI
Microsoft Power BI is a suite of business analytics tools from Microsoft primarily meant for
analysing data and sharing the insights. It enables you to explore and dig insights out of your
data via any device you use – desktops, tablets or smartphones. It helps you derive quick
answers from the data and also can connect to on-premises data sources for real time mapping
and analysis.
c) PREATTENTVE ATTRIBUTES
These attributes are what immediately catch our eye when we look at a visualization. They can
be perceived in less than 10 milliseconds, even before we make a conscious effort to notice
them.
e) POTENTIAL SOLUTIONS
Following are the proposed solutions to some challenges or problems about big data
visualization
1. Meeting the need for speed: One possible solution is hardware. Increased memory and
powerful parallel processing can be used. Another method is putting data in-memory but using
a grid computing approach, where many machines are used.
2. Understanding the data: One solution is to have the proper domain expertise in place.
3. Addressing data quality: It is necessary to ensure the data is clean through the process of
data governance or information management.
4. Displaying meaningful results: One way is to cluster data into a higher-level view where
smaller groups of data are visible and the data can be effectively visualized.
5. Dealing with outliers: Possible solutions are to remove the outliers from the data or create
a separate chart for the outliers.