0% found this document useful (0 votes)
4 views

Unit 1 Notes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Unit 1 Notes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

BIG DATA

What is Data?

The quantities, characters, or symbols on which operations are performed by a computer, which
may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical,
or mechanical recording media.

What is Big Data?

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data
that is huge in volume and yet growing exponentially with time. In short such data is so large and
complex that none of the traditional data management tools are able to store it or process it
efficiently.

“Extremely large data sets that may be analyzed computationally to reveal patterns ,
trends and association, especially relating to human behavior and interaction are known as
Big Data.”

Examples Of Big Data:


Following are some the examples of Big Data-

The Stock Exchange generates about one terabyte of new trade data per day.
Social Media
The statistic shows that 500+terabytes of new data get ingested into the databases of social media
site Facebook, every day. This data is mainly generated in terms of photo and video uploads,
message exchanges, putting comments etc.

A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many
thousand flights per day, generation of data reaches up to many Petabytes.

Tabular Representation of various Memory Sizes:

Name Equal To Size(In Bytes)

Bit 1 bit 1/8

Nibble 4 bits 1/2 (rare)

Byte 8 bits 1

Kilobyte 1024 bytes 1024


Megabyte 1, 024 kilobytes 1, 048, 576

Gigabyte 1, 024 megabytes 1, 073, 741, 824

Terrabyte 1, 024 gigabytes 1, 099, 511, 627, 776

Petabyte 1, 024 terrabytes 1, 125, 899, 906, 842, 624

Exabyte 1, 024 petabytes 1, 152, 921, 504, 606, 846, 976

Zettabyte 1, 024 exabytes 1, 180, 591, 620, 717, 411, 303, 424

Yottabyte 1, 024 zettabytes 1, 208, 925, 819, 614, 629, 174, 706, 176

Types of Digital Data:

1. Structured
2. Unstructured
3. Semi-structured

Structured

● Any data that can be stored, accessed and processed in the form of fixed format is termed
as a 'structured' data.
● Over the period of time, talent in computer science has achieved greater success in
developing techniques for working with such kind of data (where the format is well
known in advance) and also deriving value out of it.
● However, nowadays, we are foreseeing issues when the size of such data grows to a huge
extent, typical sizes being in the range of multiple zettabytes.
● Data stored in a relational database management system is one example of a 'structured'
data

1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.


Examples Of Structured Data
An ‘Employee’ table in a database is an example of Structured Data

Employee_ID Employee_Name Gender Department Salary_In_lacs

2365 Rajesh Kulkarni Male Finance 650000

3398 Pratibha Joshi Female Admin 650000

7465 Shushil Roy Male Admin 500000

7500 Shubhojit Das Male Finance 500000

7699 Priya Sane Female Finance 550000

Unstructured

● Any data with unknown form or the structure is classified as unstructured data.
● In addition to the size being huge, un-structured data poses multiple challenges in terms
of its processing for deriving value out of it.
● A typical example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc.
● Now day organizations have wealth of data available with them but unfortunately, they
don't know how to derive value out of it since this data is in its raw form or unstructured
format

Examples Of Un-structured Data


The output returned by ‘Google Search’
Semi-structured

● Semi-structured data can contain both the forms of data.


● We can see semi-structured data as structured in form but it is actually not defined with
e.g. a table definition in relational DBMS.
● Example of semi-structured data is data represented in an XML(extensible markup
language) file.

Examples Of Semi-structured Data

Personal data stored in an XML file-


<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>

Characteristics Of Big Data


There are 4 characteristics called as 4 V’s:
1. Volume
2. Velocity
3. Variety
4. Veracity

Volume means “How much Data is generated”. Now-a-days, Organizations or Human Beings or
Systems are generating or getting a very vast amount of Data say TB (TeraBytes) to PB
(PetaBytes) to ExaByte(EB) and more.
Size of data plays a very crucial role in determining value out of data. Also, whether a particular
data can actually be considered as a Big Data or not, is dependent upon the volume of data.
Hence, ‘Volume’ is one characteristic which needs to be considered while dealing with Big Data
solutions.
Volume= Very Large amount of Data

Velocity:
Velocity means “How fast produce Data”. Now-a-days, Organizations or Human Beings or
Systems are generating huge amounts of Data at a very fast rate.
Big Data Velocity deals with the speed at which data flows in from sources like business
processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The
flow of data is massive and continuous.
Velocity= Produce data at very fast rate

Variety:
Variety means “Different forms of Data”. Now-a-days, Organizations or Human Beings or
Systems are generating a very huge amount of data at a very fast rate in different formats.
Variety refers to heterogeneous sources and the nature of data, both structured and unstructured.
During earlier days, spreadsheets and databases were the only sources of data considered by
most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring
devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of
unstructured data poses certain issues for storage, mining and analyzing data .
Variety= Produce data in different formats

Veracity
Veracity means “The Quality or Correctness or Accuracy of Captured Data”. Out of 4Vs, it is the
most important V for any Big Data Solutions. Because without Correct Information or Data,
there is no use of storing large amounts of data at fast rate and different formats. That data
should give correct business value.
Veracity= The correctness of Data
History of big data
The first trace of big data is seen way back in 1663 when John Graunt dealt with overwhelming
amounts of information while he studied the bubonic plague, which was haunting Europe at the
time. Graunt was the first-ever person to use statistical data analysis.
Later, in the early 1800s, the field of statistics expanded to include collecting and analyzing data.
The world first saw the problem with the overwhelming amount of data in 1880.
The US Census Bureau announced that they estimated it would take eight years to handle and
process the data collected during the census program that year.
In 1881, a man from the Bureau named Herman Hollerith invented the Hollerith Tabulating
Machine that reduced the calculation work.
Throughout the 20th century, data evolved at an unexpected speed. Big data became the core of
evolution. Machines for storing information magnetically and scanning patterns in messages, and
computers were also created at that time.
In 1965, the US government built the first data centre, with the intention of storing millions of
fingerprint sets and tax returns.

Big Data Platform

Big data platform is a type of IT solution that combines the features and capabilities of several
big data applications and utilities within a single solution.
It is an enterprise class IT platform that enables organizations in developing, deploying,
operating and managing a big data infrastructure /environment.
Big data platforms generally consist of big data storage, servers, database, big data management,
business intelligence and other big data management utilities. It also supports custom
development, querying and integration with other systems. The primary benefit behind a big data
platform is to reduce the complexity of multiple vendors/ solutions into a one cohesive solution.
Big data platforms are also delivered through the cloud where the provider provides an all
inclusive big data solutions and services.

Features of Big Data Platform


Here are most important features of any good Big Data Analytics Platform:

● Big Data platforms should be able to accommodate new platforms and tools based on the
business requirement. Because business needs can change due to new technologies or due
to change in business processes.
● It should support linear scale-out
● It should have capability for rapid deployment
● It should support variety of data format
● Platform should provide data analysis and reporting tools
● It should provide real-time data analysis software
● It should have tools for searching the data through large data sets

Drivers for Big Data

1. The digitization of society


2. The drop in technology costs
3. Connectivity through cloud computing
4. Increased knowledge about data science
5. Social media applications
6. The rise of Internet-of-Things(IoT)
Things(IoT)

Big data architecture

There are four main Big Data architecture layers to an architecture of Big Data:

1. Data Ingestion

This layer is responsible for collecting and storing data from various sources. In Big Data, the
data ingestion process of extracting data from various sources and loading it into a data
repository. Data ingestion is a key component of a Bi
Bigg Data architecture because it determines
how data will be ingested, transformed, and stored.

2. Data Processing
Data processing is the second layer, responsible for collecting, cleaning, and preparing the data
for analysis. This layer is critical for ens
ensuring
uring that the data is high quality and ready to be used in
the future.

3. Data Storage
Data storage is the third layer, responsible for storing the data in a format that can be easily
accessed and analyzed. This layer is essential for ensuring that the data is accessible and
available to the other layers.
4. Data Visualization
Data visualization is the fourth layer and is responsible for creating visualizations of the data that
humans can easily understand. This layer is important for making the data accessible.

Most big data architectures include some or all of the following components:
● Data sources: All big data solutions start with one or more data sources. Examples include:
○ Application data stores, such as relational databases.
○ Static files produced by applications, such as web server log files.
○ Real-time data sources, such as IoT devices.

● Data storage: Data for batch processing operations is typically stored in a distributed file
store that can hold high volumes of large files in various formats. This kind of store is often
called a data lake. Options for implementing this storage include Azure Data Lake Store or
blob containers in Azure Storage.

● Batch processing: Because the data sets are so large, often a big data solution must process
data files using long-running batch jobs to filter, aggregate, and otherwise prepare the data
for analysis. Usually these jobs involve reading source files, processing them, and writing
the output to new files. Options include running U-SQL jobs in Azure Data Lake Analytics,
using Hive, Pig, or custom Map/Reduce jobs in an HDInsight Hadoop cluster, or using Java,
Scala, or Python programs in an HDInsight Spark cluster.

● Real-time message ingestion: If the solution includes real-time sources, the architecture
must include a way to capture and store real-time messages for stream processing. This
might be a simple data store, where incoming messages are dropped into a folder for
processing. However, many solutions need a message ingestion store to act as a buffer for
messages, and to support scale-out processing, reliable delivery, and other message queuing
semantics. Options include Azure Event Hubs, Azure IoT Hubs, and Kafka.

● Stream processing: After capturing real-time messages, the solution must process them by
filtering, aggregating, and otherwise preparing the data for analysis. The processed stream
data is then written to an output sink.
● Analytical data store:
● Many big data solutions prepare data for analysis and then serve the processed data in
a structured format that can be queried using analytical tools.
● The data could be presented through a low-latency NoSQL technology such as HBase,
or an interactive Hive database that provides a metadata abstraction over data files in
the distributed data store.

● Analysis and reporting:


● The goal of most big data solutions is to provide insights into the data through analysis
and reporting.
● To empower users to analyze the data, the architecture may include a data modeling
layer, such as a multidimensional OLAP cube or tabular data model in Azure Analysis
Services.
● Analysis and reporting can also take the form of interactive data exploration by data
scientists or data analysts.

● Orchestration:
● Most big data solutions consist of repeated data processing operations, encapsulated in
workflows, that transform source data, move data between multiple sources and sinks,
load the processed data into an analytical data store, or push the results straight to a
report or dashboard.
● To automate these workflows, we can use an orchestration technology such as Azure
Data Factory or Apache Oozie and Sqoop.
Big Data Technology Component:

Following the the components of Big Data technology:

1. Machine Learning (ML):


● It is the science of making computers learn things by themselves.
● In machine learning, a computer is expected to use algorithms and statistical
models to perform specific tasks without any explicit instructions.
● Machine learning applications provide results based on past experience
● For example, these days, there are some mobile applications that will give you a
summary of your finances, bills, will remind you of your bill payments, and also
may give you suggestions to go for some saving plans. These functions are done
by reading your emails and text messages.
2. Natural Language Processing (NLP):
● It is the ability of a computer to understand human language as spoken.
● For Example, When writing a mail, while making any mistakes, it automatically
corrects itself, and these days it gives auto-suggests for completing the mails
3. Business Intelligence (BI):
● Business Intelligence (BI) is a method or process that is technology used for
analyzing data and delivering actionable information that helps executives,
managers and workers make informed business decisions.
● The ultimate goal of BI initiatives is to drive better business decisions that enable
4. Cloud Computing:
● Cloud Computing is the delivery of computing services including servers, storage,
database, networking over the internet.
● For example, Dropbox allows users to access files and store up to one terabyte of
data.

Importance of Big Data


Big data is important due to following reasons:

1. Cost Saving:
Big Data tools like Apache Hadoop, Spark, etc. bring cost-saving benefits to businesses
when they have to store large amounts of data. These tools help organizations in
identifying more effective ways of doing business.

2. Time Saving:
Tools like Hadoop help them to analyze data immediately thus helping in making quick
decisions based on the learnings.

3. Understand the market condition


Big Data analysis helps businesses to get a better understanding of market situations.
For example, analysis of customer purchasing behavior helps companies to identify the
products sold most and thus produces those products accordingly. This helps companies
to get ahead of their competitors.

4. Social media Listening:


Big data tools can do sentiment analysis. Therefore, we can get feedback about who is
saying what about our company.

5. Boost Customer Acquisition and Retention


Customers are a vital asset on which any business depends on. No single business can
achieve its success without building a robust customer base. But even with a solid
customer base, the companies can’t ignore the competition in the market.
Big data analytics helps businesses to identify customer related trends and patterns.
Customer behavior analysis leads to a profitable business.

6. Solve Advertisers Problem and Offer Marketing Insights


Big data analytics shapes all business operations. It enables companies to fulfill customer
expectations. Big data analytics helps in changing the company’s product line. It ensures
powerful marketing campaigns.

Real-Time Benefits of Big Data

Big Data analytics has expanded its roots in all the fields. This results in the use of Big Data in a
wide range of industries including Finance and Banking, Healthcare, Education, Government,
Retail, Manufacturing, and many more.

There are many companies like Amazon, Netflix, Spotify, LinkedIn, Swiggy,etc which use big
data analytics. Banking sectors make the maximum use of Big Data Analytics. Education sector
is also using data analytics to enhance students’ performance as well as making teaching easier
for instructors.

Big Data analytics help retailers from traditional to e-commerce to understand customer behavior
and recommend products as per customer interest. This helps them in developing new and
improved products which help the firm enormously.

Big Data Applications

The term Big Data is referred to as a large amount of complex and unprocessed data.

Travel and Tourism


Travel and tourism are the users of Big Data. It enables us to forecast travel facilities
requirements at multiple locations, improve business through dynamic pricing, and many more.
Financial and banking sector
The financial and banking sectors use big data technology extensively. Big data analytics help
banks and customer behaviour on the basis of investment patterns, shopping trends,
motivation to invest, and inputs that are obtained from personal or financial backgrounds.

Healthcare

Big data has started making a massive difference in the healthcare sector, with the help of
predictive analytics, medical professionals, and health care personnel. It can produce
personalized healthcare and solo patients also.

Telecommunication and media


Telecommunications and the multimedia sector are the main users of Big Data. There are
zettabytes to be generated every day and handling large-scale data that require big data
technologies.

Government and Military

The government and military also used technology at high rates. We see the figures that the
government makes on the record. In the military, a fighter plane is required to process
petabytes of data.

Government agencies use Big Data and run many agencies, managing utilities, dealing with
traffic jams, and the effect of crime like hacking and online fraud.

E-commerce
E-commerce is also an application of Big data. It maintains relationships with customers that are
essential for the e-commerce industry. E-commerce websites have many marketing ideas to retail
merchandise customers, manage transactions, and implement better strategies of innovative ideas
to improve businesses with Big data.
Big Data Security

Big Data Security is the collective term for all the measures and tools used to guard both the data
and analytics methods against attacks, theft, or other malicious activities that could cause a
problem or negatively affect them. Like other forms of attacks, it can be compromised either by
attacks originating from online or offline spheres.

As compared to other areas to have securities issues and attacks happening every single minute,
these attacks can be on different components of it, like on stored data or the data source.

Why is it important?

Today almost every organization is thinking of adopting Big Data as they see the potential and
utilizing its power of it; they are using Hadoop to process these large data sets, and securing your
data is the most important step they are concerned about; independent of organization sizes,
everyone is trying to secure their data.

As it saves a different kinds of data from various sources, so we need to make security essential
as almost every enterprise that is using it has some form of sensitive data which needs to be
protected. Sensitive data can be the user’s credit card details, banking details, and
passwords. It is not a small thing, and we can’t describe it in the context of size, as size is
one of its main features of it. To secure it, someone can construct various strategies like
keeping out unauthorized users and intrusions with firewalls, making user authentication
reliable, giving training to end-user training, and many others.

What is the architecture of Big Data Security?

The basic Architecture to secure any platform contains different stages as follows:

1. Data Classification: In this phase, a training data set is provided to a classification


algorithm to categorize data into two categories, such as normal and sensitive, by
considering different types of possible attacks and the history of usage data.
2. Sensitive Data Encryption: At this step, sensitive data is encrypted with a
homomorphism cryptosystem
3. Data Storage: This stage focuses on storing normal and encrypted sensitive data on
separate system nodes
4. Data Access through Path-Hiding Approach: During this phase, any end-user seeking
specific data can utilize the path-hiding technique to obtain the data while ensuring data
privacy. The path-hiding technique prevents third parties from guessing data access
patterns, thereby securing the overall system.

Five Principles for Big Data Ethics

Data ethics encompasses the moral obligations of gathering, protecting, and using personally
identifiable information and how it affects individuals.

1. Private customer data and identity should remain private: Privacy does not mean

secrecy, as personal data might need to be audited based on legal requirements, but that

private data obtained from a person with their consent should not be exposed for use by

other businesses or individuals with any traces to their identity.

2. Shared private information should be treated confidentially: Third-party companies

share sensitive data — medical, financial or locational — and need restrictions on whether

and how that information can be shared further.

3. Customers should have a transparent view of how our data is being used or sold and the

ability to manage the flow of their private information across massive, third-party analytical

systems.

4. Big Data should not interfere with human will: Big data analytics can moderate and even

determine who we are before we make up our minds. Companies need to consider the kind

of predictions and inferences that should be allowed and those that should not.

5. Big data should not institutionalize unfair biases like racism or sexism. Machine learning

algorithms can absorb unconscious biases in a population and amplify them via training

samples.
BIG DATA ANALYTICS

● Big Data analytics is a process used to extract meaningful insights, such as hidden
patterns, unknown correlations, market trends, and customer preferences. Data analytics
technologies and techniques give organizations a way to analyze data sets and gather new
information

● Private Companies and research institutions capture terabytes of data about their user’s
interactions, business, social media, and also sensors from devices such as mobile phones
and automobiles.

● Big data analytics involves collecting data from different sources, managing it in a way
that it becomes available to be consumed by analysts and finally delivering data products
useful to the organization business.

● The process of converting large amounts of unstructured raw data, retrieved from
different sources to a data product useful for organizations forms the core of Big Data
analytics.

How does big data analytics work?


Here is an overview of the four steps of the big data analytics process:

1. Data professionals collect data from a variety of different sources. Often, it is a mix of
semistructured and unstructured data. While each organization will use different data
streams, some common sources include:

● internet clickstream data;p

● web server logs;

● cloud applications;

● mobile applications;

● social media content;

● text from customer emails and survey responses;


● mobile phone records; and

● machine data captured by sensors connected to the internet of things (IoT).

2. Data is prepared and processed. After data is collected and stored in a data warehouse or
data lake, data professionals must organize, configure and partition the data properly for
analytical queries. Thorough data preparation and processing makes for higher
performance from analytical queries.

3. Data is cleansed to improve its quality. Data professionals scrub the data using scripting
tools or data quality software. They look for any errors or inconsistencies, such as
duplications or formatting mistakes, and organize and tidy up the data.

4. The collected, processed and cleaned data is analyzed with analytics software. This
includes tools for:

● data mining, which sifts through data sets in search of patterns and relationships

● predictive analytics, which builds models to forecast customer behavior and other
future actions, scenarios and trends

● machine learning, which taps various algorithms to analyze large data sets

● deep learning, which is a more advanced offshoot of machine learning

● text mining and statistical analysis software

● artificial intelligence (AI)

● mainstream business intelligence software

● data visualization tools

Big data analytics benefits

The benefits of using big data analytics include:


● Quickly analyzing large amounts of data from different sources, in many different
formats and types.

● Rapidly making better-informed decisions for effective strategizing, which can


benefit and improve the supply chain, operations and other areas of strategic
decision-making.

● Cost savings, which can result from new business process efficiencies and
optimizations.

● A better understanding of customer needs, behavior and sentiment, which can lead to
better marketing insights, as well as provide information for product development.

● Improved, better informed risk management strategies that draw from large sample
sizes of data.

Conventional Systems
● The system consists of one or more zones each having either manually operated call
points or automatic detection devices, or a combination of both.

● Big data is a huge amount of data which is beyond the processing capacity of
conventional database systems to manage and analyze the data in a specific time interval.

Challenges of conventional systems:

● Big data is the storage and analysis of large data sets.

● These are complex data sets that can be both structured or unstructured.

● They are so large that it is not possible to work on them with traditional analytical tools.

● One of the major challenges of conventional systems was the uncertainty of the Data
Management Landscape.

● Big data is continuously expanding, there are new companies and technologies that are
being developed every day.
● A big challenge for companies is to find out which technology works best for them
without the introduction of new risks and problems.

● These days, organizations are realizing the value they get out of big data analytics and
hence they are deploying big data tools and processes to bring more efficiency in their
work environment.

Intelligent Data Analysis (IDA)

Intelligent Data Analysis (IDA) is one of the most important approaches in the field of data
mining.

Based on the basic principles of IDA and the features of datasets that IDA handles, the
development of IDA is briefly summarized from three aspects :

● Algorithm principle

● The scale

● Type of the dataset

Intelligent Data Analysis (IDA) is one of the major issues in artificial intelligence and
information.

Intelligent data analysis discloses hidden facts that are not known previously and provide
potentially important information or facts from large quantities of data.

It also helps in making a decision.

Based on machine learning, artificial intelligence, recognition of pattern, and records and
visualization technology, IDA helps to obtain useful information, necessary data and interesting
models from a lot of data available online in order to make the right choices.

IDA includes three stages:

(1) Preparation of data: Data Preparation involves selecting the required data from the relevant
data sources and integrating this into a data set to be used for data mining.
(2) Rule finding: Rule finding is working out rules contained in the data set by means of certain
methods or algorithms.

(3) Data validation and Explanation: Result validation requires examining these rules, and
result explanation is giving intuitive, reasonable and understandable descriptions using logical
reasoning.

Analytic processes
Big Data Analytics is the process of collecting large chunks of structured/unstructured data,
segregating and analyzing it and discovering the patterns and other useful business insights from
it.

These days, organizations are realizing the value they get out of big data analytics and hence
they are deploying big data tools and processes to bring more efficiency in their work
environment.

Many big data tools and processes are being utilized by companies these days in the processes of
discovering insights and supporting decision making.

Big data processing is a set of techniques or programming models to access large- scale data to
extract useful information for supporting and providing decisions.

Steps of Analytic Process:

Following the the steps of analytic process:

1. Deployment:

● In this phase, we need to plan the deployment, monitoring and maintenance.

● We produce a final report and review the project.

● We deploy the results of the analysis. THis is also known as reviewing the
project.

2. Business Understanding:
● Business objectives are defined in this phase.

● Whenever any requirement occurs, we need to assess the situation, determine data
mining goals and then produce the project plan as per the requirement.

3. Data Exploration:

● The step consists of data understanding.

● This is necessary to verify the quality of data collected.

● In this phase, we gather initial data, describe and explore the data and verify data
quality to ensure it contains the data we require.

● Data collected from the various sources is described in terms of its application
and the need for the project in this phase. This is also known as data exploration.
4. Data preparation:
● we need to format the data to get the appropriate data.
● Data is selected, cleaned, and integrated into the format finalized for the analysis
in this phase.
5. Data modeling:
● In this phase, we select the modeling techniques, generate test designs, build a
model and assess the model built.
● The data model is built to analyze relationships between various selected objects
in the data.
● Test cases built for assessing the model and model is tested and implemented on
the data in this phase.

Reporting and Analytics


The terms reporting and analytics are often used interchangeably. This is not surprising since
both take in data as “input” — which is then processed and presented in the form of charts,
graphs, or dashboards.

Reports and analytics help businesses improve operational efficiency and productivity, but in
different ways. While reports explain what is happening, analytics helps identify why it is
happening. Reporting summarizes and organizes data in easily digestible ways while analytics
enables questioning and exploring that data further. It provides invaluable insights into trends
and helps create strategies to help improve operations, customer satisfaction, growth, and other
business metrics.

Reporting and analysis are both important for an organization to make informed decisions by
presenting data in a format that is easy to understand. In reporting, data is brought together from
different sources and presented in an easy-to-consume format. Typically, modern reporting apps
today offer next-generation dashboards with high-level data visualization capabilities. There are
several types of reports being generated by companies including financial reports, accounting
reports, operational reports, market reports, and more. This helps understand how each function
is performing at a glance. But for further insights, it requires analytics.

Analytics enables business users to cull out insights from data, spot trends, and help make better
decisions. Next-generation analytics takes advantage of emerging technologies like AI, NLP, and
machine learning to offer predictive insights based on historical and real-time data.

Steps Involved in Building a Report and Preparing Data for Analytics

To build a report, the steps involved broadly include:

● Identifying the business need


● Collecting and gathering relevant data
● Translating the technical data
● Understanding the data context
● Creating reporting dashboards
● Enabling real-time reporting
● Offer the ability to drill down into reports

For data analytics, the steps involved include:

● Creating a data hypothesis


● Gathering and transforming data
● Building analytical models to ingest data, process it and offer insights
● Use tools for data visualization, trend analysis, deep dives, etc.
● Using data and insights for making decisions

Five Key Differences Between Reporting and Analysis

One of the key differences between reporting and analytics is that, while a report involves
organizing data into summaries, analysis involves inspecting, cleaning, transforming, and
modeling these reports to gain insights for a specific purpose.

1. Purpose: Reporting involves extracting data from different sources within an organization and
monitoring it to gain an understanding of the performance of the various functions. By linking
data from across functions, it helps create a cross-channel view that facilitates comparison to
understand data easily. An analysis is being able to interpret data at a deeper level, interpreting it
and providing recommendations on actions.

2. The Specifics: Reporting involves activities such as building, consolidating, organizing,


configuring, formatting, and summarizing. It requires clean, raw data and reports that may be
generated periodically, such as daily, weekly, monthly, quarterly, and yearly. Analytics includes
asking questions, examining, comparing, interpreting, and confirming. Enriching data with big
data can help predict future trends as well.
3. The Final Output: In the case of reporting, outputs such as canned reports, dashboards, and
alerts push information to users. Through analysis, analysts try to extract answers using business
queries and present them in the form of ad hoc responses, insights, recommended actions, or a
forecast. Understanding this key difference can help businesses leverage analytics better.

4. People: Reporting requires repetitive tasks that can be automated. It is often used by
functional business heads who monitor specific business metrics. Analytics requires
customization and therefore depends on data analysts and scientists. Also, it is used by business
leaders to make data-driven decisions.

5. Value Proposition: This is like comparing apples to oranges. Both reporting and analytics
serve a different purpose. By understanding the purpose and using them correctly, businesses can
derive immense value from both.

You might also like