0% found this document useful (0 votes)
76 views

Da-I Unit

The document discusses three main challenges of big data: volume, processing, and management. It describes how the volume of data is exploding and growing exponentially each year. More than 80% of data is unstructured and too large to manage effectively. It also discusses challenges around processing vast amounts of unstructured data from various sources and managing data that has complex or unstructured formats. The document then provides an overview of the basic steps in web analytics: data collection, processing data into metrics and information, developing key performance indicators, and formulating online strategies. It also notes the importance of experiments and testing, like A/B testing, in optimizing websites.

Uploaded by

G Kalaiarasi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Da-I Unit

The document discusses three main challenges of big data: volume, processing, and management. It describes how the volume of data is exploding and growing exponentially each year. More than 80% of data is unstructured and too large to manage effectively. It also discusses challenges around processing vast amounts of unstructured data from various sources and managing data that has complex or unstructured formats. The document then provides an overview of the basic steps in web analytics: data collection, processing data into metrics and information, developing key performance indicators, and formulating online strategies. It also notes the importance of experiments and testing, like A/B testing, in optimizing websites.

Uploaded by

G Kalaiarasi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

challenges of conventional system in big data

Three Challenges That big data face.

 Data

 Process

 Management

Volume
1. The volume of data, especially machine-generated data, is exploding,
2. how fast that data is growing every year, withnew sources of data that are emerging.
3. For example, in the year 2000, 800,000petabytes (PB) of data were stored in the world,and it is expected to
reach 35 zettabytes (ZB) by2020 (according to IBM).

Processing

More than 80% of today’s information isunstructured and it is typically too big to manage effectively.
Today, companies are looking to leverage a lot more data from a wider variety of sources both insideand outside the organization.
Things like documents, contracts, machine data, sensor data, social media, health records,emails, etc. The list is
endless really.

Management

A lot of this data is unstructured, or has a complex structure that’s hard to represent in rows and columns.

Web Data

web data for purposes of understanding and optimizing web usage.However, Web analytics is not just a process for
measuring web traffic but can be used as a tool for business and market research, and to assess and improve the
effectiveness of a website. Web analytics applications can also help companies measure the results of traditional print or
broadcast advertising campaigns

An advertising campaign is a series of advertisement messages that share a single idea and theme which
make up an integrated marketing communication (IMC). An IMC is a platform in which a group of people can group their
ideas, beliefs, and concepts into one large media base. Advertising campaigns utilize diverse media channels over a
particular time frame and target identified audiences

Basic steps of the web analytics process

Collection of data Processing data Developing Formulating


into information Online Strategy
Key
performance
Indicators
Basic Steps of Web Analytics Process

Most web analytics processes come down to four essential stages or steps ,which are:
• Collection of data: This stage is the collection of the basic, elementary data. Usually, these data are counts of things. The
objective of this stage is to gather the data.
• Processing of data into information: This stage usually take counts and make them ratios, although there still may be some
counts. The objective of this stage is to take the data and conform it into information, specifically metrics.
• Developing KPI: This stage focuses on using the ratios (and counts) and infusing them with business strategies, referred to as
Key Performance Indicators (KPI). Many times, KPIs deal with conversion aspects, but not always. It depends on the
organization.
• Formulating online strategy: This stage is concerned with the online goals, objectives, and standards for the organization
or business. These strategies are usually related to making money, saving money, or increasing market share.
Another essential function developed by the analysts for the optimization of the websites are the experiments
• Experiments and testings: A/B testing is a controlled experiment with two variants, in online settings, such as web
development.
The goal of A/B testing is to identify changes to web pages that increase or maximize a statistically tested result of interest.
Each stage impacts or can impact (i.e., drives) the stage preceding or following it. So, sometimes the data that is available for
collection impacts the online strategy. Other times, the online strategy affects the data collected.

process of data analysis


Analysis refers to breaking a whole into its separate components for individual
examination. Data analysis is a process for obtaining raw data and converting it into information useful for
decision-making by users. Data is collected and analyzed to answer questions, test hypotheses or disprove
theories.
Data requirements
The data is necessary as inputs to the analysis, which is specified based upon the
requirements of those directing the analysis or customers (who will use the finished product of the analysis).
The general type of entity upon which the data will be collected is referred to as an experimental unit. .
Data collection
Data is collected from a variety of sources. The requirements may be
communicated by analysts to custodians of the data, such as information technology personnel
within an organization. The data may also be collected from sensors in the environment, such as traffic cameras,
satellites, recording devices, etc. It may also be obtained through interviews, downloads from online sources, or
reading documentation.
Data processing

Data initially obtained must be processed or organised for analysis. For instance, these
may involve placing data into rows and columns in a table format (i.e., structured data) for further analysis, such
as within a spreadsheet or statistical software.[4]

Data cleaning
Once processed and organized, the data may be incomplete, contain duplicates, or
contain errors. The need for data cleaning will arise from problems in the way that data is entered and stored.
Data cleaning is the process of preventing and correcting these errors. Common tasks include record matching,
identifying inaccuracy of data, overall quality of existing data,[5] deduplication, and column segmentation.

Exploratory data analysis


Once the data is cleaned, it can be analyzed. Analysts may apply a variety of techniques
referred to as exploratory data analysis to begin understanding the messages contained in the data. The
process of exploration may result in additional data cleaning or additional requests for data, so these activities
may be iterative in nature. Descriptive statistics, such as the average or median, may be generated to help
understand the data. Data visualization may also be used to examine the data in graphical format, to obtain
additional insight regarding the messages within the data.
Modeling and algorithms
Mathematical formulas or models called algorithms may be applied to the data to identify
relationships among the variables, such as correlation or causation. To evaluate a particular variable in the data
based on other variable(s) in the data, with some residual error depending on model accuracy (i.e., Data = Model
+ Error).
Data product
A data product is a computer application that takes data inputs and generates outputs, feeding them back
into the environment. It may be based on a model or algorithm. An example is an application that analyzes data
about customer purchasing history and recommends other purchases the customer might enjoy
Communication

Once the data is analyzed, it may be reported in many formats to the users of the analysis to support their
requirements. The users may have feedback, which results in additional analysis.
When determining how to communicate the results, the analyst may consider data visualization
techniques to help clearly and efficiently communicate the message to the audience

Leading Data Analytics Tools

1) Microsoft Excel
Probably not the first thing that comes to mind, but Excel is one of the most widely used analytics tools in the
world given its massive installed base. You won’t use it for advanced analytics to be sure, but Excel is a great
way to start learning the basics of analytics not to mention a useful tool for basic grunt work. It supports all the
important features like summarizing data, visualizing data, and basic data manipulation. It has a huge user
community with plenty of support, tutorials and free resources.
2) IBM HYPERLINK "https://www.ibm.com/products/cognos-analytics"Cognos HYPERLINK
"https://www.ibm.com/products/cognos-analytics" Analytics
IBM’s Cognos Analytics is an upgrade to Cognos Business Intelligence (Cognos BI). Cognos Analytics has a
Web-based interface and offers data visualization features not found in the BI product. It provides self-service
analytics with enterprise security, data governance and management features. Data can be sourced from
multiple sources to create visualizations and reports.
3) The R language
R has been around more than 20 years as a free and open source project, making it quite popular, and R was
designed to do one thing: analytics. There are numerous add-on packages and Microsoft supports it as part of its
Big Data efforts. Extra packages include Big Data support, connecting to external databases, visualizing data,
mapping data geographically and performing advanced statistical functions. On the down side, R has been
criticized for being single threaded in an era where parallel processing is imperative.
3) Sage Live
Sage Live is a cloud-based accounting platform for small and mid-sized businesses, with features like the ability
to create and send invoices, accept payments, pay bills, record receipts and record sales, all from within a
mobile-capable platform. It supports multiple companies, currencies and banks and integrates with Salesforce
CRM for no additional charge.
4) Sisense
Sisense’s self-titled product is a BI solution that provides advanced analytical tools for analysis, visualization
and reporting. Sisense allows businesses to merge data from many sources and merge it into a single database
where it does the analysis. It can be deployed on-premises or hosted in the cloud as a SaaS application.
5) Chart.io
Chart.io is a drag and drop chart creation tool that works on a tablet or laptop to build connections to databases,
ranging from MySQL to Oracle, and then creates scripts for data analysis. Data can be
blended from multiple sources with a single click before executing analysis. It makes a variety of
charts, such as bar graphs, pie charts, scatter plots, and more.
6) SAP HYPERLINK "https://www.sap.com/products/bi-platform.html"BusinessObjects
SAP’s BusinessObjects provides a set of centralized tools to perform a wide variety of BI and
analytics, from ETL to data cleansing to predictive dashboards and reports. It’s modular so
customers can start small with just the functions they need and grow the app with their business.
It supports everything from SMBs to large enterprises and can be configured for a number of
vertical industries. It also supports Microsoft Office and Salesforce SaaS.
7) Netlink Business Analytics
Netlink’s Business Analytics platform is a comprehensive on-demand solution, meaning no
Capex investment. It can be accessed via a Web browser from any device and scale from a
department to a full enterprise. Dashboards can be shared among teams via the collaboration
features. The features are geared toward sales, with advanced analytic capabilities around sales
& inventory forecasting, voice and text analytics, fraud detection, buying propensity, sentiment,
and customer churn analysis.
8) Domo
Domo is another cloud-based business management suite is browser-accessible and scales
from a small business to a giant enterprise. It provides analysis on all business-level activity,
like top selling products, forecasting, marketing return on investment and cash balances. It
offers interactive visualization tools and instant access to company-wide data via customized
dashboards.
9) InetSoft Style Intelligence
Style Intelligence is a business intelligence software platform that allows users to create
dashboards, visual analyses and reports via a data engine that integrates data from multiple
sources such as OLAP servers, ERP apps, relational databases and more. InetSoft’s
proprietary Data Block technology enables the data mashups to take place in real time. Data
and reports can be accessed via dashboards, enterprise reports, scorecards and exception
alerts.
10) Dataiku
Dataiku develops Dataiku Data Science Studio (DSS), a data analysis and collaboration platform
that helps data analysts work together with data scientists to build more meaningful data
applications. It helps prototype and build data-driven models and extract data from a variety of
sources, from databases to Big Data repositories.
11) Python
Python is already a popular language because it’s powerful and easy to learn. Over the years,
analytics features have been added, making it increasingly popular with developers looking to do
analytics apps but wanting more power than the R language. R is built for one thing, statistical
analysis, but Python can do analytics plus many other functions and types of apps, including
machine learning and analytics.
12) Apache Spark
Spark is Big Data analytics designed to run in-memory. Early Big Data systems like Hadoop
were batch processes that ran during low utilization (at night) and were disk-based. Spark is
meant to run in real time and entirely in memory, thus allowing for much faster real-time
analytics. Spark has easy integration with the Hadoop ecosystem and its own machine learning
library. And it’s open source, which means it’s free.
13) SAS Institute
SAS is a long-time BI vendor, so its move into analytics was only natural. to be widely used in
the industry. Two of its major apps are SAS Enterprise Miner and SAS Visual Analytics.
Enterprise Miner is good for core statistical analysis, data analytics and machine learning. It’s
mature and has been around a while, with a lot of macros and code for specific uses. Visual
Analytics is newer and designed to run in distributed memory on top of Hadoop.
14) Tableau
Tableau is a data visualization software package and one of the most popular on the market.
It’s a fast visualization software which lets you explore data and make all kinds of analysis and
observations by drag and drop interfaces. Its intelligent algorithms figure out the type of data
and the
best method available to process it. You can easily build dashboards with the GUI and
connect to a host of analytical apps, including R.
15) Splunk
Splunk Enterprise started out as a log-analysis tool, but has grown to become a broad based
platform for searching, monitoring, and analyzing machine-generated Big Data. The software
can import data from a variety of sources, from logs to data collected by Big Data applications
such as Hadoop or sensors. It then generates reports a non-IT business person can easily read
and understand.

Analysis verses reporting


Living in the era of digital technology and big data has made organizations dependent on the
wealth of information data can bring. You might have seen how reporting and analysis are used
interchangeably, especially the manner which outsourcing companies market their services. While
both areas are part of web analytics (note that analytics isn’t similar to analysis), there’s a vast
difference between them, and it’s more than just spelling.
It’s important that we differentiate the two because some organizations might be selling
themselves short in one area and not reap the benefits, which web analytics can bring to the
table. The first core component of web analytics, reporting, is merely organizing data into
summaries. On the other hand, analysis is the process of inspecting, cleaning, transforming,
and modeling these summaries (reports) with the goal of highlighting useful information.
Simply put, reporting translates data into information while analysis turns information into
insights. Also, reporting should enable users to ask “What?” questions about the information,
whereas analysis should answer to “Why”” and “What can we do about it?”
Here are five differences between reporting and analysis:

1. Purpose
Reporting helps companies monitor their data even before digital technology boomed. Various
organizations have been dependent on the information it brings to their business, as reporting
extracts that and makes it easier to understand.
Analysis interprets data at a deeper level. While reporting can link between cross- channels of
data, provide comparison, and make understand information easier (think of a dashboard,
charts, and graphs, which are reporting tools and not analysis reports), analysis interprets this
information and provides recommendations on actions.
2. Tasks
As reporting and analysis have a very fine line dividing them, sometimes it’s easy to confuse
tasks that have analysis labeled on top of them when all it does is reporting. Hence, ensure that
your analytics team has a healthy balance doing both.
Here’s a great differentiator to keep in mind if what you’re doing is reporting or analysis:
Reporting includes building, configuring, consolidating, organizing, formatting, and
summarizing. It’s very similar to the abovementioned like turning data into charts, graphs, and
linking data across multiple channels.
Analysis consists of questioning, examining, interpreting, comparing, and confirming.
With big data, predicting is possible as well.

3. Outputs
Reporting and analysis have the push and pull effect from its users through their outputs.
Reporting has a push approach, as it pushes information to users and outputs come in the
forms of canned reports, dashboards, and alerts.
Analysis has a pull approach, where a data analyst draws information to further probe and to
answer business questions. Outputs from such can be in the form of ad hoc responses and
analysis presentations. Analysis presentations are comprised of insights, recommended actions,
and a forecast of its impact on the company—all in a language that’s easy to understand at the
level of the user who’ll be reading and deciding on it.
This is important for organizations to realize truly the value of data, such that a standard
report is not similar to a meaningful analytics.

4. Delivery
Considering that reporting involves repetitive tasks—often with truckloads of data, automation
has been a lifesaver, especially now with big data. It’s not surprising that the first thing
outsourced are data entry services since outsourcing companies are perceived as data reporting
experts.
Analysis requires a more custom approach, with human minds doing superior reasoning and
analytical thinking to extract insights, and technical skills to provide efficient steps towards
accomplishing a specific goal. This is why data analysts and scientists are demanded these
days, as organizations depend on them to come up with recommendations for leaders or
business executives make decisions about their businesses.

5. Value
This isn’t about identifying which one brings more value, rather understanding that both are
indispensable when looking at the big picture. It should help businesses grow, expand, move
forward, and make more profit or increase their value.
This Path to Value diagram illustrates how data converts into value by reporting and analysis
such that it’s not achievable without the other.
Data — Reporting — Analysis — Decision-making — Action — VALUE
Data alone is useless, and action without data is baseless. Both reporting and analysis are
vital to bringing value to your data and operations.

Reporting and Analysis are Valuable


Not to undermine the role of reporting in web analytics, but organizations need to understand
that reporting itself is just numbers. Without drawing insights and
getting reports aligned with your organization’s big picture, you can’t make decisions
based on reports alone.
Data analysis is the most powerful tool to bring into your business. Employing the powers of
analysis can be comparable to finding gold in your reports, which allows your business to
increase profits and further develop.
Having accurate research is crucial in devising various marketing and advertising materials for
your target market, while taking into account their needs as well as the advantage of your
competitors. We can help you come up with comprehensive strategies through our extensive
research services, which are carefully tailored for your immediate business concerns.
Why Data Analysis?
Companies that are not leveraging modern data analytic tools and techniques are falling apart.
Since Data Analytics tools capture products that automatically glean and analyze data, deliver
information and predictions, you can improve prediction accuracy and refine the models.
Bonus: Want to Transform your Career in Data Analytics? Attend Live Data Analytics Orientation
Session.
Goals of Performing Data Analysis

• You can analyze data.


• Extract actionable and commercially relevant information to boost performance.
• Several extraordinary analytical tools are available, that are free and open source so that
you can leverage it to enhance your business and develop skills.

You might also like