0% found this document useful (0 votes)
7 views

What is Data Analytics

Data analytics encompasses processes, tools, and technologies for managing data to uncover patterns, organize information, and generate business insights. It involves steps such as data collection, pre-processing, analysis, and model deployment, with types including descriptive, diagnostic, predictive, prescriptive, real-time, and augmented analytics. Additionally, data can be structured or unstructured, each with its own characteristics, challenges, and tools for management.

Uploaded by

gajrakrupa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

What is Data Analytics

Data analytics encompasses processes, tools, and technologies for managing data to uncover patterns, organize information, and generate business insights. It involves steps such as data collection, pre-processing, analysis, and model deployment, with types including descriptive, diagnostic, predictive, prescriptive, real-time, and augmented analytics. Additionally, data can be structured or unstructured, each with its own characteristics, challenges, and tools for management.

Uploaded by

gajrakrupa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

What is Data Analytics?

Data analytics is defined as a set of processes, tools, and technologies that help
manage qualitative and quantitative data to enable discovery (Discover useful
patterns or trends), simplify organization (Organize information in a simpler
way), support governance (Ensure data is used and managed properly), and
generate insights for a business.

Data Analytics Steps:


Data Collection: Collect data from different IoT sources like audio, images, and
light sensors. Since the data comes in various forms, IoT analytics becomes
essential to make sense of it.
Data Pre-processing: Prepare the data by filling in missing information,
importing necessary tools, converting categories into numbers, and normalizing
data for better analysis.
Data Analysis: Explore the data to find patterns and key statistics. This step
helps form ideas and decide the best methods for creating models.
Train and Test the Data: Use machine learning or deep learning to create models
based on the data. These models are tested and fine-tuned to ensure they work
effectively.
Deployment and Improvement: Put the model into real-world use to solve
problems. Keep improving it based on feedback and new data.

Types of Data Analytics


1. Descriptive Analytics:
 Describes what has happened over time using past data.
 Helps identify trends or patterns.
 Example: Have more people visited our website recently? Are this
month’s sales higher than last month’s?
 Purpose: To summarize past events and show progress.
2. Diagnostic Analytics:
 Explains why something happened by analyzing the reasons behind the
data.
 Uses different kinds of data and asks questions to find connections.
 Example: Did beer sales increase because of hot weather? Was the rise in
sales due to a new marketing campaign?
 Purpose: To uncover causes and make sense of data.
3. Predictive Analytics:
 Predicts what is likely to happen next based on past data and trends.
 Uses historical patterns to forecast future outcomes.
 Example: How do sales change during hot summers? Are weather models
predicting another hot summer?
 Purpose: To plan for the future by understanding what might happen.
4. Prescriptive Analytics:
 Suggests what actions to take based on predictions.
 Combines past data, current trends, and possible outcomes to give advice.
 Example: If there’s a 58% chance of a hot summer, the brewery could add
an evening shift and rent another tank to meet demand.
 Purpose: To help make decisions and improve results.
5. Real-time Data Analytics:
 Works with live data as it comes in.
 Unlike other types that rely on old data, it analyzes new data right away.
 Example: Tracking customer orders immediately or identifying delays as
they happen.
 Purpose: To act quickly using up-to-date information.
6. Augmented Data Analytics:
 Uses Artificial Intelligence (AI), Machine Learning (ML), and Natural
Language Processing (NLP) to make data analysis faster and easier.
 Automates complex tasks like exploring data or generating insights,
making them accessible to non-technical users.
 Example: AI tools that summarize sales reports or suggest trends without
needing coding.
 Purpose: To simplify analysis and save time.
Data Analytics Techniques
1. Regression Analysis:
o This technique looks at the relationship between one or more
independent variables (factors we can control) and a dependent
variable (the outcome we want to understand).
o It helps show how changes in the independent variables affect the
dependent variable.
o Example: How does advertising spending (independent variable)
affect sales (dependent variable)?
2. Factor Analysis:
o This involves taking a complex dataset with many variables and
reducing them to a smaller set.
o The goal is to find hidden trends or patterns that are harder to see
in the full data.
o Example: Simplifying customer preferences from a long list of
attributes to discover key buying factors.
3. Cohort Analysis:
o This involves grouping data into categories that share similar
characteristics, like customer demographics.
o It helps analysts dive deeper into specific groups of data to
understand trends or behaviors better.
o Example: Looking at how different age groups (cohorts) behave
differently when purchasing a product.
4. Monte Carlo Simulations:
o These simulations predict the probability of different outcomes by
using random values.
o They are often used for risk management and preventing potential
losses.
o Example: Estimating the chance of various stock prices at the end
of the year based on different market conditions.
5. Time Series Analysis:
o This technique tracks data over time to understand how a data point
changes as time passes.
o It’s useful for spotting cyclical trends (like seasonal changes) or for
making financial forecasts.
o Example: Analyzing monthly sales over the past year to predict
sales for the next quarter.

Importance of Data Analytics


o Reduce the cost of operation
o Predict future trends
o Monitor product performance
o Strengthen security

What Is Structured Data?


Structured data is well-organized and follows a specific format, usually stored
in databases.
It’s like data neatly arranged in rows and columns, which makes it easy to
process and analyze.
Because of its organized structure, it allows for quick access and use, helping
businesses run smoothly and efficiently.

Main Characteristics of Structured Data


1. Organization and Format
o Structured data is organized in a specific format, often in tables
with rows and columns.
o Each column represents a certain type of data (like name, age, or
date), and each row represents a record (like a person or a product).
o This organization makes it easy to search, categorize, and store the
data.
2. Relational Integrity
o Structured data maintains relationships between different data
points.
o For example, if a customer’s name is listed in one table and their
order details in another, these tables are linked by a common
identifier (like customer ID).
o This ensures that the data remains consistent and accurate,
preventing conflicting or missing information.
3. Ease of Querying and Analysis
o Because the data is well-organized, it’s easy to retrieve or search
for specific information.
o For example, you can quickly find a list of all customers who
purchased a product in the last month.
o This helps in making quick decisions and allows businesses to
analyze data easily to spot trends or make predictions.
4. Simplified Processing
o Since structured data follows a predefined format, it’s
straightforward to process.
o There’s no need to guess the structure or clean the data, as it’s
already in a usable form.
o This makes data handling faster and more efficient, especially for
tasks like adding, updating, or deleting data.

Examples of Structured Data


1. E-commerce:
In online shopping, structured data is used to store information like:
o Product reviews
o Pricing details
o SKU numbers (unique product identifiers)
2. Healthcare:
Hospitals and clinics use structured data to organize:
o Patient information (name, age, medical conditions)
o Medical history
o Prescription and pharmacy records
o Hospital administration details (appointments, staff schedules)
3. Banking:
Banks store structured data about:
o Financial transactions (e.g., deposits, withdrawals)
o Account details (account number, balance)
o Beneficiary information (who is receiving money)
o Sender/Receiver information for transfers
4. Customer Relationship Management (CRM) Software:
CRM systems use structured data to track:
o Lead information (potential customers)
o Source of the lead (where the lead came from)
o Activity (actions taken by leads, such as email responses)
5. Travel Industry:
Travel companies and airlines organize structured data for:
o Passenger details (names, contact info)
o Flight schedules and booking info
o Travel transactions (tickets purchased, payment info)

Challenges of Structured Data:


1. Not Flexible: Structured data can be hard to adjust or expand as the
amount of data grows. It has a fixed format, so it’s not easy to change or
scale for larger databases.
2. Depends on a Fixed Structure: Structured data depends on a specific
format (schema). This means every data entry needs to follow a set
structure, which makes it harder to change or add new types of data when
needed.
3. Takes Time to Load and Store: Loading structured data can take longer
than expected. Problems with the source system, like outdated data, can
make it hard to update or retrieve the information quickly, and it can take
up more cloud storage space.
4. Hard to Adjust to Changing Business Needs: Structured data doesn’t
handle changes well. As business needs change, it can be difficult to
know in advance what kind of queries will give the best results.
5. Manual Data Entry: Structured data often requires users to enter
information manually using commands like Create, Insert, and Select.
This process can be slow and prone to mistakes.

Tools of Structured Data


MySQL
PostgreSQL
Oracle Database
Microsoft SQL Server
Apache Hive
IBM Db2

What is unstructured data?


Unstructured data is usually qualitative and cannot be processed with regular
data tools.
It’s often called "schema on read" or "schema independent" data because it
doesn’t have a set structure when it’s created.
Examples of unstructured data include things like text, videos, audio files,
mobile activity, social media posts, satellite images, and surveillance footage.
This type of data is hard to break down and analyze because it doesn’t follow a
specific data model or organization, so it can’t be stored in traditional relational
databases.
To manage unstructured data, NoSQL databases are often used, as they don’t
require a fixed schema.
Another method to handle unstructured data is by putting it into a data lake,
where it can stay in its raw, unstructured form until needed.
Examples of unstructured data:
 Retail Website Data: By analyzing unstructured data from a retail
website, businesses can understand customer buying habits, timing of
purchases, product sentiment, and more.
 Predictive Analytics: Unstructured data, like sensor data from industrial
machinery, can help predict and alert manufacturers to unusual behavior
or potential failures before they happen.
 Rich Media: This includes data from social media posts, entertainment
content (like movies and music), surveillance footage, satellite imagery,
geospatial data, weather forecasts, and podcasts.
 Documents: Examples of unstructured data include invoices, records, web
browsing history, emails, and files from productivity apps like Word or
Excel.
 Internet of Things (IoT): Data generated by IoT devices such as sensor
data or stock ticker information can be unstructured.
 Analytics: Machine learning and artificial intelligence (AI) often deal
with unstructured data to make predictions or derive insights.

Challenges of unstructured data:


 Difficult to Understand: Unstructured data is harder to interpret because it
doesn’t follow a set structure. Analyzing it requires specialized skills and
tools.
 Requires Expertise: To properly analyze and integrate unstructured data
with machine learning algorithms, users need a strong background in data
science and machine learning.
 Security Risks: Unstructured data often resides on shared, less secure
servers, making it more vulnerable to ransomware and cyberattacks.
 Limited Tools: There are fewer tools available for processing
unstructured data, with most relying on cloud services and open-source
NoSQL databases.

Tools of unstructured data


o MongoDB
o Cassandra
o CouchDB
o Amazon S3 (Simple Storage Service)
o Google Cloud Storage

Structured Vs. Unstructured Data

Three States of Data


1. Data at Rest
 Refers to data that is stored and not actively moving or being processed.
 It is typically found on storage mediums like hard drives, databases,
cloud storage, or external devices like USBs.
 This data is at risk of theft, unauthorized access, or corruption if not
adequately secured.
 Examples: Files saved on a company server, archived emails, or sensitive
records in a database.

2. Data in Transit (Data in Motion)


 Refers to data that is actively being transmitted from one system or
location to another.
 It could be over private networks, public networks, or via communication
tools.
 This data can be seen, heard, or changed while it's being sent.
 Examples: Sending an email, transferring files through FTP, or streaming
video data over the internet.

3. Data in Use
 Refers to data that is currently being accessed, processed, or modified by
an application or user.
 It includes data being temporarily stored in system memory or CPU for
processing.
 This data is more exposed to threats like unauthorized viewing or
malicious manipulation.
 Examples: Editing a document in a word processor, updating a database
entry, or analyzing a dataset in a software tool.

Protecting the Three States of Data


1. Protecting Data at Rest
 Encryption: Use strong encryption (e.g., AES) to make data unreadable if
accessed without authorization.
 Access Control: Implement strong passwords, multi-factor authentication
(MFA), and role-based access.
 Physical Security: Keep servers, hard drives, and backups in secure
locations.
 Backup Strategy: Regularly back up data and store it securely to prevent
loss.
 Data Masking: Mask sensitive data in testing environments to minimize
exposure.

2. Protecting Data in Transit


 Encryption in Transit: Use protocols like TLS/SSL to secure data during
transfer.
 Secure Channels: Utilize VPNs or encrypted platforms for
communication.
 Data Integrity Checks: Verify data integrity using checksums or similar
methods.
 Endpoint Protection: Protect devices and servers with firewalls and
antivirus software.
 Secure APIs: Use encrypted connections and tokens for API data
transfers.

3. Protecting Data in Use


 Controlled Access: Restrict access to data based on roles and needs.
 Secure Environments: Process sensitive data in secure memory or
encrypted enclaves.
 Data Minimization: Limit data processed to only what is necessary.
 Real-time Monitoring: Detect unauthorized activity with monitoring
tools.
 Software Patching: Keep systems and software updated to fix
vulnerabilities.

Data in Motion vs. Data at Rest

You might also like