0% found this document useful (0 votes)
30 views3 pages

Dpa M.tech

The document discusses several key concepts in data analysis: 1) The four main data types are text, number, logical, and error. It is important to know which type to use for different functions and how types may change when exporting data. 2) Data parsing converts data between formats, often making unstructured data more comprehensible for tasks like data structuring. 3) Data cleaning fixes or removes incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data to improve dataset quality. Combining multiple sources introduces duplication and mislabeling risks.

Uploaded by

NAKKA PUNEETH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views3 pages

Dpa M.tech

The document discusses several key concepts in data analysis: 1) The four main data types are text, number, logical, and error. It is important to know which type to use for different functions and how types may change when exporting data. 2) Data parsing converts data between formats, often making unstructured data more comprehensible for tasks like data structuring. 3) Data cleaning fixes or removes incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data to improve dataset quality. Combining multiple sources introduces duplication and mislabeling risks.

Uploaded by

NAKKA PUNEETH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

What are the 4 data formats?

The four types of data are text, number, logical and error. You may perform different
functions with each type, so it's important to know which ones to use and when to use them. You
may also consider that some data types may change when exporting data into a spreadsheet.

Data parsing is converting data from one format to another. Widely used for data structuring, it is
generally done to make the existing, often unstructured, unreadable data more comprehensible.

What is data cleaning?

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted,
duplicate, or incomplete data within a dataset. When combining multiple data sources, there are
many opportunities for data to be duplicated or mislabeled.
Data Segmentation is the process of taking the data you hold and dividing it up and grouping
similar data together based on the chosen parameters so that you can use it more efficiently

within marketing and operations. Examples of Data Segmentation could be: Gender.

Demographic, psychographic, geographic, and behavioral are the four pillars of market

segmentation, but consider using these four extra types to enhance your marketing efforts.

Data Segmentation is the process of taking the data you hold and dividing it up and grouping

similar data together based on the chosen parameters so that you can use it more efficiently

within marketing and operations. Examples of Data Segmentation could be: Gender. Customers

vs.
Clustering is used to identify groups of similar objects in datasets with two or more variable
quantities. In practice, this data may be collected from marketing, biomedical, or geospatial

databases, among many other places.

Clustering itself can be categorized into two types viz. Hard Clustering and Soft Clustering. In
hard clustering, one data point can belong to one cluster only. But in soft clustering, the output
provided is a probability likelihood of a data point belonging to each of the pre-defined numbers
of clusters.
Grouping unlabeled examples is called clustering. As the examples are unlabeled, clustering
relies on unsupervised machine learning. If the examples are labeled, then clustering becomes
classification.

The visualization techniques include Pie and Donut Charts, Histogram Plot, Scatter Plot,
Kernel Density Estimation for Non-Parametric Data, Box and Whisker Plot for Large Data,
Word Clouds and Network Diagrams for Unstructured Data, and Correlation Matrices.

BASIC VISUALIZATIOS
 Basic graphs in R can be created quite easily. The plot command is the command to note.
 It takes in many parameters from x axis data , y axis data, x axis labels, y axis labels, color and
title. ...
 If you want a boxplot, you can use the word boxplot, and for barplot use the barplot function.

You might also like