DAV practical 2
DAV practical 2
Practical Number 2
Title of Practical Data Exploration: Knowing the data, Data preparation and Cleaning.
Prior Concepts:
Exploring data is a crucial step in understanding its characteristics, trends, and
underlying patterns. Conducting experiments in data exploration involves various
techniques and tools to gain insights into the dataset. Following are the different
approaches to conduct a data exploration.
Page | 1
visualization in lower dimensions. - Cluster analysis to identify natural groupings
within the data.
8. Iterative Process:
- Data exploration is often iterative. Revisit steps, try different techniques,
and compare results to gain a comprehensive understanding of the dataset.
9. Ethical Considerations:
- Ensure ethical use of data, especially regarding privacy, biases, and the
implications of insights drawn from the data.
New Concept:
Data loading in R
CSV File
# Load a CSV file
2. EXCEL File
# Load an Excel file (assuming 'readxl' package is
installed) library(readxl) data <-
read_excel("your_file.xlsx")
Page | 2
# View the first few rows of the dataset head(data)
Data preparation and cleaning involve various steps to ensure that the dataset is in a
suitable format for analysis or modeling. Data Cleaning Steps:
1. Feature Engineering:
- Create new features from existing ones based on domain knowledge or insights.
- Transform variables (e.g., log transformation for skewed data) to improve the
distribution of data.
2. Standardization and Normalization:
- Scale numerical variables to a standard scale using methods like `scale()` for
standardization or min- max scaling for normalization.
Page | 3
- Normalize features to bring them on a similar scale, especially when using distance-
based algorithms.
3. Handling Categorical Variables:
- Convert categorical variables to factors using `as.factor()` or one-hot
encode them using techniques from packages like `dummies`.
- Handle ordinal variables appropriately by assigning levels.
4. Data Splitting:
- Split the dataset into training and testing subsets using functions like
`sample()` or from packages like `caret` or `tidymodels`.
5. Handling Date and Time Variables:
- Convert date/time variables to appropriate formats using functions like `as.Date()` or
`as.POSIXct()`.
These steps ensure that your data is clean, formatted correctly, and ready for
analysis or modeling tasks in R.
Adjust these methods based on your specific dataset and analysis requirements.
Page | 4
Learning Objectives:
To understand the different techniques of Data exploration, Data preparation and
Cleaning.
Conclusion/Learning outcome:
The use of different tools and commands for understanding the data are studied and
implemented. The concept of Data preparation and Cleaning is understood and
implemented in R languge.
R1 R2 R3
DOP DOS Conduction File Record Viva Voice Total Signature
5 Marks 5 Marks 5 Marks 15 Marks
Page | 5
Page | 6