0% found this document useful (0 votes)
6 views

FDS qb cie-2

The document discusses various concepts in data science, including data mining, types of databases, OLAP vs OLTP, and data visualization. It explains the processes of ETL (Extract, Transform, Load) and the importance of data cleaning and integration. Additionally, it outlines applications of data transformation and measures of data similarity.

Uploaded by

oltofer9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

FDS qb cie-2

The document discusses various concepts in data science, including data mining, types of databases, OLAP vs OLTP, and data visualization. It explains the processes of ETL (Extract, Transform, Load) and the importance of data cleaning and integration. Additionally, it outlines applications of data transformation and measures of data similarity.

Uploaded by

oltofer9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

| MD Riyan Nazeer | 160921748036 | CSM3A |

Foundation of Data Science Bank- CIE II


UNIT - 3
Short Answer Questions
1 Define Data Mining and mention the applications of Data Mining.
A Data mining is the process of discovering relationships between elements of data and hidden patterns in
data that can be used to make informed decisions or predictions.

Some of the applications of data mining are:


• Marketing: Data mining can help identify customer segments and target marketing campaigns based
on their preferences and behavior.
• Finance: Data mining can help forecast future trends, analyze risks, and detect fraud or anomalies in
transactions.
• Healthcare: Data mining can help identify risk factors for diseases, develop personalized treatment
plans, and improve diagnosis and prognosis.
• Telecommunications: Data mining can help optimize network performance, reduce churn, and
enhance customer satisfaction
2 List out the various data types or attributes of Data mining.
A Data types or attributes are the properties or characteristics of data objects that can be used for data
analysis.
The six main types of data or attributes are:
• Nominal: Categorical data that has no order or hierarchy.
• Ordinal: Categorical data that can be ordered or ranked.
• Binary: Data that has only two possible values.
• Interval: Numerical data that has equal intervals, but no zero point.
• Ratio: Numerical data that has equal intervals and a zero point.
• Text: Unstructured data in the form of text.
Long Answer Questions
1 Explain Graph database.
A A graph database is a type of database that uses graph structures to store and query data. It is based on
graph theory, which is a branch of mathematics that studies the properties and patterns of graphs.
Graphs are composed of nodes and edges, where nodes represent entities and edges represent
relationships between them. For example, in a social network, nodes could be users and edges could be
friendships.

Graph databases are useful for data science because they can handle complex and dynamic data that
involve many-to-many relationships, such as social networks, recommendation systems, fraud
detection, knowledge graphs, etc. Graph databases allow data scientists to perform efficient and flexible
graph analytics, such as finding shortest paths, clustering, centrality, community detection, etc. Graph
databases also enable natural and intuitive data modeling, as the data is stored in the same way as it is
conceptualized.

Some examples of graph databases are Neo4j, Amazon Neptune, Microsoft Azure Cosmos DB, and
ArangoDB
2 Briefly explain types of graph databases
A There are two main types of graph databases based on their data model:
RDF graphs and property graphs .
• RDF graphs use the concept of a triple, which is a statement composed of three elements: subject-
predicate-object. For example, “Alice-knows-Bob” is a triple that represents a relationship between
two entities. RDF graphs are useful for data integration, as they can link data from different sources
using common vocabularies and standards.
• Property graphs use the concept of a node and an edge, where nodes represent entities and edges
represent relationships between them. Both nodes and edges can have properties, which are key-
value pairs that store additional information. For example, a node representing a person can have
properties such as name, age, gender, etc. Property graphs are useful for queries and analytics, as
they can perform complex operations on the graph structure and properties.

1
| MD Riyan Nazeer | 160921748036 | CSM3A |

Some examples of RDF graph databases are Apache Jena, Stardog, and GraphDB. Some examples of
property graph databases are Neo4j, TigerGraph, and AWS Neptune

3 How Online Analytical Processing (OLAP) is implemented in Data Mining?


A Online Analytical Processing (OLAP) is a computing software that allows users to extract and analyze
data from multiple perspectives, providing insights into complex business processes. It provides a
multidimensional view of data that enables faster and more efficient analysis.

OLAP is implemented in data mining by integrating online analytical processing with data mining
functionalities, such as classification, clustering, association, etc. This integration is called Online
Analytical Mining (OLAM) and it allows users to perform data mining on different subsets of data and at
different levels of abstraction. OLAM can achieve this by drilling, pivoting, filtering, dicing, and slicing on
a data cube and intermediate data mining outcomes.

Some of the benefits of OLAM are:


• It supports interactive and exploratory data analysis, as users can dynamically change the scope and
focus of their queries.
• It enhances the efficiency and effectiveness of data mining, as users can apply data mining
techniques on relevant and refined data.
• It facilitates the interpretation and evaluation of data mining results, as users can visualize and
compare them with the original data and other views.

Some of the challenges of OLAM are:


• It requires a large amount of storage and computation resources, as it involves building and
processing data cubes and data mining models.
• It demands a high level of integration and coordination between OLAP and data mining systems, as
they need to share data, metadata, and query languages.
• It poses a trade-off between the complexity and accuracy of data mining, as users need to balance
the granularity and dimensionality of data.
4 What are the other types of databases in addition to relational database, explain them in detail
A In addition to relational databases, which store data in tables and use SQL to query and manipulate data,
there are other types of databases that have different data models and features.
Some of the common types of databases are:
1. NoSQL databases: These are databases that do not follow the relational model and do not use SQL.
They are designed to handle large volumes of unstructured or semi-structured data, such as
documents, graphs, key-value pairs, etc. They are flexible, scalable, and performant for various
applications, such as web development, big data analytics, real-time processing, etc. Some
examples of NoSQL databases are MongoDB, Cassandra, Redis, and Neo4j.

2. Hierarchical databases: These are databases that store data in a tree-like structure, where each
record has one parent record and zero or more child records. They are useful for representing data
that has a natural hierarchy, such as organizational charts, file systems, etc. They are fast and
efficient for retrieving data, but they are rigid and difficult to modify. Some examples of hierarchical
databases are IMS, Windows Registry, and LDAP.

3. Network databases: These are databases that store data in a network-like structure, where each
record can have multiple parent and child records. They are useful for representing data that has
complex and many-to-many relationships, such as social networks, transportation networks, etc.
They are more flexible and powerful than hierarchical databases, but they are also more complex and
harder to maintain. Some examples of network databases are IDMS, RDM, and CODASYL.

4. Object-oriented databases: These are databases that store data as objects, which have attributes
and methods. They are useful for representing data that has complex structures and behaviors, such
as multimedia, computer-aided design, etc. They are compatible with object-oriented programming
languages, such as Java, C++, etc. They are more expressive and reusable than relational databases,
but they are also less standardized and less efficient for simple queries. Some examples of object-
oriented databases are ObjectDB, db4o, and Versant.

2
| MD Riyan Nazeer | 160921748036 | CSM3A |

5 Differentiate between OLAP & OLTP.


A
OLAP OLTP
Online Analytical Processing Online Transaction Processing
Used for complex data analysis and reporting Used for real-time processing of online
transactions
Handles large volumes of historical and Handles small volumes of current and operational
aggregated data data
Queries are ad-hoc, multidimensional, and long- Transactions are predefined, simple, and short-
running lived
Data is read-only and updated periodically Data is read-write and updated constantly
Performance is measured by response time and Performance is measured by throughput and
accuracy availability
Supports operations like drilling, slicing, dicing. Supports operations like insert, update, delete,
etc.
Examples: Data warehouse, data mart, data cube. Examples: ATM, online banking, online shopping.

6 Write briefly about relational & transactional database.


A A relational database is a type of database that stores data in tables, where each row represents a
record and each column represents an attribute. A relational database uses SQL (Structured Query
Language) to query and manipulate data, and enforces data integrity and consistency through
constraints and rules. A relational database is suitable for applications that require complex queries,
data analysis, and reporting

A transactional database is a type of database that supports online transaction processing (OLTP),
which is the real-time processing of online transactions, such as e-commerce sales, banking, insurance,
etc. A transactional database ensures data accuracy and reliability by following the ACID properties
(Atomicity, Consistency, Isolation, Durability), which guarantee that each transaction is complete, valid,
independent, and persistent. A transactional database can be a relational database or a NoSQL
database, depending on the data model and the application needs
UNIT - 4
Short Answer Questions
1 Write the values of Jaccard’s Index value , Cosine similarity.
A Jaccard's Index value and Cosine similarity are two measures of similarity between two sets or vectors of
data. They are calculated as follows:

|𝐱𝐱 ∩ 𝐲𝐲|
Jaccard's Index value: 𝑱𝑱(𝒙𝒙, 𝒚𝒚) = |𝐱𝐱
∪ 𝐲𝐲|

𝐱𝐱⋅𝐲𝐲
Cosine similarity: 𝐜𝐜𝐜𝐜𝐜𝐜(𝒙𝒙, 𝒚𝒚) =
|𝒙𝒙||𝒚𝒚|

these measures can be used to compare different data objects, such as documents, images, or clusters.
For example, Jaccard’s Index value can be used to measure the overlap between two sets of keywords or
tags, while Cosine similarity can be used to measure the angle between two vectors of word frequencies
or pixel values.
2 Define Data Visualization? How data visualization is implemented?
A Data visualization is the graphical representation of information and data using visual elements like
charts, graphs, maps, and other tools. It helps us to see and understand trends, patterns, and outliers in
data, and to communicate data insights effectively to others.

Data visualization can be implemented using various software tools, such as Tableau, Power BI, Google
Charts, D3.js, and more. These tools allow us to create different types of data visualizations, such as bar
charts, pie charts, scatter plots, line charts, heat maps, histograms, etc. Depending on the data and the
purpose, we can choose the most suitable type of visualization to display our data

3
| MD Riyan Nazeer | 160921748036 | CSM3A |

3 Explain ETL operations with examples.


A ETL stands for Extract, Transform, and Load, and it is a process used in data warehousing to extract data
from various sources, transform it into a format suitable for loading into a data warehouse, and then load
it into the warehouse.

• Extract: The first step of ETL is to extract raw data from different sources, such as databases, APIs,
files, or web pages.
For example, a data warehouse for an e-commerce company might extract data from the online
store, the inventory system, the payment gateway, and the customer feedback platform.

• Transform: The second step of ETL is to transform the extracted data according to specific
requirements, such as filtering, aggregating, joining, or converting.
For example, the e-commerce data warehouse might transform the data by removing duplicates,
calculating sales metrics, merging product and customer information, and converting currencies and
dates.

• Load: The final step of ETL is to load the transformed data into a target system, such as a data
warehouse, a data lake, or a database.
For example, the e-commerce data warehouse might load the data into a relational database with a
star schema, where each table represents a dimension or a fact

4 How do you measure the fitness of data set? What are data objects?
A Fitness of data set: The fitness of a data set is the degree to which it meets the requirements and
expectations of the data user for a specific purpose. It can be measured by using data quality metrics,
such as accuracy, completeness, consistency, timeliness, validity, duplication, and uniqueness.

For example, a data set that is accurate, complete, consistent, timely, valid, non-duplicated, and unique
can be considered fit for use for most data analysis tasks.

Data objects: Data objects are collections of one or more data points that create meaning as a whole.
They are the units of data that can be manipulated, stored, or exchanged by data systems. Data objects
can have different types, such as tables, arrays, pointers, records, files, sets, and scalar types.

For example, a data table is a data object that consists of rows and columns of data points, and it can be
queried, updated, or exported by a database system.

Long Answer Questions


1 Differentiate between Data Cleaning and Data Integration.
A Data Cleaning Data Integration
Data Cleaning is the process of identifying and Data Integration is the process of combining data
fixing incorrect data. It can be in incorrect format, from multiple sources into a single, unified view. It
duplicates, corrupt, inaccurate, incomplete, or can involve cleaning and transforming the data, as
irrelevant. well as resolving any inconsistencies or conflicts
that may exist between the different sources.
It facilitates the integration of diverse datasets by It involves cleaning and transforming the data, as
standardizing formats, resolving inconsistencies, well as resolving any inconsistencies or conflicts
and handling missing values that may exist between the different sources
Data Cleaning techniques include removing Data Integration techniques include extracting,
duplicates, irrelevant data, outliers, and errors, transforming, and loading (ETL) data, data
standardizing capitalization and data types, warehousing, data lake, data federation, data
handling missing values, and language virtualization, and data lineage.
translation.
Data Cleaning is essential for ensuring data Data Integration is essential for providing a
quality, accuracy, and consistency. It can improve comprehensive and holistic view of the data. It
the performance and reliability of data analysis can enable cross-domain analysis, data sharing,
and machine learning models. and collaboration.

4
| MD Riyan Nazeer | 160921748036 | CSM3A |

It lowers errors and raises the caliber of the data It enables analysts to perform comprehensive
analysis and derive insights from combined data
sources
It can be accomplished using a variety of data It can be implemented using various techniques,
mining approaches, such as clustering, outlier such as schema matching, entity resolution, or
detection, or data quality mining data fusion
2 List the applications of Data Transformation.
A Data Transformation is a technique used to transform raw data into a more appropriate format that
enables efficient data mining and model building.

Some of the applications of Data Transformation are:


1. Data Smoothing: This is used to remove noise in the data, and it helps inherent patterns to stand out.
Therefore, Data Smoothing can help in identifying trends, outliers, and anomalies in the data.
2. Attribute Construction: This is used to create new attributes from existing ones, which can help in
enhancing the features and improving the performance of the ML models.
3. Data Aggregation: This is used to summarize or group data into higher-level representations, which
can help in reducing the complexity and dimensionality of the data.
4. Data Normalization: This is used to scale the data values into a specific range, which can help in
removing bias and improving the comparability of the data.
5. Data Standardization: This is used to convert the data into a common format, which can help in
ensuring consistency and compatibility of the data.
6. Data Cleansing: This is used to identify and fix incorrect, incomplete, or inconsistent data, which
can help in improving the quality and accuracy of the data.
7. Data Integration: This is used to combine data from multiple sources into a single, unified view,
which can help in providing a comprehensive and holistic view of the data.
3 Briefly explain and write the data similarity coefficients of the following
I. Euclidean
II. Jaccard’s Index
III. Cosine Similarity
A Data similarity coefficients are numerical measures that quantify how similar two data objects are. They
are often used in data mining, machine learning, and information retrieval to compare data sets, clusters,
documents, or features.

Here are some brief explanations and formulas for the three data similarity coefficients you mentioned:

1. Euclidean similarity coefficient is based on the Euclidean distance between two data objects, which
is the length of the straight line connecting them. It is calculated as the square root of the sum of the
squared differences between the corresponding attributes of the two objects. The smaller the
Euclidean distance, the higher the similarity. The formula is:

𝟏𝟏 𝟏𝟏
𝒔𝒔𝒔𝒔𝒔𝒔(𝒙𝒙, 𝒚𝒚) = =
𝟏𝟏 + 𝒅𝒅(𝒙𝒙, 𝒚𝒚) 𝟏𝟏 + �∑𝒊𝒊=𝟏𝟏
𝒏𝒏 (𝒙𝒙
𝒊𝒊 − 𝒚𝒚𝒊𝒊 )
𝟐𝟐

Where x and y are two data objects with attributes, and d(x, y) is the Euclidean distance between
them.

2. Jaccard’s index is a similarity coefficient that measures the overlap between two data objects, which
are usually represented as sets. It is calculated as the ratio of the size of the intersection of the two
sets to the size of their union. The larger the intersection, the higher the similarity. The formula is:

|𝒙𝒙 ∩ 𝒚𝒚|
𝒔𝒔𝒔𝒔𝒔𝒔(𝒙𝒙, 𝒚𝒚) =
|𝒙𝒙 ∪ 𝒚𝒚|

Where x and y are two data sets, and |x| denotes the size of the set x.

5
| MD Riyan Nazeer | 160921748036 | CSM3A |

• Cosine similarity coefficient is based on the angle between two data objects, which are usually
represented as vectors. It is calculated as the dot product of the two vectors divided by the product of
their magnitudes. The smaller the angle, the higher the similarity. The formula is:

𝒙𝒙 ⋅ 𝒚𝒚 ∑𝒏𝒏𝒊𝒊=𝟏𝟏 𝒙𝒙𝒊𝒊 𝒚𝒚𝒊𝒊


𝒔𝒔𝒔𝒔𝒔𝒔(𝒙𝒙, 𝒚𝒚) = =
�|𝒙𝒙|� ⋅ �|𝒚𝒚|�
�∑𝒏𝒏𝒊𝒊=𝟏𝟏 𝒙𝒙𝟐𝟐𝒊𝒊 �∑𝒏𝒏𝒊𝒊=𝟏𝟏 𝒚𝒚𝟐𝟐𝒊𝒊

Where x and y are two data vectors with n attributes.


4 What is Data Discretization? Explain about Hierarchy generation.
A Data Discretization is a technique used to transform continuous or numerical data into discrete or
categorical data. It can help reduce the complexity and dimensionality of the data, as well as enhance
the features and performance of the data analysis and machine learning models.

Hierarchy generation is a process of creating a concept hierarchy for a given attribute or data set. A
concept hierarchy is a sequence of mappings from a set of low-level concepts to a set of high-level
concepts, based on some criteria of importance or abstraction
For example, a concept hierarchy for the attribute “city” can be generated by mapping it to higher-level
concepts such as “state”, “country”, and “continent”.

Data Discretization and Hierarchy generation are often used together to provide a hierarchical or multi-
resolution partitioning of the data values, which can enable mining at different levels of abstraction

Discretization and Hierarchy generation are:


1. Histogram analysis: This is a technique that divides the data values into equal-width or equal-
frequency intervals, based on the frequency distribution of the data.

2. Binning: This is a technique that groups the data values into smaller bins, based on some similarity
measure, such as mean, median, or boundary values.

3. Cluster analysis: This is a technique that partitions the data values into clusters, based on some
distance measure, such as Euclidean, Manhattan, or Jaccard.

4. Decision tree analysis: This is a technique that splits the data values into disjoint intervals, based on
some splitting criterion, such as entropy, information gain, or gini index.

5. Correlation analysis: This is a technique that merges the data values into overlapping intervals,
based on some correlation measure, such as linear regression, Pearson, or Spearman.
5 Explain some important data discretization techniques.
A Data Discretization is a technique used to transform continuous or numerical data into discrete or
categorical data. It can help reduce the complexity and dimensionality of the data, as well as enhance
the features and performance of the data analysis and machine learning models.

Some of the important data discretization techniques are:


1. Histogram analysis: This is a technique that divides the data values into equal-width or equal-
frequency intervals, based on the frequency distribution of the data.
2. Binning: This is a technique that groups the data values into smaller bins, based on some similarity
measure, such as mean, median, or boundary values.
3. Cluster analysis: This is a technique that partitions the data values into clusters, based on some
distance measure, such as Euclidean, Manhattan, or Jaccard.
4. Decision tree analysis: This is a technique that splits the data values into disjoint intervals, based on
some splitting criterion, such as entropy, information gain, or gini index.
5. Correlation analysis: This is a technique that merges the data values into overlapping intervals,
based on some correlation measure, such as linear regression, Pearson, or Spearman.

6
| MD Riyan Nazeer | 160921748036 | CSM3A |

UNIT - 5
Short Answer Questions
1 How R Variables , R- Data types are declared in R-Studio?
A In R, variables and data types are declared using the assignment operator <-

R has five main data types:


1. numeric which can be declared as x < - 3.14,
2. integer which can be declared as y < - 42L,
3. complex which can be declared as z <- 1 + 2i,
4. character which can be declared as w < - "Hello, world!",
5. logical which can be declared as v < - TRUE.

You can check the data type of an R object using functions such as type of, mode, storage. Mode, class,
and str.
2 Write a short note on vectors. Write the syntax for implementation of vectors in R.
A Vectors are a fundamental data structure in R. A vector is an ordered collection of values that are of the
same data type. R has five main data types: numeric, integer, complex, character, and logical.
To create a vector, you can use the c() function and separate the items by a comma.

Here’s an example of how to create a vector of strings:


fruits <- c("banana", "apple", "orange")

To create a vector of numerical values, you can use the : operator to create a sequence of numbers:
numbers <- 1:10

3 Write the implementation of any two arithmetic operations in R


A Two arithmetic operations in R:
1. Addition: The “+” operator is used to add two numbers.
For example, to add 2 and 3, you would write: 2 + 3
This will return the result 5.

2. Exponentiation: The “^” operator is used to raise a number to a power.


For example, to calculate 2 raised to the power of 3, you would write: 2 ^ 3
This will return the result 8.
Long Answer Questions
1 Write the installation steps of R Software & R Studio.
A Here are the steps to install R and RStudio:
1. Install R: To install R, follow these steps:
• Go to the R Project website.
• Click on the “download R” link in the middle of the page under “Getting Started.”
• Select a CRAN location (a mirror site) and click the corresponding link.
• Click on the “Download R for (Mac) OS X” link at the top of the page.
• Click on the file containing the latest version of R under “Files.”
• Save the .pkg file, double-click it to open, and follow the installation instructions.

2. Install RStudio: To install RStudio, follow these steps:


• Go to the RStudio download page.
• Under “Installers,” select the version of RStudio for your operating system.
• Click on the download link for your operating system.
• Install RStudio like you would any other application.
or
install R and RStudio using the command line.
1. Install R: To install R, follow these steps:
• Open a terminal window.
• Update the package’s cache by running the following command: sudo apt-get update

7
| MD Riyan Nazeer | 160921748036 | CSM3A |

• Install R by running the following command: sudo apt-get install r-base

2. Install RStudio: To install RStudio, follow these steps:

• Download the latest version of RStudio from their official website using the following command:
wget https://download1.rstudio.org/desktop/bionic/amd64/rstudio-1.4.1717-amd64.deb

• Install the downloaded package using the following command:


sudo dpkg -i rstudio-1.4.1717-amd64.deb

• If there are any missing dependencies, install them by running the following command:
sudo apt-get -f install
2 How can we implement different Matrix operations in R? Explain with examples.
A Matrices are a fundamental data structure in R. You can perform various operations on matrices, such as
addition, subtraction, multiplication, calculating the power, the rank, the determinant, the diagonal, the
eigenvalues and eigenvectors, the transpose and decomposing the matrix by different methods.

Here are some examples of matrix operations in R:

1. Addition: The `+` operator is used to add two matrices. For example, to add two matrices `A` and
`B`, you would write:
```
A <- matrix(c(10, 8, 5, 12), ncol = 2, byrow = TRUE)
B <- matrix(c(5, 3, 15, 6), ncol = 2, byrow = TRUE)
A+B
```

This will return the result:


```
[,1] [,2]
[1,] 15 11
[2,] 20 18
```

2. Multiplication: The `%*%` operator is used to multiply two matrices. For example, to multiply two
matrices `A` and `B`, you would write: A %*% B

This will return the result:


```
[,1] [,2]
[1,] 170 78
[2,] 205 87
```

3. Transpose: The `t()` function is used to find the transpose of a matrix. For example, to find the
transpose of a matrix `A`, you would write: t(A)

This will return the result:


```
[,1] [,2]
[1,] 10 5
[2,] 8 12
```

8
| MD Riyan Nazeer | 160921748036 | CSM3A |

4. Determinant: The `det()` function is used to find the determinant of a matrix. For example, to find the
determinant of a matrix `A`, you would write: det(A)

This will return the result `44`.

5. Rank: The `qr()` function is used to find the rank of a matrix. For example, to find the rank of a matrix
`A`, you would write: qr(A)$rank

This will return the result `2`.

3 Write a R code to illustrates the usage of all Logical Operators in R?


A # Logical Operators in R example
a <- 5
b <- 10
c <- TRUE
d <- FALSE

# AND operator
if (a > 3 & b < 15) {
print("Both conditions are true")
}
#output: "Both conditions are true"

# OR operator
if (a > 3 | b < 5) {
print("At least one condition is true")
}
#output: "At least one condition is true"

# NOT operator
if (!d) {
print("d is false")
}
#output: "d is false"

# XOR operator
if (c & !d | !c & d) {
print("One condition is true and the other is false")
}
#output: "One condition is true and the other is false"
4 Elaborate the concept of data frames in R using an example.
A A data frame is a two-dimensional data structure in R that stores data in tabular format. It is similar to a
matrix, but unlike a matrix, it can store different data types in each column. A data frame has rows and
columns, and each column can be a different vector. You can think of a data frame as a spreadsheet or a
SQL table.

# Create a data frame


df <- data.frame(
name = c("Riyan", "bilal", "faisal"),
age = c(21, 19, 20),
married = c(TRUE, FALSE, TRUE)
)

# Print the data frame


print(df)

9
| MD Riyan Nazeer | 160921748036 | CSM3A |

output:
name age married
1 Riyan 21 TRUE
2 Bilal 19 FALSE
3 faisal 20 TRUE
5 Show the implementation of functions in R. Explain with example
A • Functions are a fundamental concept in R programming.
• A function is a block of code that performs a specific task. It takes input, performs some operations
on the input, and returns output.
• Functions are used to break down complex problems into smaller, more manageable parts.

To write a function in R, we use the "function" keyword followed by a pair of parentheses, inside which we
specify the input arguments. The body of the function is contained within a pair of curly braces, and
within this body, we can perform any operations we want on the input arguments and return the desired
output.

Example of a simple function that takes a numeric vector as input and returns the sum of its
elements:
# Define the function
sum_vector <- function(x) {
# Calculate the sum of the vector elements
sum_x <- sum(x)
# Return the sum
return(sum_x)
}

# Call the function with a sample vector


v <- c(1, 2, 3, 4, 5)
sum_vector(v)

The function first calculates the sum of the elements in the input vector using the "sum()" function and
then returns this value using the "return()" function. When we call the function with a sample vector, it
will output the sum of the vector elements.

10

You might also like