0% found this document useful (0 votes)

6 views

FDS qb cie-2

The document discusses various concepts in data science, including data mining, types of databases, OLAP vs OLTP, and data visualization. It explains the processes of ETL (Extract, Transform, Load) and the importance of data cleaning and integration. Additionally, it outlines applications of data transformation and measures of data similarity.

Uploaded by

oltofer9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

FDS qb cie-2

Uploaded by

oltofer9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

| MD Riyan Nazeer | 160921748036 | CSM3A |

Foundation of Data Science Bank- CIE II

UNIT - 3
Short Answer Questions
1 Deﬁne Data Mining and mention the applications of Data Mining.
A Data mining is the process of discovering relationships between elements of data and hidden patterns in
data that can be used to make informed decisions or predictions.

Some of the applications of data mining are:

• Marketing: Data mining can help identify customer segments and target marketing campaigns based
on their preferences and behavior.
• Finance: Data mining can help forecast future trends, analyze risks, and detect fraud or anomalies in
transactions.
• Healthcare: Data mining can help identify risk factors for diseases, develop personalized treatment
plans, and improve diagnosis and prognosis.
• Telecommunications: Data mining can help optimize network performance, reduce churn, and
enhance customer satisfaction
2 List out the various data types or attributes of Data mining.
A Data types or attributes are the properties or characteristics of data objects that can be used for data
analysis.
The six main types of data or attributes are:
• Nominal: Categorical data that has no order or hierarchy.
• Ordinal: Categorical data that can be ordered or ranked.
• Binary: Data that has only two possible values.
• Interval: Numerical data that has equal intervals, but no zero point.
• Ratio: Numerical data that has equal intervals and a zero point.
• Text: Unstructured data in the form of text.
Long Answer Questions
1 Explain Graph database.
A A graph database is a type of database that uses graph structures to store and query data. It is based on
graph theory, which is a branch of mathematics that studies the properties and patterns of graphs.
Graphs are composed of nodes and edges, where nodes represent entities and edges represent
relationships between them. For example, in a social network, nodes could be users and edges could be
friendships.

Graph databases are useful for data science because they can handle complex and dynamic data that
involve many-to-many relationships, such as social networks, recommendation systems, fraud
detection, knowledge graphs, etc. Graph databases allow data scientists to perform efficient and ﬂexible
graph analytics, such as ﬁnding shortest paths, clustering, centrality, community detection, etc. Graph
databases also enable natural and intuitive data modeling, as the data is stored in the same way as it is
conceptualized.

Some examples of graph databases are Neo4j, Amazon Neptune, Microsoft Azure Cosmos DB, and
ArangoDB
2 Brieﬂy explain types of graph databases
A There are two main types of graph databases based on their data model:
RDF graphs and property graphs .
• RDF graphs use the concept of a triple, which is a statement composed of three elements: subject-
predicate-object. For example, “Alice-knows-Bob” is a triple that represents a relationship between
two entities. RDF graphs are useful for data integration, as they can link data from different sources
using common vocabularies and standards.
• Property graphs use the concept of a node and an edge, where nodes represent entities and edges
represent relationships between them. Both nodes and edges can have properties, which are key-
value pairs that store additional information. For example, a node representing a person can have
properties such as name, age, gender, etc. Property graphs are useful for queries and analytics, as
they can perform complex operations on the graph structure and properties.

1
| MD Riyan Nazeer | 160921748036 | CSM3A |

Some examples of RDF graph databases are Apache Jena, Stardog, and GraphDB. Some examples of
property graph databases are Neo4j, TigerGraph, and AWS Neptune

3 How Online Analytical Processing (OLAP) is implemented in Data Mining?

A Online Analytical Processing (OLAP) is a computing software that allows users to extract and analyze
data from multiple perspectives, providing insights into complex business processes. It provides a
multidimensional view of data that enables faster and more efficient analysis.

OLAP is implemented in data mining by integrating online analytical processing with data mining
functionalities, such as classiﬁcation, clustering, association, etc. This integration is called Online
Analytical Mining (OLAM) and it allows users to perform data mining on different subsets of data and at
different levels of abstraction. OLAM can achieve this by drilling, pivoting, ﬁltering, dicing, and slicing on
a data cube and intermediate data mining outcomes.

Some of the beneﬁts of OLAM are:

• It supports interactive and exploratory data analysis, as users can dynamically change the scope and
focus of their queries.
• It enhances the efficiency and effectiveness of data mining, as users can apply data mining
techniques on relevant and reﬁned data.
• It facilitates the interpretation and evaluation of data mining results, as users can visualize and
compare them with the original data and other views.

Some of the challenges of OLAM are:

• It requires a large amount of storage and computation resources, as it involves building and
processing data cubes and data mining models.
• It demands a high level of integration and coordination between OLAP and data mining systems, as
they need to share data, metadata, and query languages.
• It poses a trade-off between the complexity and accuracy of data mining, as users need to balance
the granularity and dimensionality of data.
4 What are the other types of databases in addition to relational database, explain them in detail
A In addition to relational databases, which store data in tables and use SQL to query and manipulate data,
there are other types of databases that have different data models and features.
Some of the common types of databases are:
1. NoSQL databases: These are databases that do not follow the relational model and do not use SQL.
They are designed to handle large volumes of unstructured or semi-structured data, such as
documents, graphs, key-value pairs, etc. They are ﬂexible, scalable, and performant for various
applications, such as web development, big data analytics, real-time processing, etc. Some
examples of NoSQL databases are MongoDB, Cassandra, Redis, and Neo4j.

2. Hierarchical databases: These are databases that store data in a tree-like structure, where each
record has one parent record and zero or more child records. They are useful for representing data
that has a natural hierarchy, such as organizational charts, ﬁle systems, etc. They are fast and
efficient for retrieving data, but they are rigid and difficult to modify. Some examples of hierarchical
databases are IMS, Windows Registry, and LDAP.

3. Network databases: These are databases that store data in a network-like structure, where each
record can have multiple parent and child records. They are useful for representing data that has
complex and many-to-many relationships, such as social networks, transportation networks, etc.
They are more ﬂexible and powerful than hierarchical databases, but they are also more complex and
harder to maintain. Some examples of network databases are IDMS, RDM, and CODASYL.

4. Object-oriented databases: These are databases that store data as objects, which have attributes
and methods. They are useful for representing data that has complex structures and behaviors, such
as multimedia, computer-aided design, etc. They are compatible with object-oriented programming
languages, such as Java, C++, etc. They are more expressive and reusable than relational databases,
but they are also less standardized and less efficient for simple queries. Some examples of object-
oriented databases are ObjectDB, db4o, and Versant.

2
| MD Riyan Nazeer | 160921748036 | CSM3A |

5 Differentiate between OLAP & OLTP.

A
OLAP OLTP
Online Analytical Processing Online Transaction Processing
Used for complex data analysis and reporting Used for real-time processing of online
transactions
Handles large volumes of historical and Handles small volumes of current and operational
aggregated data data
Queries are ad-hoc, multidimensional, and long- Transactions are predeﬁned, simple, and short-
running lived
Data is read-only and updated periodically Data is read-write and updated constantly
Performance is measured by response time and Performance is measured by throughput and
accuracy availability
Supports operations like drilling, slicing, dicing. Supports operations like insert, update, delete,
etc.
Examples: Data warehouse, data mart, data cube. Examples: ATM, online banking, online shopping.

6 Write brieﬂy about relational & transactional database.

A A relational database is a type of database that stores data in tables, where each row represents a
record and each column represents an attribute. A relational database uses SQL (Structured Query
Language) to query and manipulate data, and enforces data integrity and consistency through
constraints and rules. A relational database is suitable for applications that require complex queries,
data analysis, and reporting

A transactional database is a type of database that supports online transaction processing (OLTP),
which is the real-time processing of online transactions, such as e-commerce sales, banking, insurance,
etc. A transactional database ensures data accuracy and reliability by following the ACID properties
(Atomicity, Consistency, Isolation, Durability), which guarantee that each transaction is complete, valid,
independent, and persistent. A transactional database can be a relational database or a NoSQL
database, depending on the data model and the application needs
UNIT - 4
Short Answer Questions
1 Write the values of Jaccard’s Index value , Cosine similarity.
A Jaccard's Index value and Cosine similarity are two measures of similarity between two sets or vectors of
data. They are calculated as follows:

|𝐱𝐱 ∩ 𝐲𝐲|
Jaccard's Index value: 𝑱𝑱(𝒙𝒙, 𝒚𝒚) = |𝐱𝐱
∪ 𝐲𝐲|

𝐱𝐱⋅𝐲𝐲
Cosine similarity: 𝐜𝐜𝐜𝐜𝐜𝐜(𝒙𝒙, 𝒚𝒚) =
|𝒙𝒙||𝒚𝒚|

these measures can be used to compare different data objects, such as documents, images, or clusters.
For example, Jaccard’s Index value can be used to measure the overlap between two sets of keywords or
tags, while Cosine similarity can be used to measure the angle between two vectors of word frequencies
or pixel values.
2 Deﬁne Data Visualization? How data visualization is implemented?
A Data visualization is the graphical representation of information and data using visual elements like
charts, graphs, maps, and other tools. It helps us to see and understand trends, patterns, and outliers in
data, and to communicate data insights effectively to others.

Data visualization can be implemented using various software tools, such as Tableau, Power BI, Google
Charts, D3.js, and more. These tools allow us to create different types of data visualizations, such as bar
charts, pie charts, scatter plots, line charts, heat maps, histograms, etc. Depending on the data and the
purpose, we can choose the most suitable type of visualization to display our data

3
| MD Riyan Nazeer | 160921748036 | CSM3A |

3 Explain ETL operations with examples.

A ETL stands for Extract, Transform, and Load, and it is a process used in data warehousing to extract data
from various sources, transform it into a format suitable for loading into a data warehouse, and then load
it into the warehouse.

• Extract: The ﬁrst step of ETL is to extract raw data from different sources, such as databases, APIs,
ﬁles, or web pages.
For example, a data warehouse for an e-commerce company might extract data from the online
store, the inventory system, the payment gateway, and the customer feedback platform.

• Transform: The second step of ETL is to transform the extracted data according to speciﬁc
requirements, such as ﬁltering, aggregating, joining, or converting.
For example, the e-commerce data warehouse might transform the data by removing duplicates,
calculating sales metrics, merging product and customer information, and converting currencies and
dates.

• Load: The ﬁnal step of ETL is to load the transformed data into a target system, such as a data
warehouse, a data lake, or a database.
For example, the e-commerce data warehouse might load the data into a relational database with a
star schema, where each table represents a dimension or a fact

4 How do you measure the fitness of data set? What are data objects?
A Fitness of data set: The fitness of a data set is the degree to which it meets the requirements and
expectations of the data user for a specific purpose. It can be measured by using data quality metrics,
such as accuracy, completeness, consistency, timeliness, validity, duplication, and uniqueness.

For example, a data set that is accurate, complete, consistent, timely, valid, non-duplicated, and unique
can be considered ﬁt for use for most data analysis tasks.

Data objects: Data objects are collections of one or more data points that create meaning as a whole.
They are the units of data that can be manipulated, stored, or exchanged by data systems. Data objects
can have different types, such as tables, arrays, pointers, records, ﬁles, sets, and scalar types.

For example, a data table is a data object that consists of rows and columns of data points, and it can be
queried, updated, or exported by a database system.

Long Answer Questions

1 Differentiate between Data Cleaning and Data Integration.
A Data Cleaning Data Integration
Data Cleaning is the process of identifying and Data Integration is the process of combining data
fixing incorrect data. It can be in incorrect format, from multiple sources into a single, unified view. It
duplicates, corrupt, inaccurate, incomplete, or can involve cleaning and transforming the data, as
irrelevant. well as resolving any inconsistencies or conflicts
that may exist between the different sources.
It facilitates the integration of diverse datasets by It involves cleaning and transforming the data, as
standardizing formats, resolving inconsistencies, well as resolving any inconsistencies or conflicts
and handling missing values that may exist between the different sources
Data Cleaning techniques include removing Data Integration techniques include extracting,
duplicates, irrelevant data, outliers, and errors, transforming, and loading (ETL) data, data
standardizing capitalization and data types, warehousing, data lake, data federation, data
handling missing values, and language virtualization, and data lineage.
translation.
Data Cleaning is essential for ensuring data Data Integration is essential for providing a
quality, accuracy, and consistency. It can improve comprehensive and holistic view of the data. It
the performance and reliability of data analysis can enable cross-domain analysis, data sharing,
and machine learning models. and collaboration.

4
| MD Riyan Nazeer | 160921748036 | CSM3A |

It lowers errors and raises the caliber of the data It enables analysts to perform comprehensive
analysis and derive insights from combined data
sources
It can be accomplished using a variety of data It can be implemented using various techniques,
mining approaches, such as clustering, outlier such as schema matching, entity resolution, or
detection, or data quality mining data fusion
2 List the applications of Data Transformation.
A Data Transformation is a technique used to transform raw data into a more appropriate format that
enables efficient data mining and model building.

Some of the applications of Data Transformation are:

1. Data Smoothing: This is used to remove noise in the data, and it helps inherent patterns to stand out.
Therefore, Data Smoothing can help in identifying trends, outliers, and anomalies in the data.
2. Attribute Construction: This is used to create new attributes from existing ones, which can help in
enhancing the features and improving the performance of the ML models.
3. Data Aggregation: This is used to summarize or group data into higher-level representations, which
can help in reducing the complexity and dimensionality of the data.
4. Data Normalization: This is used to scale the data values into a specific range, which can help in
removing bias and improving the comparability of the data.
5. Data Standardization: This is used to convert the data into a common format, which can help in
ensuring consistency and compatibility of the data.
6. Data Cleansing: This is used to identify and fix incorrect, incomplete, or inconsistent data, which
can help in improving the quality and accuracy of the data.
7. Data Integration: This is used to combine data from multiple sources into a single, unified view,
which can help in providing a comprehensive and holistic view of the data.
3 Briefly explain and write the data similarity coefficients of the following
I. Euclidean
II. Jaccard’s Index
III. Cosine Similarity
A Data similarity coefficients are numerical measures that quantify how similar two data objects are. They
are often used in data mining, machine learning, and information retrieval to compare data sets, clusters,
documents, or features.

Here are some brief explanations and formulas for the three data similarity coefficients you mentioned:

1. Euclidean similarity coefficient is based on the Euclidean distance between two data objects, which
is the length of the straight line connecting them. It is calculated as the square root of the sum of the
squared differences between the corresponding attributes of the two objects. The smaller the
Euclidean distance, the higher the similarity. The formula is:

𝟏𝟏 𝟏𝟏
𝒔𝒔𝒔𝒔𝒔𝒔(𝒙𝒙, 𝒚𝒚) = =
𝟏𝟏 + 𝒅𝒅(𝒙𝒙, 𝒚𝒚) 𝟏𝟏 + �∑𝒊𝒊=𝟏𝟏
𝒏𝒏 (𝒙𝒙
𝒊𝒊 − 𝒚𝒚𝒊𝒊 )
𝟐𝟐

Where x and y are two data objects with attributes, and d(x, y) is the Euclidean distance between
them.

2. Jaccard’s index is a similarity coefficient that measures the overlap between two data objects, which
are usually represented as sets. It is calculated as the ratio of the size of the intersection of the two
sets to the size of their union. The larger the intersection, the higher the similarity. The formula is:

|𝒙𝒙 ∩ 𝒚𝒚|
𝒔𝒔𝒔𝒔𝒔𝒔(𝒙𝒙, 𝒚𝒚) =
|𝒙𝒙 ∪ 𝒚𝒚|

Where x and y are two data sets, and |x| denotes the size of the set x.

5
| MD Riyan Nazeer | 160921748036 | CSM3A |

• Cosine similarity coefficient is based on the angle between two data objects, which are usually
represented as vectors. It is calculated as the dot product of the two vectors divided by the product of
their magnitudes. The smaller the angle, the higher the similarity. The formula is:

𝒙𝒙 ⋅ 𝒚𝒚 ∑𝒏𝒏𝒊𝒊=𝟏𝟏 𝒙𝒙𝒊𝒊 𝒚𝒚𝒊𝒊

𝒔𝒔𝒔𝒔𝒔𝒔(𝒙𝒙, 𝒚𝒚) = =
�|𝒙𝒙|� ⋅ �|𝒚𝒚|�
�∑𝒏𝒏𝒊𝒊=𝟏𝟏 𝒙𝒙𝟐𝟐𝒊𝒊 �∑𝒏𝒏𝒊𝒊=𝟏𝟏 𝒚𝒚𝟐𝟐𝒊𝒊

Where x and y are two data vectors with n attributes.

4 What is Data Discretization? Explain about Hierarchy generation.
A Data Discretization is a technique used to transform continuous or numerical data into discrete or
categorical data. It can help reduce the complexity and dimensionality of the data, as well as enhance
the features and performance of the data analysis and machine learning models.

Hierarchy generation is a process of creating a concept hierarchy for a given attribute or data set. A
concept hierarchy is a sequence of mappings from a set of low-level concepts to a set of high-level
concepts, based on some criteria of importance or abstraction
For example, a concept hierarchy for the attribute “city” can be generated by mapping it to higher-level
concepts such as “state”, “country”, and “continent”.

Data Discretization and Hierarchy generation are often used together to provide a hierarchical or multi-
resolution partitioning of the data values, which can enable mining at different levels of abstraction

Discretization and Hierarchy generation are:

1. Histogram analysis: This is a technique that divides the data values into equal-width or equal-
frequency intervals, based on the frequency distribution of the data.

2. Binning: This is a technique that groups the data values into smaller bins, based on some similarity
measure, such as mean, median, or boundary values.

3. Cluster analysis: This is a technique that partitions the data values into clusters, based on some
distance measure, such as Euclidean, Manhattan, or Jaccard.

4. Decision tree analysis: This is a technique that splits the data values into disjoint intervals, based on
some splitting criterion, such as entropy, information gain, or gini index.

5. Correlation analysis: This is a technique that merges the data values into overlapping intervals,
based on some correlation measure, such as linear regression, Pearson, or Spearman.
5 Explain some important data discretization techniques.
A Data Discretization is a technique used to transform continuous or numerical data into discrete or
categorical data. It can help reduce the complexity and dimensionality of the data, as well as enhance
the features and performance of the data analysis and machine learning models.

Some of the important data discretization techniques are:

1. Histogram analysis: This is a technique that divides the data values into equal-width or equal-
frequency intervals, based on the frequency distribution of the data.
2. Binning: This is a technique that groups the data values into smaller bins, based on some similarity
measure, such as mean, median, or boundary values.
3. Cluster analysis: This is a technique that partitions the data values into clusters, based on some
distance measure, such as Euclidean, Manhattan, or Jaccard.
4. Decision tree analysis: This is a technique that splits the data values into disjoint intervals, based on
some splitting criterion, such as entropy, information gain, or gini index.
5. Correlation analysis: This is a technique that merges the data values into overlapping intervals,
based on some correlation measure, such as linear regression, Pearson, or Spearman.

6
| MD Riyan Nazeer | 160921748036 | CSM3A |

UNIT - 5
Short Answer Questions
1 How R Variables , R- Data types are declared in R-Studio?
A In R, variables and data types are declared using the assignment operator <-

R has ﬁve main data types:

1. numeric which can be declared as x < - 3.14,
2. integer which can be declared as y < - 42L,
3. complex which can be declared as z <- 1 + 2i,
4. character which can be declared as w < - "Hello, world!",
5. logical which can be declared as v < - TRUE.

You can check the data type of an R object using functions such as type of, mode, storage. Mode, class,
and str.
2 Write a short note on vectors. Write the syntax for implementation of vectors in R.
A Vectors are a fundamental data structure in R. A vector is an ordered collection of values that are of the
same data type. R has ﬁve main data types: numeric, integer, complex, character, and logical.
To create a vector, you can use the c() function and separate the items by a comma.

Here’s an example of how to create a vector of strings:

fruits <- c("banana", "apple", "orange")

To create a vector of numerical values, you can use the : operator to create a sequence of numbers:
numbers <- 1:10

3 Write the implementation of any two arithmetic operations in R

A Two arithmetic operations in R:
1. Addition: The “+” operator is used to add two numbers.
For example, to add 2 and 3, you would write: 2 + 3
This will return the result 5.

2. Exponentiation: The “^” operator is used to raise a number to a power.

For example, to calculate 2 raised to the power of 3, you would write: 2 ^ 3
This will return the result 8.
Long Answer Questions
1 Write the installation steps of R Software & R Studio.
A Here are the steps to install R and RStudio:
1. Install R: To install R, follow these steps:
• Go to the R Project website.
• Click on the “download R” link in the middle of the page under “Getting Started.”
• Select a CRAN location (a mirror site) and click the corresponding link.
• Click on the “Download R for (Mac) OS X” link at the top of the page.
• Click on the ﬁle containing the latest version of R under “Files.”
• Save the .pkg ﬁle, double-click it to open, and follow the installation instructions.

2. Install RStudio: To install RStudio, follow these steps:

• Go to the RStudio download page.
• Under “Installers,” select the version of RStudio for your operating system.
• Click on the download link for your operating system.
• Install RStudio like you would any other application.
or
install R and RStudio using the command line.
1. Install R: To install R, follow these steps:
• Open a terminal window.
• Update the package’s cache by running the following command: sudo apt-get update

7
| MD Riyan Nazeer | 160921748036 | CSM3A |

• Install R by running the following command: sudo apt-get install r-base

2. Install RStudio: To install RStudio, follow these steps:

• Download the latest version of RStudio from their official website using the following command:
wget https://download1.rstudio.org/desktop/bionic/amd64/rstudio-1.4.1717-amd64.deb

• Install the downloaded package using the following command:

sudo dpkg -i rstudio-1.4.1717-amd64.deb

• If there are any missing dependencies, install them by running the following command:
sudo apt-get -f install
2 How can we implement different Matrix operations in R? Explain with examples.
A Matrices are a fundamental data structure in R. You can perform various operations on matrices, such as
addition, subtraction, multiplication, calculating the power, the rank, the determinant, the diagonal, the
eigenvalues and eigenvectors, the transpose and decomposing the matrix by different methods.

Here are some examples of matrix operations in R:

1. Addition: The `+` operator is used to add two matrices. For example, to add two matrices `A` and
`B`, you would write:
```
A <- matrix(c(10, 8, 5, 12), ncol = 2, byrow = TRUE)
B <- matrix(c(5, 3, 15, 6), ncol = 2, byrow = TRUE)
A+B
```

This will return the result:

```
[,1] [,2]
[1,] 15 11
[2,] 20 18
```

2. Multiplication: The `%*%` operator is used to multiply two matrices. For example, to multiply two
matrices `A` and `B`, you would write: A %*% B

This will return the result:

```
[,1] [,2]
[1,] 170 78
[2,] 205 87
```

3. Transpose: The `t()` function is used to find the transpose of a matrix. For example, to find the
transpose of a matrix À`, you would write: t(A)

This will return the result:

```
[,1] [,2]
[1,] 10 5
[2,] 8 12
```

8
| MD Riyan Nazeer | 160921748036 | CSM3A |

4. Determinant: The `det()` function is used to find the determinant of a matrix. For example, to find the
determinant of a matrix À`, you would write: det(A)

This will return the result `44`.

5. Rank: The `qr()` function is used to find the rank of a matrix. For example, to find the rank of a matrix
À`, you would write: qr(A)$rank

This will return the result `2`.

3 Write a R code to illustrates the usage of all Logical Operators in R?

A # Logical Operators in R example
a <- 5
b <- 10
c <- TRUE
d <- FALSE

# AND operator
if (a > 3 & b < 15) {
print("Both conditions are true")
}
#output: "Both conditions are true"

# OR operator
if (a > 3 | b < 5) {
print("At least one condition is true")
}
#output: "At least one condition is true"

# NOT operator
if (!d) {
print("d is false")
}
#output: "d is false"

# XOR operator
if (c & !d | !c & d) {
print("One condition is true and the other is false")
}
#output: "One condition is true and the other is false"
4 Elaborate the concept of data frames in R using an example.
A A data frame is a two-dimensional data structure in R that stores data in tabular format. It is similar to a
matrix, but unlike a matrix, it can store different data types in each column. A data frame has rows and
columns, and each column can be a different vector. You can think of a data frame as a spreadsheet or a
SQL table.

# Create a data frame

df <- data.frame(
name = c("Riyan", "bilal", "faisal"),
age = c(21, 19, 20),
married = c(TRUE, FALSE, TRUE)
)

# Print the data frame

print(df)

9
| MD Riyan Nazeer | 160921748036 | CSM3A |

output:
name age married
1 Riyan 21 TRUE
2 Bilal 19 FALSE
3 faisal 20 TRUE
5 Show the implementation of functions in R. Explain with example
A • Functions are a fundamental concept in R programming.
• A function is a block of code that performs a speciﬁc task. It takes input, performs some operations
on the input, and returns output.
• Functions are used to break down complex problems into smaller, more manageable parts.

To write a function in R, we use the "function" keyword followed by a pair of parentheses, inside which we
specify the input arguments. The body of the function is contained within a pair of curly braces, and
within this body, we can perform any operations we want on the input arguments and return the desired
output.

Example of a simple function that takes a numeric vector as input and returns the sum of its
elements:
# Deﬁne the function
sum_vector <- function(x) {
# Calculate the sum of the vector elements
sum_x <- sum(x)
# Return the sum
return(sum_x)
}

# Call the function with a sample vector

v <- c(1, 2, 3, 4, 5)
sum_vector(v)

The function ﬁrst calculates the sum of the elements in the input vector using the "sum()" function and
then returns this value using the "return()" function. When we call the function with a sample vector, it
will output the sum of the vector elements.

Get MIS 10 - Management Information Systems 10th Edition Hossein Bidgoli free all chapters
50% (2)
Get MIS 10 - Management Information Systems 10th Edition Hossein Bidgoli free all chapters
47 pages
OmniDocs 10.1 Service Administration Guide
0% (1)
OmniDocs 10.1 Service Administration Guide
153 pages
ARIES Software Datasheet A4 PDF
No ratings yet
ARIES Software Datasheet A4 PDF
8 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Assignment 1
100% (1)
Assignment 1
19 pages
Assignment (Data Models of DBMS)
No ratings yet
Assignment (Data Models of DBMS)
5 pages
BDA Assignment1 BE6 20
No ratings yet
BDA Assignment1 BE6 20
10 pages
Unit II No-SQL Db Managment
No ratings yet
Unit II No-SQL Db Managment
33 pages
Case Study On Different Nosql Data Models
No ratings yet
Case Study On Different Nosql Data Models
6 pages
DBMS TYPES NOTES (BSCS)
No ratings yet
DBMS TYPES NOTES (BSCS)
8 pages
SQL
No ratings yet
SQL
35 pages
NOSQL Concept 2
No ratings yet
NOSQL Concept 2
4 pages
App Dev Finals
No ratings yet
App Dev Finals
7 pages
intro database
No ratings yet
intro database
7 pages
Hierarchical Database. Hierarchical Database Is Very Similar To Your Folder Structure On The Laptop. Every Folder Can
No ratings yet
Hierarchical Database. Hierarchical Database Is Very Similar To Your Folder Structure On The Laptop. Every Folder Can
3 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
Unit 2 Handouts
No ratings yet
Unit 2 Handouts
11 pages
Introduction To Nosql: - Key Value Databases
No ratings yet
Introduction To Nosql: - Key Value Databases
14 pages
NOSQL
No ratings yet
NOSQL
15 pages
BIG DATA UNIT-II NOTES
No ratings yet
BIG DATA UNIT-II NOTES
7 pages
Tutorial Answre Unit 1
No ratings yet
Tutorial Answre Unit 1
35 pages
Database
No ratings yet
Database
12 pages
Types of Data Base
100% (1)
Types of Data Base
10 pages
BDA
No ratings yet
BDA
65 pages
No SQL
No ratings yet
No SQL
10 pages
nosql-technology (1)
No ratings yet
nosql-technology (1)
8 pages
chap 4
No ratings yet
chap 4
18 pages
No SQL
No ratings yet
No SQL
38 pages
Unit III
No ratings yet
Unit III
12 pages
Data 1
No ratings yet
Data 1
4 pages
Week 5 part-II Data Models.pptx
No ratings yet
Week 5 part-II Data Models.pptx
32 pages
Unit 2
No ratings yet
Unit 2
65 pages
Untitled document.
No ratings yet
Untitled document.
7 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
CH.5 NOSQL database for Business Applications
No ratings yet
CH.5 NOSQL database for Business Applications
21 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Unit 6
No ratings yet
Unit 6
143 pages
compassignment
No ratings yet
compassignment
15 pages
Presentation1 DP900
No ratings yet
Presentation1 DP900
34 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
Chapter1-Overview of Database Concepts
No ratings yet
Chapter1-Overview of Database Concepts
19 pages
Bda CHP 3
No ratings yet
Bda CHP 3
75 pages
Features of Nosql: Non-Relational
No ratings yet
Features of Nosql: Non-Relational
7 pages
Introduction To Nosql: What Is A Nosql Database Used For?
No ratings yet
Introduction To Nosql: What Is A Nosql Database Used For?
6 pages
no sql.pptx
No ratings yet
no sql.pptx
12 pages
ADBMS original-output
No ratings yet
ADBMS original-output
28 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Bda Notes (Unit-2)
No ratings yet
Bda Notes (Unit-2)
26 pages
database types
No ratings yet
database types
9 pages
Database ppt
No ratings yet
Database ppt
14 pages
10gen Top 5 NoSQL Considerations
No ratings yet
10gen Top 5 NoSQL Considerations
10 pages
MongoDB Slides Until ClassTest
No ratings yet
MongoDB Slides Until ClassTest
221 pages
BDA - M 3 - NoSQL
No ratings yet
BDA - M 3 - NoSQL
81 pages
Topic Beyond Syllabus DBMS
No ratings yet
Topic Beyond Syllabus DBMS
12 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Unit 5_230601_174540-1
No ratings yet
Unit 5_230601_174540-1
14 pages
Range BIDB
No ratings yet
Range BIDB
10 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
47 pages
Chapter 7-1
No ratings yet
Chapter 7-1
4 pages
WK 3
No ratings yet
WK 3
29 pages
AWS1-1
No ratings yet
AWS1-1
38 pages
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Assignment 1: ERD and SQL: CRICOS Provider No. 00103D ITECH2004 Assignment1 ER-SQL 2020
No ratings yet
Assignment 1: ERD and SQL: CRICOS Provider No. 00103D ITECH2004 Assignment1 ER-SQL 2020
5 pages
Database Management System (DBMS) Basic Concepts STRUCTURE of DBMS
100% (1)
Database Management System (DBMS) Basic Concepts STRUCTURE of DBMS
7 pages
Fablab Inventory Thesis Edited Version
No ratings yet
Fablab Inventory Thesis Edited Version
57 pages
Dbms U2 One Shot Bcs501
No ratings yet
Dbms U2 One Shot Bcs501
71 pages
Report On Bank Management System
No ratings yet
Report On Bank Management System
25 pages
Day-1 Az-900
No ratings yet
Day-1 Az-900
5 pages
SD Tutorial UseCase (Students) Dadasd
No ratings yet
SD Tutorial UseCase (Students) Dadasd
3 pages
W.M.N Maheshika Wijesinghe Application Development Assigment Unit 30 Reg11179 Peoson No PDF
No ratings yet
W.M.N Maheshika Wijesinghe Application Development Assigment Unit 30 Reg11179 Peoson No PDF
89 pages
Oracle 12c DBA Handson - 1st Half
100% (1)
Oracle 12c DBA Handson - 1st Half
51 pages
Case Study British Airways
0% (1)
Case Study British Airways
18 pages
Mansur Seminar Presentation
No ratings yet
Mansur Seminar Presentation
14 pages
DBMS Handwritten Notes Q1j2as
100% (1)
DBMS Handwritten Notes Q1j2as
56 pages
DB2 Advisor: An Optimizer Smart Enough To Recommend Its Own Indexes
No ratings yet
DB2 Advisor: An Optimizer Smart Enough To Recommend Its Own Indexes
10 pages
Training Basic v1.2 CM6.76 (En)
No ratings yet
Training Basic v1.2 CM6.76 (En)
155 pages
Python Program Question Unitwise
No ratings yet
Python Program Question Unitwise
8 pages
Rezayat_RCL_MM_SCOPE
No ratings yet
Rezayat_RCL_MM_SCOPE
153 pages
isc2.prep4sure.ccsp.free.pdf.2022-apr-22.by.rod.193q.vce
No ratings yet
isc2.prep4sure.ccsp.free.pdf.2022-apr-22.by.rod.193q.vce
15 pages
SB_Set_1 (1)
No ratings yet
SB_Set_1 (1)
22 pages
PDS Bentley-Substation LTR en LR
No ratings yet
PDS Bentley-Substation LTR en LR
2 pages
2015 - April Meeting - DLP - Lokesh Yamasani PDF
No ratings yet
2015 - April Meeting - DLP - Lokesh Yamasani PDF
30 pages
CRM Requirements Specification td2015-004 td2015-005 PDF
100% (1)
CRM Requirements Specification td2015-004 td2015-005 PDF
24 pages
DBMS MCQ's
100% (3)
DBMS MCQ's
6 pages
Metro Rail Management Online
No ratings yet
Metro Rail Management Online
97 pages
CCTNS TN RFP Vol I Police NCRB
No ratings yet
CCTNS TN RFP Vol I Police NCRB
302 pages
Entity Entity Set Entity Type
No ratings yet
Entity Entity Set Entity Type
7 pages
Assignment 2 Guidance: Task 1 - Develop The Database System (P2 - P3)
No ratings yet
Assignment 2 Guidance: Task 1 - Develop The Database System (P2 - P3)
2 pages
RCI SAP Implementation
No ratings yet
RCI SAP Implementation
14 pages

FDS qb cie-2

Uploaded by

FDS qb cie-2

Uploaded by

| MD Riyan Nazeer | 160921748036 | CSM3A |

Foundation of Data Science Bank- CIE II

Some of the applications of data mining are:

3 How Online Analytical Processing (OLAP) is implemented in Data Mining?

Some of the beneﬁts of OLAM are:

Some of the challenges of OLAM are:

5 Differentiate between OLAP & OLTP.

6 Write brieﬂy about relational & transactional database.

3 Explain ETL operations with examples.

Long Answer Questions

Some of the applications of Data Transformation are:

𝒙𝒙 ⋅ 𝒚𝒚 ∑𝒏𝒏𝒊𝒊=𝟏𝟏 𝒙𝒙𝒊𝒊 𝒚𝒚𝒊𝒊

Where x and y are two data vectors with n attributes.

Discretization and Hierarchy generation are:

Some of the important data discretization techniques are:

R has ﬁve main data types:

Here’s an example of how to create a vector of strings:

3 Write the implementation of any two arithmetic operations in R

2. Exponentiation: The “^” operator is used to raise a number to a power.

2. Install RStudio: To install RStudio, follow these steps:

• Install R by running the following command: sudo apt-get install r-base

2. Install RStudio: To install RStudio, follow these steps:

• Install the downloaded package using the following command:

Here are some examples of matrix operations in R:

This will return the result:

This will return the result:

This will return the result:

This will return the result `44`.

This will return the result `2`.

3 Write a R code to illustrates the usage of all Logical Operators in R?

# Create a data frame

# Print the data frame

# Call the function with a sample vector

You might also like