UNIT 05 Data Science PDF

Reproducible research in data science emphasizes the importance of making data, code, and analysis methods publicly accessible for verification and collaboration, thereby enhancing transparency and trust in scientific findings. Key tools for achieving reproducibility include version control systems like Git, dynamic document generation with R Markdown, and containerization with Docker. R Markdown is particularly useful for documenting analyses, allowing for easy sharing and conversion into various formats, while tools like knitr facilitate the integration of R code into reports.

Uploaded by

faseeha1812

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views6 pages

UNIT 05 Data Science PDF

Uploaded by

faseeha1812

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Reproducible research in data science involves making research findings, including

data analysis and code, publicly accessible so others can verify and build upon
them. This ensures transparency and allows for scrutiny of the process and results,
promoting trust in scientific claims.
Key aspects of reproducible research in data science:
Transparency:
Sharing data, code, and analysis methods allows others to understand how results
were obtained.
Verification:
Others can attempt to reproduce the analysis using the provided materials to
confirm the original findings.
Building upon existing work:
A reproducible research workflow allows researchers to easily use and extend the
work of others, accelerating scientific progress.
Trust in science:
Reproducibility underpins trust in science by enabling others to verify the results
and identify potential errors or biases.
Increased accuracy:
Reproducible research increases the likelihood that the research is correct and
reliable, as it allows for more rigorous scrutiny.
Tools and techniques for reproducible research:
Version control:
Using tools like Git to track changes to code and data throughout the project,
allowing for easy rollback to previous versions.
Dynamic document generation:
Using tools like R Markdown to combine code, data, and plain language
explanations into a single document that can be easily executed and updated.
Containerization:
Using Docker to package the environment, including software and dependencies,
so that the code can be run consistently across different systems.
Data management and sharing:
Using tools and platforms to store and share data securely and efficiently, while
adhering to ethical and legal guidelines.
Workflow management:
Using tools and platforms to manage the different steps in the analysis pipeline,
from data collection to model training and evaluation.

Why is reproducible research important in data science?

Complex data analyses:
Data science often involves complex data analyses and models, making
reproducibility essential for verifying the results and identifying potential errors.
Sharing and collaboration:
Data science research is often collaborative, and reproducibility facilitates the
sharing and reuse of data, code, and analysis methods.
Building trust:
Reproducibility builds trust in data science research by allowing others to verify
the results and identify potential biases.
Improving scientific progress:
Reproducibility accelerates scientific progress by allowing researchers to build
upon the work of others and to quickly identify and correct errors.

Tools behind reporting modern data analyses in reproducible research.

Reproducible research in data science relies on a combination of tools and
practices, including programming languages (like R or Python), version control
systems (like Git), and platforms for collaborative coding and sharing (like
GitHub). Tools like Jupyter notebooks and R Markdown allow for combining
code, data, and results in a dynamic format, while platforms like BinderHub
facilitate sharing of entire computing environments.
Here's a more detailed breakdown:
Programming Languages:
 R and Python: These are the dominant languages in data science and offer
extensive packages for statistical analysis, data manipulation, and machine
learning.
Version Control:
 Git:
A distributed version control system that allows tracking changes to code over
time and collaborating with others.
 GitHub:
Platforms that host Git repositories, providing a centralized location for storing
and sharing code.
Collaborative Coding and Sharing:
 Jupyter Notebooks:
Interactive computing environments that allow combining code, text, and
visualizations in a single document.
 R Markdown:
A tool for creating reproducible reports and documents that combine R code, text,
and visualizations.

To write a documemt using R Markdown.

R Markdown is a file format for making dynamic and static documents with R.
You can create an R Markdown file to save, organize and document your analysis
using code chunks and comments. It is important to create an R Markdown file to
have good communication between your team about analysis, you can create an R
Markdown file to summarize your visuals to stakeholders. R Markdown documents
are written in Markdown. Markdown is a syntax for formatting plain text files. It is
also used to create rich format text in your document.
Why use an R Markdown document?
Documenting your work makes it easy to share your analysis with anyone, R
Markdown lets you create a record of your analysis, conclusions, and decisions in a
document. It binds together your code and your report so you can share every step
of your analysis. R Markdown documents will help stakeholders and team
members understand what you did in your analysis to reach your conclusions. We
also have an interactive option called R Notebook that lets the user run their code
and show the graphs and charts that visualize the code. R Markdown lets you
convert files into other formats like HTML, PDF, Word documents, slide
presentations, and dashboards also.
Creating an R Markdown document
As we know R Markdown is a great tool for documenting your analysis, it is very
easy to create and run R Markdown.
To create R Markdown Open R Studios in the menu bar, and click File -> New File
-> R Markdown...
A window will open like this after clicking R Markdown...

In the dialog box that opens, add the name of the document in the title box. A name
is something that uniquely identifies your document and a name will help you
easily recognize what your analysis is about. For example, we use the penguin
dataset in this article so, I named my R Markdown "Penguins_Plots".
In the author, box enters the author's name.
Next, we can choose our output format. For now, leave the file in the default
output format which is HTML.
In the presentation, we can create a slide show of the R Markdown file.
In Shiny, we can create a shiny document and a Shiny presentation.
In From Template, we can use predefined te

Code Chunk
The next part with gray background in R Markdown is the code chunk. We can run
code chunks at any time.

code chunk
RStudio automatically adds to the notebook with this formatted default code
chunk. Code chunk starts with delimiter ` ` ` {r} and ends with ` ` `

R Markdown can run in two ways:

Run rmarkdown::render("<file_path>"")
Click the knit HTML at the top of the document
The knit drop-down menu includes three main options: HTML, PDF, and Word
document. You can use knit to convert your file to any of these types.

Knitr
A knitR function takes an input file, extracts the R code from it and returns an
output file. It is a dynamic report generalization package. Knitr integrates R code
in various documents like the HTML files, Markdown, Latex etc. An example of
kable() is taken, which uses the knitr package in R. This recipe demonstrates an
example on knitr package.

Step 1 - Install necessary library

install.packages('knitr')

library("knitr")

Step 2 - kable() in R

kable() is a function of knitr package, used for generating tables in R.

data = dimnames(iris3) # using the iris dataset head(data)

Step 3 - Converting into html format
html_file = kable(data,format="html") html_file # converting into html format

Step 4 - Converting into table format

tab = kable(head(data), format = "simple", row.names = TRUE) # converting to
simple table format tab

Topic 7 - Challenge Risk and Safety
No ratings yet
Topic 7 - Challenge Risk and Safety
83 pages
Learn R Programming in 24 Hours
From Everand
Learn R Programming in 24 Hours
Alex Nordeen
No ratings yet
Document
No ratings yet
Document
37 pages
MIT 302 - Statistical Computing II - Tutorial 06
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 06
4 pages
DataScience - Unit 1
No ratings yet
DataScience - Unit 1
12 pages
Reproducible Research with R and R Studio Second Edition Christopher Gandrud available full chapters
No ratings yet
Reproducible Research with R and R Studio Second Edition Christopher Gandrud available full chapters
80 pages
Writing A Reproducible Paper in R Markdown: Mail@paulcbauer - Eu Github Repository
No ratings yet
Writing A Reproducible Paper in R Markdown: Mail@paulcbauer - Eu Github Repository
18 pages
Reproducible Research With Rmarkdown
No ratings yet
Reproducible Research With Rmarkdown
22 pages
R Programming - a Comprehensive Guide: Software
From Everand
R Programming - a Comprehensive Guide: Software
Editor IJSMI
No ratings yet
Reproducible Research with R and R Studio Second Edition Christopher Gandrud instant download full chapters
No ratings yet
Reproducible Research with R and R Studio Second Edition Christopher Gandrud instant download full chapters
163 pages
Mastering RStudio - Develop, Communicate, and Collaborate With R - Sample Chapter
100% (1)
Mastering RStudio - Develop, Communicate, and Collaborate With R - Sample Chapter
40 pages
Beginning R: The Statistical Programming Language
From Everand
Beginning R: The Statistical Programming Language
Mark Gardener
4.5/5 (4)
BA303 Role of R
No ratings yet
BA303 Role of R
3 pages
R Programming Insights Textbook
From Everand
R Programming Insights Textbook
Manish Soni
No ratings yet
Learn R By Coding
From Everand
Learn R By Coding
Thomas Kurnicki
No ratings yet
R Programming Unlocked: Easy Learning
From Everand
R Programming Unlocked: Easy Learning
Md. Sifat Hossain
No ratings yet
Unit - 01
No ratings yet
Unit - 01
27 pages
Unit 1 - R Programming
No ratings yet
Unit 1 - R Programming
30 pages
Learning Jupyter
From Everand
Learning Jupyter
Dan Toomey
3.5/5 (4)
Learning RStudio for R Statistical Computing: Learn to effectively perform R development, statistical analysis, and reporting with the most popular R IDE
From Everand
Learning RStudio for R Statistical Computing: Learn to effectively perform R development, statistical analysis, and reporting with the most popular R IDE
Mark van der Loo
4/5 (8)
R Markdown: Cheat Sheet
No ratings yet
R Markdown: Cheat Sheet
2 pages
R coding for data analysts: from beginner to advanced
From Everand
R coding for data analysts: from beginner to advanced
Porcu Valentina
No ratings yet
Overview of R Markdown
No ratings yet
Overview of R Markdown
8 pages
Rmarkdown Cheatsheet PDF
No ratings yet
Rmarkdown Cheatsheet PDF
2 pages
RMD Tut For West Lab
No ratings yet
RMD Tut For West Lab
34 pages
Note 5-7
No ratings yet
Note 5-7
21 pages
Analysis of Beginning Data Science in R - Data Analysis, Visualization, and Modelling For The Data Scientist (PDFDrive)
No ratings yet
Analysis of Beginning Data Science in R - Data Analysis, Visualization, and Modelling For The Data Scientist (PDFDrive)
2 pages
R Programming For Data Science. A Comprehensive Guide To R Programming... 2024
No ratings yet
R Programming For Data Science. A Comprehensive Guide To R Programming... 2024
235 pages
(Smtebooks - Eu) Blogdown - Creating Websites With R Markdown 1st Edition PDF
100% (1)
(Smtebooks - Eu) Blogdown - Creating Websites With R Markdown 1st Edition PDF
173 pages
Practical 1 - MST2044 - R-Markdown Example
No ratings yet
Practical 1 - MST2044 - R-Markdown Example
45 pages
Chapter 02 Introduction
No ratings yet
Chapter 02 Introduction
31 pages
R Programming for Data Science 1st Edition Roger Peng pdf download
100% (1)
R Programming for Data Science 1st Edition Roger Peng pdf download
91 pages
M01 Lesson 01
No ratings yet
M01 Lesson 01
27 pages
R For The Rest of Us A Statistics Free Introduction David Keyes Download
No ratings yet
R For The Rest of Us A Statistics Free Introduction David Keyes Download
52 pages
Comp Dse 3
No ratings yet
Comp Dse 3
79 pages
Reproducible Research with R and R Studio Second Edition Christopher Gandrud full
No ratings yet
Reproducible Research with R and R Studio Second Edition Christopher Gandrud full
82 pages
MSDR PDF
No ratings yet
MSDR PDF
479 pages
Data Science - Notes
No ratings yet
Data Science - Notes
68 pages
Assignment of Business Analytics
No ratings yet
Assignment of Business Analytics
6 pages
Using Knitr and Pandoc To Create Reproducible Scientific Reports
No ratings yet
Using Knitr and Pandoc To Create Reproducible Scientific Reports
44 pages
R Markdown: Here's All You Have To Know For STAT 327
No ratings yet
R Markdown: Here's All You Have To Know For STAT 327
2 pages
R Fundamentals (Hadley Wickham - Rice Univ)
No ratings yet
R Fundamentals (Hadley Wickham - Rice Univ)
66 pages
MSDR Sample
No ratings yet
MSDR Sample
248 pages
Lecture 20
No ratings yet
Lecture 20
46 pages
Mastering Software Development in R
100% (1)
Mastering Software Development in R
468 pages
R Tutiorial
No ratings yet
R Tutiorial
6 pages
R Assignment
No ratings yet
R Assignment
22 pages
Tutorial 1 - Answers.
No ratings yet
Tutorial 1 - Answers.
7 pages
Report Writing For Data Science in R - Roger D. Peng
No ratings yet
Report Writing For Data Science in R - Roger D. Peng
120 pages
Markdown Format List
No ratings yet
Markdown Format List
4 pages
Different Output Formats in R
No ratings yet
Different Output Formats in R
4 pages
Module 3
No ratings yet
Module 3
71 pages
R Markdown Basics
No ratings yet
R Markdown Basics
41 pages
Data (MCS102) Module 1
No ratings yet
Data (MCS102) Module 1
40 pages
blogdown: Creating Websites with R Markdown Yihui Xie online pdf
No ratings yet
blogdown: Creating Websites with R Markdown Yihui Xie online pdf
150 pages
R Programming Language - 2020 Edition
No ratings yet
R Programming Language - 2020 Edition
228 pages
blogdown: Creating Websites with R Markdown Yihui Xie sample
No ratings yet
blogdown: Creating Websites with R Markdown Yihui Xie sample
120 pages
Programming And Coding in Intermidiate Level
From Everand
Programming And Coding in Intermidiate Level
Memo
No ratings yet
Computing With R
No ratings yet
Computing With R
20 pages
R PROGRAMMING QUESTION BANK Answer
100% (1)
R PROGRAMMING QUESTION BANK Answer
20 pages
Data Science Specialization
No ratings yet
Data Science Specialization
21 pages
3.4 MOP Setpoint
No ratings yet
3.4 MOP Setpoint
4 pages
Science Quiz Bee
No ratings yet
Science Quiz Bee
5 pages
The Famished Road
No ratings yet
The Famished Road
91 pages
Blockchain's Impact On Marketing by Slidesgo
No ratings yet
Blockchain's Impact On Marketing by Slidesgo
8 pages
Ship's Particulars
No ratings yet
Ship's Particulars
1 page
s15 Pin Out
No ratings yet
s15 Pin Out
4 pages
Review of Invisalign System
No ratings yet
Review of Invisalign System
13 pages
Calculus and Its Applications 11th Edition Bittinger Solutions Manualpdf Download
100% (11)
Calculus and Its Applications 11th Edition Bittinger Solutions Manualpdf Download
42 pages
CH 2 - Plan To Adapt
No ratings yet
CH 2 - Plan To Adapt
4 pages
MM-Last Day Assignment
No ratings yet
MM-Last Day Assignment
18 pages
Christian Concept of The Family
No ratings yet
Christian Concept of The Family
2 pages
Haldi Ram
No ratings yet
Haldi Ram
9 pages
Icao Spi Leading and Lagging
No ratings yet
Icao Spi Leading and Lagging
49 pages
Birds of A Feather PDF
No ratings yet
Birds of A Feather PDF
28 pages
Puritan Literature
No ratings yet
Puritan Literature
4 pages
Aditya Internship Training
No ratings yet
Aditya Internship Training
14 pages
The World During Rizal's Time PDF
No ratings yet
The World During Rizal's Time PDF
29 pages
E Illustrated Parts C-Arm C-Arm IPM Contents
67% (3)
E Illustrated Parts C-Arm C-Arm IPM Contents
73 pages
Singer 457U15, U125, U135, U140 Operator's Guide
100% (1)
Singer 457U15, U125, U135, U140 Operator's Guide
8 pages
Dual Clutch Transmission
0% (1)
Dual Clutch Transmission
18 pages
Agricultural Pesticide Spraying Robotic System Controlled Using Android Application
No ratings yet
Agricultural Pesticide Spraying Robotic System Controlled Using Android Application
6 pages
Career Development As A Management Accou
No ratings yet
Career Development As A Management Accou
19 pages
Chapter 4 (Answers)
No ratings yet
Chapter 4 (Answers)
5 pages
Calcaneus
No ratings yet
Calcaneus
4 pages
Awrrpt 1 66643 66644
No ratings yet
Awrrpt 1 66643 66644
228 pages
MA6452 S&NM 1 - by Civildatas - Com 12
No ratings yet
MA6452 S&NM 1 - by Civildatas - Com 12
50 pages
For Green Marketing Project
No ratings yet
For Green Marketing Project
16 pages
Patrolling
No ratings yet
Patrolling
31 pages
Greek Architecture
No ratings yet
Greek Architecture
13 pages

UNIT 05 Data Science PDF

Uploaded by

UNIT 05 Data Science PDF

Uploaded by

Reproducible research in data science involves making research findings, including

Why is reproducible research important in data science?

Tools behind reporting modern data analyses in reproducible research.

To write a documemt using R Markdown.

R Markdown can run in two ways:

Step 1 - Install necessary library

kable() is a function of knitr package, used for generating tables in R.

data = dimnames(iris3) # using the iris dataset head(data)

Step 4 - Converting into table format

You might also like