0% found this document useful (0 votes)

16 views16 pages

Article Review 3 Eng

The document discusses Python and R, two popular programming languages for data science. Python is a general purpose language suited for data manipulation, machine learning, and production systems, while R focuses on statistical analysis. Key differences include Python's versatility in data collection and R's strengths in statistical modeling and visualization. The best choice depends on one's experience, colleagues, problem to be solved, and visualization needs. Many organizations use both languages together.

Uploaded by

Cecilia Fauziah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views16 pages

Article Review 3 Eng

Uploaded by

Cecilia Fauziah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Programming Language

Scripting
Programming for Data Science
Daftar Isi
Introduction 3
What is Python? 3
What is R? 4
The main difference between R and Python: Data analysis goals 5
Other key differences 5
Python vs. R: Which is right for you? 7
Scripting Languages for Data Science 8
Why Use Programming and Scripting Languages for Data Science? 9
CRUD Dataframe and Table, Merge, Export, Import 9
Create Dataframe 9
Read Dataframe 10
Update Dataframe 10
Delete Dataframe 10
Export Dataframe 11
Import Dataframe 11
Function and Class 11
Object-Oriented Programming (OOP) 11
Functions 12
Classes 12
Use Case 13
References 16

2
Introduction
Explore the basics of these two open-source programming languages
popular for Data Science, the key differences that set them apart and how to choose
the right one for your situation. If you work in data science or analytics, you’re
probably well aware of the Python vs. R debate. Although both languages are
bringing the future to life — through artificial intelligence, machine learning and
data-driven innovation — there are strengths and weaknesses that come into play.
In many ways, the two open source languages are very similar. Free to
download for everyone, both languages are well suited for data science tasks —
from data manipulation and automation to business analysis and big data
exploration. The main difference is that Python is a general-purpose programming
language, while R has its roots in statistical analysis. Increasingly, the question isn’t
which to choose, but how to make the best use of both programming languages for
your specific use cases.

What is Python?
Python is a general-purpose, object-oriented programming language that
emphasizes code readability through its generous use of white space. Released in
1989, Python is easy to learn and a favorite of programmers and developers. In fact,
Python is one of the most popular programming languages in the world, just behind
Java and C.
Several Python libraries support data science tasks, including the following:
● Numpy for handling large dimensional arrays
● Pandas for data manipulation and analysis
● Matplotlib for building data visualizations

3
Plus, Python is particularly well suited for deploying machine learning at a
large scale. Its suite of specialized deep learning and machine learning libraries
includes tools like scikit-learn, Keras and TensorFlow, which enable data scientists
to develop sophisticated data models that plug directly into a production system.
Then, Jupyter Notebooks are an open source web application for easily sharing
documents that contain your live Python code, equations, visualizations and data
science explanations.

What is R?
R is an open source programming language that’s optimized for statistical
analysis and data visualization. Developed in 1992, R has a rich ecosystem with
complex data models and elegant tools for data reporting. At last count, more than
13,000 R packages were available via the Comprehensive R Archive Network (CRAN)
for deep analytics.
Popular among data science scholars and researchers, R provides a broad
variety of libraries and tools for the following:
● Cleansing and prepping data
● Creating visualizations
● Training and evaluating machine learning and deep learning algorithms
R is commonly used within RStudio, an integrated development environment
(IDE) for simplified statistical analysis, visualization and reporting. R applications can
be used directly and interactively on the web via Shiny.

4
The main difference between R and Python: Data analysis
goals
The main distinction between the two languages is in their approach to data
science. Both open source programming languages are supported by large
communities, continuously extending their libraries and tools. But while R is mainly
used for statistical analysis, Python provides a more general approach to data
wrangling.
Python is a multi-purpose language, much like C++ and Java, with a readable
syntax that’s easy to learn. Programmers use Python to delve into data analysis or
use machine learning in scalable production environments. For example, you might
use Python to build face recognition into your mobile API (Application Programming
Interface) or for developing a machine learning application.
R, on the other hand, is built by statisticians and leans heavily into statistical
models and specialized analytics. Data scientists use R for deep statistical analysis,
supported by just a few lines of code and beautiful data visualizations. For example,
you might use R for customer behavior analysis or genomics research.

Other key differences

● Data collection: Python supports all kinds of data formats, from
comma-separated value (CSV) files to JSON (JavaScript Object Notation)
sourced from the web. You can also import SQL tables directly into your
Python code. For web development, the Python requests library lets you
easily grab data from the web for building datasets. In contrast, R is designed
for data analysts to import data from Excel, CSV and text files. Files built in
Minitab or in SPSS (Statistical Package for the Social Sciences) format can
5
also be turned into R data frames. While Python is more versatile for pulling
data from the web, modern R packages like Rvest are designed for basic web
scraping.
● Data exploration: In Python, you can explore data with Pandas, the data
analysis library for Python. You’re able to filter, sort and display data in a
matter of seconds. R, on the other hand, is optimized for statistical analysis of
large datasets, and it offers a number of different options for exploring data.
With R, you’re able to build probability distributions, apply different statistical
tests, and use standard machine learning and data mining techniques.
● Data modeling: Python has standard libraries for data modeling, including
Numpy for numerical modeling analysis, SciPy for scientific computing and
calculations and scikit-learn for machine learning algorithms. For specific
modeling analysis in R, you’ll sometimes have to rely on packages outside of
R’s core functionality. But the specific set of packages known as the
Tidyverse make it easy to import, manipulate, visualize and report on data.
● Data visualization: While visualization is not a strength in Python, you can use
the Matplotlib library for generating basic graphs and charts. Plus, the
Seaborn library allows you to draw more attractive and informative statistical
graphics in Python. However, R was built to demonstrate the results of
statistical analysis, with the base graphics module allowing you to easily
create basic charts and plots. You can also use ggplot2 for more advanced
plots, such as complex scatter plots with regression lines.

6
Python vs. R: Which is right for you?
Choosing the right language depends on your situation. Here are some things to
consider:
● Do you have programming experience? Thanks to its easy-to-read syntax,
Python has a learning curve that’s linear and smooth. It’s considered a good
language for beginning programmers. With R, novices can be running data
analysis tasks within minutes. But the complexity of advanced functionality in
R makes it more difficult to develop expertise.
● What do your colleagues use? R is a statistical tool used by academics,
engineers and scientists without any programming skills. Python is a
production-ready language used in a wide range of industry, research and
engineering workflows.
● What problems are you trying to solve? R programming is better suited for
statistical learning, with unmatched libraries for data exploration and
experimentation. Python is a better choice for machine learning and
large-scale applications, especially for data analysis within web applications.
● How important are charts and graphs? R applications are ideal for visualizing
your data in beautiful graphics. In contrast, Python applications are easier to
integrate in an engineering environment.
● Note that many tools, such as Microsoft Machine Learning Server, support
both R and Python. That’s why most organizations use a combination of both
languages, and the R vs. Python debate is all for naught. In fact, you might
conduct early-stage data analysis and exploration in R and then switch to
Python when it’s time to ship some data products.

7
Scripting Languages for Data Science

In addition to programming languages, data scientists also use scripting languages.

Scripting languages are interpreted languages that are typically used to automate
tasks. The most popular scripting languages for data science are Python, Bash, and
SQL. Python is a general-purpose scripting language that is easy to learn and use. It
is also very powerful, making it suitable for a wide variety of data science tasks.
Bash is a Unix shell scripting language that is used to automate tasks on Unix and
Linux systems. SQL is a database query language that is used to interact with
databases. Scripting languages empower Data Scientists to extract insights from
raw data through effective data manipulation and analysis. With Python's Pandas
library, for instance, data can be cleansed, transformed, and aggregated effortlessly.
This ability to reshape data is essential for creating a consistent and structured
foundation for analysis. By leveraging Pandas' intuitive syntax, Data Scientists can
filter, sort, and group data with ease, enabling them to uncover patterns and trends.

In addition to data manipulation, scripting languages offer a rich ecosystem of

statistical and machine learning tools. Libraries such as scikit-learn in Python and
caret in R provide pre-built algorithms for classification, regression, clustering, and
more. Through scripting, Data Scientists can experiment with various models,
fine-tuning parameters and evaluating performance metrics to select the best-fit
solution for their data.

8
Why Use Programming and Scripting Languages for Data
Science?

There are several reasons why data scientists use programming and scripting
languages. First, these languages allow data scientists to automate tasks. This can
save a lot of time and effort, especially when working with large datasets. Second,
these languages allow data scientists to write code that is reusable and portable.
This means that the code can be used on different computers and with different
datasets. Third, these languages allow data scientists to create custom tools and
applications. This can be useful for solving specific problems or for conducting
research.

CRUD Dataframe and Table, Merge, Export, Import

Performing CRUD (Create, Read, Update, Delete) operations on a DataFrame in

Python is a common task, especially when working with data analysis and
manipulation. Additionally, merging, exporting, and importing data are also essential
operations. Below is a guide on how to perform these operations using the Pandas
library in Python.

Create Dataframe

Figure 1. Create Dataframe

9
Read Dataframe

Output:

Figure 2. Read Dataframe

Update Dataframe

Output:

Figure 3. Update Dataframe

Delete Dataframe

10
Output:

Figure 4. Delete Dataframe

Export Dataframe

Figure 5. Export Dataframe

Import Dataframe

Figure 6. Import Dataframe

Function and Class

Object-Oriented Programming (OOP)

Python is an object-oriented programming (OOP) language, and both functions and

classes play a crucial role in OOP. Functions are used for procedural programming,
while classes support the creation of objects and encapsulation of data and
behavior.

11
Understanding and using functions and classes are essential for writing clean,
modular, and maintainable code. They provide a way to organize and structure code
logically, making it easier to manage and extend.

Functions

A function in Python is a block of reusable code that performs a specific task.

Functions are defined using the def keyword, followed by the function name,
parameters in parentheses, and a colon. The function body is indented.

Example Functions:

Output:

Figure 7. Example Functions

In the example above, greet is a simple function that takes a name parameter and
prints a greeting. Functions can also return values using the return statement.

Classes

A class is a blueprint for creating objects. Objects are instances of a class, and
classes define attributes (characteristics) and methods (functions) that operate on
those attributes. Classes are defined using the class keyword.

12
Output:

Figure 8. Example Class

In this example, Dog is a class with an __init__ method (constructor) that initializes
the attributes name and age. The bark method is a behavior associated with the Dog
class.

Use Case

Background and Problem Statement

You are a data scientist at ID/X Partners doing a data science project. The project
you are working on is related to creating machine learning to predict whether
customers will churn or not.

13
Create a function to evaluate the machine learning model that has been created.
Functions have 2 parameters input (prediction result and actual result) and must
provide the output for accuracy and precision metrics!

Solution
from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay

def classification_eval (actual, predicted, name):

cm = confusion_matrix(actual, predicted)
tp = cm[1][1]
tn = cm[0][0]
fp = cm[0][1]
fn = cm[1][0]

accuracy = round((tp+tn) / (tp+tn+fp+fn) * 100, 2)

precision = round((tp) / (tp+fp) * 100, 2)

print('Evaluation Model:', name)

print(cm)
print('Accuracy :', accuracy, '%')
print('Precision :', precision, '%')

This function evaluates the classification based on the predictions and actual values
(ground truth) provided. This syntax has several steps:

14
Calculating Confusion Matrix: Using predicted and actual values, this function
calculates the confusion matrix (cm) using the confusion_matrix function.

Calculating Evaluation Metrics: After getting the confusion matrix, this function
calculates several evaluation metrics such as:

Accuracy: The accuracy of a classification model, calculated as the proportion of

correct predictions overall.
Precision: The proportion of correct positive predictions out of all positive
predictions made.

Displaying Evaluation Results: After calculating the metrics above, this function
prints the model evaluation results, including confusion matrix, accuracy, precision,
recall, and F1 score.

15
References

https://www.ibm.com/cloud/blog/python-vs-r

https://www.python.org/about/gettingstarted/

http://www.sthda.com/english/wiki/r-basics-quick-and-easy

https://towardsdatascience.com/python-basics-for-data-science-6a6c987f

2755

https://medium.com/datactw/a-complete-introduction-to-r-for-data-scien

ce-1858c69f76b0

https://towardsdatascience.com/getting-started-with-r-programming-2f15e

9256c9

https://r4ds.had.co.nz/index.html

https://dplyr.tidyverse.org/

https://datacarpentry.org/R-ecology-lesson/03-dplyr.html

https://medium.com/analytics-vidhya/python-data-manipulation-fb86d0cd

d028

Python Programming For Beginners (Knowles, Chad)
100% (10)
Python Programming For Beginners (Knowles, Chad)
246 pages
Data Analysis From Scratch With Python - Beginner Guide Using Python, Pandas, NumPy, Scikit-Learn, IPython, TensorFlow and
100% (10)
Data Analysis From Scratch With Python - Beginner Guide Using Python, Pandas, NumPy, Scikit-Learn, IPython, TensorFlow and
104 pages
FinOps - Google Cloud Platform
100% (1)
FinOps - Google Cloud Platform
39 pages
R Vs Python For Data Science
No ratings yet
R Vs Python For Data Science
7 pages
Introduction - R Programming
No ratings yet
Introduction - R Programming
22 pages
Python Vs R For Data Science
No ratings yet
Python Vs R For Data Science
2 pages
Python Vs R For Data Science 1725025528
No ratings yet
Python Vs R For Data Science 1725025528
10 pages
Py Chapter 2 Topic 2
No ratings yet
Py Chapter 2 Topic 2
5 pages
Guide Python Data Science
100% (2)
Guide Python Data Science
13 pages
R VS Python
No ratings yet
R VS Python
12 pages
10EXP01
No ratings yet
10EXP01
12 pages
Auditing The Data Using Python
No ratings yet
Auditing The Data Using Python
4 pages
What Is R Programming
No ratings yet
What Is R Programming
7 pages
What Is Python?: Why Python For Data Science?
No ratings yet
What Is Python?: Why Python For Data Science?
3 pages
Python Tutorial
No ratings yet
Python Tutorial
18 pages
Python 2
No ratings yet
Python 2
18 pages
Presentation Python
No ratings yet
Presentation Python
17 pages
Introduction To R Programming
No ratings yet
Introduction To R Programming
5 pages
Introduction - R Programming
100% (1)
Introduction - R Programming
26 pages
Python Intro-1
No ratings yet
Python Intro-1
56 pages
Tools of Business Analytics
No ratings yet
Tools of Business Analytics
20 pages
Python
No ratings yet
Python
23 pages
Comparative Analysis of R and Python For Mathematical Programming
No ratings yet
Comparative Analysis of R and Python For Mathematical Programming
4 pages
Data Analysis Using R and Python
No ratings yet
Data Analysis Using R and Python
96 pages
2 IntroPython
No ratings yet
2 IntroPython
18 pages
Python For Data Science
No ratings yet
Python For Data Science
20 pages
Py Chapter 1 Topic 1
No ratings yet
Py Chapter 1 Topic 1
7 pages
2.1 Ponder Over Questions: Quora
No ratings yet
2.1 Ponder Over Questions: Quora
27 pages
Handout 1 - Introduction To Setting Up Python
No ratings yet
Handout 1 - Introduction To Setting Up Python
49 pages
DTS 204-50-102
No ratings yet
DTS 204-50-102
53 pages
Languages Data Scientist
No ratings yet
Languages Data Scientist
13 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
49 pages
Data Science Lecture No 5
No ratings yet
Data Science Lecture No 5
16 pages
R Material
No ratings yet
R Material
105 pages
Python For Data Analytics Scientific and Technical Applications
No ratings yet
Python For Data Analytics Scientific and Technical Applications
6 pages
Data Ty
No ratings yet
Data Ty
59 pages
DDI Book Chapter Tools and Techniques
No ratings yet
DDI Book Chapter Tools and Techniques
13 pages
R Programming Unit 1
No ratings yet
R Programming Unit 1
83 pages
SC&RP - Unit 1
No ratings yet
SC&RP - Unit 1
106 pages
R Python
No ratings yet
R Python
25 pages
1 Introduction Python Programming For Data Science
No ratings yet
1 Introduction Python Programming For Data Science
11 pages
Python Programming1
No ratings yet
Python Programming1
27 pages
Python For Data Science
No ratings yet
Python For Data Science
7 pages
Comparison Python, R, SAS
No ratings yet
Comparison Python, R, SAS
6 pages
Introduction
No ratings yet
Introduction
45 pages
Python Basic
No ratings yet
Python Basic
145 pages
Data Science Handwritten Notes
No ratings yet
Data Science Handwritten Notes
44 pages
Basics of Python Programming and Statistics
No ratings yet
Basics of Python Programming and Statistics
56 pages
Python
No ratings yet
Python
323 pages
Intro To DS Assignmnt 1 (Amna Iqbal) ....
No ratings yet
Intro To DS Assignmnt 1 (Amna Iqbal) ....
4 pages
Python Libraries Seminar Report
100% (2)
Python Libraries Seminar Report
16 pages
Python 1
No ratings yet
Python 1
15 pages
Data Science With Python The Ultimate Ste - Julian James McKinnon
No ratings yet
Data Science With Python The Ultimate Ste - Julian James McKinnon
110 pages
Igual-SeguÃ 2017 Chapter ToolboxesForDataScientists
No ratings yet
Igual-SeguÃ 2017 Chapter ToolboxesForDataScientists
24 pages
Suraj Report File
No ratings yet
Suraj Report File
17 pages
Ds Python Unit-I
No ratings yet
Ds Python Unit-I
30 pages
PDS Unit1-1
No ratings yet
PDS Unit1-1
104 pages
WYIIA Time Schedule
No ratings yet
WYIIA Time Schedule
1 page
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
Article Review 10 Eng
No ratings yet
Article Review 10 Eng
28 pages
Article Review 6 Eng
No ratings yet
Article Review 6 Eng
31 pages
Reading 2
No ratings yet
Reading 2
28 pages
ASEAN DE 2022 - Participants Booklet
No ratings yet
ASEAN DE 2022 - Participants Booklet
13 pages
SOFTPP
No ratings yet
SOFTPP
87 pages
DSPR OPS & ABOD Budget-Expenditure Account Mapping
No ratings yet
DSPR OPS & ABOD Budget-Expenditure Account Mapping
25 pages
PROJECT REPORT - Study of Marketing Strategies of NETFLIX
No ratings yet
PROJECT REPORT - Study of Marketing Strategies of NETFLIX
59 pages
Pda TR32 2004
No ratings yet
Pda TR32 2004
153 pages
Automatic Whiteboard Cleaner and Adjuster
No ratings yet
Automatic Whiteboard Cleaner and Adjuster
3 pages
Explai The Business Model of Nagad and Surjopay
No ratings yet
Explai The Business Model of Nagad and Surjopay
5 pages
ITIL 4 Foundation Key Elements v3.1
No ratings yet
ITIL 4 Foundation Key Elements v3.1
2 pages
Akurateco WLPG EUJ
No ratings yet
Akurateco WLPG EUJ
16 pages
ERPCODE Inc
No ratings yet
ERPCODE Inc
5 pages
SolidCAM 2020 2.5D Milling Training Course PDF
No ratings yet
SolidCAM 2020 2.5D Milling Training Course PDF
318 pages
SW - Chapter - 3 Requirement Engineering
No ratings yet
SW - Chapter - 3 Requirement Engineering
37 pages
Information Technology For Managers Notes
No ratings yet
Information Technology For Managers Notes
90 pages
BBA 2019-22 (Scheme & Syllabus)
No ratings yet
BBA 2019-22 (Scheme & Syllabus)
95 pages
Syandes Final Paper
No ratings yet
Syandes Final Paper
101 pages
Chapter Three Graphical Ui Layout
No ratings yet
Chapter Three Graphical Ui Layout
34 pages
SD Lab Manual Kushagra Mehrotra A117
No ratings yet
SD Lab Manual Kushagra Mehrotra A117
62 pages
Ais3 Notes
No ratings yet
Ais3 Notes
26 pages
JD Ecotek Senior Project Manager
No ratings yet
JD Ecotek Senior Project Manager
2 pages
Rajesh Kumar Resume
No ratings yet
Rajesh Kumar Resume
3 pages
Integrating Google GCP With Microsoft Sentinel
No ratings yet
Integrating Google GCP With Microsoft Sentinel
18 pages
TM04 Designing Program Logic
No ratings yet
TM04 Designing Program Logic
62 pages
CC Unit-4 Module
No ratings yet
CC Unit-4 Module
20 pages
Agile
No ratings yet
Agile
8 pages
Attendance Record
No ratings yet
Attendance Record
7 pages
2016 2017 KS3 Computing Scheme of Work
No ratings yet
2016 2017 KS3 Computing Scheme of Work
20 pages
Grade 3 Computer Studies Test
No ratings yet
Grade 3 Computer Studies Test
2 pages
Info Lowongan Kerja 2022
No ratings yet
Info Lowongan Kerja 2022
46 pages
Apple HP Dell
No ratings yet
Apple HP Dell
10 pages
7503-Strategic MGMT of IT (MBAFT-S3-DU) Oct23 (PG 150)
No ratings yet
7503-Strategic MGMT of IT (MBAFT-S3-DU) Oct23 (PG 150)
152 pages

Article Review 3 Eng

Uploaded by

Article Review 3 Eng

Uploaded by

Programming Language

Other key differences

In addition to programming languages, data scientists also use scripting languages.

In addition to data manipulation, scripting languages offer a rich ecosystem of

CRUD Dataframe and Table, Merge, Export, Import

Performing CRUD (Create, Read, Update, Delete) operations on a DataFrame in

Figure 1. Create Dataframe

Figure 2. Read Dataframe

Figure 3. Update Dataframe

Figure 4. Delete Dataframe

Figure 5. Export Dataframe

Figure 6. Import Dataframe

Function and Class

Object-Oriented Programming (OOP)

Python is an object-oriented programming (OOP) language, and both functions and

A function in Python is a block of reusable code that performs a specific task.

Figure 7. Example Functions

Figure 8. Example Class

Background and Problem Statement

def classification_eval (actual, predicted, name):

accuracy = round((tp+tn) / (tp+tn+fp+fn) * 100, 2)

print('Evaluation Model:', name)

Accuracy: The accuracy of a classification model, calculated as the proportion of

You might also like