0% found this document useful (0 votes)
42 views23 pages

RapidMiner: Data Science Platform Guide

The document provides an overview of RapidMiner, an open-source data science platform. It then outlines the steps to install RapidMiner on a Windows system and provides a real example of using RapidMiner to build a decision tree model to predict customer behavior using sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views23 pages

RapidMiner: Data Science Platform Guide

The document provides an overview of RapidMiner, an open-source data science platform. It then outlines the steps to install RapidMiner on a Windows system and provides a real example of using RapidMiner to build a decision tree model to predict customer behavior using sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

RAPID

MINER
THE BEST DATA SCIENCE
PLATFORM IN THE WORLD

RNATION
TE A
IN L

SC
HOOL
Group 2

Dinh Quoc Thinh Vuong Thao Chi Hoang Dieu Linh Ta Dang Quang Bui Tuong
Minh Quang
Part 1: Overview about
RapidMiner

Part 2: How to install


Outline RapidMiner
of
Presentation Part 3: Real example
using RapidMiner
Part 1: Overview
[Link] facilitates data integration by connecting to different sources like databases,
spreadsheets, and big data platforms. It offers ETL tools for data extraction, transformation,
and loading to preprocess and clean data.

[Link]'s visual interface aids in data preparation with tasks like missing value imputation, outlier
detection, feature selection, and normalization using various operators and transformations.

[Link] in Machine Learning offers various algorithms (classification, regression, clustering,


association rules, time series analysis) through a user-friendly drag-and-drop interface for creating
predictive models without programming skills.
Part 1: Overview
[Link] offers model evaluation techniques like cross-validation and holdout validation, along with
visualizations and statistical metrics to interpret and compare model performance.

[Link] facilitates model deployment by offering options to export models in formats like PMML or
executable code for integration into other systems.

[Link]'s Auto Model feature automates model selection and hyperparameter tuning, finding the best
algorithm and settings for data to save time and effort in modeling.
Part 2 : How to install
Rapidminer
The Rapidminer is available in multiple operating systems
including: MacOS and Microsoft

Because, to install Rapidminer in both operating systems is the same,


so in this presentation we will focus on how to install Rapidminer on Microsoft system
Step 1: Visit the official website of RapidMiner
using the URL
[Link] on any web
browser. Click on the DOWNLOAD button.

Step 2: Clicking on the DOWNLOAD button


will redirect to another webpage. Click on the
Downloads button which is adjacent to My
Account

Step3: To download RapidMiner, select the 64-


bit Windows installer from the web page. The
273 MB file will begin downloading.
Step 4: Now check for the executable file in Step 5: It will prompt confirmation to make
downloads in your system and run it. changes to your system. Click on Yes.

Step 6: After this installation process


will start and will hardly take a minute
to complete the installation
Step 7: RapidMiner is successfully installed on Step 8: Run the software, initialisation of files will occur.
the system and an icon is created on the
desktop.

Step 9: RapidMiner software is started successfully and the interface is initialized.


Part3: Real example
of using Rapidminer
Step 1: From Repository, Choose data file name This data set includes information about age, gender, payment method,
choose the Sample, then “Deals” and drag it into the and answers to questions about whether or not to be a future customer.
choose Data Process If positive, we have the result Yes
If negative, we get the result No
Step 2: Find out the “Set role” in Step 3: Match a straight line from the
Operators then drag it into the end of “out” to the beginning of
Process “extra”
Step 4: Click into the “Set Role” and
then choose “Future Customer” for
For the “Target Role” choose “label”
the “Attribute Name”

=> After all, click “Apply”


Step 5: Similarly, in the Operators, type “Decision Step 6: Click into the “Decision Tree”, set
tree” and then drag it into the Process these Parameters default like this
Step 7: Connect the Step 8: Select the blue arrow icon
links as shown below above to run the program

Result: We can see that the software has


automatically drawn a decision tree based on the
input data
Step 9: Choose another data named This data set is a test set to answer
“Deals-Testset”, then drag it into the whether to become a customer in the
Process future or not (Data set to test after
interviewing customers)
Step 12: Search for the
“Performance” in
Step 10: Choose “Apply Step 11: Connect the lines as “Operators”, then click on
Model” in “Operators” shown below “Performance
Classification” and then
Drag it into the Process
Step 13: Connect the
lines as shown below
Step 14: Similarly step 8, select the blue arrow icon above to run the program

Result:
Looking at the results table below we can see that:
Class Precision:
- In “Predicted yes”, 232 elements are true yes, while 17 elements are true no. =>The ratio
of true yes scores among those classified as yes is 93.17%
- In “Predicted no”, 246 elements are true no, while 5 elements are true yes => The ratio
of true no scores among those classified as no is 98.01%
Class recall:
- The ratio of true yes scores among truly yes scores is 97.89%
- The ratio of true no scores among truly no scores is 93.54%

Compare:
Precision of “No” and “Yes”
The accuracy of the “No” points found is higher than that of the “Yes” points
(98.01% > 93.17%)
Recall of “No” and “Yes”True Yes Rate is high, meaning the rate of missing truly
“Yes” points is low. (97.89%)
The rate of missing truly “Yes” points is lower than that of “No” points (97.89% >
93.54%)
Performance Vector:
Accuracy: 95.60% => The model's
accuracy is 95.60%
Thanks you !
Do you have any question so far ?
B để tạo hiệu ứng mờ C để tạo hoa giấy

D để tạo tiếng trống M để thả mic

O để tạo bong bóng Q để tắt tiếng

U để hạ màn Bất kỳ số nào từ


0-9 để hẹn giờ

You might also like