0% found this document useful (0 votes)
1 views

Lam sach so lieu (data preparation or preprocessing)

Uploaded by

Dang Phuc Vinh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Lam sach so lieu (data preparation or preprocessing)

Uploaded by

Dang Phuc Vinh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Data Preparation and preprocessing

By Nguyen Tien Huy (Nagasaki University)

Using Excel database and SPSS

1. Copy all information into one data sheet


2. Save as the version 2
3. Remember to “undo” if you make a mistake
4. Assign ID number for each row (from 1 to final), which is useful to for you to identify
when you make a mistake
Find duplicate rows: Select column of name (then address, birthday, similar
information); only select the data row (not title row); == > Compare the whole data
to check if it is completely a duplicate? If there is some new factor of both papers,
we select both. If one is bigger of more complete, we choose the bigger one.
5. Delete unnecessary columns (such as noted information)
6. Correct the format such as
.5-6.9 == > 0.5-6.9
Breastmilk, breast milk in different papers == > choose one type only

ID of original article should be uniformed


M. AlGhamdi/2014 == > choose one type only AlGhamdi/2014/country
greece == > Greece
7. Correct some strange characters using FIND/REPLACE
Find out missing data by counting cells (using formular: =COUNT(B3:B119));
missing data should be coded as blanked in excel. == > Look back to the original
papers or source of data

8. Use the COUNTA function to count cells that contain values input
(=COUNTA(B3:B119))
9. Find out outlier mistakes by sorting each column; check Max and Min of data (too
strange, big number??)
== > Look back to the original papers or source of data

1
10. Coding information for SPSS: make a coding table as the following table

Numbe Variables Variables Data Coding

r (for SPSS) (in excel)

Cannot use space, please use

“_” to replace

2 Day_of_fever Day of fever days

15. Vomit Yes 1


No 0
16. Petechiae Yes 1
No 0
17. Convulsion Yes 1
No 0

11. Click SPSS icon

2
12. Check “Open an exsisting data source”
13. Select the excel file, select sheet
14. There are 2 views of data and variables (see the lecture from

3
4
5
6
15. NOTES:

Continous variables (bien lien tuc): make the measure as “Scale”

Category variables (bien khong lien tuc): make the measure as “Ordinal”

16. Identify outliers, duplicates, sorting

7
Check detected cases in the raw data

8
References:

R. Jason Weiss and Robert T Townsen. Using Excel to Clean and Prepare Data for
Analysis.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1198040/

http://survey.cvent.com/blog/conducting-online-surveys/7-steps-to-prepare-data-for-
analysis

You might also like