0% found this document useful (0 votes)
44 views

B) Stata Interface (With Data and Commands, Windows) : End: The Introduction of Data Has Finished

This document provides a summary of Stata commands for data management, descriptive statistics, graphs, regression models, and other analyses. It includes explanations and examples of commands for importing and cleaning data, descriptive statistics, categorical variable analyses, plots, hypothesis testing, regression, and other analyses. Loops, data management techniques like preserve/restore, and a variety of modeling approaches like logistic regression are also demonstrated through examples.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

B) Stata Interface (With Data and Commands, Windows) : End: The Introduction of Data Has Finished

This document provides a summary of Stata commands for data management, descriptive statistics, graphs, regression models, and other analyses. It includes explanations and examples of commands for importing and cleaning data, descriptive statistics, categorical variable analyses, plots, hypothesis testing, regression, and other analyses. Loops, data management techniques like preserve/restore, and a variety of modeling approaches like logistic regression are also demonstrated through examples.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

B) Stata interface (with data and commands,

windows)
Command
clear
input x
1
2
3
end
summarize x

Explanation
cleans the previous data set without saving
the data
creates a new numeric variable called x
These are 3 values for x
end: the introduction of data has finished
Descriptive statistics for variable x (the
underlined text is enough to make the
command work)

1. Description: summarize, detail


Command
set more off, perm

Explanation
Supress the repeated presence of the word
more (it will annoy you by tirelessly
requesting that you may press the space bar
to continue)
Clears the previous data base
Do this only if you are using version 13 or
higher, to keep the routines (specially the
generation of random values) as they were
in version 12.
Creates 100 empty rows
Set a random seed: new data will be
randomly generated, but they will be the
same when you run again this sequence of
commands
Generates a new random variable called
speed with mean = 80 and standard
deviation=20 (approximately)
You have changed the values for the variable
speed
in row 1, speed=20
in row 2, speed=33
in row 100, speed=180
Descriptive statistics are requested

clear
version 12
set obs 100
set seed 333

generate speed=rnormal(80,20)
replace speed=20 in 1
replace speed=33 in 2
replace speed=180 in 100
summarize speed, detail

Please skip always this step if you are using


versions 11 or older

4. Labels and codes of variables: label codebook


clear
input ///
smk weight
0
66
0
72
0
78
1
79
1
80
1
81
2
68
2
70
2
72
end

sex
0
0
1
0
1
1
0
0
1

5. Categorical variables I: tabulate tables tabstat


clear
input ///
smoking weight
0
66
0
72
0
78
1
79
1
80
1
81
2
68
2
70
2
72
end

6. Categorical variables II: tab1 tab2


input ///
sex
smoke OH cancer
1
1
1
1
0
1
1
1
1
0
1
1
0
0
1
1
1
1
0
1
0
1
0
1
1
0
0
1
0
0
0
1
1
0
1
0
1
0
1
0
end

1
1
0
0
1
1
0
0

1
1
1
1
0
0
0
0

n
50
40
10
8
25
20
5
4

0
0
0
0
0
0
0
0

650
860
690
892
675
880
695
896

7. Categorization and quantiles: recode, xtile


clear
version 12
set obs 500
set seed 1234
gen BMI=rnormal(24,4)
replace BMI=20 if BMI>19.5 & BMI<20.5

8. Histograms: hist
clear
version 12
set obs 500
set seed 1234
gen x=rnormal(100,10)

9. Boxplots: graph box


clear
version 12
set obs 500
set seed 1234
gen x=rnormal(100,10)

10. Scatter plots: twoway


clear
input ///
hours
0
0
1
2
3
4
8
end

weight_chg
-1
0
0
.5
2.5
2
3

clear
input ///
str15 country Chocolate
Nobel
China
0.700
0.060
Brazil
2.900
0.050
Spain
3.600
1.701
Poland
3.600
3.124
Australia
4.500
5.451
Canada
3.900
6.122
Netherlands
4.500
11.356
USA
5.300
10.770
Ireland
8.800
12.706
Germany
11.600
12.668
UK
9.700
18.875
Switzerland11.900
31.544
Sweden
6.480
31.855
end

11. More on twoway: bar line


clear
input
mark
3
4
6
7
8
9
10
end

///
n
1
5
4
7
10
12
4

clear
input ///
lowfatdai averageSBP
1
140
2
132
3
128
4
122
5
110
end

12. Plots with confidence intervals and optional


error
bars: rcap
clear
version 12
set obs 500
set seed 999
g age=round(100*runiform())
replace age=age+20 if age<20
g SBP=95+(.4*age)+rnormal(0,15)

clear
input Q4 RR low upper
1 1 1
1
20.8 0.4
1.6
30.6 0.4
0.9
40.5 0.3
0.833
end

13. Text in graph, fine tuning of graphs


clear
input Q4 RR low upper
1 1 1
1
20.8 0.4 1.6
30.6 0.4 0.9
40.5 0.3 0.833
end

16. Dates and strings: mdy format substr display


clear
input ///
str10 name
Mary_Ann
John
Peter
23
Grandpa
end

day month year


3
3
1981
1
12
1979
1
2009
26
6
1927

17. Algebra: return list scalar + - * / ^


clear
version 12
set obs 500
set seed 999
g age=round(100*runiform())

20. Distributions: uniform binomial poisson


clear
version 12
set obs 1000
set seed 333
g x = uniform()
hist x, freq w(.2)

22. Graphics to
assess
clear
version 12
se obs 999
se seed 1234
6

24. Hypothesis testing and p values


clear
input ///
married case
1
1
0
1
1
0
0
0
end
expand n

n
10
20
90
80

25. Testing normality: sktest, ladder, gladder


clear
se obs 150
se seed 1234
g z=rnormal(0,1)
g x=0.1+round(4*uniform()) in 1/50
replace x=abs(exp(rnormal(0.8,1.2))) if x==.
*in versions >13, youll need to add version 12.0 at the beginning
clear
se obs 150
se seed 1234
g a=ceil(10*uniform())
g b=rnormal(20,4)
g x=a+b+abs(exp(rnormal(0.1,1)))

26. Comparing proportions: chi2, exact


clear
input ///
married case
1
1
0
1
1
0
0
0
end
expand n
clear
input ///
genotype case
1
1

n
10
20
90
80

n
5
7

1
0
end
expand n

0
0

1
11

27. Categorical variables, epidemiologic analyses:


cs
clear
input ///
smoke
OH
0
0
0
1
0
0
0
1
1
1
1
1
end
expand n

0
1
0
1

cancer
0
0
1
1

n
1591
1582
9
18

0
0
1
1

1555
1510
45
90

28. Case-control studies, matched case-control:


cc mcc
clear
input ///
case exposure
n
1
1
50
1
0
50
0
1
20
0
0
80
end
expand n
g sex=1 in 25/75
replace sex=1 in 110/175
replace sex=0 if sex==.
clear
input ///
exp_case exp_contr
n
1
1
10
1
0
75
0
1
25
0
0
110
end
expand n

29. Comparison between two means: ttest sdtest


clear
version 12.0
set obs 300
g group=_n>120
set seed 1234
g bmi=rnormal(25,4) in 1/120
replace bmi=rnormal(24,4) in 121/300

version 12
set seed 1234
g e=rnormal(0,2)
g bmi_2yr=bmi+0.25+e

30. Non-parametric tests: ranksum signrank


clear
version 12
set obs 300
g group=_n>120
set seed 1234
g bmi=rnormal(25,4) in 1/120
replace bmi=rnormal(24,4) in 121/300
version 12
set seed 1234
g e=rnormal(0,2)
g bmi_2yr=bmi+0.25+e

31. Comparison between >2 means: oneway


kwallis
clear
input
x
4
4
6
8
8

///
group
1
1
1
1
1

4
8
8
8

2
2
2
2

8
8
10
12
12

3
3
3
3
3

end

32. Simple linear regression: regress


clear
input ///
hours
0
0
1
2
3
4
8
end

weight_chg
-1
0
0
.5
2.5
2
3

35. Multiple regression, dummy variables


clear
input ///
w_chg
9
8
7
6
3
3
0
-2
-3
-5
-5
-5
-6
-8
end

ses
1
1
1
1
2
2
1
4
2
2
3
4
4
3

smk
1
1
1
1
0
1
1
1
0
2
0
2
0
2

age
68
65
65
63
71
71
58
53
49
52
48
55
52
45

36. Factorial anova, repeated measures anova


clear
input ///
w_chg
9
8
7
6
3
3
0
-2
-3
-5
-5
-5
-6

ses
1
1
1
1
2
2
1
4
2
2
3
4
4

smk
1
1
1
1
0
1
1
1
0
2
0
2
0

age
68
65
65
63
71
71
58
53
49
52
48
55
52

10

-8
end

45

clear
input ///
id time weight diet
1
1
76
1
1
2
65
1
1
3
63
1
2
2
2

1
2
3

82
70
68

1
1
1

3
3
3

1
2
3

80
78
70

1
1
1

4
4
4

1
2
3

83
83
84

2
2
2

5
5
5

1
2
3

79
80
79

2
2
2

6
6
6
end

1
2
3

84
84
84

2
2
2

37. ANCOVA: margins marginsplot


clear
input ///
SBP
MedDiet
145
12
165
9
153
10
105
14
115
5
123
6
98
13
133
3
160
5
133
7
140
6
102
12
166
4
120
7
end

age

68
65
66
63
45
38
43
35
69
52
48
35
72
55

38. Transforming columns/rows: reshape


clear
input ///
id weight1
1
76
2
82
3
80

weight2 weight3
65
63
70
68
78
70

11

diet
1
1
1

4
5
6
end

83
79
84

clear
input ///
subject
1
1
1
1
2
2
2
2
end

83
80
84

SBP
120
125
121
116
118
122
122
124

84
79
84

2
2
2

hour
9
12
18
24
9
12
18
24

39. Protecting your data: preserve restore


clear
sysuse citytemp4.dta
clear
input ///
id weight1
1
76
2
82
3
80
4
83
5
79
6
84
end

weight2 weight3
65
63
70
68
78
70
83
84
80
79
84
84

diet
1
1
1
2
2
2

41. Logistic regression: logistic logit


clear
input ///
OH car_crash
0
0
0
1
1
0
1
1
end
expand n

n
50
10
40
20

12

clear
sysuse cancer

42. ROC curves: roctab roccomp


clear
sysuse cancer
quietly logistic died i.drug age
predict p_death
clear
sysuse cancer
quietly logistic died age
predict p_death1
quietly logistic died i.drug age
predict p_death2

44. Kaplan-Meier curves: stset sts graph


clear
sysuse cancer

44. Confounding and interaction: est store lrtest


clear
input ///
smoke alcohol
1
1
1
0
1
1
1
0
0
1
0
0
0
1
0
0
end
expand n

clear
input ///
FVL
OC
1
1
1
1
0
0
0
0
end
expand n

DVT
1
0
1
0
1
0
1
0

CHD
1
1
0
0
1
1
0
0

n
400
120
7600
1880
30
84
2970
6916

1
1
0
0

61
6
710
940

1
1
0
0

30
8
9700
9920

13

45. Loops: foreach


clear
input ///
date2 date4 date6 date8 date10 date12 date14
15500 16265 17008 17770 18518 19288 20032
14850 15590 16330 17070 17810 18550 19290
14950 15601 16430 17179 17925 18550
.
16100 16800 17500 18000 18500
.
.
17290 17903 20431 20655
.
.
.
14911 15601
. 18001
.
.
.
15120 16043
.
.
.
.
.
19999
.
.
.
.
.
.
end
format date* %td

14

You might also like