Stat A Guide
Stat A Guide
Hemanshu Kumar
August 2015
explicitly ask for the data to be saved to file. This minimizes the
chance that any mistakes you make permanently destroy your data.
Stata variables are actually entire vectors, with one value for each
observation. If you want to store a simple number or string, it is
more appropriate to use scalars. Stata also provides a simple way to
store matrices of numbers. These can all be stored in the same Stata
dataset.
If you need to give Stata a command that involves a filename and/or directory name that contains spaces, you must enclose the file/directory
name in a pair of double quotation marks ("").
Stata commands and variable names are case-sensitive.
In any command, Stata is generally insensitive to the number of continuous spaces.
Command Syntax
For the bulk of this document, we will assume that you are working interactively in Stata, by typing commands in the command window.
4.1
Getting Started
Once we have launched Stata from Windows, we take immediate note of two
things on the screen:
the status bar at the bottom mentions the current working directory
of Stata on its left corner. Say we want to use and save datasets in
the directory C:\My Documents\C003. For this, just type
.cd "C:\My Documents\C003"
in the command window, without the initial . I will include the dot
at the beginning of each command, since this is the way the command
2
4.2
Getting Help
Stata has an extensive help system. You can get help on Stata commands
at any time by typing
.help commandname
As an example, try asking for help on matsize. In addition, you can
search Statas help system for any word(s) of your choice. For example, try
.search memory
In the help on a command, you will notice a portion of the command
underlined. This tells you the extent to which you can abbreviate that command in Stata. For example, in setting memory allocation above, we could
have typed just
.set mem 100m
.set mat 300
4.3
5
5.1
Arguably the most professional way to use Stata is to do all your work
with do files. A do file is nothing but a text file that contains a series of
commands. When the file is run in Stata, the commands are processed
together in a batch. And since the do file is a simple text file, it can be
opened by any text editor, such as Windows native Notepad or Wordpad
programs. I personally prefer to use a program called WinEdt.
However, Stata has its own do file editor as well, and to pull it up, you
can just type
.doedit
in the command window, or alternatively click on the envelope icon in
the toolbar near the top of Statas main window. As an example of our first
4
5.2
Comments
5.3
Backward Compatibility
Since new versions of Stata are constantly coming out, it is quite conceivable
that a do file you write today may not work a few months or years down
the line. The solution is to specify the Stata version you created the file in,
at the top of the file, using the version command. This is what you see in
the do file example in section 5.1 above.
5.4
Clearing Memory
The clear command is a very useful command that also often finds place
at the beginning of do files. It simply clears the memory of all variables
and observations, as well as other Stata structures such as scalars, matrices,
labels, equations, and so on.
Generating Data
In this guide, we will consider a situation where you are interested in creating
a dataset from scratch, rather than using a pre-existing one.
Stata thinks of its dataset as being comprised of several variables, all of
which have the same number of observations. In a spreadsheet or matrix
5
representation, the variables comprise the columns and the observations sit
on individual rows. The first thing to do when generating a dataset is to
tell Stata how many observations the variables will have.
.set obs #
where # should be replaced with the requisite positive integer. This
command is usually given when there are no pre-existing variables in memory.1
Remember that Stata works with data in memory. Until you explicitly ask
Stata to save the data, no change will be made to any file on disk. Saving
your dataset in Statas own proprietory format is simplicity itself. Suppose
you want to save to a file called mywork.dta in the working directory. You
need to type:
.save using mywork
Notice that we did not need to specify the .dta extension Stata adds it
automatically. If a file called mywork.dta already exists, Stata will promptly
give you an error. If you are sure you want to overwrite the existing file with
the dataset in memory, you should add the replace option to the command:
.save using mywork, replace
8.1
Browse command
Having loaded in your dataset, you will find the Variables window populated by the various variable names that were found in your data. If you
imported a text file into Stata, Stata would have converted the variable
names to small letters even if they were originally not so. By default, all
variable names in Stata are purely in small letters, and no Stata variable
name can begin with a number.
Perhaps the first thing to do is to just look at the spreadsheet of your
data. This is achieved by typing
.browse
As an aside, note that the browse command can be abbreviated to as
little as br.
1
If there are variables in memory, then this command can be used to increase the
number of observations in the dataset. In this case, the new observations will all have
missing values.
The Data Browser pane that opens up allows you to look at your data,
but not edit it. Also, while the Data Browser is open, no other commands
can be executed by Stata. These are for your protection! Stata strongly
deprecates direct editing of data; you should use commands, so that you
have a better track of what changes are made, and are forced to change
data in a consistent manner.If you are sure you want to manually change
the data, you can always use the edit command.
If you have missing observations/cells in your data, these are recorded
as a single dot (.).
8.2
Sometimes, you may wish to look at only part of your data. For example,
you might have a variable country, which stores names of various countries,
and you may wish to see only those observations for which country takes
on the value India. To do this, type
.br if country == "India"
Most commands in Stata accept the if argument. Notice that this is not
an option it does not need to be preceded by a comma. if executes a
command for those observations for which the succeeding logical expression
holds true. You should note that in a logical expression for equality, we
must use a double = sign. See help operator for more.
Instead of performing a command (such as browse) for a set of observations which satisfy a condition, if we want to execute it over some specific
range of observation numbers, we can do the following
.br in 50/l
(where l is the lowercase L). This would browse the observation numbers
from 50 to the last. (f, for the f irst observation, is also available as a special
character). Notice the use of the forward slash (/) to give an observation
range.
We could also choose to browse only a subset of the variables in our data.
Suppose our dataset had five variables, country, year, gdp, gnp, exrate,
listed in that order in our Variables window. Then
.br country year gdp gnp if year>1990
would show us the specified variables for the data from after 1990. Notice
that in a list of variables, the individual variables are separated by spaces.
Stata also allows us to abbreviate variable names as long as it can
uniquely identify a variable from its abbreviation. In addition, wildcards
such as * and ? are permitted. Thus, the same result as above could be
obtained by typing
.br c y g* if y>1990
7
You can also use - to shorten a list of variables, using the order in the
Variables window. Thus, the same result as above could also be obtained by
.br c-gnp if y>1990
8.3
Typing just
.describe
gives you a basic summary of your data, including the source dataset, the
number of observations and variables, the amount of memory in use, and
a list of all variables with their respective storage types, display formats
and labels (more later about labels). To describe only specific variables, the
syntax is
.describe varlist
where varlist is a list of variables.
For numeric variables, the inspect command provides a useful first pass
at the nature of the data: it gives a small histogram, tells you the number
of unique and missing values, and the number of values which are positive/zero/negative and integer or not.
For categorical (nominal) variables, we can quickly obtain a frequency
distribution of the data using the tabulate command.
For any variable, its values (if desired, for a specified range of observations) in the dataset can be obtained using the command list.
Basic descriptive statistics for cardinal variables can be obtained with
the summarize command. For example,
.sum gnp gdp if year<=1990
(where sum is the abbreviated version of summarize) provides the mean,
standard deviation and minimum and maximum values of gnp and gdp for
years uptil 1990. Adding the detail option to the command provides a
larger set of statistics, including quantiles and skewness and kurtosis.
9
9.1
9.2
Deleting Variables
We can use drop varlist to drop a specific list of variables, or use keep
varlist to drop except the specified list of variables.
9.3
Macros
Macros come in two types, global and local. Global macros, once defined, are
available anywhere in Stata. Local macros exist solely within the program
or do-file in which they are defined. If that program or do-file calls another
program or do-file, the local macros previously defined temporarily cease to
exist, and their existence is reestablished when the calling program regains
control. When a program or do-file ends, its local macros are permanently
deleted.
For example,
to create a local containing a value, you could type:
.local x = 4
Locals can be numeric or text strings. For example, you could type
.local y = "Hello"
The list of macros in memory at any time, and their values, can be obtained using
.macro list
The content of a local in memory at any time, and its value, can be
obtained using
.disp x y
To delete any local from memory (for example y), we can type
.local drop y
10
10.1
We can do simple math on the command line by using the display command. display will simply output the result of the computation for us. For
example, we can ask Stata to
.display 5 + ln((3-1)/2)
and Stata will just output 5.
10.2
Using Scalars
10