Matlab Notes 2019
Matlab Notes 2019
Kevin Sheppard
University of Oxford
1
Formerly Financial Econometrics MFE MATLAB Notes.
2
• Working with heterogeneous data using tables, which provides a data structure that can easily han-
dle heterogeneous data (e.g., strings, numbers and dates)
• varfun, rowfun, findgroups and splitapply for computing function on grouped data
• The new datetime format as a replacement for serial dates, including durations and calendarDurations
for working with datetimes and NaT (not a time)
• Moving Statistics Functions: Calculate moving statistics using the movmean, movsum, movmedian, movmax,
movmin, movvar, and movstd functions
• bsxfun, which provides a high-performance method to perform math on arrays with different di-
mensions, has been added to chapter 6.
• nan, which similar to zeros and ones, has been added to chapter 5.
• A new chapter (22) containing an extensive set of complete examples has been added.
• A new chapter covering the basics of parallel programming in MATLAB has been added. This chapter
covers two scenarios. The first considers parallel coding when the parallel toolbox is available and
the second discusses methods to achieve simple parallelism using the file system.
• All code has been tested on R2012a, the current release at the time of writing.
• A major rewrite of chapter 14 on importing data. MATLAB’s importer has improved substantially
over the past few years and importing data is now much simpler. The chapter also covers some
useful improvements to load which allow for selective loading from a mat file containing more than
one variable.
• Combined the chapters covering control flow with the chapter covering loops for both improved
organization and a reduction in the chapter count.
1 Introduction to MATLAB 1
1.1 The Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Demos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Basic Input 7
2.1 Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Entering Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Entering Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Higher Dimension Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Empty Matrices ([]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7 Accessing Elements of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8 Calling Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Basic Math 15
3.1 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Matrix Addition (+) and Subtraction (-) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Matrix Multiplication (*) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Matrix Left Division (\) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Matrix Right Division (/) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.6 Matrix Exponentiation (^) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.7 Parentheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.8 Dot (.) Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.9 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.10 Operator Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Basic Functions 21
4.1 Moving window functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
viii CONTENTS
4.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6 Matrix Functions 35
6.1 Matrix Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 Broadcastable Operations: bsxfun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.3 Linear Algebra Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
8 Logical Operators 41
8.1 >, >=, <, <=, == , ∼= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.2 & (AND), | (OR) and ∼ (NOT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.3 Logical Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.4 Logical Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
9 Control Flow 47
9.1 Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
9.2 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.3 Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
9.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10 Graphics 57
10.1 Support Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
10.2 2D Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
10.3 3D Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
10.4 Multiple Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10.5 Advanced Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
10.6 Exporting Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
12 String Manipulation 85
12.1 String Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
12.2 String Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
CONTENTS ix
12.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
19 Optimization 131
19.1 Unconstrained Derivative-based Optimization . . . . . . . . . . . . . . . . . . . . . . . . 132
19.2 Unconstrained Derivative-free Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 133
19.3 Bounded scalar optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
19.4 Constrained Derivative-based Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 134
x CONTENTS
22 Examples 153
22.1 Estimating the Parameters of a GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . 153
22.2 Estimating the Risk Premia using Fama-MacBeth Regressions . . . . . . . . . . . . . . . . 157
22.3 Estimating the Risk Premia using GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
22.4 Outputting LATEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Introduction to MATLAB
These notes provide an introduction to MATLAB with an emphasis on the tools most useful in economet-
rics and statistics. All topics relevant to the MFE curriculum should be covered but if any relevant topic is
missing or under-explained, please let me know and I’ll add examples as necessary.
This set of notes follows a few conventions. Typewriter font is used to denote MATLAB commands
and code snippets. MATLAB keywords such as if, for and break are highlighted in blue and existing MAT-
LAB functions such as sum, abs and plot are highlighted in cyan. In general, both keywords and standard
function names should not be used for variable names, although only keywords are formally excluded
from being redefined. Strings are highlighted in purple, and comments are in green. The double arrow
symbol >> is used to indicate the MATLAB command prompt – it is also the symbol used in the MATLAB
command window. Math font is used to denote algebraic expressions.
For more information on programming in MATLAB, see MATLAB: An Introduction with Applications by
Amos Gilat (ISBN:0470873736), Matlab: A Practical Introduction to Programming and Problem Solving by
Stormy Attaway (ISBN: 0128045256) or Mastering MATLAB 8 by Bruce L. Littlefield and Duane C. Hansel-
man (ISBN: 0136013309). The first book provides more examples for beginners, the second is similar to
this set of notes while the final is comprehensive, ranging from basic concepts to advanced applications,
and was the first book I used – back when the title was Mastering MATLAB 5.
Figure 1.1 contains an image of the main MATLAB window. There are three sub-windows visible. The
command window, labeled A, is where commands are entered, functions are called and m-files – batches
of MATLAB commands – are run. The current directory window, labeled B, shows the files located in the
current directory. Normally these will include m- and data-files. On the left side of the command window
is the workspace (C), which contains a list of the variables in memory, such as data loaded or variables
entered in the command window. The workspace can be customized using the Home>Layout, and other
available panes include command history or an integrated editor.
MATLAB contains a syntax-aware editor that highlights code to improve readability, provides limited error
checking and encourages best practices, such as using a semicolon at the end of each statement. The
2 Introduction to MATLAB
Figure 1.1: Basic MATLAB Window. The standard setup has four panes. 1: The Command Window, 2:
Current Directory, 3: Workspace, and 4: Command History
editor can be launched from the main window in one of two ways, either by clicking Home>New Script or
entering edit into the command window directly. Figure 1.2 contains an example of the editor and shows
the syntax highlighting.
M-files may contain either lists of commands or complete functions (but not both).1 M-file names
can include letters, numbers, and underscores, although they must begin with a letter. Names should be
distinct from reserved words (if, else, for, end, while, . . .) and existing function names (mean, std, var,
cov, sum, . . .). To verify whether a name is already in use, the command which filename can be used to list
the file which would be executed use if filename was entered in the command window.2
>> which for
built-in (C:\Program Files\MATLAB\R2012a\toolbox\matlab\lang\for)
>> which mean
C:\Program Files\MATLAB\R2012a\toolbox\matlab\datafun\mean.m
>> which mymfile
’mymfile’ not found.
To check whether an existing file duplicates the name of an existing function, use the command which
filename -all to produce a list of all matching files.
>> which mean -all
C:\Program Files\MATLAB\R2012a\toolbox\matlab\datafun\mean.m
C:\Program Files\MATLAB\R2012a\toolbox\finance\ftseries\@fints\mean.m
1
MATLAB also supports the object-oriented programming paradigm, which allows for richer structure within an m-file. OOP,
while useful in large, complex code bases, requires a deeper understanding of programming and is not essential for solving
econometric problems.
2
The exact path will depend on both the the version of MATLAB used and the underlying operating system.
1.2 The Editor 3
Figure 1.2: MATLAB editor. The editor is a useful programming tool. It can be used to create batch files
or custom functions (both called m-files). Note the syntax highlighting emphasizing the different types of
commands and data.
C:\Program Files\MATLAB\R2012a\toolbox\stats\stats\@ProbDistUnivParam\mean.m
C:\Program Files\MATLAB\R2012a\toolbox\matlab\timeseries\@timeseries\mean.m
When multiple files exist on the MATLAB path with the same name, the first listed will be executed.
4 Introduction to MATLAB
The semicolon (;) is used at the end of a line to suppress the display of the result of a command. To
understand the effect of a ;, examine the result of these two commands,
>> x=ones(3,1);
>> x=ones(3,1)
x =
1
1
1
It is generally a good idea to suppress the output of commands, although in certain cases, such as debug-
ging or examining the output of a particular command, it may be useful to omit the semicolon until the
code is performing as expected.
Comments
Comments assist in tracking completed tasks, documenting unique approaches to solving a difficult prob-
lem and are useful if the code needs to be shared. The percentage symbol (%) is used to identify a comment.
When a % is encountered, processing stops on the current line and continues on the next line. Block com-
ments are not supported and so comment blocks must use a % in front of each line.
% This is the start of a
% comment block.
% Every line must have a %
% symbol before the comment
. . . (dot-dot-dot)
. . . is a special expression that can be used to break a long code expression across multiple lines in an m-
file. . . . concatenates the next line onto the end of the present line when processing, and exists purely to
improve the readability of code. These two expressions are identical to the MATLAB interpreter.
x = 7;
x = x + x * x - x + exp(x) / log(x) * sqrt(2*pi);
x = 7;
x = x + x * x - x ...
+ exp(x) / log(x) * sqrt(2*pi);
1.3 Help
MATLAB contains a comprehensive help system which is available both in the command window and in a
separate browser. The browser-based help is typically more complete and is both indexed and searchable.
Two types of help are available from the command line: toolbox and function. Toolbox help returns
a list of available functions in a toolbox. It can be called by help toolbox where toolbox is the short name
1.4 Demos 5
of the toolbox (e.g. stats, optim, etc.). help, without a second argument, will produce a list of toolboxes.
while function specific help can be accessed by calling help function (e.g. help mean).
The help browser can be accessed by hitting the F1 key, selecting Help>Full Product Family Help at the top of
the command window, or entering doc in the command window. The documentation of a specific function
can be directly accessed using doc function (e.g. doc mean).
1.4 Demos
MATLAB contains an extensive selection of demos. To access the list of available demos, enter demo in the
command window.
1.5 Exercises
2. Launch the help browser and read the section MATLAB, Getting Started, Introduction.
4. Enter demo in the command window and play with some of the demos. The demos in the Graphics
section are particularly entertaining.
6 Introduction to MATLAB
Chapter 2
Basic Input
MATLAB does not require users to directly manage memory and so variables can be input with no setup.
The generic form of a MATLAB expression is
and expressions are processed by assigning the value on the right to the variable on the left. For instance,
x = 1;
y = x;
x = exp(y);
are all valid assignments for x. The first assigns 1 to x, the second assigns the value of another variable, y,
to x and the third assigns the output of exp(y) to x. Assigning one variable to another assigns the value of
that variable, not the variable itself – changes to y will not be reflected in the value of x in y = 1 and x = y.
>> y = 1;
>> x = y;
>> x
x =
1
>> y = 2;
>> x
x =
1
>> y
y =
2
Variable names can take many forms, although they can only contain numbers, letters (both upper and
lower), and underscores (_). They must begin with a letter and are CaSe SeNsItIve. For example,
x
X
X1
8 Basic Input
X_1
x_1
dell
dell_returns
2.1.1 Keywords
Like all programming languages, MATLAB has a list of reserved keywords which cannot be used as variable
names. The current list of keywords is
break case catch classdef continue else elseif end for function
global if otherwise parfor persistent return spmd switch try while
Most data structures used in MATLAB are matrices by construction, even if they are 1 by 1 (scalar), K by
1 or 1 by K (vectors).1 Vectors, both row (1 by K ) and column (K by 1), can be entered directly into the
command window. The mathematical expression
x = [1 2 3 4 5]
is entered as
>> x=[1 2 3 4 5];
In the above input, [ and ] are reserved symbols which are interpreted as begin array and end array,
respectively. The column vector,
1
2
x = 3
4
5
where ; is interpreted as new row when used inside square brackets ([ ]).
1
An important exception to the “everything is a matrix” rule occurs in cell arrays, which are matrices composed of other
matrices (formally arrays of arrays or ragged (jagged) arrays). See chapter 13 for more on the use of and caveats to cell arrays.
2.3 Entering Matrices 9
Matrices are essentially a column vector composed of row vectors. For instance, to construct
1 2 3
x = 4 5 6 ,
7 8 9
enter the matrix one row at a time, separating the rows with semicolons,
>> x = [1 2 3 ; 4 5 6; 7 8 9];
Note that it is not necessary to use brackets to denote the inner row vectors.
Multi-dimensional (N -dimensional) arrays are available for N up to about 30, depending on the size of
each matrix dimension. Higher dimensional arrays are particularly useful for storing panel data – time
series of cross-sections, such as a time-varying covariance. Unlike scalars, vectors and matrices, higher
dimension arrays cannot be directly allocated and are typically constructed by calling functions such as
zeros(2, 2, 2).
An empty matrix contains no elements, x = []. Empty matrices may be returned from functions in certain
cases (e.g. if some criteria is not met). Empty matrices often cause problems, occasionally in difficult
to predict ways, although they do have some useful applications. First, they can be used for lazy vector
construction using repeated concatenation. For example,
>> x=[]
x =
[]
>> x=[x 1]
x =
1
>> x=[x 2]
x =
1 2
>> x=[x 3]
x =
1 2 3
is a legal operation that builds a 3-element vector by concatenating the previous value with a new value.
This type of concatenation is bad from a code performance point-of-view and so it should generally be
avoided by pre-allocating the data array using zeros (see page 33), if possible. Second, empty matrices
are needed for calling functions when multiple inputs are required but some are not used. For example,
std(x,[],2) uses x as the first argument, 2 as the third and leaves the second empty.
10 Basic Input
2.6 Concatenation
Concatenation is the process by which one vector or matrix is appended to another. Both horizontal and
vertical concatenation are possible. For instance, suppose
" # " #
1 2 5 6
x = and y = .
3 4 7 8
Suppose
" #
x
z = .
y
needs to be constructed. This can be accomplished by treating x and y as elements of a new matrix.
>> x=[1 2; 3 4]
x =
1 2
3 4
>> y=[5 6; 7 8]
y =
5 6
7 8
This is an example of vertical concatenation. x and y can be horizontally concatenated in a similar fashion:
>> z=[x y]
z =
1 2 5 6
3 4 7 8
Note that concatenating is the code equivalent of block-matrix forms in standard matrix algebra.
Once a vector or matrix has been constructed, it is important to be able to access the elements individually.
Data in matrices is stored in column-major order. This means elements are indexed by first counting down
rows and then across columns. For example, in the matrix
1 2 3
x = 4 5 6
7 8 9
2.7 Accessing Elements of Matrices 11
the first element of x is 1, the second element is 4, the third is 7, the fourth is 2, and so on.
Elements can be accessed by element number using parenthesis (x(#)). After defining x, the elements
of x can be accessed
>> x=[1 2 3; 4 5 6; 7 8 9]
x =
1 2 3
4 5 6
7 8 9
>> x(1)
ans =
1
>> x(2)
ans =
4
>> x(3)
ans =
7
>> x(4)
ans =
2
>> x(5)
ans =
5
The single index notation works well if x is a vector, in which case the indices correspond directly to
the order of the elements. However single index notation can be tedious when x is a matrix, and double
indexing of matrices is available using the notation x(r,c) where r and c are the row and column indices,
respectively.
>> x(1,1)
ans =
1
>> x(1,2)
ans =
2
>>x(1,3)
ans =
3
>> x(2,1)
ans =
4
>> x(3,3)
ans =
9
Higher dimension matrices can also be accessed in a similar manner using one index for each dimension,
x(#, #, #). For example, x(1,2,3) would return the element in the first row of the second column of the
third panel.
The colon operator (:) plays a special role in accessing elements. It is interpreted as all elements in
that dimension. For example, x(:,1), returns all elements from matrix x in column 1. Similarly, x(2,:)
12 Basic Input
returns all elements from x in row 2. Double : notation produces all elements of the original matrix –
x(:,:) returns x. Finally, vectors can be used to access elements of x. For instance, x([1 2],[1 2]), will
return the elements from x in rows 1 and 2 and columns 1 and 2, while x([1 2],:) will returns all columns
from rows 1 and 2 of x.
>> x(1,:)
ans =
1 2 3
>> x(2,:)
ans =
4 5 6
>> x(:,:)
ans =
1 2 3
4 5 6
7 8 9
>> x
x =
1 2 3
4 5 6
7 8 9
>> x([1 2],[1 2])
ans =
1 2
4 5
>> x([1 3],[2 3])
ans =
2 3
8 9
>> x([1 3],:)
ans =
1 2 3
7 8 9
end
end is a keyword which has a number of uses. One of the uses is to automatically select the final element
in an array when using a slice. Suppose x is a 2 by 3 matrix. x(1,2:end) is the same as x(1,2:3). The
advantage of end is that it will automatically select the last index in a particular dimension without needing
to know the array size.
Functions calls have different conventions other expressions. The most important difference is that func-
tions can take more than one input and return more than one output. The generic structure of a function
call is [out1, out2, out3, . . .]=functionname(in1, in2, in3, . . .). The important aspects of this structure are
• If only one output is needed, brackets ([ ]) are optional, for example y=mean(x).
2.9 Exercises 13
• If multiple outputs are required, the outputs must be encapsulated in brackets, such as in
[y, index] = min(x).
• The number of output variables determines how many outputs will be returned. Asking for more
outputs than the function provides will result in an error.
• Inputs can be the result of other functions as long as only the first output is required. For example,
the following are equivalent,
y = var(x);
mean(y)
and
mean(var(x))
• Inputs can contain only selected elements of a matrix or vector (e.g. mean(x([1 2] ,[1 2]))).
2.9 Exercises
u = [1 1 2 3 5 8]
1
1
2
v =
3
5
8
" #
1 0
x =
0 1
" #
1 2
y =
3 4
1 2 1 2
z = 3 4 3 4
1 2 1 2
" #
x x
w =
y y
3. What command would pull [x; y] out of w? Is there more than one? If there are, list all alternatives.
Basic Math
Mathematical operations in MATLAB code closely follow the rules of linear algebra. Operations legal in
linear algebra are legal in MATLAB; operations that are not legal in linear algebra are not legal in MATLAB.
For example, matrices must be conformable along their inside dimensions to be multiplied – attempting
to multiply nonconforming matrices produces an error.
3.1 Operators
When x and y are scalars, the behavior of these operators is obvious. When x and y are matrices, things
are a bit more complex.
Addition and subtraction require x and y to have the same dimensions or to be scalar. If they are both
matrices, z=x+y produces a matrix with z(i,j)=x(i,j)+y(i,j). If x is scalar and y is a matrix, z=x+y results
in z(i,j)=x+y(i,j).
Suppose z=x+y:
16 Basic Math
y
Scalar Matrix
Any Any
Scalar
z =x+y z i j = x + yi j
x
Any Both Dimensions Match
Matrix
z i j = y + xi j z i j = xi j + yi j
These conform to the standard rules of matrix addition and subtraction. xi j is the element from row i and
column j of x .
Multiplication requires the inside dimensions to be the same or for one input to be scalar. If x is N by M
and y is K by L and both are non-scalar matrices, x*y requires M = K . Similarly, y*x requires L = N . If x
is scalar and y is a matrix, then z=x*y produces z(i,j)=x*y(i,j).
Suppose z=x*y:
y
Scalar Matrix
Any Any
Scalar
z = xy z i j = x yi j
x
Any Inside Dimensions Match
Matrix
z i j = y xi j zi j = M
P
k =1 x i k yk j
Matrix division is not defined in linear algebra. The intuition for the definition of matrix division in MAT-
LAB follows from solving a set of linear equations. Suppose there is some z , a M by L vector, such that
xz = y
where x is N by M and y is N by L . Division finds z as the solution to this set of linear equations by least
squares, and so z = (x 0 x )−1 (x 0 y ).
Suppose z=x\y:
y
Scalar Matrix
Any Any
Scalar y y
z = x z i j = xi j
x
N/A Left Dimensions Match
Matrix
– z = (x 0 x )−1 x 0 y
Note: Like linear regression, matrix left division is only well defined if x is nonsingular (has full rank).
3.5 Matrix Right Division (/) 17
Matrix right division is simply the opposite of matrix right division, and z = y/x is identical to z = (x’\y’)’,
and so there is little reason to use matrix right division. Suppose z = y/x:
y
Scalar Matrix
Any Any
Scalar y y
z = x z i j = xi j
x
N/A Right Dimensions Match
Matrix
– z = y 0 x (x 0 x )−1
y
Scalar Matrix
Any y Square
Scalar
z = xy Not useful
x
x Square N/A
Matrix
z = xy
In the case where x is a matrix and y is an integer, and z=x*x* . . . *x (y times). If y is not an integer, this
function involves eigenvalues and eigenvalues.1
3.7 Parentheses
Parentheses can be used in the usual way to control the order in which mathematical expressions are
evaluated, and can be nested to create complex expressions. See section 3.10 on Operator Precedence for
more information on the order mathematical expressions are evaluated.
The . operator (read dot operator) changes matrix operations into element-by-element operations. Sup-
pose x and y are N by N matrices. z=x*y results in usual matrix multiplication where z(i,j) = x(i,:) * y(:,j),
while z = x .* y produces z(i,j) = x(i,j) * y(i,j). Multiplication (.*), division (./), right division (.\),
and exponentiation (.∧ ) all have dot forms.
z=x.*y z(i,j)=x(i,j)*y(i,j)
z=x./y z(i,j)=x(i,j)/y(i,j)
z=x.\y z(i,j)=x(i,j)\y(i,j)
z=x.^y z(i,j)=x(i,j)^y(i,j)
1
If x is a scalar and y is a real symmetric matrix, then x^y is defined as V * diag(x.^diag(D))*V’ where V is the matrix
of eigenvectors and D is a diagonal matrix containing the corresponding eigenvalues of y.
18 Basic Math
3.9 Transpose
Matrix transpose is expressed using the ’ operator. For instance, if x is an M by N matrix, x’ is its transpose
with dimensions N by M .
Computer math, like standard math, has operator precedence which determined how mathematical ex-
pressions such as
2^3+3^2/7*13
In the case of a tie, operations are executed left-to-right. For example, x^y^z is interpreted as (x^y)^z.
Unary operators are + or - operations that apply to a single element. For example, consider the ex-
pression (-4). This is an instance of a unary - since there is only 1 operation. (-4)^2 produces 16. -4^2
produces -16 since ∧ has higher precedence than unary negation and so is interpreted as -(4^2). -4 * -4
produces 16 since it is interpreted as (-4) * (-4) because unary negation has a higher precedence than
multiplication.
3.11 Exercises
3. Compute the values (x+y)^2 and x^2+x*y+y*x+y^2. Are they the same?
6. Is a*b+a*c the same as a*b+c? If so, show it, if not, how can the second be changed so they are equal.
7. Suppose a command x^y*w+z was entered. What restrictions on the dimensions of w, x, y and x must
be true for this to be a valid statement?
Basic Functions
This chapter discusses a set of core functions which are frequently encountered.
length
length returns the size of the maximum dimension of a matrix. If y is T by K , T > K , then length(x) is
T . If K > T , the length is K . Using length is risky since the value returned can be either the number of
columns or the number of rows, depending on which is larger.1 In practice, size should be used since the
dimension can be explicitly provided.
>> x=[1 2 3; 4 5 6]
x =
1 2 3
4 5 6
>> length(x)
ans =
3
>> length(x’)
ans =
3
size
size returns the size of either a particular dimension or the entire array. To determine the size of a partic-
ular dimension, use z=size(x,DIM ), where DIM is the dimension. Dimension 1 corresponds to rows and
dimension 2 is columns, so if x is T by K , z=size(x,1) returns T while z=size(x,2) returns K . Alterna-
tively, s=size(x) returns a vector s with the size of each dimension. size can also be used with as many
outputs as dimensions (the jth output contains the length of the jth dimension).
>> x=[1 2 3; 4 5 6]
x =
1 2 3
1
When used on higher dimensional arrays, length(x) is the same as max(size(x)) and so it returns the maximum di-
mension size across the entire array.
22 Basic Functions
4 5 6
>> size(x,1)
ans =
2
>> size(x,2)
ans =
3
>> size(x’,1)
ans =
3
>> s=size(x)
s =
2 3
>> [m,n] = size(x)
m =
2
n =
3
sum
T
X
z = xt .
t =1
min, max
min xi t , i = 1, 2, . . . , K ,
column-by-column (max is identical to min, only computing the maximum). If x is a vector, min(x) is scalar.
If x is a matrix, min(x) is a K by 1 vector containing the minimum values of each column.
>> x=[1 2 3; 4 5 6]
x =
1 2 3
4 5 6
>> min(x)
ans =
1 2 3
>> min(x’)
ans =
1 4
min(x,[],DIM ) can be used with an optional 3rd input to indicate the dimension to compute the mini-
mum across (e.g. min(x,[],1) for columns, min(x,[],2) for rows). The DIM argument occurs in the 3rd
position since there is a rarely used 2-input form of min which computes the minimum of 2 matrices (with
the same size) or of a matrix and a scalar. Both min and max can be used with a 2nd output to return the
index or indices of the smallest and largest elements, respectively.
>> x=[1 5 3; 4 2 6];
>> [minX, ind] = min(x)
minX =
1 2 3
ind =
1 2 1
prod
T
Y
z = xt .
t =1
x =
1 2 3
4 5 6
>> prod(x)
ans =
4 10 18
>> prod(x’)
ans =
6 120
>> prod(x,2)
ans =
6
120
i
X
xi j = xk j .
k =1
cumsum(x,DIM ) changes the dimension used to compute the cumulative sum. cumprod is identical to
cumsum, only computing the cumulative product,
i
Y
xi j = xk j .
k =1
cummax(x) and cummin(x) compute the cumulative max and minimum of an array, respectively.
sort
sort orders the values in a vector or the rows of a matrix from smallest to largest. If x is a vector, sort(x)
is vector where x(1)=min(x) and x(i)≤x(i+1). If x is a matrix, sort(x) is a matrix of the same size where
the sort is performed column-by-column.
>> x=[1 5 2; 4 3 6]
x =
1 5 2
4 3 6
>> sort(x)
ans =
1 3 2
4 5 6
>> sort(x’)
ans =
1 3
2 4
25
5 6
sort(x,DIM ) can be used to change the dimension of the sort. sort can be used with a second input
to output a lit of the indices used to sort. This is especially useful when one matrix needs to be sorted
according to the data in another matrix.
>> x=[9 1 8 2 7 3 6 4 5];
>> [sortedX,ind] = sort(x)
sortedX =
1 2 3 4 5 6 7 8 9
ind =
2 4 6 8 9 7 5 3 1
>> y = x;
>> y(ind)
y =
1 2 3 4 5 6 7 8 9
The related command sortrows can be used to perform a lexicographic sort or a matrix, which first
sorts the first column, then the second column for those rows with the same value in the first column, and
so on.
>> x=[1 5 2; 4 3 6; 4 1 6]
x =
1 5 2
4 3 6
4 1 6
>> sortrows(x)
ans =
1 5 2
4 1 6
4 3 6
Like sort, sortrows can be used with a 2nd output to produce a vector containing the indices used in the
sort.
exp
e x.
>> x=[1 2 3; 4 5 6]
x =
1 2 3
4 5 6
>> exp(x)
ans =
2.7183 7.3891 20.0855
54.5982 148.4132 403.4288
26 Basic Functions
log
ln x .
>> x=[1 2 3; 4 5 6]
x =
1 2 3
4 5 6
>> log(x)
ans =
0 0.6931 1.0986
1.3863 1.6094 1.7918
sqrt
√
xi j
>> x=[1 2 3; 4 5 6]
x =
1 2 3
4 5 6
>> sqrt(x)
ans =
1.0000 1.4142 1.7321
2.0000 2.2361 2.4495
Note: This command produces the same result as dot-operator command z=x.^(1/2).
mean
PT
t =1 xt
z =
T
If x is a T by K matrix, z=mean(x) is a K by 1 vector containing the means of each column, so
z(i) = sum(x(:,i)) / size(x,1). mean(x, DIM ) can be used to alter the dimension used.
Warning: When x is a vector, mean behaves like sum and so will compute the mean of the vector.
>> x=[1 2 3; 4 5 6]
x =
1 2 3
27
4 5 6
>> mean(x)
ans =
2.5000 3.5000 4.5000
>> mean(x’)
ans =
2 5
var
PT
2 t =1 (x t − x̄ )2
σ̂ =
T −1
If x is a vector, var(x) is scalar. If x is a matrix, var(x) is a K by 1 vector containing the sample variances of
each column. var(x,[], DIM ) can be used to alter the dimension used. Note: This command uses T − 1
in the denominator by default. This behavior can be altered using an optional second argument.
>> x=[1 2 3; 4 5 6]
x =
1 2 3
4 5 6
>> var(x)
ans =
4.5000 4.5000 4.5000
>> var(x’)
ans =
1 1
cov
T
1 X
Σ̂ = (xt − x̄)0 (xt − x̄).
T −1
t =1
If x is a vector, cov(x) is scalar (and is identical to var(x)). If x is a matrix, cov(x) is a K by K matrix with
sample variances in the diagonal elements and sample covariances in the off-diagonal elements. Note:
Like var, cov uses T − 1 in the denominator unless an optional second argument is used.
x =
1 2 3
4 5 6
>> cov(x)
ans =
4.5000 4.5000 4.5000
4.5000 4.5000 4.5000
28 Basic Functions
std
skewness
PT
t =1 (x t − x̄ )
3
T
skew = .
σ̂3
If x is a vector, skewness(x) is scalar. If x is a matrix, skewness(x) is a K by 1 vector containing the sample
skewness of each column. skewness(x,[],DIM ) changes the dimension used.
>> x=[1 2 3; 4 5 6]
x =
1 2 3
4 5 6
>> skewness(x)
ans =
0 0 0
>> skewness(x’)
ans =
0 0
4.1 Moving window functions 29
kurtosis
T
κ= .
σ̂4
If x is a vector, kurtosis(x) is scalar. If x is a matrix, kurtosis(x) is a K by 1 vector containing the sample
kurtosis of each column. kurtosis(x,[],DIM ) changes the dimension used.
>> x=[1 2 3; 4 5 6]
x =
1 2 3
4 5 6
>> kurtosis(x)
ans =
1 1 1
>> kurtosis(x’)
ans =
1.5000 1.5000
The most common statistics functions are available in moving window versions which will compute the
function using all blocks of data within an array. There are moving window versions of mean, median,var,
std, max, min and sum. These all follow the patterm movfunc where func is one of the previously names
functions, for example, movmean. The basic use of these functions requires the specification of the window
length, and the function will be computed for all contiguous blocks with this length.
>> x=[1 7 2 10 0 -1];
>> movmean(x, 3)
ans =
4.0000 3.3333 6.3333 4.0000 3.0000 -0.5000
>> movmax(x, 4)
ans =
7 7 10 10 10 10
4.2 Exercises
1. Load the MATLAB data file created in the Chapter 14 exercises and compute the mean, standard
deviation, variance, skewness and kurtosis of both returns (SP500 and XOM).
2. Create a new matrix, returns = [SP500 XOM]. Repeat exercise 1 on this matrix.
4. Find the max and min of the variable SP500 (see the Chapter 14 exercises). Create a new variable
SP500sort which contains the sorted values of this series. Verify that the min corresponds to the
first value of this sorted series and the max corresponds to the last Hint: Use length or size.
30 Basic Functions
Chapter 5
: operator
The : operator has multiple uses. The first allows elements in a matrix or vector to be accessed ((e.g.
x(1,:) as previously described). The second allows a matrix to be collapsed into a column vector (e.g.
x(:), which is identical to reshape(x,prod(size(x)),1)). The final constructs row vectors with evenly
spaced points. In this context, the : operator has two forms, first:last and first:increment:last. The basic
form, first:last, produces a row vector of the form
where N is the largest integer such that first+N ≤last. When first and last are both integers and first ≥last,
then N =last−first. These examples demonstrate the use of the : operator.
>> x=1:5
x =
1 2 3 4 5
>> x=1:3.5
x =
1 2 3
>> x=-4:6
x =
-4 -3 -2 -1 0 1 2 3 4 5 6
The second form for the : operator includes an increment. The resulting sequence will have the form
where N is the largest integer such that first+N(increment)≤last. Consider these two examples:
>> x=0:.1:.5
x =
0 0.1000 0.2000 0.3000 0.4000 0.5000
>> x=0:pi:10
x =
32 Special Vectors and Matrices
>> x=-1:-1:-5
x =
-1 -2 -3 -4 -5
>> x=0:-pi:-10
x =
0 -3.1416 -6.2832 -9.4248
linspace
linspace is similar to the : operator. Rather than producing a row vector with a predetermined increment,
linspace produces a row vector with a predetermined number of nodes. The generic form is linspace(
lower , upper , N ) where lower and upper are the two bounds of the series and N is the number of points
to produce.
If inc is defined as δ=(upper-lower)/(N -1), the resulting sequence will have the form
where lower+(N − 1)δ is by construction equal to upper. This, the command linspace(lower,upper,N)
will produces the same output as lower:(upper-lower)/(N-1):upper.
Recall that : is a low precedence operator, and so operations involving : should always be enclosed in
parenthesis when used with other mathematical expressions. Failure to do so can result in undesirable or
unexpected behavior. For example, consider
>> N=4;
>> lower=0;
>> upper=1;
>> linspace(lower,upper,N)-(lower:(upper-lower)/(N-1):upper) % Correct
ans =
1.0e-015 *
0 0 -0.1110 0
>> linspace(lower,upper,N)-lower:(upper-lower)/(N-1):upper % Unexpected
ans =
0 0.3333 0.6667 1.0000
>> (linspace(lower,upper,N)-lower):((upper-lower)/(N-1)):upper
which first generates a sequence, and then uses the colon operator with the sequence as the first argument
– which is not the correct method to produce a sequence using :.
5.1 Exercises 33
logspace
>> logspace(0,1,4)
ans =
1.0000 2.1544 4.6416 10.0000
zeros
zeros generates a matrix of 0s and is generally called with two arguments, the number of rows and the
number of columns.
>> M = 2; N = 5;
>> x = zeros(M,N)
will generate a matrix of 0s with N rows and M columns. zeros(M,N) and zeros([M N]) are equivalent –
the latter is more useful if the number of dimensions depends on data or some other input. zeros can also
be used with more than 2 inputs to create 3- or higher-dimensional arrays.
ones
ones produces a matrix of 1s in the same way zeros produces a matrix of 0s.
x = ones(M,N)
eye
eye generates an identity matrix (matrix with ones on the diagonal, zeros everywhere else). An identity
matrix is always square so it only takes one argument.
In = eye(N)
nan
nan produces a matrix populated with NaNs (see Ch. 7) in the same way zeros produces a matrix of 0s.
nan is useful for initializing a matrix for storing missing values where the missing values are left as NaNs.
5.1 Exercises
1. Produce two matrices, one containing all zeros and one containing only ones, of size 10 × 5.
3. Produce an identity matrix of size 5. Take the exponential of this matrix, element-by-element.
34 Special Vectors and Matrices
5. Using both the : operator and linspace, create the sequence 0, 0.01, 0.02, . . . , .99, 1.
6. Create a custom logspace using the natural log (base e ) rather than the logspace created in base 10
(which is what logspace uses). Hint: Use linspace AND exp.
Chapter 6
Matrix Functions
Some functions operate exclusively on matrix inputs. These functions can be broadly categorized as either
matrix manipulation functions – functions which alter that structure of an existing matrix – or mathemat-
ical functions which are only defined for matrices such as the computation of eigenvalues and eigenvec-
tors.
repmat
repmat replicates a matrix according to a specified size vector. The generic form of repmat is repmat(X ,
M , N ) where X is the matrix to be replicated, M is the number of rows in the new block matrix, and N is
the number of columns in the new block matrix. For example, suppose X was a matrix
" #
1 2
X =
3 4
>> x = [1 2; 3 4];
>> y = [x x x; x x x];
>> y = repmat(x,2,3);
repmat has two clear advantages over manual allocation. First, repmat can be executed using on some
parameters determined at run-time, such as the number of explanatory variables in a model. Second,
repmat can be used for arbitrary dimensions. Manual matrix construction is tedious and error prone with
as few as 4 rows or columns.
36 Matrix Functions
reshape
reshape transforms a matrix with one set of dimensions to one with a different set as long and the number
of elements does not change (and cannot change). reshape can transform an M by N matrix x into an K
by L matrix y as long as M N = K L . The most useful call to reshape switches a matrix into a vector or vice
versa. For example,
>> x = [1 2; 3 4];
>> y = reshape(x,4,1)
y =
1
3
2
4
>> z = reshape(y,1,4) % same as y’
z =
1 3 2 4
>> w = reshape(z,2,2)
w =
1 2
3 4
The crucial implementation detail of reshape is that matrices are stored using column-major notation.
Elements in matrices are indexed first down the rows of a column and then across columns. reshape will
place elements of the old matrix into the same position in the new matrix and so after calling reshape,
x (1) = y (1), x (2) = y (2), and so on.
diag
diag can be used to convert a vector to a diagonal matrix or to extract the leading diagonal from a matrix.
The behavior depends on the format of the input. If the input is a vector, diag will return a matrix con-
taining the elements of the vector along the diagonal. If the input is a matrix, diag will return a column
vector containing the elements of the leading diagonal (i.e. positions (1,1), (2,2) ... up to the smaller of the
number of rows or columns). Consider the following example:
>> x = [1 2; 3 4];
x =
1 2
3 4
>> y = diag(x)
y =
1
4
>> z=diag(y)
z =
1 0
0 4
6.2 Broadcastable Operations: bsxfun 37
bsxfun is a convenience and performance function which allows basic mathematical operations on vec-
tors which are not compatible under the rules of chapter 3. For example, suppose x is a T by K matrix
of data, and the studentized – mean 0 and variance 1 – data are needed. The first step in studentizing a
matrix, subtracting the mean from each column, can be accomplished using a variety of functions, for
example using repmat or ones:
>> x = randn(100,10);
>> meanX = mean(x);
>> demeanedX_1 = x - repmat(meanX,100,1);
>> demeanedX_2 = x - ones(100,1) * meanX
bsxfun simplifies this code by automatically performing singleton expansion. Singleton expansion ex-
pands all arrays dimensions which are 1 to be compatible with a dimension sizes which are not 1. For-
mally, singleton expansion is only possible for two arrays x and y when either statement is true for all
dimensions:
• dim(x,i) = dim(y,i)
Note that if the number of dimensions of the two arrays differ, that all “missing” dimensions have size 1.
Table 6.1 contains some examples with different array dimensions.
When two arrays are compatible, bsxfun requires 3 inputs. The first is function to use, and can either
be a string or a function handle. Common functions are ’plus’, ’minus’, times’, ’rdivide’ and ’ldivide’
(or @plus, @minus, etc.). The previous example can be rewritten using bsxfun in a single line:
While this example produces the same output as the two previous examples, the bsxfun version is
actually higher performing since bsxfun avoids allocating the full T by K matrix of the means prior to
computing the difference. When the input is small, these two will perform similarly. However, when the
input is large, bsxfun is substantially higher performing.
38 Matrix Functions
chol
chol computes the Cholesky factor of a positive definite matrix. The Cholesky factor is an upper triangular
matrix and is defined as C in
C 0C = Σ
det
|x |
eig
eig computes the eigenvalues and eigenvector of a square matrix. When used with one output (val=eig(x)),
the vector of eigenvalues is returned. When used with two ([vec,val]=eig(x)), matrices containing the
eigenvectors and eigenvalues (diagonal) are returned so that vec*val*vec’ is the same as x.
inv
inv computes the inverse of a matrix. inv(x) can alternatively be computed using x^(-1) or x\eye(length(x))
– the latter form is preferred for both performance and precision.
kron
kron computes the Kronecker product of two matrices. z = kron(x,y) implements the mathematical ex-
pression
z = x ⊗ y.
trace
trace computes the trace of a square matrix (sum of diagonal elements) and so trace(x) equals sum(diag(x)).
Chapter 7
Three special expressions are reserved to indicate certain non-numerical “values”. Inf represents infinity
and Inf is distinct from -Inf. Inf can be constructed in a number for ways, for instance, 1/0 or exp(1000).
NaN stands for Not a Number. NaNs are created whenever a function produces a result that cannot be clearly
defined as a number or infinity. For instance, inf/inf produces a NaN.
All numeric software has limited precision and MATLAB is no different. The easiest limits to under-
stand are the upper and lower limits – 1.7977 × 10308 and −1.7977 × 10308 (realmax). Numbers larger (in
absolute value) than these are Inf. The smallest non-zero number that can be expressed is 2.2251 × 10−308
(realmin). Numbers between −2.2251 × 10−308 and 2.2251 × 10−308 are numerically 0.
The most difficult concept to understand about numerical accuracy is the limited relative precision.
The relative precision of MATLAB is 2.2204 × 10−16 . This value is returned from the command eps and
may vary based on the type of CPU and/or the operating system used. Numbers which differ by a relative
range of 2.2204 × 10−16 are numerically the same. To explore the role of eps, examine the results of the
following:
>> x=1
x =
1
>> x=x+eps/2
x =
1
>> x-1
ans =
0
>> x=x+2*eps
x =
1
>> x-1
ans =
4.4408e-016
Next, consider how the order of execution matters to the final result:
>> x=1-1+eps/2
x =
1.1102e-16
>> x=1-(1+eps/2)
40 Inf, NaN and Numeric Limits
x =
0
The difference in these two expressions arises since, in the first, 1 is subtracted from 1, and then eps/2 is
added (which is distinct from 0), while in the second 1+eps/2 is numerically identical to 1, and so after
the expression in the parentheses is evaluated, the intermediate result is 1, which is the subtracted from
1 producing 0.
To better understand what is meant by relative range, consider the following output:
>> x=10
x =
10
>> x+2*eps
ans =
10
>> x-10
ans =
0
In the initial example, eps/2 < eps so it has no effect (relative to 1) while 2*eps > eps so it does. However
in the second example, 2*eps/10 < eps, and so it has no effect when added. In other words, 2*eps is
sufficiently “big” relative to 1 to create a difference, while it is not relative to 10. This is a very tricky concept
to understand, but failure to understand numeric limits can results in errors or surprising results from that
is otherwise.
7.1 Exercises
1. What is the value of log(exp(1000)) both analytically and in MATLAB? Why do these differ?
2. What is the value of eps/10?
3. Is .1 different from .1+eps/10?
3. Is 1e120 (1 × 10120 ) different from 1e120+1e102? (Hint: Test with ==)
Chapter 8
Logical Operators
Logical operators, when combined with flow control (such as if . . . else . . . end blocks, chapter 9), al-
low for complex choices to be compactly expressed. They are additionally useful for selecting subsets of
vectors of matrices which satisfy some range restrictions.
Logical operators can be used on scalars, vector or matrices. All comparisons are done element-by-
element and return either logical true (which has numeric value 1) or false (0).1 For instance, suppose x
and y are matrices of the same size. z=x < y will be a matrix of the same size as x and y composed of 0s
and 1s. Alternatively, if one is scalar, say y, then the elements of z are z(i,j)= x(i,j) < y. The following
table examines the behavior when x and/or y are scalars or matrices. Suppose z = x < y:
y
Scalar Matrix
Any Any
Scalar
z =x <y z i j = x < yi j
x
Any Both Dimensions Match
Matrix
z i j = xi j < y z i j = xi j < yi j
1
Note that true and false are known as Boolean variables and are not standard numerical values. Boolean variables are stored
using 1 byte of computer memory, while typical numerical values require 8 bytes to store. Ch. 21 provides a more detailed
description of the data types available.
42 Logical Operators
Aside from the different level of precedence (NOT (~) has higher precedence than AND (&) and OR
()), these operators follow the same rules as other logical operators, and so when used on matrices, all
dimensions must be identical. When one of the inputs is a scalar and the other is a matrix, the operator is
applied to the scalar and each element of the matrix.
Suppose x and y are logical variables (1s or 0s), and define z=x & y:
y
Scalar Matrix
Any Any
Scalar
z = x &y 2 z i j = x &yi j
x
Any Both Dimensions Match
Matrix
z i j = xi j &y z i j = xi j &yi j
AND and OR (but not NOT), can be used in both standard and short-circuit forms. Short-circuit operators
terminate as soon as the statement can be correctly evaluated and so offer higher performance, although
they can only be used with scalar logical expressions. In general, short-circuit operators should be used
when applicable.
Logical operators can be used to access a subset of the elements of a vector or matrix. Standard indexing
operates by using the numerical location (e.g. 1,2,...) of elements in a matrix. In contrast, logical indexing
essentially is a series of yes or no indicating whether a value should be selected. Logical indexing uses
Boolean values – true or false (0 or 1, but must be logical 0 or 1, not numeric 0 or 1) – as opposed to the
numeric values when using standard indexing. In essence, Logical indices behave like a series of light
switches indicating which elements to select: 1 for on (selected) and 0 for off (not selected).
>> x=[-2 0 1 2];
>> y = x<=0
y =
1 1 0 0
>> x(y)
ans =
-2 0
>> x(x~=0)
ans =
2
When both inputs are scalar, short-circuit operators (&& and ) should be used.
8.4 Logical Functions 43
-2 1 2
>> x(x>0) = -1
x =
-2 0 -1 -1
Logical indexing is very powerful when combined with other logical functions. For example, suppose
nan is used to represent missing data in an array. any(isnan(x),2) will return a logical vector indicating
whether any of the values in a row are nan, and so the negative of this statement indicates all values are
not nan. This expression can be used to remove rows with nans so that mathematical operations will not
be nan.
>> x= ones(3,3);
>> x(2,2) = nan;
>> sum(x)
ans =
3 NaN 3
>> sum(x(~any(isnan(x),2),:))
ans =
2 2 2
8.4.1 logical
The command logical is used to convert non-logical elements to logical. Logical values and regular nu-
merical values are not exactly the same. Logical elements only take up 1 byte of memory (The smallest
unit of memory MATLAB can address) while regular numbers require 8 bytes. logical is useful to convert
the standard numerical data type in MATLAB to logical values.
As previously demonstrated, the elements of a matrix x can be accessed by x(#) where # can be a
vector of indices. Since the elements of x are indexed 1,2,. . ., an attempt to retrieve x(0) will return an
error. However, if # is not a number but instead is a logical value, this behavior changes. The following
code shows how numeric indices differ from logical ones,
>> x = [1 2 3 4];
>> y = [1 1];
>> x(y) % Element number 1 twice
ans =
1 1
>> y = logical([1 1]); % True for elements 1 & 2
>> x(y)
ans =
1 2
>> y = logical([1 0 1 0]); % True for elements 1 & 3
>> x(y)
ans =
1 3
Note that logical turns any non-zero value into logical true (1), although a warning is generated if the
values differ from 0 or 1. For example
44 Logical Operators
>> x=[0 1 2 3]
x =
0 1 2 3
>> logical(x)
Warning: Values other than 0 or 1 converted to logical 1.
ans =
0 1 1 1
The commands all and any are useful for aggregating logical values. all returns logical(1) if all logical
elements in a vector are 1. If all is called on a matrix of logical elements, it works column-by-column,
returns 1 if all elements of the column are logical true and 0 otherwise. any returns logical(1) if any ele-
ment of a vector is logical true. When used with a matrix input, any operates column-by-column, returning
logical true if any element of that column is true.
>> x = [1 2 3 4]
x =
1 2 3 4
>> y = x<=2
y =
1 1 0 0
>> all(y)
ans =
0
>> any(y)
ans =
1
>> x = [1 2 ; 3 4];
x =
1 2
3 4
>> y = x<=3
y =
1 1
1 0
>> all(y)
ans =
1 0
>> any(y)
ans =
1 1
8.4.3 find
find is a useful function for working with multiple data series. find is not logical itself, although it takes
logical inputs and returns matrix indices where the logical statement is true. There are two primary ways
to call find. indices = find (x < y) will return indices (1,2,. . .,numel(x)) while [i,j] = find (x < y)
will return pairs of matrix indices (i , j ) that correspond to the places where x<y.
8.4 Logical Functions 45
>> x = [1 2 3 4];
>> y = x<=2
y =
1 1 0 0
>> find(y)
ans =
1 2
>> x = [1 2 ; 3 4];
>> y = x<=3
y =
1 1
1 0
>> find(y)
ans =
1
2
3
>> [i,j] = find(y)
i =
1
2
1
j =
1
1
2
8.4.4 is*
A number of special purpose logical tests are provided to determine if a matrix has special characteristics.
Some operate element-by-element and produce a matrix of the same dimension as the input matrix while
other produce only scalars. These functions all begin with is.
There are a number of other special purpose is* expressions. For more details, search for is* in the help
file.
46 Logical Operators
Note: isnan(x)|isinf(x)|isfinite(x) always equals 1, implying any element falls into one (and only
one) of these categories.
8.5 Exercises
1. Using the data file created in Chapter 14, count the number of negative returns in both the S&P 500
and ExxonMobil.
2. For both series, create an indicator variable that takes the value 1 is the return is larger than 2 stan-
dard deviations or smaller than -2 standard deviations. What is the average return conditional on
falling into this range for both returns.
3. Construct an indicator variable that takes the value of 1 when both returns are negative. Compute
the correlation of the returns conditional on this indicator variable. How does this compare to the
correlation of all returns?
5. What is the relationship between all and any? Write down a logical expression that allows one or
the other to be avoided (i.e. write myany = ? with out using any and myall = ? without using all).
Chapter 9
Control Flow
9.1 Choice
Flow control allows different code to be executed depending on whether certain conditions are met. Two
flow control structures are available: if . . . elseif . . . else and switch . . . case . . . otherwise.
if . . . elseif . . . else blocks always begin with an if statement immediately followed by a scalar logical
expression and must be terminated with end. elseif and else are optional and can always be replicated
using nested if statements at the expense of more complex logic. The generic form of an if . . . elseif . . .
else block is
if logical_1
Code to run if logical_1
elseif logical_2
Code to run if logical_2
elseif logical_3
Code to run if logical_3
...
...
else
Code to run if all previous logicals are false
end
or
if logical
Code to run if logical true
else
Code to run if logical false
end
48 Control Flow
>> x
x =
4
and
x = 5;
if x<5
x=x+1;
elseif x>5
x=x-1;
else
x=2*x;
end
>> x
x =
10
These examples have all used simple logical expressions. However, any scalar logical expressions, such
as (x<0 || x>1) && (y<0 || y>1) or isinf(x) || isnan(x), can be used in if . . . elseif . . . else blocks.
switch . . . case . . . otherwise blocks allow for more advanced flow control although they can be completely
replicated using only if . . . elseif . . . else flow control blocks. Do not feel obligated to use these if not
comfortable in their application. The basic structure of this block is to find some variable whose value can
be used to choose a piece of code to execute (the switch variable). Depending on the value of this variable
(its case), a particular piece of code will be executed. If no cases are matched (otherwise), a default block
of code is executed. otherwise can safely be omitted and if not present no code is run if none of the cases
are matched. However, at most one block is matched. Matching a case causes that code block to execute
then the program continues running on the next line after the switch . . . case . . . otherwise block. The
generic form of a switch . . . case . . . otherwise block is
switch variable
case value_1
Code to run if variable=value_1
case value_2
Code to run if variable=value_2
case value_3
Code to run if variable=value_3
9.1 Choice 49
...
...
otherwise
Code to run if variable not matched
end
There is an equivalence between switch . . . case . . . otherwise and if . . . elseif . . . else blocks, although
if the logical expressions in the if . . . elseif . . . else block contain inequalities, variables must be created
prior to using a switch . . . case . . . otherwise block. switch . . . case . . . otherwise blocks also differ from
standard C behavior since only one case can be matched per block. The switch . . . case . . . otherwise block
is exited after the first match and the program resumes with the next line after the block.
A simple switch . . . case . . . otherwise example:
x=5;
switch x
case 4
x=x+1;
case 5
x=2*x;
case 6
x=x-2;
otherwise
x=0;
end
>> x
x =
10
cases can include multiple values for the switch variable using the notation case {case1 ,case2 ,. . . }. For
example,
x=5;
switch x
case {4,5}
x=x+1;
case {1,2}
x=2*x;
otherwise
x=0;
end
>> x
x =
6
x = 9;
switch x
case {4}
x=x+1;
case {1,2,5}
x=2*x;
otherwise
50 Control Flow
x=0;
end
>> x
x =
0
9.2 Loops
Loops make many problems, particularly when combined with flow control blocks, simple and in many
cases, feasible. Two types of loop blocks are available: for . . . end and while. . . end. for blocks iterate over
a predetermined set of values and while blocks loop as long as some logical expression is satisfied. All for
loops can be expressed as while loops although the opposite is not true. They are nearly equivalent when
break is used, although it is generally preferable to use a while loop than a for loop and a break statement.
for loops begin with for iterator=vector and finish with end. The generic structure of a for loop is
for iterator=vector
Code to run
end
iterator is the variable that the loop will iterate over. For example, i is a common name for an iterator.
vector is a vector of data. It can be an existing vector or it can be generated on the fly using linspace
or a:b:c syntax (e.g. 1:10). One subtle aspect of loops is that the iterator can contain any vector data,
including non-integer and/or negative values. Consider these three examples:
count=0;
for i=1:100
count=count+i;
end
count=0;
for i=linspace(0,5,50)
count=count+i;
end
count=0;
x=linspace(-20,20,500);
for i=x
count=count+i;
end
The first loop will iterate over i = 1, 2,. . . , 100 . The second loops over the values produced by the
function linspace which creates 50 uniform points between 0 and 5, inclusive. The final loops over x, a
vector constructed from a call to linspace. Loops can also iterate over decreasing sequences:
count=0;
x=-1*linspace(0,20,500);
for i=x
9.2 Loops 51
count=count+i;
end
The key to understanding for loop behavior is that for always iterates over the elements of vector in the
order they are presented (i.e. vector(1), vector(2), . . .).
Loops can also be nested:
count=0;
for i=1:10
for j=1:10
count=count+j;
end
end
One particularly useful construct is to loop over the length of a vector, which allows each element to ac-
cessed individually.
trend=zeros(100,1);
for i=1:length(trend)
trend(i)=i;
end
Finally, these ideas can be combined to produce nested loops with flow control.
matrix=zeros(10,10);
for i=1:size(matrix,1)
for j=1:size(matrix,2)
if i<j
matrix(i,j)=i+j;
else
matrix(i,j)=i-j;
end
end
end
or loops containing nested loops that are executed based on a flow control statement.
52 Control Flow
matrix=zeros(10,10);
for i=1:size(matrix,1)
if (i/2)==floor(i/2)
for j=1:size(matrix,2)
matrix(i,j)=i+j;
end
else
for j=1:size(matrix,2)
matrix(i,j)=i-j;
end
end
end
Note: The iterator variable should not be modified inside the for loop. Changing the iterator can produce
undesirable results. For instance,
for i=1:10
i
i=2*i;
i
end
while loops are useful when the number of iterations needed depends on the outcome of the loop con-
tents. while loops are commonly used when a loop should only stop if a certain condition is met, such as
the change in some parameter is small. The generic structure of a while loop is
while logical
Code to run
9.2 Loops 53
Two things are crucial when using a while loop: first, the logical expression should evaluate to true
when the loop begins (or the loop will be ignored) and second the inputs to the logical expression must
be updated inside the loop. If they are not, the loop will continue indefinitely (hit CTRL+C to break an
interminable loop). The simplest while loops are drop-in replacements of for loops, and
count=0;
i=1;
while i<=10
count=count+i;
i=i+1;
end
while loops should generally be avoided when for loops will do. However, there are situations where no
for loop equivalent exists.
mu=1;
index=1;
while abs(mu) > .0001
mu=(mu+randn)/index;
index=index+1;
end
In the block above, the number of iterations required is not known in advance and since randn is a standard
normal pseudo-random number, it may take many iterations until this criterion is met. Any finite for loop
cannot be guaranteed to meet the criteria.
9.2.3 break
break can be used to terminate a for loop and, as a result, for loops can be constructed to behave similarly
to while loops.
for iterator = vector
Code to run
if logical
break
end
end
The only difference between this loop and a standard while loop is that the while loop could potentially
run for more iterations than iterator contains. break can also be used to end a while loop before running
the code inside the loop. Consider this slightly strange loop:
while 1
54 Control Flow
x = randn;
if x < 0
break
end
y = sqrt(x);
end
The use of while 1 will produce a loop, if left alone, that will run indefinitely. However, the break
command will stop the loop if some condition is met. More importantly, the break will prevent the code
after it from being run, which is useful if the operations after the break will create errors if the logical
condition is not true.
9.2.4 continue
continue, when used inside a loop, has the effect of advancing the loop to the next iteration while skipping
any remaining code in the body of the loop. While continue can always be avoided using if . . .else blocks,
its use typically results in tidier code. The effect of continue is best seen through a block of code,
for i=1:10
if (i/2)==floor(i/2)
continue
end
i
end
demonstrating that continue is forcing the loop to the next iteration whenever i is even (and (i/2)==
floor(i/2) evaluates to logical true).
Exception handling is an advanced tool which allows programs to be tolerant of errors. It is not necessary
for most numerical applications since data values which would produce the error, such as dividing by 0,
can be checked, and if encountered, an alternative code path can be executed. Exception handling is more
useful when performing input/output (especially if over a network)
try . . . catch blocks can be used to execute code which may not always complete. They should not usually
be used in numeric code since it is better to anticipate and explicitly handled issues when they occur to
ensure correct results. try statement allow subsequent statements to be run, and, more importantly, for
9.4 Exercises 55
continuation even if they code contains an error. catch blocks execute at the point where the error occurs,
and so if the code in the try block does not produce an error, the catch block is skipped. catch blocks can
be used with a special syntax to capture the error, which may be useful for debugging or cleaning up any
resources which were used in the try block. Note that when an error occurs, the code in the try block
before the error is executed and any code after the error is skipped.
One scenario for using a try . . . catch block is when reading or writing data to a network drive if there
is some chance that the network drive may be temporarily down. The following code shows one method
to accomplish this. The catch block uses matlabError to capture the error so that information can be
displayed. It also checks to see if the file is open, in which case fid would be positive, and closes it if
needed.
notRead = true;
while notRead
try
fid = fopen(’data.txt’,’rt’);
data = fgetl(fid);
fclose(fid);
notRead = false;
catch matlabError
if fid>0
fclose(fid);
end
disp(matlabError.identifier)
disp(matlabError.message)
% Pause for 30 seconds before retrying
pause(30)
end
end
9.4 Exercises
1. Write a code block that would take a different path depending on whether the returns on two series
are simultaneously positive, both are negative, or they have different signs using an if . . . elseif . . .
else block.
2. Construct a variable which takes the values 1, 2 or 3 depending on whether the returns in exercise
1 are both positive (1), both negative (2) or different signs (3). Repeat exercise 1 using a switch . . .
case . . . otherwise block.
3. Simulate 1000 observations from an ARMA(2,2) where εt are independent standard normal innova-
tions. The process of an ARMA(2,2) is given by
yt = φ1 yt −1 + φ2 yt −2 + θ1 εt −1 + θ2 εt −2 + εt
Use the values φ1 = 1.4, φ2 = −.8, θ1 = .4 and θ2 = .8. Note: A T by 1 vector containing stan-
dard normal random variables can be simulated using e = randn(T,1). When simulating a process,
56 Control Flow
always simulate more data then needed and throw away the first block of observations to avoid start-
up biases. This process is fairly persistent, at least 100 extra observations should be computed.
4. Simulate a GARCH(1,1) process where εt are independent standard normal innovations. A GARCH(1,1)
process is given by
yt = σt εt
Use the values ω = 0.05, α = 0.02 γ = 0.07 and β = 0.9 and set h0 = ω/ 1 − α − 12 γ − β . Note
that some form of logical expression is needed in the loop. I[εt −1 <0] is an indicator variable that takes
the value 1 if the expression inside the [ ] is true.
yt = φ1 yt −1 + θ1 σt −1 εt −1 + λσ2t + σt εt
Use the values from Exercise 3 for the GJR-GARCH model and use the φ1 = −0.1, θ1 = 0.4 and
λ = 0.03.
7. Using a while loop, write a bit of code that will do a bisection search to invert a normal CDF. A
bisection search cuts the interval in half repeatedly, only keeping the sub-interval with the target
in it. Hint: keep track of the upper and lower bounds of the random variable value and use flow
control. This problem requires normcdf.
8. Test out the loop using by finding the inverse CDF of 0, -3 and pi. Verify it is working by taking the
absolute value of the difference between the final value and the value produced by norminv.
Chapter 10
Graphics
Extensive plotting facilities capable of producing a virtually limitless range of graphical data representa-
tions are available. This chapter will emphasize the basics of the most useful graphing tools.
All plotting functions have a set of support functions which are useful for providing labels for various
portions of the plot or making adjustments to the range.
• legend labels the various elements on a graph. The specific behavior of legend depends on the type
of plot and the order of the data. legend takes as many strings as unique plot elements. Standard
usage is legend(’Series 1’,’Series 2’) where the number of series is figure dependent.
• title places a title at the top of a figure. Standard usage is title(’Figure Title’).
• xlabel, ylabel and zlabel produce text labels on the x , y and z (if the plot is 3-D) axes respectively.
Standard usage is xlabel(’X Data Name’).
• axis can be used to both get the axis limits and set the axis limits. To retrieve the current axis limits,
enter AX = axis();. AX will be a row vector of the form [xlow xhigh ylow yhigh (zlow) (zhigh)] where
zlow and zhigh are only included if the figure is 3-D. The axis can be changed by calling axis([xlow
xhigh ylow yhigh (zlow) (zhigh)]) where the z-variables are only allowed if the figure is 3-D. axis
can also be used to tighten the axes to include only the minimum space required to express the data
using the command axis tight.
These four are the most important support functions, but there are many additional functions available
to customize figures (see section 10.5).
10.2 2D Plotting
10.2.1 plot
plot is the most basic plotting command. Like most commands, it can be used many ways. the standard
usage for a single series is
58 Graphics
plot(x1,y1,format1)
where x1 and y1 are vector of the same size and format1 is a format string of the form color shape linespec.
color can be any of
b blue m magenta
g green y yellow
r red k black
c cyan
- solid -. dashdot
: dotted -- dashed
(none) no line
The three arguments are combined to produce a format string. For instance ’gs-’ will produce a green
solid line with squares at every data point while ’r+ ’ will produce a set of red + symbols at every data
point (note that the string is r-plus-space). Arguments which are not needed can be left out. For instance,
to produce a green dotted line with no symbol, use the format string ’g:’. If no format string is provided,
an automatic color scheme will be used with marker-less solid lines. Suppose the following x and y data
were created,
x = linspace(0,1,100);
y1 = 1-2*abs(x-0.5);
y2 = x;
y3 = 1-4*abs(x-0.5).^2;
Calling plot(x,y1,’rs:’,x,y2,’bo-.’,x,y3,’kp--’) will produce the plot in figure 10.1. A line’s color
information is lost when documents printed are in black and white, and so it is important to use physical
characteristics to distinguish multiple series – either different line types or different markers, or both.
All plots should be clearly labeled. The following code labels the axes, gives the figure a title, and
provides a legend. The results of running the code along with the plot command above can be seen in
figure 10.1.
xlabel(’x’);
ylabel(’f(x)’);
title(’Plot of three series’);
legend(’f(x)=1-|x-0.5|’,’f(x)=x’,’f(x)=1-4(x-0.5)^2’);
10.2 2D Plotting 59
(a) (b)
0.8 0.8
0.7 0.7
0.6 0.6
f(x)
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
(c) (d)
0 0 1
y
-1
-2
-3 -0.05 0.5
-5 -4 -3 -2 -1 0 1 2 3 4 5 0 100 200 300 400 500 600 700 800 900 1000
x
One final method for calling plot is worth mentioning. plot(y) will plot the data in vector y against a
simple series which labels each observation 1, 2, . . ., length(y). plot(y) is equivalent to plot(1:length(y),y)
when y is a vector. If y is a matrix, plot will draw each column of y as if it was a separate series and plot(y)
is equivalent to plot(1:length(y(:,1)), y(:,1), 1:length(y(:,2)), y(:,2), . . . ).
10.2.2 plotyy
plotyy is a special version of plot which allows two series to be plotted on the same graph using different
axes - a left and a right one. The basic use is plotyy(x1,y1,x2,y2). The following code plots a set of
simulated returns, y1, and the corresponding log-price, y2, which is just the exponential of the cumulative
sum of the returns. The output of this code can be seen in panel (d) of figure 10.1.
x = 1:1000;
y1 = .08/365 + randn(1000,1)*.2/sqrt(250);
60 Graphics
y2 = exp(cumsum(y1));
plotyy(x,y1,x,y2)
legend(’Returns’,’Price’)
10.2.3 scatter
scatter, like most graphing functions, is self-descriptive. It produces a scatter plot of the elements of a
vector x against the elements of a vector y . Formatting, such as color or marker shape can be provided
using a format string as plot. Other options, such as marker size, must be set using handle graphics or
interactive plot editing. A simple example of handle graphics is included at the end of this chapter. Consult
scatter’s help file for further information. The following code produces a scatter plot of 1000 pseudo-
random numbers from a normal distribution, each with unit variance and correlation of 0.5. The output
of this code can be seen in panel (c) of figure 10.1.
x=randn(1000,2);
Sigma=[2 .5;.5 0.5];
x=x*Sigma^(0.5);
scatter(x(:,1),x(:,2),’rs’)
xlabel(’x’)
ylabel(’y’)
legend(’Data point’)
title(’Scatter plot of correlated normal random variables’)
10.2.4 bar
bar produces vertical bar chart, and can be used as bar(y) or bar(x,y) – the first form uses 1:length(y) as
the values for x, which are the bar locations. The following code produces a bar chart with only selected
columns present. The output of this code can be seen in panel (a) of figure 10.2.
x = [1 2 4 5 9];
y = 20-(5-x).^2;
bar(x,y)
title(’Bar Chart’)
Other bar charts can be produced using an optional style argument (bar(x,y,’style’)), where style
is one of:
• ’grouped’ - Produces a bar chart where values in each column of y are grouped together, but appear
in different colors.
• ’stacked’ - Produces a bar chart by stacking the values in the each column of y. This is only useful
if y is n by k where n > 1.
Examples of the three styles appear in panels (b) – (d) of figure 10.2. These were generated (in order) using
bar(1:3,[1 2 3;2 3 4;3 4 5],’grouped’)
bar(1:3,[1 2 3;2 3 4;3 4 5],’stacked’)
bar(1:3,[1 2 3;2 3 4;3 4 5],’hist’)
10.2 2D Plotting 61
(a) (b)
Bar chart
20 5
18 4.5
16 4
14 3.5
12 3
10 2.5
8 2
6 1.5
4 1
2 0.5
0 0
1 2 4 5 7 9 1 2 3
(c) (d)
12 5
4.5
10
4
3.5
8
6 2.5
4
1.5
1
2
0.5
0 0
1 2 3 1 2 3
Figure 10.2: This plot contains four bar charts produced using variances of bar(x,y).
10.2.5 hist
hist constructs a histogram – a rough empirical PDF – of a vector of data. The following code simulates
10,000 χ42 random variables and produces a histogram of the simulated values using 50 bins in the his-
togram (10 bins are used by default).
x = chi2rnd(4,10000,1);
hist(x, 50)
The results of running this code is presented in panel (a) of figure 10.3.
10.2.6 stairs
stairs produces a plot which is appropriate for discrete data - such as high-frequency price data. The
primary difference between stairs and plot is the mechanism used to connect the data points plotted.
stairs uses a step method to connect the points while plot uses simple linear interpolation. Panel (b) of
figure 10.3 shows the result of running the following code.
price = cumsum(randn(20,1));
62 Graphics
(a) (b)
900
10
800
700
8
600
500 6
400
4
300
200
2
100
0 0
0 5 10 15 20 25 30 0 2 4 6 8 10 12 14 16 18 20
(c) (d)
Bonds
3
0
Stocks
-1
-2
0 5 10 15 20 25
Figure 10.3: Panel (a) demonstrates the use of hist. Panel (b) shows the use of stairs to plot discrete data.
Panel (c) demonstrates the use of errorbar and panel (d) shows the use of pie.
stairs(price)
title(’Stair plot of price’)
10.2.7 errorbar
errorbar adds error bars to a basic plot. The error bars can be provided using either a scalar, in which case
the error bars are plotted using 2 times the scalar – it is similar to a standard deviation when the data is
normal – or using vectors L and U which specify the lower and upper bounds (in deviation from the data).
The following code produces an errorbar plot using (random) lower and upper bounds for the error bars.
x = 1:20;
l = -abs(randn(1,20));
u = abs(randn(1,20));
y = cumsum(randn(1,20));
errorbar(x,y,l,u)
10.2.8 pie
pie can be used to produce a pie chart. The basic structure is pie(y,explode,label) where y is the data
to use in the pie chart, explode is a vector with the same size as y which describes how far from a center
a slice should appear (default is 0), and label is a cell array of strings which can be used to provide labels
for each slice (See Chapter 13 for more on cell arrays). The following code produces the pie chart in panel
(d) of figure 10.3.
pie([.7 .2 .1],[.1 0 0],{’Stocks’,’Bonds’,’Cash’})
title(’Asset allocation’)
10.3 3D Plotting
10.3.1 plot3
plot3 behaves similarly to plot except that it plots a series against two other series in a 3-dimensional
space. All arguments are the same and the generic form is
plot3(x1,y1,z1,format1)
The results of this block of code can be seen in panel (a) of figure 10.4.
10.3.2 surf
The next three graphics tools all plot a matrix of z data against vector of x and y data. All three uses the
results from a bivariate normal probability density function. The PDF of a bivariate normal with mean 0
is given by
1 1
f X (x ) = − exp(− x 0 Σ−1 x )
1
2π|Σ| 2 2
A matrix of PDF values, pdf was created with the following code:
64 Graphics
(a) (b)
1 0.2
0.8
0.15
0.6
PDF
0.1
z
0.4
0.05
0.2
0 0
1 2
0.5 1 1 3
2
0.5
0 0 1
0 0
-0.5 -1 -1
-0.5
y y -2
-1 -1 x -2 -3 x
(c) (d)
1.5
0.2
0.15
0.5
PDF
0.1
0
y
0.05
-0.5
0
-1
2
1 3
2 -1.5
0 1
0
-1 -1
y -2 -2
-2 -3 x -3 -2 -1 0 1 2 3
x
Figure 10.4: 3-D lines can be plotted using the plot3 command. This line was plotted by calling
plot3(x,y,z,’rs:’);. surf plots a 3-D surface from vectors of x and y data and a matrix of z data. This
surf contains the PDF bivariate of a bivariate normal, and was created using surf(x,y,pdf) where x, y
and pdf are defined in the text. mesh produce a figure similar to surf but with gaps between grid points,
allowing the backside of a figure to be seen in a single view. This mesh contains the PDF of a bivariate nor-
mal, and was created using mesh(x,y,pdf) where x, y and pdf are defined in the text. A contour plot is a set
of slices through a surf plot. This particular contour plot contains iso-probability lines from a bivariate
normal distribution with mean 0, variances of 2 and 0.5, and correlation of 0.5.
N = 100;
x = linspace(-3,3,N);
y = linspace(-2,2,N);
Sigma = [2 .5; .5 .5];
pdf=zeros(N,N);
for i=1:length(y)
for j=1:length(x)
pdf(i,j)=exp(-0.5*[x(j) y(i)]*Sigma^(-1)*[x(j) y(i)]’)/sqrt((2*pi)^2*det(Sigma));
end
end
10.4 Multiple Graphs 65
The first two lines initialize the x and y values. Since x has a higher variance, it has a larger range. The surf
(panel (b) of figure 10.4) was created by
surf(x,y,pdf)
xlabel(’x’)
ylabel(’y’)
zlabel(’PDF’)
title(’Surf of normal PDF’)
shading interp
The command shading interp changes how the colors are applied from a discrete grid to a continuous
grid.
Note: The x and y arguments of surf must match the dimensions of the z argument. If [M,N]=size(z),
then length(y) must be M and length(x) must be N. This is true of all 3-D plotting functions that draw
matrix data. In the code above, i is the row iterator which corresponds to y and j is the column iterator,
corresponding to x.
10.3.3 mesh
mesh produces a graphic similar to surf but with empty space between grid points. Mesh has the advantage
that the hidden side can be seen, potentially revealing more from a single graphic. It also produces much
smaller files which can be important when including multiple graphics in a presentation or report. Using
the same bivariate normal setup, the following code produces the mesh plot evidenced in panel (c) of figure
10.4.
mesh(x,y,pdf)
xlabel(’x’)
ylabel(’y’)
zlabel(’PDF’)
title(’Mesh of normal PDF’)
10.3.4 contour
contour is similar to surf and mesh in that it takes three arguments, x , y and z . contour differs in that it pro-
duces a 2D plot. contour plots, while not as eye-catching as surf or mesh plots, are often better at convey-
ing meaningful information. Contour plots can be either called as contour(x,y,z) or contour(x,y,z,N)
where N determines the number of contours drawn. If omitted, the number of contours is automatically
determined based on the variance of the z data. The code below and panel (d) of figure 10.4 demonstrate
the use of contour.
contour(x,y,pdf);
xlabel(’x’)
ylabel(’y’)
title(’Contours of normal PDF’)
Subplots allow for multiple plots to be placed in the same figure. All calls to subplot must specify three
arguments, the number of rows, the number of columns and which cell to place the graphic. The generic
66 Graphics
form is
subplot(M ,N ,#).
where M is the number of rows, N is the number of columns, and # indicates the cell to place the graphic.
Cells in a subplot are counted across then down For instance, in a call to subplot(3,2,#), the #’s would be
1 2
3 4
5 6
A call to subplot should be immediately followed by some plotting function. In the simplest case, this
would be a call to plot. However, any graphic function can be used in a subplot. The code below and
output in figure 10.5 demonstrates how different data visualizations may be used in every cell. These also
show some of the available plotting function that are not described in these notes.
subplot(2,2,1);
x = [5 3 0.5 2.5 2];
explode = [0 1 0 0 0];
pie(x,explode)
colormap jet
title(’pie function’)
axis tight
subplot(2,2,2);
Y = cool(7);
bar3(Y,’detached’)
title(’Detached’)
title(’bar3, ’’Detached’’’)
axis tight
subplot(2,2,3)
bar3(Y,’grouped’)
title(’bar3, ’’Grouped’’’)
axis tight
subplot(2,2,4);
x = 1:10;
y = sin(x);
e = std(y)*ones(size(x));
errorbar(x,y,e)
title(’errorbar’)
axis tight
Note: The graphics code in each subplot was taken from the function’s help file (see doc function). The
help system is comprehensive and most functions are illustrated with example code.
While the standard graphics functions are powerful, these functions are not flexible enough to express all
available options. For example, it is often useful to change the thickness of a line in order to improve its
10.5 Advanced Graphics 67
38%
0.5
19%
0
1
2
3
4% 4
5
6 3
7 2
1
23%
1
0.5
0.5
0
0
1 -0.5
2
3 -1
4
5
6 -1.5
7 2 4 6 8 10
Figure 10.5: Subplots allow for more than one graphic to be included in a figure. This particular subplot
contains three different types of graphics with two variants on the 3-D bar. The upper left contains a call
to pie, the upper right contains a call to bar3 specifying the option ’grouped’, the lower left contains a call
to bar3 specifying the options ’detached’ and the lower right contains the results to a call to errorbar.
Point-and-click
The simplest method to improve plots is to use the editing facilities of the figure windows directly. A num-
ber of buttons are available along the top edge of a plot. One the of these is an arrow, (1) in figure 10.6.
Clicking on the arrow will highlight it and allow any element, such as a line, to be selected. Double-clicking
on a line will bring up a Property Editor (2) dialog which contains elements of the selected item that can
be changed. These include color, line width, and marker (3). For more information in editing plots, search
for Editing Plots in the help browser.
68 Graphics
Figure 10.6: Most features of a plot can be editing using the interactive editing tools of a figure window.
Interactive editing can be started by first selecting the arrow icon along the top of the figure (1), then
clicking on the element to be edited (e.g. the line, the axes, any text label). This will bring up the Property
Editor (2) where the item-specific properties can be changed (3). Alternatively, the interactive editing
features can be enabled by selecting Edit>Figure Properties.
Handle Graphics
The MATLAB graphics system is fully programmable. Anything that can be accomplished through manual
editing of a plot can be accomplished by using handle graphics since every graphical element is assigned
a handle. The handle contains everything there is to know about the particular element, such as the color
or line width. Once familiar with handle graphics, they can be used to create spectacularly complex data
visualizations. The use of handle graphics will be illustrated through an example.
The example will illustrate the use of handle graphics by showing both before and after plots using
subplot.
e = randn(100,2);
y = cumsum(e);
subplot(2,1,1);
plot(y);
10.6 Exporting Plots 69
subplot(2,1,2);
h = plot(y);
l = legend(’Random Walk 1’,’Random Walk 2’,’Location’,’Southwest’);
t = title(’Two Random Walks’);
xl = xlabel(’Day’);
yl = ylabel(’Level’);
set(h(1),’Color’,[1 0 0],’LineWidth’,3,’LineStyle’,’:’)
set(h(2),’Color’,[1 .6 0],’LineWidth’,3,’LineStyle’,’-.’)
set(t,’FontSize’,14’,’FontName’,’Bookman Old Style’,’FontWeight’,’demi’)
set(l,’FontSize’,14’,’FontName’,’Bookman Old Style’,’FontWeight’,’demi’)
set(xl,’FontSize’,14’,’FontName’,’Bookman Old Style’,’FontWeight’,’demi’)
set(yl,’FontSize’,14’,’FontName’,’Bookman Old Style’,’FontWeight’,’demi’)
parent = get(h(1),’Parent’);
set(parent,’FontSize’,14’,’FontName’,’Bookman Old Style’,’FontWeight’,’demi’)
Most modifications that can be made using handle graphics can be implemented using the point-
and-click editing method previously outlined. The advantage of handle graphics is only apparent when a
figure needs to be updated or redrawn. If handle graphics have been used, it is only necessary to change
the data and the re-run the code. If using the point-and-click editing method, any change in the data or
model requires manually reapplying the edits. For more on handle graphics, please consult the Handle
Graphics Properties in the help file.
Once a plot has been finalized, it must be exported to be included in an assignment, report or project.
Exporting is straight forward. On the figure, click File, Save As (1 in figure 10.8). In the Save as type box,
select the desired format (TIFF for Microsoft Office, EPS or PDF file for LATEX(2 in figure 10.9)), enter a file
name (1 in figure 10.9) and save. Figures 10.8 and 10.9 contain representations of the steps needed to
export from a figure box.
If the exported figure does not appear as desired, it may be necessary to alter the shape of the figure’s
window. Exported figures are What-You-See-Is-What-You-Get (WYSIWYG). Figure 10.10 contains an ex-
ample of a figure with reasonable proportions while the axes in Figures 10.11 and 10.12 poorly scaled. The
following code will the three figures.
fig = figure(1);
x = linspace(0,1,100);
y = 1-abs(x-0.5);
plot(x,y,’r’)
xlabel(’x’);
ylabel(’y=1-|x-0.5|’);
title(’Roof-top plot’);
legend(’f(x)=1-|x-0.5|’);
set(fig,’Position’,[445 -212 957 764]);
70 Graphics
-5
Level
-10
-15
-20
0 10 20 30 40 50 60 70 80 90 100
Day
0
Level
-5
-10
Figure 10.7: The top subplot is a standard call to plot while the bottom highlight some of the possibilities
when using handle graphics. It is worth nothing that all of these changes evidenced in the bottom subplot
can be reproduces using the point-and-click method.
fig = figure(2);
x = linspace(0,1,100);
y = 1-abs(x-0.5);
plot(x,y,’r’)
xlabel(’x’);
ylabel(’y=1-|x-0.5|’);
title(’Roof-top plot’);
legend(’f(x)=1-|x-0.5|’);
set(fig,’Position’,[ 445 -212 461 764]);
fig = figure(3);
x = linspace(0,1,100);
y = 1-abs(x-0.5);
plot(x,y,’r’)
10.6 Exporting Plots 71
Figure 10.8: To export a figure, click Save As. . . in the file menu of a figure (1). The dialog in figure 10.9 will
appear.
xlabel(’x’);
ylabel(’y=1-|x-0.5|’);
title(’Roof-top plot’);
legend(’f(x)=1-|x-0.5|’);
set(fig,’Position’,[ 445 216 957 336]);
10.6.1 print
Figures can be programmatically exported using the print command. The basic structure of the com-
mand is print -dformat filename where format is epsc2 for color encapsulated postscript (EPS, LATEX or
Microsoft Office), pdf for portable document format (LATEX) or tiff for TIFF (Microsoft Office). When
explorint to PDF, it is a good idea to use the additional flag -fillpage. Figures exported in EPS or PDF
formats are vector images and scale both up and down well. TIFF images are static and become blurry
when scaled.
Note: It is necessary to call set(gcf,’Color’,[1 1 1],’InvertHardcopy’,’off’) before print to remove
the gray border surrounding the figure.
72 Graphics
Figure 10.9: To export a figure, enter a file name and use the drop-down box to select a file type. Select
TIFF image if using Microsoft Office or EPS File (Encapsulated Postscript) if using LATEX.
fig = figure(1);
x = linspace(0,1,100);
y = 1-abs(x-0.5);
plot(x,y,’r’)
xlabel(’x’);
ylabel(’y=1-|x-0.5|’);
title(’Roof-top plot’);
legend(’f(x)=1-|x-0.5|’);
set(fig,’Position’,[445 -212 957 764]);
set(gcf,’Color’,[1 1 1],’InvertHardcopy’,’off’)
print -depsc2 ExportedFigure.eps
print -dtiff ExportedFigure.tiff
print -dpdf -fillpage ExportedFigure.pdf
10.7 Exercises 73
Figure 10.10: Exporting figures is What-You-See-Is-What-You-Get. The axes in this figure are appropri-
ately scaled.
10.7 Exercises
1. Generate two random walks using a loop and randn. Plot these two on a figure and provide all of the
necessary labels.
3. Generate 1000 draws from a normal. Plot a histogram with 50 bins of the data.
4. Using the ExxonMobil and S&P 500 data (see the Chapter 14 exercises), produce a 2 × 2 subplot
containing:
5. Export the plot from exercise 1 as a TIFF, an EPS and a PDF. View the files created outside of MATLAB.
74 Graphics
Figure 10.11: Exporting figures is What-You-See-Is-What-You-Get. The axes in this figure are poorly scaled
and the height is too large for the width.
Figure 10.12: Exporting figures is What-You-See-Is-What-You-Get. The axes in this figure are poorly scaled
and the width is too large for the height.
6. Use page setup to change the orientation and dimensions as described in this chapter. Re-export
the figure as both a TIFF and EPS (using different names) and compare the new images to the old
versions.
Chapter 11
Tracking dates is crucial when working with time-series data. MATLAB provides two methods to store
dates and times. The modern implementation is known as a datetime. datetimes are optimized format
that is human readable in the console and provides support extended information such as time zones.
The legacy date format is known as MATLAB serial dates where dates are stored as days since January 0,
0000.1 For example, January 1, 0000 is 1 in MATLAB date format and January 1, 2000 is 730,486. Serial
dates store hours as fractional days, and so 12:00 January 1, 2000 is 730,486.5.
MATLAB datetimes are a modern, flexible data type for storing dates and times. Theey provide nanosec-
ond resolution and support time zone information. The latter is important in many domains since 18:15:32
GMT on January 31, 2016 is the same as 13:15:32 in New York.
The standard method to create a datetime array is to call datetime on a cell array of strings. This will
produce an array of datetime.2
>> dates = {’12/31/1999’,’1/1/2000’,’1/2/2000’}
>> datetimes = datetime(dates)
datetimes =
31-Dec-1999 01-Jan-2000 02-Jan-2000
datetimes can also be created from numeric values containing year, month, day, and optionally hour,
minute, second and millisecond.
>> year = [1999 2000 2000];
>> month = [12 1 1];
>> day = [31 1 2];
>> datetimes = datetime(year, month, day)
datetimes =
31-Dec-1999 01-Jan-2000 02-Jan-2000
>> hours = [23 6 18]
>> minutes = [59 0 30]
1
Serial dates in MATLAB are numeric values and do not require special treatment.
2
While the implementation of a datetime is not directly exposed to users, in 2016a each datetime is stored using 16 bytes
of memory. This is twice as much storage as a MATLAB Serial date and allows for additional information about the date and time,
such as a time zone, to be stored.
76 Dates and Times
datetimes can be created from other formats using the optional argument ’ConvertFrom’ followed by a
supported format such as ’datenum’ (MATLAB Serial dates) or ’excel’ (Excel dates). Finally, a small num-
ber of frequently used dates can be created from string arguments, including ’now’, ’today’, ’tomorrow’
and ’yesterday’.
>> datetime(’now’)
ans =
07-Oct-2016 15:47:59
>> datetime(’today’)
ans =
07-Oct-2016
>> datetime(’yesterday’)
ans =
06-Oct-2016
datetimes are objects and properties of a datetime can be accessed using dot notation. Available prop-
erties include the components of the date such as year, month, day, the components of the time such as
hour or second, and information about the timezone of the datetime.
>> n = datetime(’now’);
>> n.Year
ans =
2016
>> n.Hour
ans =
15
>> n.TimeZone
ans =
’’
>> n.SystemTimeZone
ans =
Europe/London
durations arise naturally through differencing datetimes. durations are expressed in terms of hours, min-
utes, seconds.
>> datetime(’now’)
ans =
07-Oct-2016 15:58:20
>> datetime(’now’) - datetime(’today’)
11.2 MATLAB Serial Dates 77
ans =
15:58:20
>> datetime(’now’) - datetime(’yesterday’)
ans =
39:58:20
durations can be directly created by passing in the number of hours, minutes, seconds and milliseconds.
duration can also be used in mathematical expressions to construct datetimes.
calendarDurations are similar to durations except that are expressed in terms of calendar units such as
years, months and days. They are a convenience function for generating sequences that are regular in
terms of a calendar but do not have a uniform duration in terms of hours.
11.1.3 NaT
Like NaN for numeric values, datetimes support a specific missing value – NaT (not a time). Importing
unrecognizable date strings will produce NaTs.
>> datetime({’12/31/1999’,’12/32/1999’})
ans =
31-Dec-1999 NaT
>> dt = datetime({’12/31/1999’,’12/32/1999’});
>> dt(1) - dt(2)
ans =
01-Jan-2000 00:00:00 NaT
Serial dates store dates as numbers based on the relative distance to January 0, 0000. Since these are simply
numbers, special purpose functions are required to convert to a human readable format. datetimes are
preferred to serial dates, and these are primarily retained for legacy compatibility.
78 Dates and Times
11.2.1.1 datenum
datenum converts either string dates (’01JAN2000’) or numeric dates ([2000 01 01]) into MATLAB serial
dates. To call the function with string dates, use either datenum(stringdate) or datenum(stringdate,format )
where format is composed of blocks from
While common string formats are automatically recognized, format strings allow virtually any date
format to be converted to MATLAB serial dates. Format strings are particularly useful if the arguments
appear in a strange order, such as yyyyddmm (e.g. 20000101), or if the dates are delimited using nonstan-
dard characters, such as a ; or , (e.g. 2000;01;01). Consider the following examples showing both automatic
detection and the use of format strings.
>> datenum(’01JAN2000’)
ans =
730486
>> datenum(’01JAN2000’,’ddmmmyyyy’)
ans =
730486
>> datenum(’01;JAN;2000’,’dd;mmm;yyyy’)
ans =
730486
>> datenum(’01012000’,’ddmmyyyy’)
ans =
730486
>> strdates=char(’01JAN2000’,’02JAN2000’,’03JAN2000’)
strdates =
01JAN2000
02JAN2000
03JAN2000
>> datenum(strdates)
11.2 MATLAB Serial Dates 79
ans =
730486
730487
730488
datenum can additionally be used to convert numeric dates, such as [2000 01 01] to MATLAB serial date
format. For example,
>> datenum([2000 01 01])
ans =
730486
>> years=[2000;2000;2000];
>> months=[01;01;01];
>> days=[01;02;03];
>> [years months days]
ans =
2000 1 1
2000 1 2
2000 1 3
>> datenum(years,months,days)
ans =
730486
730487
730488
datenum can also be used to translate hours, minutes and seconds to fractional days (using [year month
day hour minute second] format).
11.2.1.2 datestr
datestr is the “inverse” of datenum – it produces a human readable string from a MATLAB serial date.
By default, datestr returns string dates of the form ’dd-mmm-yyyy’. datestr also provides a number of
standard formats such as ’mm/dd/yy’ or ’mmm.dd,yyyy’. To produce one of standard date formats, use
datestr(serialdate, #) where # corresponds to one of the format strings (see doc datestr for a list).
datestr can also produce strings with arbitrary formats by providing a format string (e.g. use ’dd; mm;
yyyy’ to produce a date string with ; delimiters).
>> serial_date=datenum(’01JAN2000’)
serial_date =
730486
>> datestr(serial_date)
ans =
01-Jan-2000
>> datestr(serial_date,0)
ans =
01-Jan-2000 00:00:00
>> datestr(serial_date,’dd;mm;yyyy’)
ans =
01;01;2000
Like datenum, datestr can take a vector input and return a vector output.
80 Dates and Times
>> serial_date=datenum(strvcat(’01JAN2000’,’02JAN2000’,’03JAN2000’))
serial_date =
730486
730487
730488
>> datestr(serial_date)
ans =
01-Jan-2000
02-Jan-2000
03-Jan-2000
11.2.1.3 datevec
datevec converts MATLAB serial dates into human parsable numeric formats. Specifically, given a K ×1
vector containing MATLAB serial dates, datevec will produce a K × 6 vector of the form [Year Month Day
Hour Minute Second]. For example,
corresponds to 0:00 (midnight) on January 1 and 2, 2000 and 12:00 (noon) on January 3, 2000.
now returns the a MATLAB serial date representation of the computer clock. clock returns a 1 × 6 vector
(same format as datevec) of the computer clock. datevec(now) produces the same output as clock.
11.2.1.6 etime
The elapsed time between two calls to clock can be computed using etime.
>> c=clock;
>> j=1; for i=1:10000000; j=j+1; end;
>> e=etime(clock,c)
e =
0.0630
11.3 Converting between datetimes and Serial Dates 81
tic and toc can be used for timing code to find hot spot – segments of code which take the majority of the
computational time. For example,
>> tic
>> j=1; for i=1:1000000; j=j+1; end
>> toc
Elapsed time is 0.010740 seconds.
MATLAB series dates are numbers while datetime requires string dates. Serial dates can be converted to
datetimes using the optional arguments ’ConvertFrom’,’datenum’ when calling datetime. datetimes can
be directly converted to serial dates using datenum.
>> dates = {’12/31/1999’,’1/31/2000’,’2/29/2000’}’
>> serial = datenum(dates)
serial =
730485
730516
730545
>> datetimes = datetime(serial,’ConvertFrom’,’datenum’)
datetimes =
31-Dec-1999 00:00:00
31-Jan-2000 00:00:00
29-Feb-2000 00:00:00
>> serial = datenum(datetimes)
serial =
730485
730516
730545
Plotting with dates can be implemented using either datetimes or serial dates. When plotting with datetimes,
the plot will automatically show human readable dates. When plotting with serial dates (which are just
numbers), datetick is required to converts an axis of a plot expressed in MATLAB serial dates to text dates.
For example,
>> dates = datenum(’01Jan2000’):datenum(’31Dec2000’);
>> rw = cumsum(randn(size(dates)));
>> subplot(3,1,1);
>> plot(dates, rw);
>> subplot(3,1,2);
>> plot(dates, rw);
>> datetick(’x’)
>> subplot(3,1,3);
82 Dates and Times
produces the two plots in figure 11.4. The top plot contains MATLAB serial dates along the x-axis while
the bottom contains string dates. datetick also understands both standard formatting commands (see
datestr) and custom formatting commands (see datenum). This function has an unfortunate tendency
to produce few x-labels. The solution is to first choose the axis label points (in serial dates) and then use
datetick(’x’,’keepticks’,’keeplimits’) as illustrated in figure 11.4.
>> figure()
>> h=plot(dates, rw);
>> axis tight
>> serial_dates=datenum(strvcat(’01/01/2000’,’01/02/2000’,’01/03/2000’,...
’01/04/2000’,’01/05/2000’,’01/06/2000’,...
’01/07/2000’,’01/08/2000’,’01/09/2000’,...
’01/10/2000’,’01/11/2000’,’01/12/2000’), ...
’dd/mm/yyyy’);
>> parent=get(h,’Parent’);
>> set(parent,’XTick’,serial_dates);
>> datetick(’x’,’dd/mm’,’keeplimits’,’keeplimits’);
>> xlabel(’Date’)
>> ylabel(’Level’)
>> title(’Demo plot of datetick with keeplimits and keepticks’)
11.4 Dates on Figures 83
20
-20
7.3045 7.305 7.3055 7.306 7.3065 7.307 7.3075 7.308 7.3085 7.309
#10 5
20
-20
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan
20
-20
Jan 2000 Mar 2000 May 2000 Jul 2000 Sep 2000 Nov 2000 Jan 2001
Figure 11.1: datetick converts MATLAB serial dates into text strings. Unfortunately, datetick
changes the location of points and makes fairly bad choices. The solution is to use
datetick(’x’,’keepticks’,’keeplimits’). The bottom panel uses datetimes and so does not re-
quire a call to datetick.
84 Dates and Times
0
Level
−5
−10
−15
01/01 01/02 01/03 01/04 01/05 01/06 01/07 01/08 01/09 01/10 01/11 01/12
Date
Figure 11.2: datetick with keepticks and keeplimits. These two arguments ensure datetick behaves in
a consistent manner. To use them, set up the figure as is should look but with serial dates on the axis, and
then call datetick(’x’,’keepticks’,’keeplimits’).
Chapter 12
String Manipulation
While manipulating text is not MATLAB’s forté, the programming environment does provide a complete
set of tools for working with strings. Simple strings can be input from the command line
str = ’Econometrics is my favorite subject.’;
Strings are treated as matrices of character data, and so they respect the standard behavior of most com-
mands (e.g. str(1:10)). However, using commands designed for numerical data is tedious and special
purpose functions are provided to assist with string data.
The primary application of string functions is to parse data. Chapter 14 contains an example of parsing
a poorly formatted file. It uses a number of string functions to manipulate and parse the text of a file.
char
char has two uses. The first is to convert integer numerical values between 1 and 127 into their ASCII
equivalent characters.1 Non-integer values are truncated to integers using floor and then converted.
>> char(65:100)
ans =
ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_‘abcd
>> char(25*pi)
ans =
N
The second use of char is to vertically concatenate strings (stack) which do not (necessarily) have the
same length.
>> s1 = ’A string’;
>> s2 = ’A longer string’;
>> s3 = ’An even longer string’;
>> char(s1,s2,s3)
ans =
A string
1
Values up to 65535 are permitted to allow unicode characters.
86 String Manipulation
A longer string
An even longer string
Note that char works similarly to strvcat, although the latter deprecated and should not be used.
double
>> double(’MATLAB’)
ans =
77 97 116 108 97 98
upper and lower convert strings to all upper case and lower case, respectively.
strcat
strcat horizontally concatenates strings. z=strcat(x,y) is the same as z=[x y] when x and y have the
same number of rows. If one has a single row, strcat concatenates it to every row of the other vector.
>> strcat(char(’a’,’b’),char(’c’,’d’))
ans =
ac
bd
>> strcat(char(’a’,’b’),’c’)
ans =
ac
bc
strfind
strfind returns the index of the all matching strings in a text block, such as delimiting characters in a
block of text. For example, consider a single line from WRDS TAQ output
>> str = ’IBM,02JAN2001,9:30:07,84.5’;
>> strfind(str,’,’)
ans =
4 14 22
strfind returns all of the location of ’,’. If more than one character is searched for, strfind can produce
overlapping blocks.
>> str = ’ababababa’
str =
ababababa
>> strfind(str,’aba’)
12.1 String Functions 87
ans =
1 3 5 7
strcmp compares two strings and returns (logical) 1 if they are the same, and is case sensitive. strcmpi
does the same but is not case sensitive.
>> strcmp(’a’,’a’)
ans =
1
>> strcmp(’a’,’A’)
ans =
0
>> strcmpi(’a’,’A’)
ans =
1
strncmp compares the first n characters of two strings and returns (logical) 1 if they are the same, and is
case sensitive. strncmpi does the same but is not case sensitive.
>> strncmp(’apple’,’apple1’,5)
ans =
1
>> strncmp(’apple’,’apple1’,6)
ans =
0
>> strncmp(’apple’,’Apple1’,5)
ans =
0
>> strncmpi(’apple’,’Apple1’,5)
ans =
1
strmatch
strmatch compares rows of a character matrix with a string and returns the index of all rows that begin
with the string. To match only the entire row, use the optional command ’exact’
>> str = strvcat(’alpha’,’beta’,’alphabeta’);
>> strmatch(’alpha’,str)
ans =
1
3
>> strmatch(’alpha’,str,’exact’)
88 String Manipulation
ans =
1
strsplit allows a string to be split into a cell array using a character as a delimiter. strjoin is the in-
verse function and will join a cell array containing strings into a single string separated by a user provided
character.
regexp is similar to strfind but takes standard regular expression syntax commands, and is case sensitive.
regexpi does the same but is not case sensitive. For examples of regexp, see doc regexp.
str2num
str2num converts string values into numerical varies. The input can be either vector or matrix valued.
>> strvcat(’1’,’2’,’3’)
ans =
1
2
3
>> str2num(strvcat(’1’,’2’,’3’))
ans =
1
2
3
>> str2num([’1 2 3’;’4 5 6’])
ans =
1 2 3
4 5 6
12.2 String Conversion 89
str2double
str2double converts string values into numerical varies. Unlike str2num it only operates only on scalars or
cell arrays, and when used on a cell array, each cell must contain only a single string to convert. str2double
offers better performance when it is applicable.
num2str
num2str converts numerical values into strings. The input can be scalar, vector or matrix valued.
>> num2str([1;2;3])
ans =
1
2
3
>> num2str([1 2 3;4 5 6])
ans =
1 2 3
4 5 6
sscanf
sscanf can be used to convert strings to text, and is by far the fastest method to convert large text blocks
to numbers. The generic form of sscanf is
sscanf(text ,format )
where text is a numeric character string and format contains information about the format of the values
in text. sscanf operates column-by-column so that lines must be stored in columns (or if stored in rows,
the input can be transposed). The space character is used to delimit the end of an entry and so it is es-
sential that the input string must be padded by a space.2 The format string can handle a wide variety
of cases, although the most important are %d, which converts a string to a base-10 (32-bit) integer, and
%f, which converts a string to a floating point. Consider the following example which generates 10,000
random numeric strings using randi and then parses the text using sscanf, str2num and str2double.
>> text = char(47+randi(10,10000,6)); % Random numeric string
>> text = [text repmat(’ ’,10000,1)]; % Pad with space
>> tic; numericValues = sscanf(text’,’%d’); toc
Elapsed time is 0.005850 seconds.
sscanf is about 100 times faster than str2num and str2double. Format strings can include multiple ele-
ments in which case the formats are sequentially applied until the end of the text string is reached.
2
Technically, sscanf operates on text(:) (which is a single column vector constructed by stacking the input text). This is
why it is essential that lines are padded by a space.
90 String Manipulation
Note that sscanf terminate without an error when an unexpected string is encountered.
>> text = [num2str([pi floor(exp(1))]) ’ A ’ num2str((1+sqrt(5))/2)]
text =
3.1416 2 A 1.618
>> sscanf(text’,’%f’)
ans =
3.1416
2
In the example above, sscanf stops when it encounters the A and returns the first two values. It is impor-
tant to verify that the strings contain only the expected data (e..g. only numeric types, including .) prior
to the command.
fprintf
12.3 Exercises
1. Load the file hardtoparsetext.mat and inspect the variable string_data. The data in this file are ;
delimited and contain stock name, date of observation, shares out standing, and price. Write a pro-
gram that will loop over the rows and parse the data into four variables: ticker, date, shares and
price. Note: Ticker should be a string, date should be a MATLAB serial data, and shares outstanding
and price should be numerical. For values of ’N/A’, use NaN. For help converting the dates to serial
dates, see chapter 11.
Chapter 13
Structures and cell arrays are advanced data storage formats that often provide useful scaffolding for work-
ing with mixed (i.e. string and numeric) or structured data.
13.1 Structures
Structures allow related pieces of data to be organized into a single variable. Structures are constructed
using
variable_name.field_name
syntax where both variable_name and field_name must be valid variable names. One application of struc-
tures is to organize data. Consider the case of working with data that comes in triples which correspond
to x-, y- and z-data. One alternative would be to store the data as a 3 by 1 vector. Alternatively, a structure
could be used with field names x, y and z to provide added guidance on what is expected.
>> coord.x = 0.5
coord =
x: 0.5000
>> coord.y = -1
coord =
x: 0.5000
y: -1
>> coord.z = 2
coord =
x: 0.5000
y: -1
z: 2
Structures can also be used in arrays (array of structures), which can either be constructed using the com-
mand struct or lazily initialized by concatenation. Continuing from the previous example,
>> coord(2).x = 3
coord =
1x2 struct array with fields:
x
y
z
92 Structures and Cell Arrays
>> coord(2).y = 2
coord =
1x2 struct array with fields:
x
y
z
>> coord(2).z = -1
coord =
1x2 struct array with fields:
x
y
z
The elements of the array of structures can be accessed like any other array with the caveat that the as-
signment will itself be a structure.
>> newCoord = coord(1)
newCoord =
x: 0.5000
y: -1
z: 2
The fundamental problem with structures in MATLAB is that they are difficult to work with, and that op-
erating on structures requires operating on the fields one-at-a-time. Structures are also difficult to preal-
locate and so performance issues arise when used in large arrays. Structures are still commonly used (for
example, in optimset), although they have been supplanted by a more useful object, the cell array. It is
tempting to use structures to push large collections of data, parameters and other values into and out of
functions. This is generally a bad practice and should be avoided.
Cell arrays are a powerful alternative to the “everything is a matrix” model of classic MATLAB. Cell arrays
are formally jagged (or ragged) arrays and are collections of other arrays (possibly other cell arrays). Cell
arrays can be thought of as generic containers where the final elements are one of the MATLAB primitive
data types (e.g. a matrix). The are most useful when handling either pure string data or mixed data which
contains both string values and numbers. Cell arrays manipulation is similar to matrix manipulation al-
though there are some important caveats.
13.2 Cell Arrays 93
Cell arrays can be initialized using the cell command or directly using braces ({}). In either case,
braces are used to access elements within a cell array. The example below shows how cell arrays can be
pre-allocated using cell and then populated using braces.
Initially the variable was an empty cell array. After the string vector ’cell’ was added in the first po-
sition, only the second was empty. Finally, the string vector ’array’ was placed into the second position.
This simple example show the ease with which cell arrays can be used to handle strings as opposed to
using matrices of characters which becomes problematic when some of the rows may not have the same
number of characters, which are required to be padded with blank characters (and then deblanked before
being used).
Cell arrays are also adept at handling mixed data, as the next example shows.
The cell array above has a string in the first position and a 5 by 1 numeric vector in the second. Cell arrays
can even contain other cell arrays, and so can be used to store virtually any data structure by nesting.
The primary method for accessing cell arrays is through the use of braces ({}) as the two previous exam-
ples demonstrated. Selecting an element using braces returns the contents of the cell and can be used to
assign the values for processing using functions that are not designed for cell arrays. Continuing from the
previous example,
>> x = cellArray{1}
x =
string
>> y = cellArray{2}
y =
1 2 3 4 5
Cell arrays can also be accessed using parentheses although this type of access is markedly different
from accessing cell arrays with braces. Unlike braces which access the contents of a cell, parentheses
access the cell itself and not its contents. The difference in behavior means that subsets of a cell array can
be assigned to another variable without iterating across the contents of the cell array.
>> cellArray = cell(3,1);
>> cellArray{1} = ’one’;
>> cellArray{2} = ’two’;
>> cellArray{3} = ’three’;
cellArray =
’one’
’two’
’three’
% Correct method to reassign elements of a cell array to a new array using parentheses ( )
>> newCellArray = cellArray(1:2)
newCellArray =
’one’
’two’
% Incorrect method to reassign elements of a cell array to a new array using braces { }
>> newCellArray = cellArray{1:2}
newCellArray =
one
In the correct example above, newCellArray contains the first elements of cellArray. Also note the incor-
rect attempt to assign the first two elements using braces which does not produce the desired result.
Cell arrays, like structures, are useful data structures for working with strings or mixed data. Cell arrays
are generally superior to structures and there are many functions which can operate directly on cell arrays
of strings (e.g. sort, unique, ismember). They do come with some overhead and so are not appropriate for
every use. For example, a 2 by 1 vector containing [1 2]’ requires 16 bytes of memory. A cell array with 1
in its first cell and 2 in its second requires 240 bytes of memory, a 15 fold increase. Due to this overhead
cell arrays are undesirable in situations where data is highly regular and where the contents of each cell is
small.
Chapter 14
Importing data ranges from simple to very difficult, depending on the data size and format. A few princi-
ples can simplify this task:
• Variables in Excel files should have one variable per column with a distinct variable name in the top
cell of each column.
• Use readtable to import either delimited text or Excel files into MATLAB tables.
• When importing data using one of the fast but legacy import functions (e.g., csvread or xlsread),
the file imported should contain numbers only. The sole exception to this rule is that Excel files can
also contain dates (in Excel date format).
The simplest and most robust method to import data is to use a correctly formatted Excel file and the
import wizard. The key to the import is to make certain the data in the Excel file has been formatted
according to a simple set of rules:
• A valid, distinct variable name for the column in the first row
As an example, consider importing a month of GE prices downloaded from Yahoo! Finance. The original
data can be found in GEPrices.xlsx and is presented in Figure 14.1. This data file fits the requirements since
all columns contain either dates or numbers.
This file can be imported using the following steps. First, change the Current Directory to the directory
with the Excel file to be imported. Next, select the Current Directory browser in the upper left pane of
the main window.1 The Excel file should be present in this view. To import the file, right click on the file
name and select Import Data... (see figure 14.1). This will trigger the dialog in figure 14.1. To complete the
1
If this pane is absent, it can be enabled in the Desktop menu along the top of the MATLAB window.
96 Importing and Exporting Data
Figure 14.1: The raw data as taken from Yahoo! Finance. All of these columns are well formatted with
variable names in the first row and numeric (or date) content.
import, make sure Column Vectors is chosen (top left of Import Wizard) and click Import. If the import fails
the most likely cause is the format of the Excel file – make certain the file conforms to the rules above and
try again. Alternatively, select Table which will read the data into a single MATLAB table (see 15 for more
on tables).
The preferred method to import date from files through code is to use readtable which will import the
data into a table (see 15 for more on tables). Other methods to read in data include xlsread, csvread,
textread and textscan which provide data-type specific readers. These lower level readers can be faster
than readtable although they are also more fragile in the sense that they produce errors when data is not
well formatted.
For many datasets stored as either delimited text or in an Excel file, readtable(filename) will import data
into a MATLAB table without any further options. The imported data will consist of columns, each with a
datatype optimized to hold the type of data in the file. In particular, columns of numbers will be imported
14.2 Importing Data in Code 97
Figure 14.2: To import data, select the Current Directory view, right click on the Excel file to be imported,
and select Import. This will trigger the import wizard in figure 14.1.
as numeric arrays while columns of strings or string dates will be imported in a cell array. If the file contains
variable names in the first row, the table will read these in automatically and use them as the column
names. readtable supports delimited text, Excel files, and Open Document Spreadsheets, and attempts
to infer the type of file from the file’s extension.
The CSV below contains the first 10 rows to IBM_TAQ.txt and contains Trade-and-Quote data for IBM
on one day.
SYMBOL,DATE,TIME,PRICE,SIZE
IBM,20070103,9:30:03,97.18,100
IBM,20070103,9:30:08,96.6,373200
IBM,20070103,9:30:08,97.17,1000
IBM,20070103,9:30:08,97.17,100
IBM,20070103,9:30:08,96.61,200
IBM,20070103,9:30:08,96.75,200
IBM,20070103,9:30:08,97.15,100
IBM,20070103,9:30:08,97.15,100
IBM,20070103,9:30:08,97.15,100
IBM,20070103,9:30:08,97.15,100
This file can be imported using readtable and columns that are not numbers are imported as strings.
>> t = readtable(’IBM_TAQ_top_10.csv’)
t =
SYMBOL DATE TIME PRICE SIZE
98 Importing and Exporting Data
Figure 14.3: As long as the data is correctly formatted, the import wizard should import the data and create
variables with the same name as the column headers. To complete this step, make sure that Column vectors
is selected from the Import as drop-down box and then select Import.
Some basic reformatting can be used to reformat the DATE and TIME columns as a datetime.
>> times = datetime(t.TIME) - datetime(’today’);
>> dates = datetime(t.DATE,’ConvertFrom’,’yyyymmdd’);
>> t.datetimes = dates + times;
>> t(:,’datetimes’)
ans =
datetimes
____________________
03-Jan-2007 09:30:03
03-Jan-2007 09:30:08
03-Jan-2007 09:30:08
14.2 Importing Data in Code 99
03-Jan-2007 09:30:08
03-Jan-2007 09:30:08
03-Jan-2007 09:30:08
03-Jan-2007 09:30:08
03-Jan-2007 09:30:08
03-Jan-2007 09:30:08
03-Jan-2007 09:30:08
Optional parameters that provide additional information can be passed using the format readtable(filename,
param, value). The most useful options for param are ’ReadVariableNames’ which takes a value of true or
false (1 or 0) and instructs readtable to read the names from the file, ’ReadRowName’ which indicates that
row names should be read, and ’TreatAsEmpty’ which takes a cell array of strings that readtable should
treat as missing values (e.g., {’ ’, ’N/A]}). When reading text files, optional inputs all the the delimiter
to be set (’Delimiter’) and lines at the top of the file containing non-date to be skipped (’HeadLines’).
When reading Excel or OpenDocument Spreadsheet files, the sheet to read can be set (’Sheet’) and the
range to read, a rectangular region in the sheet, can be set (’Range’).
Data in Excel sheets can be also be imported using the function xlsread from the command window.
Accompanying this set of notes is an Excel file, deciles.xls, which contains returns for the 10 CRSP deciles
from January 1, 2004, to December 31, 2007. The first column contains the dates while columns 2 through
11 contain the portfolio returns from decile 1 through decile 10 respectively. To load the data, use the
command
>> data = xlsread(’deciles.xls’);
This command will read the data in sheet1 of file deciles.xls and assign it to the variable data. xlsread
can handle a number of other situations, including reading sheets other than sheet1 or reading only spe-
cific blocks of cells. For more information, see doc xlsread. Data can be exported to an Excel file using
xlswrite. Extended information about an Excel file, such as sheet names and can be read using the com-
mand xlsfinfo.
MATLAB and Excel do not agree about dates. MATLAB dates are measured as days past January 0,
0000 while Excel dates are measured relative to December 31, 1899. In MATLAB serial date 1 corresponds
to January 1, 0000 while in Excel day 1 corresponds to January 1, 1900. To convert imported Excel dates
into MATLAB dates, datenum(’30DEC1899’) must be added to the column of data representing the dates.
Returning to the example above,
>> [A,finfo]=xlsfinfo(’deciles2.xls’)
A =
Microsoft Excel Spreadsheet
finfo =
’deciles’
>> data = xlsread(’deciles2.xls’,’deciles’,’A2:K1257’);
>> dates = data(:,1);
>> datestr(dates(1))
ans =
03-Jan-0104
>> dates = dates + datenum(’30DEC1899’);
100 Importing and Exporting Data
>> datestr(dates(1))
ans =
02-Jan-2004
This example uses a files deciles2.xls which contains the sheet deciles. Opening the files in Excel shows that
deciles contains column labels as well as the data. To import data from this file, xlsread needs to know to
take the data from deciles in cells A2:K1275 (upper left and lower right corners of the block). Running the
command xlsread(’deciles2.xls’, ’deciles’, ’A2:K1257’) does this. Finally, the disagreement in the
base date is illustrated and the correction is shown to work. For more on dates, see Chapter 11.
Comma-separated value (CSV) data is similar to Excel data, although the CSV files must contain only
numeric values. If the file contains strings, such as variable names, the import will fail. The command to
read CSV data is virtually identical to the command to read Excel files,
% This command fails since deciles.csv contains variable names in the first row
>> data = csvread(’deciles.csv’) %Error
Error using dlmread (line 139)
Mismatch between file and format string.
Trouble reading number from file (row 1u, field 1u) ==>
caldt,CAP1RET,CAP2RET,CAP3RET,CAP4RET,CAP5RET,CAP6RET,CAP7RET,CAP8RET,CAP9RET,CAP10RET\n
However, like xlsread, csvread be given a specific cell to begin reading, and so leading text can be avoided.
% This command works since it skips the first row
>> data = csvread(’deciles.csv’,1,0)
dlmread allows for text files with other delimiters to be read. These examples show the basic usage of
dlmread for reading a tab-delimited file and a CSV file.
14.3 MATLAB Data Files (.mat) 101
The native file format is the MATLAB data file or mat file. Data from a mat file is loaded by entering
>> load deciles.mat
There is no need to specify an input variable as the mat file contains both variable names and data. See
below for saving data in mat format. load can also be used as a function, which allows for dynamically
generated file names.
% Function usage
>> load(’deciles.mat’)
Load can be used with a single output which will load all variables in the mat file into a structure (see
chapter 13).
% Function usage
>> dec = load(’deciles.mat’)
dec =
data: [1256x11 double]
>> dec.data(1:2,1)
20040102
20040105
In recent versions of MATLAB (R2011b or later), it is possible to load only a subset of the variables in a file.
Suppose a mat file contained 3 variables, x, y and z. To load only x and y (but not z), use the following
command.
% Limited import
>> load(’datafile.mat’,’x’,’y’)
The related function whos can be used to generate a list of the variables in a mat file.
% Mat contents
>> contents = whos(’-file’,’deciles.mat’)
contents =
name: ’data’
size: [1006 11]
bytes: 88528
class: ’double’
global: 0
sparse: 0
complex: 0
nesting: [1x1 struct]
persistent: 0
In this example, contents is a structure. If the mat file contains more than 1 variables, an array of structures
is returned (see chapter 13 for more on structures).
102 Importing and Exporting Data
MATLAB can be programmed to read virtually any text (or even binary) format since it contains functions
for parsing and interpreting arbitrary file data. Reading poorly formatted data files is an advanced tech-
nique and should be avoided if possible. However, some data is only available in formats where reading
in data line-by-line is the best solution.2 For instance, the standard import method fails if the raw data is
very large (too large for Excel) and is poorly formatted. In this case, one solution is to write a program to
read and process the file line-by-line.
The file IBM_TAQ.txt contains a simple example of data that is difficult to import. This file was down-
loaded from WRDS and contains all prices for IBM from the TAQ database in the interval January 1, 2001,
through January 31, 2001. It is too large to use in Excel and has both numbers, dates and text on each line.
The following code block demonstrates one approach to parsing this file.
fid=fopen(’IBM_TAQ.txt’,’rt’);
%Count number of lines
count=0;
while 1
line=fgetl(fid);
if ~ischar(line)
break
end
count=count+1;
end
%Close the file
fclose(fid);
2
Line-by-line importing of complex files is slow but relatively straight forward. More advanced users will find that processing
complex files in blocks using fread is substantially faster. See chapter 21 for more discussion of file importing.
14.4 Advanced Data Import 103
• Re-reads the file parsing each line by the location of the commas using strfind to locate the delim-
iting character
textscan is a relatively fast method to read files that contain mixed numeric and string data. A text file
must satisfy some constraints in order for textscan to be useful. First, the file must be regular in the sense
that it has the same number of columns in every row, and second each column must contain the same
type of data – that is, the file must not mix strings with numbers in a column. IBM_TAQ.txt is satisfied
these two constraints and so can be read using the command block below. textscan uses a file handle
created using fopen as the file input, rather than the file name directly.
fid = fopen(’IBM_TAQ.csv’,’rt’);
data = textscan(fid, ’%s %f %s %f %f’, ’delimiter’, ’,’, ’HeaderLines’, 1)
fclose(fid);
104 Importing and Exporting Data
The arguments to textscan instruct the function that the lines are formatted according to string-
number-string-number-number where %s indicates string and %f indicates number, that the columns
are delimited by a comma, and that the first line is a header and so should be skipped. The data read in by
textscan is returned as a cell array, where numeric columns are stored as vectors while string values (the
ticker and the time in this example) are stored as cell arrays of strings. The use of curly braces, {} indicates
that a cell array is being used. See chapter 13 for more on accessing the values in cell arrays.
>> data
data =
Columns 1 through 3
{558986x1 cell} [558986x1 double] {558986x1 cell}
Columns 4 through 5
[558986x1 double] [558986x1 double]
>> data{1}(1)
ans =
’IBM’
>> data{2}(1)
ans =
20070103
>> data{3}(1)
ans =
’9:30:03’
>> data{4}(1)
ans =
97.1800
>> data{5}(1)
ans =
100
Note that the time column would need further processing to be transformed into a useful format. For more
on reading poorly formatted data file, see the documentation for fopen, fscanf, fread, fgetl, dlmread, and
textscan. See chapter 12 for more on string manipulation. textscan is a good solution for files with mixed
data which are not excessively large – large files tend to be very slow due to the use of cell arrays.
Once the data has been loaded, save it and any changes in the native MATLAB data format using save
>> save filename
This will produce the file filename.mat containing all variables in memory. filename can be replaced with
any valid filename. To save a subset of the variables in memory, use
>> save filename var1 var2 var3
which saves the file filename.mat containing var1, var2, and var3. save, like load, can also be used as a
function which allows for using a variable as the file name.
>> saveFileName = ’filename’;
>> save(saveFileName,’var1’,’var2’,’var3’)
14.6 Exercises 105
Data can be exported to a tab-delimited text files using save with the arguments -double-ascii. For ex-
ample,
>> save filename var1 -ascii -double
would save the data in var1 in a tab-delimited text file. It is generally a good practice to only export one
variable at a time using this method. Exporting more than one variable results in a poorly formatted file
that may be hard to import into another program. The restriction to a single variable should not be seen
as a severe limitation since var1 can always be constructed from other variables (e.g. var1=[var2 var3];.
tabless can be exported using writetable to either delimited text or Excel. writetable facilitates the
export of tables-specific features such as variable names and row names. Alternative methods to export
data include xlswrite, csvwrite and dlmwrite.
14.6 Exercises
1. Use the import wizard to import exercise3.xls, which contains three columns of data, the date, the
return on the S&P 500, and the return on XOM (ExxonMobil).
2. Use xlsread to read the file exercise3.xls. Load in the three series into a new variable named returns.
3. Parse returns into three variables, dates, SP500 and XOM. (Hint, use the : operator).
5. Save a MATLAB data file dates with only the variable dates.
6. Construct a new variable, sumreturns as the sum of SP500 and XOM. Create another new variable,
outputdata as a horizontal concatenation of dates and sumreturns.
7. Export the variable outputdata to a new .xls file using xlswrite. See the help available for xlswrite.
106 Importing and Exporting Data
Chapter 15
Traditionally arrays in MATLAB were homogeneous – all data in an array had to have the same type. For
example, in the usual case, all values in a numeric array are stored as double precision floating point num-
bers. Over time MATLAB has added support for arrays with many different data types including integers,
unsigned integers, single precision floating points and datatimes. Simultaneously MATLAB has offered
support for fully heterogeneous data using cell arrays. Cell arrays support all MATLAB data types and
each cell in an array can contain a different data type. While cell arrays are general purpose they are nec-
essarily slow when storing homogeneous data. This arises due to the different way in which data is stored.
In a traditional array, data is contiguous so that the second element in the array is adjacent to the first
in memory, and the n th element is exactly (n − 1) steps away from the first where the step size only de-
pends on the size of the data type used in the array (e.g., 8 bytes for a double precision floating point). Cell
arrays, on the other hand, are not contiguous in memory and each cell actually points to a different loca-
tion where the data is stored. As a result, accessing adjacent elements in a cell array requires additional
lookups and in most cases, additional fetches from the main memory of the computer. This makes cell
arrays too slow for any serious numeric computations.
Recently, tables have been introduced to provide more continuity between fast, homogeneous nu-
meric arrays and slow, heterogeneous cell arrays. tables are collection of columns with additional meta-
data including variable names. With-in each column, the data type is homogeneous1 , while across columns
data types may differ. This structure has a number of advantages over purely numeric arrays since data
sets containing strings, dates, and numeric data can be aligned while preserving the ability to easily use
data in high-performance numeric applications.
15.1.1 Importing
In most cases, tables will be created by importing data into MATLAB. Data is imported into a table using
readtable, which can import delimited text files (e.g., comma or tab separated values), Excel or OpenDoc-
ument Spreadsheet files. If the data file to be imported is well formatted with variable names in the first
row and data in rows below, readtable will import into a table and automatically read the variable names.
Consider importing the following comma separated value file saved as animals.csv,
1
With the exception of a column that is actually a cell array, which is permitted. For example, columns in tables that store
strings will typically be cell arrays.
108 Working with Heterogeneous Data
name,species,weight,height,birthday
Seabiscuit,horse,650000,1600,23/5/1933
Callie,dog,32000,550,3/1/2015
Grumpy Cat,cat,4000,240,4/4/2012
Jerry,mouse,19,25,10/2/1940
>> readtable(’animals.csv’)
ans =
name species weight height birthday
____________ _______ _______ ______ ___________
which demonstrated the variable name importing. If a data file does not have obvious variable names,
readtable will generate automatic variable names using the pattern Var1, Var2, . . ..
Tables can also be directly created using the table function. When used with existing variables, the vari-
able name will be automatically used in the table.
Alternatively, variable names can be set when calling table by using the optional argument ’VariableNames’
followed by a cell array containing the variable names.
Standard MATLAB arrays, cell arrays and arrays of structures can all be converted to tables using array2table,
cell2table, and struct2table. Using array2table to convert a 2-dimensional array to a table is similar
to calling table after splitting the columns into separate variables where each column is named after the
parent array and the column numbers.
>> x = reshape(1:12,4,3)
>> array2table(x)
ans =
x1 x2 x3
__ __ __
1 5 9
2 6 10
3 7 11
4 8 12
cell2table is similar except that an attempt to find a homogeneous datatype for each column is attempted.
If a homogeneous datatype cannot be detected, then the column will be stored as a cell array. In general, it
is not a good idea to store data with mixed types in a table, and in some cases converting cell arrays using
tables can result in a loss of information due to the method used by MATLAB for detecting a homogeneous
datatype. This issue is demonstrated in column 4 (x4) in the example below where the float is truncated
and the third value is truncated to be the maximum value of a uint8.
>> x = [{’a’,’b’,’c’}’,{1,2,3}’,...
{’a’,2,datetime(’12/31/1999’)}’, {uint8(1), 3.14, 2^31}’];
>> t = cell2table(x)
t =
x1 x2 x3 x4
___ __ _____________ ___
’a’ 1 ’a’ 1
’b’ 2 [ 2] 3
’c’ 3 [31-Dec-1999] 255
>> iscell(t.x1) % String, so still cell array
>> iscell(t.x2) % Homogeneous array now
0
>> iscell(t.x3) % Mixed, so still a cell array
1
>> t.x4 % Note loss of information
ans =
1
3
255
Finally, struct2table creates a table from an structure containing arrays or an array of structures. Column
names are derived from the fields of the structure.
>> clear y % Ensure y is clear before assigning fields
>> y.y1 = [1;2;3];
110 Working with Heterogeneous Data
In addition to providing a flexible container for storing heterogeneous data, tables have a number of ad-
ditional features when compared with either numeric arrays or cell arrays. These features are designed to
provide additional meaning to the data stored in a table and include both variable and row names.
Three properties of a table are dedicated to storing information about variables. The most useful prop-
erty is the capability to provide variable names (VariableNames). Traditional numeric arrays can only be
accessed by column number, and so it was necessary to remember which column contained which data
series. tables allow variables to have names which are displayed when viewing the data in a table and are
used in other table-specific functions. Variable names must be value MATLAB variable names. Variable
descriptions (VariableDescriptions) provide a matched set of strings which can contain any information
required to describe a variable in a table. Variable units (VariableUnits) are strings which can be used to
store the unit of a variable (e.g. Million USD, $, or Hours).
Row Names
Tables can have named rows. Row names are set using the RowNames property. Row names much satisfy
two constraints: the row names must be strings and they must be unique.
Three additional fields are available to store table metadata. Description can be used to store a general
string description of a table. DimensionNames can be used to store the names of each dimension in a table.
Finally, UserData can be used to store any other information about a table that one wishes to store that
doesn’t cleanly fit into one of the other categories.
Properties can either be set when creating a table using optional arguments of the form ’PropertyName’,
PropertyValue or using the field .Properties of a table. All properties can be read using table .Properties
15.3 Column data types 111
tables are designed to efficiently handle data that with heterogeneous types across variables but homo-
geneous within a single variable. Columns can have different data types which allows for efficient storage
of large dataset.
15.3.1 Numeric
Numeric is a common format for storing data values. The default numeric columns type will use double
precision floating point numbers. Each value requires 8 bytes of storage, and so when data have a more
limited range, for example, integer values less than some value, other numeric types can be used to re-
duce the amount of memory required to store data. For example, 8-bit unsigned integers can hold values
between 0 and 255, inclusive, and require only 1 byte of storage per value.
15.3.2 Strings
Strings are usually stored in a table using cell arrays. This allows for strings to have different lengths and
for simple manipulations of string values.
15.3.3 categoricals
Categoricals are used when a string variable only takes a relatively small number of values. For example,
country names in a large dataset of web visitor data can only take around 200 values. Categorical vari-
ables efficiently encode these strings to integers while preserving the ease of interpretation of the original
112 Working with Heterogeneous Data
country names. A cell array of strings can be converted to a categorical using the command categorical(
cellarray ). In this example, a large list of full country names requires around 1.1 MiB of storage which the
categorical version of the same data requires about 1% as much memory.
>> names = {’Afghanistan’,’Albania’,’Algeria’,’Andorra’,’Angola’}’
>> countries = names(randi(5,10000,1))
>> countries_cat = categorical(countries)
>> whos countries*
Data in an existing table can be converted to a categorical by assigning the output of a call to categorical
to the original variable.
>> t = table(countries)
>> t.countries = categorical(t.countries)
15.3.4 datetimes
datetimes represent another optimized format. Traditionally MATLAB used a proprietary serial date for-
mat that expresses a date as the number of days since January 1, 0000 12:00:00 AM (which was 1.0). These
dates are difficult to work with since it is not easy to interpret 730485 as December 31, 1999. datetimes
offer an alternative storage format that is visibly represented as human-readable dates while using an opti-
mized format for the storage of dates and times. Datetimes also bring support for timezone information,
which is missing in the MATLAB serial date format. datetimes are created by calling datetime on a cell
array of string dates and times.
>> dates = {’23/5/1933 12:00:00 AM’,’3/1/2015 6:30:15 PM’,...
’4/4/2012 6:18:18 PM’,’10/2/1940 12:21:12 AM’}’;
>> datetimes = datetime(dates)
>> whos date*
Name Size Bytes Class Attributes
dates 4x1 608 cell
datetimes 4x1 169 datetime
Like categoricals, datetimes can be added to an existing table by assigning the output of datetime.
>> animals = readtable(’animals.csv’)
animals =
name species weight height birthday
____________ _______ _______ ______ ___________
The difference between dates stored as strings and dates stored as datetime is in the representation of the
date and the lack of quotation marks.
15.4 Selection
Parentheses are the simplest method to access a table and selections made with parentheses will return
a table even if a single column is selected. Two inputs are required, one for rows and one for columns.
Selecting rows is identical to selecting rows of a matrix and any of the usual methods, scalar, numeric list
of indices, slice (using : notation), or logical array, can be used. Selecting columns supports the same 4
selection types in addition to selection by variable name. When selecting using a single variable name, the
name alone can be used. When selecting multiple columns the variable names should be entered using a
cell array using {}.
>> t = table([1,2,3]’,[10,9,8]’,[-1,0,1]’,[’a’,’b’,’c’]’,...
datetime({’12/31/1999’,’1/31/2000’,’2/29/2000’}’),...
’VariableNames’,{’Alpha’,’Beta’,’Gamma’,’Delta’,’Epsilon’});
>> t(:,1:3) % Cols 1, 2 and 3
>> t(:,3) % Col 3 only
>> t(:,logical([1,1,0,1])) % Select cols 1, 2 and 4
Finally, order matters and so these two commands are not identical.
>> t([2,1],{’Alpha’, ’Beta’}) % Name order is respected
>> t([2,1],{’Beta’, ’Alpha’}) % Name order is respected
Braces (or curly braces, {}) can be used to extract values from a table. The important difference is that the
result is an array if the data are homogeneous. Arrays do not support heterogeneous data and so using
braces to select multiple columns with different types will produce an error. When using braces, two
arguments are required. Aside from the requirement for a homogeneous input and the return of an array,
using braces is virtually identical to using parentheses.
>> t{:,1:3} % Cols 1, 2 and 3
>> t{:,3} % Col 3 only
>> t{:,logical([1,1,0,1])} % Select cols 1, 2 and 4
114 Working with Heterogeneous Data
Finally note that using braces on a mixed table produces another an error.
>> t{:,{’Alpha’,’Delta’,’Epsilon’}} % Error
Dot notation allows a single column to be extracted. Generally, the syntax used will be table.variable as
in
>> t.Alpha
Dot notation can additionally be used to select columns based on numeric position using the syntax ta-
ble.(#) where # is a number. This dot selection is identical to the previous one.
>> t.(1)
Note that when using dot selection the column selected is just a standard array and not a table. Dot se-
lection can be chained with other selectors to subset the column selected with the dot. For example,
>> t.Alpha(2:3)
Tables can be exported to other MATLAB data structures including homogeneous arrays, cell arrays and
structures containing arrays using table2array, table2cell, and table2struct, respectively. table2array
can only export tables that are homogeneous (e.g. all numbers).
>> table2array(t(:,1:3)) % Only the numbers
ans =
1 10 -1
2 9 0
3 8 1
>> table2cell(t)
ans =
[1] [10] [-1] ’a’ [31-Dec-1999]
[2] [ 9] [ 0] ’b’ [31-Jan-2000]
[3] [ 8] [ 1] ’c’ [29-Feb-2000]
>> table2struct(t)
ans =
3x1 struct array with fields:
Alpha
Beta
Gamma
Delta
Epsilon
15.5 Table-specific features 115
tables can be saved to MATLAB data files using the same syntax as any other variable, save matfilename
tablename. tables can be exported to either delimited text files or excel files. The file extension determines
the file format written. By default, text files will be comma separated, although this can be changed using
an optional argument. Excel files can be exported in either old or new Excel file formats (.xls for old,
.xlsx or .xlsm for new).
Options can be passed using additional arguments. The most useful are ’WriteVariableNames’ and ’WriteRowNames’
which determine whether these values will be exported.
Tables support a range of SQL-like operations that allow tables to be merged or joined and which allow for
row-based set operations such as intersections or differences of two tables. join can be used to join two
tables on one or more variables using a SQL-like left join. join requires that the table being joined to the ex-
isting table has all of the keys in the existing table. The related, and more useful, innerjoin and outerjoin
perform inner (retain only rows in both) or outer (retain if a row in either) joins. Neither innerjoin nor
outerjoin require all values of a key to be available in both tables.
>> t1 = table({’dog’,’cat’,’horse’}’,[1,2,3]’,...
’VariableNames’,{’animal’,’id’});
>> t2 = table({’bird’,’dog’,’cat’,’dolphin’}’,...
[102.2,43.1,13.9,73.3]’,...
’VariableNames’,{’animal’,’weight’});
>> innerjoin(t1,t2,’Keys’,’animal’)
ans =
animal id weight
______ __ ______
’cat’ 2 13.9
’dog’ 1 43.1
>> outerjoin(t1,t2,’Keys’,’animal’)
ans =
animal_t1 id animal_t2 weight
_________ ___ _________ ______
A wide range of set operations are also available for finding sub- or super-sets of tables. intersect find
the intersection of two tables and returns the common rows. setdiff returns the rows in one table that
116 Working with Heterogeneous Data
are not in the other table; similarly setxor returns the rows that are in either table except those that are
available in both. union returns set of all unique rows in two tables. unique returns the unique rows in a
single table while ismember returns a true/false value indicating if a row is in another table.
15.5.4 Grouping
tables support computing statistics or applying other functions across groups. For example, if a data set
contains data for individual income and hours worked across states, it is interesting to see how income
varies with hours worked. This requires computing the average income and the average hours worked for
each state. varfun makes this type of calculation simple since it support computing a function, variable-
by-variable, and allow automatic grouping on one or more variables.
>> states = {’NY’,’FL’,’CA’,’TX’}’;
>> index = randi(4,100000,1);
>> income = 30000 + 2500 * index + 10000 * randn(100000,1);
>> hours = 35 + 2.5 * index + 8 * randn(100000,1);
>> state = states(index);
>> t = table(state, income, hours)
>> varfun(@mean,t,’InputVariables’,{’hours’,’income’},...
’GroupingVariables’, ’state’)
ans =
state GroupCount mean_hours mean_income
_____ __________ __________ ___________
Other available table-specific function as rowfun, which will compute function across variables in a par-
ticular row, findgroups which will generate a set of group indices for a table, and splitapply which allows
more generality than varfun for computing grouped statistics.
A number of c-specific functions are available to simplify working with tables. summary can be used to
compute a basic summary of the variables in a table.
>> summary(t)
Alpha: 3x1 double
Values:
min 1
median 2
max 3
Beta: 3x1 double
Values:
min 8
median 9
max 10
Gamma: 3x1 double
15.5 Table-specific features 117
Values:
min -1
median 0
max 1
Delta: 3x1 char
Epsilon: 3x1 datetime
Values:
min 31-Dec-1999
median 31-Jan-2000
max 29-Feb-2000
istable returns true (1) if a variable is a table. height and width return the number of rows and columns,
respectively, in a table. These are mostly redundant since size can be used with these tables as well.
118 Working with Heterogeneous Data
Chapter 16
The most valuable code in the statistics toolbox are the CDFs, PDFs, random number generators and in-
verse CDFs. All distributions commonly encountered in econometrics have the complete set of four pro-
vided, including
• χ 2 (chi2-)
• β (beta-)
• Exponential (exp-)
• F (f-)
• Γ (gam-)
• Lognormal (logn-)
• Poisson (poiss-)
• Student’s t (t-)
• Uniform (unif-)
16.2.1 quantile
quantile returns the empirical quantiles of a vector. It requires two inputs. The first is a vector or matrix
(T by K ) and the second is an M -element vector of quantiles to compute. When the input is a vector, the
120 Probability and Statistics Functions
output will have the same dimensions as the list of quantiles used (either 1 by M or M by 1). When the
input is a matrix, a M by K matrix is returned where each column of the computed quantiles corresponds
to a column of the input matrix. quantile is simple and can easily be replaced using sort, length and
floor or ceil.
>> x = randn(100000,1);
>> quantile(x,[.025 .05 .5 .95 .975])
-1.9567 -1.6430 0.0010 1.6375 1.9488
16.2.2 prctile
prctile is identical to quantile except it expects an arguments between 0 and 100 rather between 0 and
1.
16.2.3 regress
regress performs basic regression and returns key regression statistics. The Statistic Toolbox implemen-
tation is not robust to many empirical realities in economic or financial data (such as heteroskedasticity)
and so is of limited use.
The MFE Toolbox contains a set of functions addressing many common problems in financial economet-
rics. It is available on the course website. Note that the MFE Toolbox has superceeded the UCSD_garch
toolbox.
16.4 Exercises
1. Have a look through the statistics toolbox in the help browser and explore the functions available.
2. Download the MFE toolbox and extract its contents. Have a look through the list of functions avail-
able.
Chapter 17
Custom Functions
Custom functions can be written to perform repeated tasks or to use as the objective of an optimization
routine. All functions must begin with the line of the form
where out1, out2, . . . are variables the function returns to the command window, functionname is the
name of the function (which should be unique and not a reserved word) and in1, in2, . . . are input vari-
ables.
To begin, consider this simple function func1
function y = func1(x)
x = x + 1;
y = x;
This function, which is not particularly well written1 , takes one input and returns one output, increment-
ing the input variable (whether a scalar, vector or matrix) by one.
Functions have a few important differences relative to standard m-file scripts.
• Functions operate on a copy of the original data. Thus, the same variable names can be used inside
and outside of a function without risking any data.2
• Any variables created when the function is running, or any copies of variables made for the function,
are lost when the function completes unless they are explicitly returned.3
In the function above, this means that only the value of y is returned and everything else is lost – in par-
ticular, changes in x do not persist. For example, suppose the following was entered
>> x = 1;
>> y = 1;
>> z = func1(x);
>> x
1
It has no comments, has superfluous commands and is trivial in nature. The function should only contain y = x +1; and
a comment that describes the function’s purpose.
2
MATLAB uses a copy-on-change model where data is only copied if modified. If unmodified, variables passed to functions
behave as if passed by reference.
3
MATLAB supports global variables using the keyword global. Global variables can be seen both in the standard workspace
and inside functions. In general, global variables should be avoided. Use cases of global variables are discussed in Chapter 21.
122 Custom Functions
x =
1
>> y
y =
1
>> z
z = 2
Thus, despite the function using variables named x and y, the values of x and y in the workspace do not
change when the function is called.
Functions with multiple inputs and outputs can also be constructed. A simple example is given by
This function takes two inputs and returns two outputs. It is important to note that despite the two outputs
of this function, it i not necessary to call the function with two outputs. For example, consider the following
use of this function.
>> x = 1;
>> y = 1;
>> z1 = func2(x, y)
z1 =
2
>> [z1, z2] = func2(x, y)
z1 =
2
z2 =
0
>> [~, z2] = func2(x, y)
z2 = 0
The final call shows the use of ~ to suppress leading outputs of functions when they are not used.
There are a number of advanced function specific variables available to determine environmental param-
eters such as how many input variables were provided to the function (nargin), how many output were
requested (nargout), that allow variable numbers of input and outputs (varargin and varargout, respec-
tively) and that allow for early termination of the function (return). This course can be completed without
using any of these, although they are useful especially when producing code for other users.
17.1.1 nargin
nargin is available inside functions to determine the number of inputs provided in the function call. This
allows for default values to be used for trailing inputs. Note that an empty input ([]) is still an input, and
so it may be necessary to check whether an input is empty using isempty.
17.1 Function-specific functions 123
17.1.2 nargout
nargout is available inside functions to determine the number of outputs requested. It is useful to avoid
calculating some outputs when if the number of outputs requested is smaller than the maximum number
of outputs supported by the function.
17.1.3 varargin
varargin can be used as the last input in a function declaration to capture a variable number of inputs.
Consider the following code.
function varargin_demo(varargin)
% Iterates across all inputs and displays the contents
for i=1:length(varargin)
disp(varargin{i})
end
This function can accept any number of inputs (including 0) and will iterate across the inputs and display
their contents. Note that varargin is a cell array (see chapter 13).
17.1.4 varargout
varargout is similar to varargin, only that it allows for a variable number of outputs. varargout is rarely
encountered, but can be used to allow producing as many outputs as the number of inputs when varargin
is used.
function varargout = varargout_demo(varargin)
% Iterates across all inputs and displays the contents
varargout = cell(size(varargin));
for i=1:length(varargin)
varargout{i} = varargin{i};
end
The following code demonstrated this function using different numbers of inputs.
>> [a,b,c] = varargout_demo(1,2,3)
a =
1
b =
2
c =
3
0 +10.0000i
e =
-1
f =
Inf
17.1.5 return
return can be used to exit a function before all code has been executed, and is usually used inside an if
statement.
17.2 Comments
Like batch m-files, comments in custom functions are made using the % symbol. However, comments
have an additional purpose in custom functions. Whenever help function is entered in the command
window, the first continuous block of comments is displayed in the command window. For instance, in
the function func
function y = func(x)
% This |function| returns
% the value of the input squared.
Initial comments usually contain the possible combinations of input and output arguments as well as
a description of the function. While comments are optional, they should be included both to improve
readability of the function and to assist others if the function is shared.
17.3 Debugging
Since the data modified in the function is not available when the function is run, debugging can be diffi-
cult. There are four strategies to debug a function:
• Leave off ; as needed to write out the value of variables to the command window (or alternatively,
use disp).
• Use keyboard and return to interrupt the function to inspect the values.
17.4 Exercises 125
The first of these methods is often the easiest. Consider a script version of the function above,
x = 1;
y = 2;
%function [xpy, xmy] = func2(x,y)
xpy = x + y;
xmy = x - y;
Running this script would be equivalent to calling the function func2(1,2). However, when calling it as
a script, variables can be examined as they change. The second method can be useful although clumsy
– often the output window is quickly filled with numbers and so locating the problematic code becomes
difficult. The third options is more advanced. Adding keyboard to a function interrupts the function at
the location of keyboard and returns control to the command window. When in this situation, the usual
>> prompt changes to a K>>. When in keyboard mode, variables inside the function are treated as if they
were script variables. Once finished inspecting the variables, enter return to continue the execution of
the function. A simple example of keyboard can be adapted to the function above,
function [xpy, xmy] = func3(x,y)
keyboard
xpy = x + y;
xmy = x - y;
keyboard
Calling this function will result in an immediate keyboard session (note the K>>). Entering whos will list
two variables, x and y. When return is entered, a second keyboard session open. Entering whos will now
list four variables, the xpy and xmy in addition to the original two. When a function has been debugged,
either comment out or remove the keyboard commands.
The final option is to set breakpoints in the MATLAB editor. Breakpoints can be added either in the
editor or using the command dbstop in file at lineNumber. In practice, it is usually simpler to use the
editor to set the breakpoint. When using breakpoints, the function is stopped whenever a breakpoint is
encountered. This allows for values inside the function to be inspected. In addition, various methods of
“stepping” are available when using formal debugging:
• Step In - Proceed to the next line, and enter any sub-function (also in debugging mode)
• Step Out - Proceed out of the current function to the next line in the main program
• Continue - Resume normal execution, stopping at the end of the main function or when another
breakpoint is encountered
Figure 17.1 show how a break point is set in the MATLAB editor.
17.4 Exercises
1. Write a function summstat that take one input, a T by K matrix, and returns a matrix of summary
statistics of the form
126 Custom Functions
Figure 17.1: Break points can be set in the editor for debugging files by right-clicking in the left column
of the window.
17.4 Exercises 127
2. Rewrite the function so that it outputs 4 vectors, one each for mean, std, skewness and kurtosis.
3. Write a function called normloglikihood that takes two arguments, params and data (in that order)
and returns the log-likelihood of a vector of data. Note: params = [mu sigma2]0 consists of two ele-
ments, the mean and the variance.
4. Append to the previous function a second output that returns the score of the log-likelihood (a 2 ×
1 vector) evaluated at params.
128 Custom Functions
Chapter 18
All pseudo-random numbers are generated by four core random number generators,
The distribution of pseudo-random number generated will determine which of these are used. For exam-
ple, Weibull pseudo-random numbers use rand. Normal pseudo-random numbers obviously call randn.
Creating Students-t pseudo-random numbers requires calls to both randn and randg, and χ 2 uses only
randg.
The all of the pseudo-random number generators share a common state (by default). The state is a large
vector which determines the next pseudo-random number. This state allows a sequence of random num-
bers to be repeated by first saving the state and then restoring it. The state is saved using state = rng(),
where state is a structure containing information about the type of generator in use, the seed and the
actual state vector. The state can be restored using rng(state).
>> state = rng()
state =
Type: ’twister’
Seed: 0
State: [625x1 uint32]
>> randn
ans =
0.5376671395461
>> randn
ans =
130 Simulation and Random Number Generation
1.83388501459509
>> rng(state)
>> randn
ans =
0.5376671395461
>> randn
ans =
1.83388501459509
These two sequences are the same since the state was restored to its previous value.
Warning: The state is restored every time MATLAB is initialized. As a result, all of the random number
generators will produce the same sequence when starting from a fresh MATLAB session. This default
state can be restored using rng(0).
The state of all random number generators is reset each time MATLAB is opened. Thus, two programs
drawing pseudo-random numbers on different computers, or in two instance on the same computer, will
be identical. Two avoid this problem the state needs to be initialized to a “random” value. This can be
accomplished in recent versions of MATLAB by
rng(’shuffle’)
which uses the current time to act as a “random” input to generate the state. This will ensure that simula-
tions running in different MATLAB sessions will not use the same sequence of random numbers.
Warning: Do not over-initialize the pseudo-random number generators. The generators should be
initialized once per session and then allowed to produce the sequence beginning with the state set by
rng(’shuffle’). Repeatedly re-initializing the pseudo-random number generators will produce a se-
quence that is much less random than the generator was designed to provide.
MATLAB has substantially overhauled their random number generators over the past decade. Fine-grained
control of the random number generator is available using RandStream, which is a class that can be used
to initialize a random stream.1 The random stream, in-turn, does the actual generation of the pseudo-
random numbers. Recent versions of MATLAB support 6 core random number generators, each with
different properties. The default algorithm is known as mt19937ar, or the Mersenne Twister. It is a widely
used algorithm with good properties. However, other choices may work better when using MATLAB is
parallel.
1
MATLAB supports object-oriented programming (OOP). RandStream is an example of a class, one of the core components of
OOP. Understanding OOP is not necessary to be a proficient MATLAB programmer where the dominant programming paradigm
is imperative programming.
Chapter 19
Optimization
The optimization toolbox contains a number of routines to the find extremum of a user-supplied objective
function. Most of these implement a form of the Newton-Raphson algorithm which uses the gradient to
find the minimum of a function.1
A custom function that returns the function value at a set of parameters – for example a log-likelihood
or a GMM quadratic form – is required to use one of the optimizers. All optimization targets must have
the parameters as the first argument. First, consider finding the minimum of x 2 . A function which allows
the optimizer to work correctly has the form
function x2 = optim_target1(x)
x2=x^2;
When multiple parameters (a parameter vector) are used, the objective function must take the form
function obj = optim_target2(params)
x=params(1);
y=params(2);
obj= x^2-3*x+3+y*x-3*y+y^2;
Optimization targets can have additional inputs that are not parameters (such as data or hyper-parameters).
function obj = optim_target3(params,hyperparams)
x=params(1);
y=params(2);
c1=hyperparams(1);
c2=hyperparams(2);
c3=hyperparams(3);
obj= x^2+c1*x+c2+y*x+c3*y+y^2;
This form is useful when optimization targets require at least two inputs: parameters and data. Once an
optimization target has been specified, the next step is to use one of the optimizers find the minimum.
1
MATLAB’s optimization routines only find minima. However, if f is a function to be maximized, − f is a function with the
minimum at located the same point as the maximum of f .
132 Optimization
where fun is the optimization target, p0 is the vector of starting values, options is a user supplied opti-
mization options structure (see 19.5), and var1 , var2 , . . . are optional variables containing data or other
constant values. Typically, three outputs are requested, the parameters at the optimum (p), the function
value at the optimum (fval) and a flag to determine whether the optimization was successful (exitflag).
For example, suppose
x=params(1);
y=params(2);
c1=hyperparams(1);
c2=hyperparams(2);
c3=hyperparams(3);
obj= x^2+c1*x+c2+y*x+c3*y+y^2;
was our objective function and was saved as optim_target4.m. To minimize the function, call
which produces
>> [p,fval,exitflag]=fminunc(’optim_target4’,p0,options,hyper)
First-order
Iteration Func-count f(x) Step-size optimality
0 3 3 3
1 6 0 0.333333 1.49e-008
Optimization terminated: relative infinity-norm of gradient less than options.TolFun.
p =
1 1
fval =
0
exitflag =
1
fminunc has minimized this function and returns the optimum value of 0 at x = (1, 1). exitflag has the
value 1, indicating the optimization was successful. Values less than or equal to 0 indicate the optimization
to not converge successfully.
19.2 Unconstrained Derivative-free Optimization 133
fminsearch also performs unconstrained optimization but uses a derivative free method called a simplex
search. fminsearch uses an “amoeba” to crawl around in the parameter space and will always move to
lower objective function values.
fminsearch has the same generic form as fminunc
[p,fval,exitflag]=fminsearch(’fun’,p0 ,options, var 1 ,var 2 ,. . .)
where fun is the optimization target, p0 is the vector of starting values, options is a user supplied opti-
mization options structure (see 19.5), and var1 , var2 , . . . are (optional) variables of data or other constant
values. Returning to the previous example but using fminsearch,
>> options = optimset(’fminsearch’);
>> options = optimset(options,’Display’,’iter’);
>> [x,fval,exitflag]=fminsearch(’optim_target4’,[0 0],options,hyper)
Iteration Func-count min f(x) Procedure
0 1 3
1 3 2.99925 initial simplex
2 5 2.99775 expand
3 6 2.99775 reflect
4 8 2.99475 expand
...
...
...
57 107 8.93657e-009 contract inside
58 109 3.71526e-009 contract outside
59 111 1.99798e-009 contract inside
60 113 5.82712e-010 contract inside
Optimization terminated:
the current x satisfies the termination criteria using OPTIONS.TolX of 1.000000e-004
and F(X) satisfies the convergence criteria using OPTIONS.TolFun of 1.000000e-004
x =
1.0000 1.0000
fval =
5.8271e-010
exitflag =
1
fminsearch requires more iterations and many more function evaluations and in general should not be
used if fminunc works satisfactorily. However, for certain problems, such as when the objective is not
continuously differentiable, fminsearch may be the only option.
fminbnd performs minimization of single parameter problems over a bounded interval using a golden
section algorithm. The generic form is
[p,fval,exitflag]=fminbnd(’fun’,lb,ub,options, var 1 ,var 2 ,. . .)
where fun is the optimization target, lb and ub are the lower and upper bounds of the parameter, options
is a user supplied optimization options structure (see 19.5), and var1 , var2 , . . . are (optional) variables
134 Optimization
x=params(1);
c1=hyperparams(1);
c2=hyperparams(2);
c3=hyperparams(3);
obj= c1*x^2+c2*x+c3;
fmincon performs constrained optimizations using linear and/or nonlinear constraints. The constraints
can take the form of either equality or inequality expressions (or both). fmincon minimizes f (x ) subject
to any combination of
• AE Q x = b E Q
• Ax ≤ b
• C N E Q (x ) = d N E Q
• C (x ) ≤ d
x1=x(1);
x2=x(2);
u=x1^(lambda)*x2^(1-lambda);
u=-u; % Must change max problem to min!!!
0 3 -0.4 -0.2
1 6 -0.529134 0 1 -0.106 0.129
2 9 -0.529134 0 1 -4.14e-025 2.01e-009
Optimization terminated: first-order optimality measure less
than options.TolFun and maximum constraint violation is less
than options.TolCon.
Active inequalities (to within options.TolCon = 1e-006):
lower upper ineqlin ineqnonlin
3
x =
0.3333
0.6667
fval =
-0.5291
exitflag =
1
x1=x(1);
x2=x(2);
p1=prices(1);
p2=prices(2);
cost = p1*x1+p2*x2;
Since this problem has a nonlinear constraint, it is necessary to specify a nlcon function,
function [C, Ceq] = compensated_utility(x,prices,lambda,Ubar)
x1=x(1);
x2=x(2);
u=x1^(lambda)*x2^(1-lambda);
Note: The constraint function and the optimization must take the same optional arguments in the same
order, even if the arguments are not required. The solution to this problem can be found using
>> options = optimset(’fmincon’);
19.4 Constrained Derivative-based Optimization 137
0.3333
0.6666
fval =
0.9999
exitflag =
1
These two examples are problems where the answers can be analytically verified. In many cases it
is impossible to verify that the global optimum has been found if there are local minima. The standard
practice for addressing the possibility of local minima is to start the optimization from different starting
values and then to use the lowest fval. If the optimizer is working well on the specified problem, many of
the starting values should produce similar parameter estimates and fvals.
Note: Many aspects of constrained optimization (and optimization in general) are more black magic
than science. Worse, most techniques are problem class specific and so general rules are hard to derive.
138 Optimization
optimset sets optimization options and has two distinct forms. The initial call to optimset should always
be of the form options = optimset(’fmintype’) which will return the default options for the selected op-
timizer. Once the options structure has been initialized, individual options can be changed by calling
options = optimset(options,’option1 ’,option value1 ,’option2 ’,option value2 ,. . .)
For example, to set options for fmincon,
>> options = optimset(’fmincon’);
>> options = optimset(options,’MaxFunEvals’,1000,’MaxIter’,1000);
>> options = optimset(options,’TolFun’,1e-3);
For help on the available options or their specific meaning, see doc optimset.
MATLAB uses standard DOS (or Unix, depending on the platform) file system commands to change work-
ing directories. For instance, to change directory, type
cd c:\MyDirectory
on Windows or
cd ~/MyDirectory/
on Unix.
Other standard file navigation commands, such as dir and mkdir are also available. Alternatively, the
current directory can be changed by clicking the button with . . . next to the Current Directory box at the top
of the command window (see figure 1.1).
The file system can be accessed in MATLAB code. One common application of programmatic access to the
file system is to perform some action on every file in a particular directory, which can be done by looping
over the output of dir.
% Create some files
for i=1:3;
fid = fopen([’file_’ num2str(i) ’.demotxt’],’wt’);
fprintf(fid,’Nothing to see’);
fclose(fid);
end
The example code below get a list of files that have the extension demotxt and then loops across the files,
first displaying the file name and then using type to print the contents of the file. This method is very
useful for processing multiple data files.
>> d = dir(’*.demotxt’)
d =
3x1 struct array with fields:
name
140 Accessing the File System
date
bytes
isdir
datenum
>> for i=1:length(d);
>> disp(d(i).name)
>> type(d(i).name)
>> end
file_1.demotxt
Nothing to see
file_2.demotxt
Nothing to see
file_3.demotxt
Nothing to see
MATLAB contains a full set of platform-independent commands to access the file system. The platform-
independence is derived from the availability of utility functions such as filesep which returns the platform-
specific file seperator and copyfile which operates like copy on Windows and cp on Unix platforms.
cd
cd can be used to change the current directory. Both relative and absolute paths are supported. cd can be
used both with a space, as in cd c:\temp or as a function, as in cd(’c:\temp’). cd can also be used with
string variables containing the path, in which case the function version must be used.
% Absolute
cd(’c:\temp\’)
% Relative, up one then down in temp
cd(’..\temp\’)
% Relative, up two levels
cd(’..\..’)
% String input
targetDir = ’c:\temp’;
cd(targetDir)
% Non-funciton version
cd c:\temp
dir
dir can be used to list the contents of a directory. It can be used without any arguments, with a wildcard
argument, or with a full path. When used without an output variable, the listing is printed to the screen.
Using an output returns a structure containing the contents of the directory. Like cd, dir can be used
either with a space or as a function.
% Wildcard
files = dir(’*.mat’)
% Path
files = dir(’c:\temp’)
20.2 Running Other Programs 141
mkdir and rmdir can be used to create and remove directories, respectively. Like cd, both commands can
be used with a space or as a function, although only the function version can be used with string inputs.
delete
delete can be used to delete files. It can be used with a single filename, or with a wildcard expression to
delete all matches. Like cd, delete can be used either with a space or as a function.
copyfile and movefile can be used to copy and move files, respectively. Both command require two in-
puts, the source and the destination. The source can include wildcards in which case the destination must
be a directory.
fullfile
fullfile is a useful utility for building full paths including the filename and extensions.
% Wildcard
>> fileLoc = fullfile(’c:’,’temp’,’data.mat’)
fileLoc =
c:\temp\data.mat
fileparts
• File name
• Extension
filesep
filesep can be used to get the platform-specific platform separator. It is useful for manually building full
paths, although using fullfile is often easier.
MATLAB can launch other programs using system (or dos on Windows). The basic structure is system(’
command_to_run ’) (which can also be executed using the syntax !command_to_run). An optional out-
put can be used to capture any outpur from the command that would have appeared in the DOS window
or terminal.
142 Accessing the File System
While this section sounds like a Buddhist rite of passage, the path contains an important set of locations.
The path determines where MATLAB searches for files when running programs. All of the MATLAB toolbox
directories are automatically on the path, but it may be necessary to add new directories to use custom or
a non-standard toolbox.
To see the current path, enter path in the command window. Alternatively, there is a GUI path browser
available under File>Set Path. . . . The path is sorted from the most important directory to least, with the
present working directory (what pwd returns in the command window) silently atop the list. The path
determines which files MATLAB will use when evaluating a function or running a batch file.
Suppose a custom function is accidentally titled mean. When mean is entered in the command window,
MATLAB will find all occurrences of mean on the path and rank them based on the order the files appear.
The highest ranked filed will then be executed. Because of this, it is crucial that existing function names
are avoided when writing m-files. which function -all will show all files that match function (function,
m-files and mat files), returning them in the order they appear on the path. This is useful for detecting
duplicate file names.
New directories can be appended to the path using addpath or File>Set Path. . .. The GUI tool can be
used to re-rank directories on the path. To save any changes, use the command savepath or click on Save
Path in the Path GUI.
startup.m
When using MATLAB in a shared environment, the MATLAB path will generally be read-only – hence it
cannot be permanently changed. The “work-around” for this issue is to create a file named startup.m
in the directory where MATLAB initially opens. startup.m is a special file that is automatically executed
when MATLAB is started and can contain lines with the addpath command.
% Example startup.m
addpath(’c:\temp’);
addpath(’c:\temp\mytoolbox’);
% Change the directory to where I keep my work
cd(’c:\users\kevin\Dropbox’)
20.4 Exercises
3. Create a new file names tobedeleted.m using the editor in this new directory (It can be empty).
5. Add this directory to the path using either addpath or the Path GUI. Save the changes using either
savepath or the Path GUI.
6. Delete the newly created m-file, and then delete this directory from the command line.
20.4 Exercises 143
7. Remove this folder from the path using either rmpath or the Path GUI.
144 Accessing the File System
Chapter 21
The final step in writing code is to optimize the performance of the code, if needed. Code optimization
can produce large improvements in speed over a naïve (but correct) implementation. In some cases the
improvements can be 100 times or greater and the largest gains come from removing superfluous memory
allocations.
Warning: Be careful not to over-optimize code. Over-optimizing code can produce code that is unread-
able and difficult to debug. A good practice is to use a simple, possibly slow, implementation as a starting
point. The optimized version can be built from the known-good code and the output from the optimized
code can be compared to the known-correct version.
Recent versions of MATLAB (R2015b or later) generate Low Level Virtual Machine intermediate results,
which is them compiled to machine code using LLVM (http://llvm.org/). This a common strategy
used by a number of projects Clang (C/C++), Apple’s Swift, Julia and Python’s Numba. This produces
code that runs quickly and often has performance indistinguishable from code written in C or Fortran and
compiled using an optimizing compiler (e.g., GCC, MSOC or ICC/IFort). This feature is known as MATLAB
Execution Engine. While traditional code optimizations are still useful, the performance improvements
of these optimizations are decidedly lower when using versions of MATLAB that include the Execution
Engine.
Displaying results to the screen is a relatively slow action and excess printing to screen can (substantially)
reduce performance. Use ; to suppress output.
Pre-allocating data and pre-generating random numbers in large blocks is the most basic optimization.
While recent MATLAB improvements have reduced the performance impact of not pre-allocating it still al-
lows some expensive memory allocation to be avoided in the core of the program. Similarly, pre-generating
146 Performance and Code Optimization
random numbers allows function overhead to be avoided. To see the effects of pre-allocating, consider
the following code:
clear y
y = 0;
tic;
for i=2:100000;
y(i) = y(i-1) + randn;
end;
toc
clear y
y = zeros(100000,1);
tic;
for i=2:100000;
y(i) = y(i-1) + randn;
end;
toc
The second version with a pre-allocated y is about 2 times faster. To see the effects of pre-generating
random numbers, consider the following code:
M = 1000000 ;
y = zeros(M,1);
tic;
for i=2:M;
y(i) = y(i-1) + randn;
end;
toc
y = zeros(M,1);
e=randn(M,1);
tic;for i=2:M;
y(i) = y(i-1) + e(i);
end;
toc
Pre-allocating random numbers to avoid many tiny function calls produces a doubling in performance.
One of the key advantages to using an environment such as MATLAB is that end-users are not required to
manage memory. This abstraction comes at the cost of performance and memory allocation is slow. For
an example of the penalty, consider the two implementations of the following recursion
21.5 Use Vector and Matrix Operations 147
epsilon = randn(10000,1);
y = zeros(10000,1);
parameters = [.1 .5 -.2 .8 1];
tic
for t=3:10000
y(t) = parameters * [1 y(t-1) y(t-1) epsilon(t-1) epsilon(t)]’;
end
toc
tic
for t=3:10000
y(t) = parameters(1);
for i=1:2
y(t) = y(t) + parameters(i+1)*y(t-i);
end
for i=0:1
y(t) = y(t) + parameters(5-i)*epsilon(t-i);
end
end
toc
The second implementation is about 10 times as fast because it avoids allocating memory inside the
loop. In the first implementation, [1 y(t-1) y(t-1) epsilon(t-1) epsilon(t)] requires a new, empty 5
element vector to be allocated in memory and then for the 5 elements to be copied into this vector every
iteration. The second implementation uses more loops but avoids costly memory allocation.
Vector and matrix operations are highly optimized and writing code in matrix-vector notation is faster
than looping. Consider the problem of computing
N
X
X0 X = xn xn0
n=1
tic
for n=1:N
op = op + X(n,:)’*X(n,:);
148 Performance and Code Optimization
end
toc
tic
op_fast = X’*X;
toc
Many operations in MATLAB are amenable to vectorization, not just matrix algebra. For example, logical
operators can be used on entire vectors or matrices, and the result can then be used to select the relevant
data points. Consider the following example:
x = randn(10000,1);
tic
y = x(x<0); % Vectorized select
toc
tic
y = zeros(10000,1);
count = 0;
for i=1:10000;
if x(i)<0;
count = count + 1;
y(count) = x(i);
end
end
y = y(1:count);
toc
In this simple example, the vectorized code requires about half the time as the for-loop code.
Many optimization targets depend on parameters, data and functions of data. In most cases, the functions
of the data do not depend on the parameter values and so they can be pre-computed. For example, if the
optimization target is a likelihood target that depends on the square of the data (e.g. the Gaussian log-
likelihood), pre-computing the square of the data and passing it as one of the optional arguments avoids
needlessly re-computing these values every time the objective function is called.
21.8 Use M-Lint 149
The editor provides M-Lint guidance when available. This advice is almost always correct and should only
be ignored if known to be wrong.
21.9 timeit
The function timeit can be used to quickly time and compare alternative versions of a function. Consider
these two implementations of a dot product,
function dp = dot_1(x,y)
dp = x’*y;
and
function dp = dot_2(x,y)
dp = 0;
for i=1:length(x)
dp = dp + x(i) * y(i);
end
The execution time can be examined using timeit and an anonymous function,
>> x = randn(1000000, 1);
>> y = randn(1000000, 1);
>> timeit(@() dot_1(x,y))
ans =
0.0011
>> timeit(@() dot_2(x,y))
ans =
0.0119
which shows that the manual version is about 10 times slower than the version which uses the built-in
multiplication operator.
Running through the profiler records every line executed and the time required to execute. This allows
hot-spots in code – code segments which require the most time – to be identified so that optimization can
be focused on the code that spends the most time running.
The profiler is run using
>> profile on
>> code_to_profile
>> profile report
>> profile off
The first command turns the profile on. The second run the code to be profiled. The final command
turns the profiler off and opens the profile report viewer. The file below uses concatenation which is slow.
Profiling will highlight that virtually all of the computational effort is spent in the inner line in the loop.
150 Performance and Code Optimization
% file: code_to_profile.m
% This is an example of a file that does not use best practices
text = [];
for i = 1:200000
text = [text char(mod(i,26) + 65)];
end
Under normal circumstances, variables are not available in functions unless explicitly passed as inputs.
Moreover, even when passed, the value of a passed variable cannot be changed inside the function and
changes are discarded when the function returns (unless explicitly passed out). Global variables, on the
other hand, are available both in the base MATLAB workspace and in functions. They also can be accessed
and modified at any time. As a general rule, global variables should not be used. Using global variables
makes debugging more difficult and lowers long-run code maintainability.
Some scenarios where globals are useful include:
• Tracking intermediate values when optimizing a function. The diagnostics available from the op-
timizers are limited, and using a global will allow any value visible to the optimization target (e.g.
parameter values) to be saved.
• Avoiding memory allocation when the memory allocation is an important component of the total
run-time of the function.
Global variables are declared using the global keyword. global should be called prior to initializing a
variable.
>> x = 1;
>> whos x
Name Size Bytes Class Attributes
x 1x1 8 double
>> clear x
>> global x
>> x = 1;
>> whos x
Name Size Bytes Class Attributes
x 1x1 8 double global
Global variables can then be accessed inside a function using the global keyword in the function.
function print_global()
global x
disp([’The value of x is ’ num2str(x)])
Calling the function prints the value of the global variable. Note that if x is not a global it will be initialized
as a global with an empty value.
21.12 In-place Evaluation 151
>> print_global()
The value of x is 1
Finally, note that a global is only available after using the global keyword, and so the existence of a global
variable with a particular name does not prevent that a variable with the same named from being used in
functions in the usual, non-persistent manner.
In general, when a function is called, a=f(b) and b=f(b) have the same performance since the output
must be allocated from memory. Some functions which operate element-by-element can be evaluated
“in-place” so that a=f(b) and b=f(b) are no longer the same. The reason for the difference is that when f
operates element-by-element, it can be directly applied to b without allocating a new array – but only if the
function output is also b (otherwise it would overwrite the values in b). Functions which support in-place
evaluation include exp and log. To see the memory gains to using in-place evaluation, it is necessary to
track the memory usage of MATLAB and use very large matrices (5000 by 5000 or larger). For example, the
memory usage of
>> x = randn(5000,5000);
>> y = exp(x); % First new memory allocated
>> y = exp(x); % New memory allocated again
>> x = exp(x); % No memory allocation
shows that even repeated calls to y=exp(x) require memory allocation while x=exp(x) does not. Note that
it is necessary to overwrite the contents of an array to use in-place operations and so they are only useful
in certain situations.
152 Performance and Code Optimization
Chapter 22
Examples
These examples are all actual econometric problems chosen to demonstrate the use of MATLAB in an end-
to-end manner, form importing data to presenting estimates. A reasonable familiarity with the underlying
econometric models and methods is assumed so that the focus can be on the translation of mathematics
to MATLAB.
This example will highlight the steps needed to estimate the parameters of a GJR-GARCH(1,1,1) model
with a constant mean. The volatility dynamics in a GJR-GARCH model are given by
p o q
X X X
σ2t =ω+ αi rt2−i + γ j rt2− j I[rt − j <0] + βk σ2t −k .
i =1 j =1 k =1
Returns are assumed to be conditionally normal, rt |Ft −1 ∼ N µ, σ2t , and parameters are estimated by
3. Compute standard errors using a “sandwich” covariance estimator (also known as the Bollerslev &
Wooldridge (n.d.) covariance estimator)
The first task is to write the log-likelihood function which can be used in an optimizer. The log-likelihood
function will compute the recursion and the log-likelihood. It will also, optionally, return the T by 1 vector
of individual log-likelihoods which are useful for numerically computing the scores.
The log-likelihood can be defined using the normal distribution,
(rt − µ)2
1
rt |µ, σ2t =− ln 2π + ln σ2t +
ln f ,
2 σ2t
mu = parameters(1);
omega = parameters(2);
alpha = parameters(3);
gamma = parameters(4);
beta = parameters(5);
T = size(data,1);
eps = data - mu;
% Data and sigma2 are T by 1 vectors
sigma2 = zeros(T,1);
% Must use a back cast to start the algorithm
sigma2(1) = backCast;
for t = 2:T
sigma2(t) = omega + alpha * eps(t-1)^2 ...
+ gamma * eps(t-1)^2 * (eps(t-1)<0) + beta * sigma2(t-1);
end
The function also returns the conditional variances in the third output since the fit variances are often of
interest in addition to the model parameters.
It is necessary to discuss one other function before proceeding with the main block of code. The
asymptotic variance takes the “sandwich” form, which is commonly expressed as
J −1 IJ −1
where J is the expected Hessian and I is the covariance of the scores. Both are numerically computed.
The strategy for computing the Hessian is to use the definition that
f θ + ei hi + e j h j − f (θ + ei hi ) − f θ + e j h j + f (θ )
Ji j ≈
hi h j
where hi is a scalar “step size” and ei is a vector of 0s except for element i , which is 1. A 2-sided version of
this approximation, which takes both forward and backward steps and then averages, is below. For more
on numerical derivatives, see ?.
function H = hessian_2sided(fun, theta, varargin)
if size(theta,2)>size(theta,1)
theta = theta’;
end
f = feval(fun,theta,varargin{:});
h = 1e-5 * abs(theta);
thetah = theta + h;
h = thetah - theta;
K = size(theta,1);
h = diag(h);
22.1 Estimating the Parameters of a GARCH Model 155
fp = zeros(K,1);
fm = zeros(K,1);
for i = 1:K
fp(i) = feval(fun, theta+h(:,i),varargin{:});
fm(i) = feval(fun, theta-h(:,i),varargin{:});
end
fpp = zeros(K);
fmm = zeros(K);
for i = 1:K
for j = i:K
fpp(i,j) = fun(theta + h(:,i) + h(:,j), varargin{:});
fpp(j,i) = fpp(i,j);
fmm(i,j) = fun(theta - h(:,i) - h(:,j), varargin{:});
fmm(j,i) = fmm(i,j);
end
end
hh = diag(h);
hh = hh*hh’;
H = zeros(K);
for i=1:K
for j=i:K
H(i,j) = (fpp(i,j) - fp(i) - fp(j) + f+ f - fm(i) - fm(j) + fmm(i,j))/hh(i,j)/2;
H(j,i) = H(i,j);
end
end
Finally, the code that does the actual work can be written. The first block imports the data, flips it
so that the oldest observations are first, and computes 100 times returns. Scaling data can be useful to
improve optimizer performance since ideally estimated parameters should have similar magnitude (i.e.
ω ≈ .01 and α ≈ .05)
% Import data
FTSE = readtable(’FTSE_1984_2012_clean.xlsx’);
% Flip upside down
FTSE = flipud(FTSE);
% Compute returns
FTSE.Date = datetime(FTSE.Date,’ConvertFrom’,’excel’);
FTSE.Ret = [nan; 100*diff(log(FTSE.Close))];
Good starting values are important. These are a good guess based on more than a decade of fitting models.
An alternative is to implement grid search and use the best (smallest) value from the grid.
% Starting values
startingVals = [nanmean(FTSE.Ret),nanvar(FTSE.Ret) * .01,.03, .09, .90];
Bounds are used in estimation to ensure that all parameters are ≥ 0, and to set sensible upper bounds in
the parameters. A constraint is placed on α, γ and β which is sufficient to ensure stationarity of the pro-
cess. This is not technically necessary, although it is helpful since it prevents the volatility from exploding
which produces numerical issues.
156 Examples
% Estimate parameters
LB = [-10*nanmean(FTSE.Ret) realmin 0 0 0];
UB = [10*nanmean(FTSE.Ret) 10*nanvar(FTSE.Ret) 1 2 1];
% Sum constraint
A = [0 0 1 0.5 1];
b = 1;
Next, a back cast is constructed to initialize the conditional variance process. This is an example of an
exponential weighted moving average, only running backward in time.
T = size(FTSE.Ret,1);
w = .06*.94.^(0:T-2);
backCast= w*FTSE.Ret(2:end).^2;
The options are then specified, and the main optimization routine can be called. The two options used set
the display to be iterative so that the function value at each iteration is displayed, and the set the algorithm
to SQP (sequential quadratic programming) which is a good choice for many constrained problems.
options = optimset(’fmincon’);
options.Display = ’iter’;
options.Algorithm = ’sqp’;
estimates = fmincon(@gjr_garch_likelihood, startingVals, ...
A, b, [], [], LB, UB, [], options, FTSE.Ret(2:end), backCast);
The optimized log-likelihood and the time series of variances are computed by calling the objective using
the parameters found by the optimizer.
[loglik, logliks, sigma2] = gjr_garch_likelihood(estimates, FTSE.Ret(2:end), backCast);
Next, the numerical scores and the covariance of the scores are computed. These exploit the definition of
a derivative, so that for a scalar function,
∂ f (θ ) f (θ + ei hi ) − f (θ )
≈ .
∂ θi hi
The covariance is computed as the outer product of the scores since the scores should have mean 0 when
evaluated at the solution to the optimization problem.
% Covariance
step = 1e-5 * estimates;
scores = zeros(T-1,5);
for i=1:5
h = step(i);
delta = zeros(1,5);
delta(i) = h;
The final block of the numerical code calls hessian_2sided to estimate the Hessian and finally computes
the asymptotic covariance.
22.2 Estimating the Risk Premia using Fama-MacBeth Regressions 157
% Hessian
J = hessian_2sided(@gjr_garch_likelihood, estimates, FTSE.Ret(2:end), backCast);
J = J/T;
Jinv = J\eye(length(J));
vcv = Jinv*I*Jinv/T;
The remaining steps are to pretty print the results and to produce a plot of the conditional variances,
% Pretty print parameters, standard error and t-stat
output = [estimates’, sqrt(diag(vcv)), estimates’./sqrt(diag(vcv))];
disp(’ Parameter Estimate Std. Err. T-stat’)
param = {’mu’,’omega’,’alpha’,’gamma’,’beta’};
for i = 1:length(estimates)
fprintf(’%10s %10.3f %13.3f %11.3f \n’,param{i},output(i,1),output(i,2),output(i,3));
end
This final code block produce a plot of the annualized conditional standard deviations.
% Produce a plot
plot(FTSE.Date(2:end),sqrt(252*sigma2));
axis tight;
ylabel(’Volatility’)
title(’FTSE Volatility (GJR GARCH(1,1,1))’)
This example highlights how to implement a Fama-MacBeth 2-stage regression to estimate factor risk
premia, make inference on the risk premia, and test whether a linear factor model can explain a cross-
section of portfolio returns. This example closely follows Cochrane (n.d.) (See also Jagannathan et al.
(n.d.)).
First, the data are imported. I formatted the data downloaded from Ken French’s website into an easy-
to-import CSV which can be read by readtable. The data in the table is split into different variables (as
arrays), and the dimensions are determined using size.
% Import data
data = readtable(’famafrench.csv’);
% Split using slices
dates = data.date;
factors = data{:,{’VWMe’,’SMB’,’HML’}};
riskfree = data{:,’RF’};
portfolios = data{:,6:end};
% Shape information
[T,K] = size(factors);
[T,N] = size(portfolios);
% Compute excess returns
excessReturns = bsxfun(@minus,portfolios,riskfree);
1. Compute the time-series β s. This is done be regressing the full array of excess returns on the factors
(augmented with a constant) using \.
158 Examples
2. Compute the risk premia using a cross-sectional regression of average excess returns on the esti-
mates β s. This is a standard regression where the step-1 β estimates are used as regressors, and the
dependent variable is the average excess return.
The asymptotic variance requires computing the covariance of the demeaned returns and the weighted
pricing errors. The problem is formulated as a 2-step GMM estimation where the moment conditions are
ε1t
ε1t f t
ε2t
ε2t f t
g t (θ ) =
..
.
εN t
εN t f t
β ut
order to make inference on this problem, the derivative of the moments with respect to the parameters,
∂ g t (θ ) /∂ θ 0 is needed. With some work, the estimator of this matrix can be seen to be
" #
∂ g t (θ ) − I n ⊗ ΣX
0
G =E = .
∂ θ0 G21 −β β 0
0
where X t = 1 f t0 and ΣX = E X t X t0 . G21 is a matrix with the structure
where h i
G21,i = 0K ,1 diag (E [u i ] − βi λ)
and where E [u i ] is the expected pricing error. In estimation, all expectations are replaced with their sam-
ple analogues.
% Moment conditions
p = alphaBeta;
epsilon = excessReturns - X*p;
moments1 = kron(epsilon,ones(1,K+1));
22.2 Estimating the Risk Premia using Fama-MacBeth Regressions 159
vcv = inv(G’)*S*inv(G)/T;
The J test examines whether the average pricing errors, α̂, are zero. The J statistic has an asymptotic
χN2 distribution, and the model is badly rejected.
vcvAlpha = vcv(1:4:N*K+N,1:4:N*K+N);
J = alpha’*inv(vcvAlpha)*alpha;
Jpval = 1 - chi2cdf(J,25);
The next block formats the output to present all of the results in a readable manner. In particular, fprintf
is used to print the estimated parameters to screen.
riskPremia = lam;
vcvLam = vcv(N*K+N+1:end,N*K+N+1:end);
annualizedRP = 12*riskPremia;
arpSE = sqrt(12*diag(vcvLam));
fprintf(’ Annualized Risk Premia\n’)
fprintf(’ Market SMB HML\n’)
fprintf(’--------------------------------------\n’)
fprintf(’Premia %0.4f %0.4f %0.4f\n’,annualizedRP)
fprintf(’Std. Err. %0.4f %0.4f %0.4f\n’,arpSE)
fprintf(’\n\n’)
fprintf(’J-test: %0.4f\n’,J)
fprintf(’P-value: %0.4f\n\n\n’,Jpval)
i=1;
betaVar = zeros(25,4);
for j=1:5
for k=1:5
a = alpha(i);
b = beta(i,:);
offset = (K+1)*(i-1)+1:(K+1)*(i);
variances = diag(vcv(offset,offset))’;
% Lazy concatenation
160 Examples
betaVar(i,:) = variances;
s = sqrt(variances);
c = [a b];
t = c./s;
fprintf(’Size: %d, Value:%d Alpha Beta(VWM) Beta(SMB) Beta(HML)\n’,j,k)
fprintf(’Coefficients: %10.4f %10.4f %10.4f %10.4f\n’,c);
fprintf(’Std Err. %10.4f %10.4f %10.4f %10.4f\n’,s);
fprintf(’T-stat %10.4f %10.4f %10.4f %10.4f\n\n’,t);
i = i + 1;
end
end
The final numeric example estimates the same problem, only using GMM rather than 2-stage regression.
The GMM objective takes the parameters, portfolio returns, factor returns and the weighting matrix and
computes the moments, average moments and the objective value. The moments used can be described
as
ri2t − βi f t f t ∀i = 1, . . . N
and
ri t − βi λ ∀i = 1, . . . N .
N = size(pRets,2);
[T,K] = size(fRets);
beta = params(1:N*K);
lam = params(N*K+1:end);
beta = reshape(beta,N,K);
lam = reshape(lam,K,1);
betalam = beta*lam;
expectedRet = fRets*beta’;
e = pRets - expectedRet;
instr = repmat(fRets,1,N);
moments1 = kron(e,ones(1,K));
moments1 = moments1 .* instr;
moments2 = bsxfun(@minus,pRets,betalam’);
moments = [moments1 moments2];
avgMoment = mean(moments);
J = T * avgMoment*Winv*avgMoment’;
22.3 Estimating the Risk Premia using GMM 161
The final function needed is the Jacobian of the moment conditions. Mathematically it is simply to
express the Jacobian using ⊗(Kronecker product). This code is so literal that it is simple to reverse engineer
the mathematical formulas used to implement this estimator.
" #
I N ⊗ ΣF 0
Ĝ =
IN ⊗ λ −β
N = size(pRets,2);
[T,K] = size(fRets);
beta = params(1:N*K);
lam = params(N*K+1:end);
beta = reshape(beta,N,K);
lam = reshape(lam,K,1);
G = zeros(N*K+K,N*K+N);
ffp = fRets’*fRets/T;
G(1:N*K,1:N*K)=kron(eye(N),ffp);
G(1:(N*K),(N*K)+1:end) = kron(eye(N),-lam);
G((N*K)+1:end,(N*K)+1:end) = -beta’;
The data import step is virtually identical to that in the previous example – although it shows some
alternative functions to accomplish the same tasks. Note that only portfolios in odd-numbered columns
are selected in order to speed up the GMM optimization.
data = csvread(’famafrench.csv’,1);
dates = data(:,1);
factors = data(:,2:4);
riskfree = data(:,5);
portfolios = data(:,6:end);
N = size(portfolios,2);
portfolios = portfolios(:,1:2:N);
[T,N] = size(portfolios);
excessRet = bsxfun(@minus,portfolios,riskfree);
K = size(factors,2);
Starting values are important in any optimization problem. The GMM problem is closely related to
Fama-MacBeth regression, and so it is sensible to use the output from an FMB regression.
augFactors = [ones(T,1) factors];
alphaBeta = augFactors\excessRet;
betas = alphaBeta(2:4,:);
avgReturn = mean(excessRet)’;
riskPremia = betas’\avgReturn;
The GMM objective can be minimized using an identity matrix as the covariance of the moment con-
ditions along with the starting values computed using a Fama-MacBeth regression.
startingVals = [betas(:);riskPremia];
162 Examples
Winv = eye(N*(K+1));
options = optimset(’fminunc’);
options.Display = ’iter’;
options.LargeScale = ’off’;
step1opt = fminunc(@gmm_objective,startingVals,options,excessRet,factors,Winv);
Once the initial estimates have been computed, these can be used to estimate the covariance of the
moment conditions, which is then used to estimate the optimal weighting matrix.
[J,moments]= gmm_objective(step1opt, excessRet, factors, Winv);
S = cov(moments);
Winv2 = inv(S);
options.MaxFunEvals = 10000;
step2opt = fminunc(@gmm_objective,step1opt,options,excessRet,factors,Winv2);
The final block computes estimates the asymptotic covariance of the parameters using the usual effi-
cient GMM covariance estimator, assuming that the moments are a martingale.
[J,moments] = gmm_objective(step2opt, excessRet, factors, Winv2);
G = gmm_G(step2opt, excessRet, factors);
S = cov(moments);
vcv = inv(G*inv(S)*G’)/T;
Automatically outputting results to LATEX or another format can eliminate export errors and avoid tedious
work. This example shows how two of the tables in the previous Fama-MacBeth example can be exported
to a LATEX document, and how, if desired, the document can be compiled to a PDF. The first code block con-
tains code to clear the workspace, clear the window (clc) and to set a flag indicating whether the MATLAB
code should compile the latex file.
clear all
clc
fclose(’all’);
% Flag to compile output tables
compileLatex = true;
The next code block loads the mat file created using the output from the Fama-MacBeth example.
% Load variables
load(’Fama-MacBeth_results.mat’)
The document will be stored in a cell array. The first few lines contain the required header for a LATEX
document, including some packages used to improve table display and to select a custom font. Most of
this code uses lazy concatenation – that is appending lines to an existing variable. While this is generally
a bad practice from a performance perspective, concatenation is a useful technique in situations where
performance is not important. The concatenation in the cell array is implemented using latex{end+1}
which tells MATLAB to place the new information 1 after the last element.
22.4 Outputting LATEX 163
Table 1 will be stored in its own cell array, and then concatenated onto the main LATEX code. Building this
table is string manipulation, num2str and sprintf.
% Table 1
table1 = cell(2,1);
table1{1} = ’\begin{center}’;
table1{2} = ’\begin{tabular}{lrrr} \toprule’;
% Header
colNames = {’VWM$^e$’,’SMB’,’HML’};
header = ’’;
for cName=colNames
header = [header ’ & ’ cName{:}];
end
% Main row
row = ’’;
for i=1:length(annualizedRP)
row =[row ’ & $\underset{{(’ num2str(arpSE(i),’%0.3f’) ...
’)}}{{’ num2str(annualizedRP(i),’%0.3f’) ’}}$’];
end
table1{end+1}=row;
% Blank row
row = ’\\’;
table1{end+1} = row;
% J-stat row
row = sprintf(’J-stat: $\\underset{{(%0.3f)}}{{%0.1f}}$ \\\\’,Jpval,J);
table1{end+1}=row;
Table 2 is a more complex and uses loops to iterate over the rows of the arrays containing the β s and their
standard errors.
% Format information for table 2
164 Examples
sizes = {’S’,’2’,’3’,’4’,’B’};
values = {’L’,’2’,’3’,’4’,’H’};
% Table 2 has the same header as table 1, copy with a slice
table2 = table1(1:3);
m = 1;
for i=1:5
for j=1:5
row = sprintf(’Size: %s, Value: %s’,sizes{i},values{j});
b = beta(m,:);
s = sqrt(betaVar(m,2:end));
for k=1:length(b)
row = [row sprintf(’ & $\\underset{{(%0.3f)}}{{%1.3f}}$’,s(k),b(k))];
end
row = [row ’ \\ ’];
table2{end+1}=row;
m = m + 1;
if j==5 && i~=5
table2{end+1}= ’\cmidrule{2-4}’;
end
end
end
The penultimate block finished the document, and uses fprintf to write the lines to the LATEX file.
fprintf does not break lines, so the new line character is added to each (\n). Note that MATLAB treats text
starting with a slash (\) as an escape sequence, and so it is necessary to escape the slashes in the LATEX. This
means that \ in the output LATEX must be \\ prior to being written.
% Finish document
latex{end+1}= ’\end{document}’;
% Write to table
fid = fopen(’latex.tex’,’wt’);
for i=1:length(latex)
temp = latex{i};
% Escape slashes if needed
slashes = strfind(temp,’\’);
if ~isempty(slashes)
temp = [temp; repmat(char(0),1,length(temp))];
temp(2,slashes) = ’\’;
temp = temp(temp~=char(0))’;
end
fprintf(fid,[temp ’\n’]);
end
fclose(fid);
22.4 Outputting LATEX 165
Finally, if the flag is set, system is used to compile the LATEX. This assumes that pdflatex is on the system
path.
% Compile if needed
if compileLatex
exitStatus = system(’pdflatex latex.tex’);
end
166 Examples
Chapter 23
Parallel MATLAB
To be completed
168 Parallel MATLAB
Chapter 24
This list contains a brief summary of the functions most useful in the MFE course. It only scratches the sur-
face of what MATLAB offers. There are approximately 100 functions listed here; MATLAB and the Statistics
Toolbox combined contain more than 1400.
abs
Returns the absolute value of the elements of a vector or matrix. If used on a complex data, returns the
complex modulus.
diff
Returns the difference between two adjacent elements of a vector. The if the original vector has length T ,
vector returned has length T − 1. If used on a matrix, returns a matrix of differences of each column. The
matrix returned has one less row than the original matrix.
exp
log
Returns the natural logarithm of the elements of a vector or matrix. Returns complex values for negative
elements.
log10
Returns the logarithm base 10 of the elements of a vector or matrix. Returns complex values for negative
elements.
170 Quick Function Reference
max
Returns the maximum of a vector. If used on a matrix, returns a row vector containing the maximum of
each column.
mean
Returns the arithmetic mean of a vector. If used on a matrix, returns a row vector containing the mean of
each column.
min
Returns the minimum of a vector. If used on a matrix, returns a row vector containing the minimum of
each column.
mod
Returns the remainder of a division operation where the elements of a vector or matrix are divided by a
scalar or conformable vector or matrix.
roots
sqrt
sign
Returns the sign, defined as x /| x | and 0 if x = 0, of the elements of a vector or matrix. Operates element-
by-element on vectors or matrices.
sum
Returns the sum of the elements of a vector. If used on a matrix, operated column-by-column.
24.2 Rounding
ceil
floor
round
24.3 Statistics
Computes the correlation of a matrix. If a matrix x is N by M , returns the M by M correlation treating the
columns of x as realizations from separate random variables.
cov
kurtosis
Computes the kurtosis of a vector. If used on a matrix, a row vector containing the kurtosis of each column
is returned.
median
Returns the median of a vector. If used on a matrix, a row vector containing the median of each column
is returned.
prctile
Computes the percentiles of a vector. If used on a matrix, a row vector containing the percentiles of each
column is returned.
regress
Estimates a classic linear regression. Does not compute White heteroskedasticity-robust standard errors.
quantile
Computes the quantiles of a vector. If used on a matrix, a row vector containing the quantiles of each
column is returned.
skewness
Computes the skewness of a vector. If used on a matrix, a row vector containing the skewness of each
column is returned.
172 Quick Function Reference
std
Computes the standard deviation of a vector. If used on a matrix, a row vector containing the standard
deviation of each column is returned.
var
Computes the variance of a vector. If used on a matrix, a row vector containing the variance of each col-
umn is returned.
D I S T cdf
Returns the cumulative distribution function values for a given D I S T , where D I S T takes one of many
forms such as t (tcdf), norm (normcdf), or gam (gamcdf). Inputs vary by distribution.
D I S T inv
Returns the inverse cumulative distribution value for a given D I S T , where D I S T takes one of many forms
such as t (tinv), norm (norminv), or gam (gaminv). Inputs vary by distribution.
D I S T pdf
Returns the probability density function values for a given D I S T , where D I S T takes one of many forms
such as t (tpdf), norm (normpdf), or gam (gampdf). Inputs vary by distribution.
D I S T rnd
Produces pseudo-random numbers for a given D I S T , where D I S T takes one of many forms such as t
(trnd), norm (normrnd), or gam (gamrnd). Inputs vary by distribution.
Note: D I S T function are available for the following distributions: Beta, Binomial, χ 2 , Exponential, Ex-
treme Value, F , Gamma, Generalized Extreme Value, Generalized Pareto, Geometric, Hypergeometric,
Lognormal, Negative Binomial, Noncentral F , Noncentral t , Noncentral χ 2 , Normal, Poisson, Rayleigh,
t , Uniform, Discrete, Uniform, Weibull.
rand
Uniform pseudo-random number generator. One of three core random number generators that are used
to produce pseudo-random numbers from other distributions.
randg
Standard gamma pseudo-random number generator. One of three core random number generators that
are used to produce pseudo-random numbers from other distributions.
24.5 Logical 173
randn
Standard normal pseudo-random number generator. One of three core random number generators that
are used to produce pseudo-random numbers from other distributions.
random
Generic pseudo-random number generator. Can generate random numbers for the following distribu-
tions:
Beta, Binomial, χ 2 , Exponential, Extreme Value, F , Gamma, Generalized Extreme Value, Generalized
Pareto, Geometric, Hypergeometric, Lognormal, Negative Binomial, Noncentral F , Noncentral t , Non-
central χ 2 , Normal, Poisson, Rayleigh, t , Uniform, Discrete, Uniform, Weibull.
24.5 Logical
all
Returns logical true (1) if all elements of a vector are logical true. If used on a matrix, returns a row vector
containing logical true if all elements of each column are logical true.
any
Returns logical true (1) if any elements of a vector are logical true. If used on a matrix, returns a row vector
containing logical true if any elements of each column are logical true.
find
Returns the indices of the elements of a vector or matrix which satisfy a logical condition.
ischar
isfinite
Returns logical true if the argument is finite. Operates element-by-element on vectors or matrices.
isinf
Returns logical true if the argument is infinite. Operates element-by-element on vectors or matrices.
isnan
Returns logical true if the argument is not a number (NaN). Operates element-by-element on vectors or
matrices.
174 Quick Function Reference
isreal
logical
ans
ans is a special variable that contains the value of the last unassigned operation.
eps
eps is the numerical precision of MATLAB. Numbers differing by more the eps are numerically identical.
Inf
NaN
NaN represents not-a-number. It occurs as a results of performing an operation which produces in indefi-
nite result, such as Inf/Inf.
pi
eye
linspace
logspace
z=logspace(L ,U , N ) returns a 1 by N vector of points logarithmically spaced between 10L and 10U (in-
clusive).
24.8 Vector and Matrix Functions 175
ones
toeplitz
zeros
chol
det
diag
Returns the elements along the diagonal of a square matrix. If the input to diag is a vector, returns a matrix
with the elements of the vector along the diagonal.
eig
inv
kron
trace
tril
triu
cumprod
Qi
Computes the cumulative product of a vector. y = cumprod(x) computes yi = j =1 x j . If used on a
matrix, operates column-by-column.
cumsum
Pi
Computes the cumulative sum of a vector. y = cumsum(x) computes yi = j =1 x j . If used on a matrix,
operates column-by-column.
cat
Concatenates two matrices along some dimension. If x and y are conformable matrices, cat(1,x,y) is the
same as [x; y] and cat(2,x,y) is the same as [x y].
length
numel
Returns the number of elements in a matrix. If the matrix is 2D with dimensions N and M , numel returns
NM.
repmat
reshape
Reshapes a matrix to have a different size. The product of the dimensions must be the same before and
after, hence the number of elements cannot change.
size
Returns the dimension of a matrix. Dimension 1 is the number of rows and dimension 2 is the number of
columns.
24.10 Set Functions 177
intersect
Returns the intersection of two vectors. Can be used with optional ’rows’ argument and same-sized ma-
trices to produce an intersection of the rows of the two matrices.
setdiff
Returns the difference between the elements of two vectors. Can be used with optional ’rows’ argument
and same-sized matrices to produce a matrix containing difference of the rows of the two matrices.
sort
Produces a sorted vector from smallest to largest. If used on a matrix, operates column-by-column.
sortrows
Sorts the rows of a matrix using lexicographic ordering (similar to alphabetizing words).
union
Returns the union of two vectors. Can be used with optional ’rows’ argument and same-sized matrices
to produce an union of the rows of the two matrices.
unique
Returns the unique elements of a vector. Can be used with optional ’rows’ argument on a matrix to select
the set of unique rows.
case
Command which can be evaluated to logical true or false in a switch. . . case. . . otherwise flow control
block.
else
Command that is the default in if . . . elseif. . . else flow control blocks. If none of the if or elseif state-
ment are evaluated to logical true, the else path is followed.
elseif
Command that is used to continue a if . . . elseif. . . else flow control block. Should be immediately
followed by a statement that can be evaluated to logical true or false.
178 Quick Function Reference
end
Command indicating the end of a flow control block. Both if . . . elseif. . . else and switch . . . case. . .
otherwise must be terminated with an end. Also ends loops.
if
Command that is used to begin a if . . . elseif. . . else flow control block. Should be immediately followed
by a statement that can be evaluated to logical true or false.
switch
Command signaling the beginning of a switch . . . case. . . otherwise flow control block. Switch should be
followed by a variable to be used by case.
24.12 Looping
continue
Forces a loop to proceed to the next iteration while bypassing any code occurring after the continue state-
ment.
break
Prematurely breaks out of a loop before the all iterations have completed.
end
All loop blocks must be terminated by an end command. Also ends flow control blocks.
for
One of two types of loops. for loops iterate over a predefined vector unless prematurely ended by break.
while
One of two types of loops. While loops continue until some logical condition is evaluated to logical false
(0) unless prematurely ended by a break or continue command.
24.13 Optimization
fminbnd
Function minimization with bounds. Find the minimum of a function that exists between L and U .
24.14 Graphics 179
fmincon
Constrained function minimization using a gradient based search. Constraints can be linear or non-linear
and equality or inequality.
fminsearch
fminunc
optimget
optimset
24.14 Graphics
axis
Sets or gets the current axis limits of the active figure. Can also be used to tighten limits using the command
axis tight.
bar
bar3
colormap
contour
Produces a contour plot of the levels of z data against vectors of x and y data.
errorbar
Produces a plot of x data against y data with error bars (confidence sets) around each point.
180 Quick Function Reference
figure
Opens a new figure window. When used with a number, for example figure(X X ) opens a window with
label Figure X X where X X is some integer. If a windows with label Figure X X is already open, that figure
is set as the active figure and any subsequent plot commands will operate on Figure X X .
gcf
get
Gets of list of properties from a graphics handle or the value of a property if used with an optional second
argument.
hist
Produces a histogram of data. Can also be used to compute bin centers and height.
legend
mesh
Produces a 3-D mesh plot of a matrix of z data against vectors of x and y data.
pie
plot
plot3
scatter
set
shading
subplot
Command that allows for multiple plots to be graphed on the same figure. Used in conjunction with other
plotting commands, such as subplot(2,1,1); plot(x,y); subplot(2,1,2); plot(y,x);
surf
Produces a 3-D surface plot of a matrix of z data against vectors of x and y data.
title
xlabel
ylabel
zlabel
clock
Returns the current date and time as a 6 by 1 numeric vector of the form [YEAR MONTH DATE HOUR MIN SEC].
date
datenum
datestr
datetick
datevec
Parses date numbers and date strings and returns date vectors of the form [YEAR MONTH DATE HOUR MIN SEC].
etime
Can be used to compute the elapsed time between two readings from clock.
now
tic
Begins a tic-toc timing loop. Useful for determining the amount of time required to run a section of code.
toc
x2mdate
char
double
num2str
str2double
str2num
strcat
Horizontally concatenates two or more strings. Equivalent to [string1 string2] for strings with the same
number of rows.
strcmp
strcmpi
strfind
strmatch
strncmp
strncmpi
strvcat
Vertically concatenates two or more strings. If the strings have different numbers of columns, right pads
the shorter string with blanks.
cos
Computes the cosine of a scalar, vector or matrix. Operates element-by-element on vectors or matrices.
sin
Computes the sine of a scalar, vector or matrix. Operates element-by-element on vectors or matrices.
184 Quick Function Reference
cd
Change directory. When used with a directory, changes the working directory to that directory. When
called as cd .., changes the working directory to its parent. If the desired directory has a space, use the
function version cd(’c:\dir with space\dir2\dir3’).
delete
Deletes a file from the present working directory. Warning: This command is dangerous; files deleted are
permanently gone and not in the Recycle Bin.
dir
mkdir
pwd
rmdir
Removes a child directory in the present working directory. Child directory must be empty.
clc
clear
Clears variables from memory. clear and clear all remove all variables from memory, while clear var1
var2 . . . removes only those variables listed.
clf
close
Closes figure windows. Can be used to close all figure windows by calling close all.
24.19 MATLAB Specific 185
doc
When used as doc function, opens the help browser to the documentation of function. When used alone
(doc) opens the help browser.
edit
Launches the built-in editor. If called using edit filename, opens the editor with filename.m or, if file-
name.m does not exist on the MATLAB path, creates the file in the current directory.
format
Changes how numbers are represented in the command windows. format long shows all decimal places
while format short only shows up to 5. format short is the default.
help
Displays inline help for calling a function (help function). Also can be used to list the function in a toolbox
(help toolbox) or to list toolboxes (help).
helpbrowser
Opens the integrated help system for MATLAB at the last viewed page.
helpdesk
Opens the integrated help system for MATLAB at the home page.
keyboard
Allows functions to be interrupted for debugging. After verifying function operation, use return to con-
tinue running.
profile
Built-in MATLAB profiler. Reports code dependencies, timing of executed code and provides tips for im-
proving the performance of m-files. Has four important variants:
• profile report opens the profiling report which contains statics on the performance on code exe-
cuted since profile on was called. Does not stop the profiler.
• profile viewer turns the profiles off and opens the profiling report which contains statics on the
performance on code executed since profile on was called
186 Quick Function Reference
realmax
Returns the largest number MATLAB is capable of represented. Larger numbers are Inf.
realmin
Returns the smallest positive number MATLAB is capable of representing. Numbers closer to 0 are 0.
which
When used in combination with a function name, returns full path to function. Useful if there may be
multiple functions with same name on the MATLAB path.
whos
Returns a list of all variables in memory along with a description of type and information on size and
memory requirements.
24.20 Input/Output
csvread
csvwrite
fclose
fgetl
Reads the current file until an end-of-line character is encountered, returning a string representing the
line without the end-of-line character.
fopen
Opens a file for low level reading (using e.g. fgetl) or writing (using e.g. fprintf).
fprintf
load
Loads the contents of a MATLAB data file (.mat) into the current workspace. Can also be used to load
simple text files.
24.20 Input/Output 187
save
Saves variables to a MATLAB data file (.mat). Can also be used to save tab delimited text files. Can be
combined with -ascii -double to produce a tab delimited text file.
textread
textscan
Reads formatted text. Can read into cell arrays and from specific points in a file.
xlsfinfo
xlsread
Reads variables in .xls files. All data should be numeric, although it does contain methods which allow for
text to be read.
xlswrite
Bollerslev, T. & Wooldridge, J. M. (n.d.), ‘Quasi-maximum likelihood estimation and inference in dynamic models with time-
varying covariances’, 11(2), 143–172.
Jagannathan, R., Skoulakis, G. & Wang, Z. (n.d.), The analysis of the cross section of security returns, in Y. Aït-Sahalia & L. P.
Hansen, eds, ‘Handbook of financial econometrics’, Vol. 2, Elsevier B.V., pp. 73–134.
Index
B cummax, 24
length, 21
D
size, 21
Data Importing
break, 53, 178
csvread, 100
bsxfun, 37
dlmread, 100
C load, 101
case, 48, 177 readtable, 96
cat, 176 textscan, 103
categoricals, 111 xlsflinfo, 99
cd, 140, 184 xlsread, 99
cdf, 172 date, 181
ceil, 170 Date Functions
cell, 93 x2mdate, 100
Cell Arrays, 92–94 datenum, 78, 103, 181
cell2table, 109 datestr, 79, 182
char, 85, 182 datetick, 81, 182
INDEX 191
cumprod, 24 nan, 33
log, 26 NOT, 42
prod, 23
O
sqrt, 26
ones, 33, 175
sum, 22
Operator Precedence, 18
MATLAB Path, 142
optimget, 179
Matrices
Optimization, 131–138
Math, 15–18
Bounded, 133
Addition, 15
Constrained, 134
Division, 16
Derivative-based, 132
Dot Operations, 17
Derivative-free, 133
Multiplication, 16
optimset, 138, 179
Operator Precedence, 18
OR, 42
Subtraction, 15
otherwise, 48
Transpose, 18
outerjoin, 115
max, 23, 170
mean, 26, 170 P
mesh, 65 pdf, 172
min, 23, 170 Performance, 145
mkdir, 141, 184 pi, 174
M-Lint, 149 pie, 180
mod, 170 plot, 57, 180
movefile, 141 plot3, 63, 180
INDEX 193
U
union, 177
unique, 177
upper, 86
V
var, 27, 172
varargin, 123
varargout, 123
varfun, 116
Variable Names, 7
W
which, 2, 186
while, 52, 178
whos, 186
width, 116