Climate Data Analysis Tools (CDAT) (PDFDrive)
Climate Data Analysis Tools (CDAT) (PDFDrive)
Version 3.3
11/1/02
Legal Notice
Permission to use, copy, modify, and distribute this software for any pur-
pose without fee is hereby granted, provided that this entire notice is
included in all copies of any software which is or includes a copy or mod-
ification of this software and in all copies of the supporting documenta-
tion for such software.
DISCLAIMER
CHAPTER 1 CDAT 1
Introduction 1
Downloading CDAT 2
Installing CDAT 3
System Requirements 3
How to use this guide 3
1.1 Introduction
Climate Data Analysis Tools (CDAT) is a software
infrastructure that uses an object-oriented scripting language to link
together separate software subsystems and packages, thus forming an
integrated environment for solving model diagnosis problems. The
power of the system comes from Python and its ability to seamlessly
interconnect software. Python provides a general purpose and full-
featured scripting language with a variety of user interfaces including
command-line interaction, stand-alone scripts (applications) and
graphical user interfaces (GUI). The CDAT subsystems, implemented
as modules, provide access to and management of gridded data
(Climate Data Management System or CDMS); large-array
numerical operations (Numerical Python); and visualization
(Visualization and Control System or VCS).
Having gone through the tutorial, the next step is to become aware of
CDAT’s features. Chapter 3 entitled "What’s in CDAT?" addresses
this. To illustrate the features of CDAT, we have tried to address some
of the tasks that a climate scientist may wish to accomplish.
Therefore, the organization of tasks is broken up along the lines of
File I/O, creating databases and accessing data from them, data
extraction, altering variables and metadata, regridding data, spatial
and temporal averaging, statistics, and visualization. In addition to
these, other sample scripts which describe such tasks as making use
of exiting Fortran or C code, and interfacing to specialized packages
such as spherepack and EOFs, provide the reader with a flavor for
what is already possible to do with CDAT and also how easy it is to
leverage off previous efforts. The final chapter in this guide explains
how you can contribute to CDAT.
CDAT uses the Python language as a glue to link together the various tools. Python has
one of the fastest growing user bases among programming languages today and is extensively
documented online and in publications. It must be stressed that it is not essential to have any
knowledge of Python before being able to use CDAT tools. The Graphical User Interface
(VCDAT) can be used with no knowledge of Python and is the first topic introduced in this
chapter. This is highly recommended for beginners. For the more adventurous beginner who
wishes to learn the use of scripting capabilities, a brief background of the Python language
and syntax is provided in the next section.
Online video tutorials are available through the CDAT home page at
http://cdat.sf.net. These short streaming video tutorials are a
convenient way to see VCDAT in action. VCDAT was designed to be
used from left-to-right and top-to-bottom and has on-line help
balloons to assist the user in navigating through the interface. That is,
if the user becomes unsure of what to do at any give time, then by
moving the mouse over the region of question and letting it rest will
result in a help balloon popup with information to assist the user.
VCDAT allows the user to enter command line instructions and logs
most button click instructions in a script file for later reference.
Although it is not required to learn CDAT scripting in order to use
VCDAT, the interface can be used as a CDAT script-learning tool by
translating every button press and keystroke into a script file. All
scripts include comment lines to assist the user. The script file can be
modified, saved, and executed by the user. Thus, helping the user
learn how to read and write CDAT scripts. This facility also allows
the non-interactive repetition of common tasks.
Main Menu
Select
Variable
Graphics
Dimensions
Defined
Variables
Function
Icons
Operational
Icons
Variable
Information
user can still send the unseen plot to the printer or to a graphics
output file. See "Main Menu" -> "Save Plot As" and "Main Menu"
-> "Print Plot On" for sections on file output and printer selections.
Also, view the VCS document for details on the
PCMDI_GRAPHICS directory and the HARD_COPY file.
• Options: The "Options" menu displays additional GUIs for
controlling the outcome of the plot. These plot options are:
"Continents Types", "Page Orientation", "Overlay", "Isoline
Labels", "Annotation", "Set VCS Canvas Geometry", "Set Min
Max Values", "Set Graphics Method Attributes", Number of Plots
on VCS Canvas", "Set Plot Map Projection", "Editors",
"Animate", "Clear VCS Canvas", and "Close VCS Canvas".
• Define: After the "Variable" has been selected, the user can select
the "Define" button to transfer the variable from a file or remote
database to memory. The selected variable will be visible in the
"Defined Variables" below.
The "Define" button is also used to save a defined variable under a
new name and can be used to overwrite an existing defined
variable. That is, in the "Defined Variable List" window, select a
defined variable, and then select the "Define" button. A pop up
will appear with simple instructions for both actions.
slider changes the last selected node value. The values below the
second slider bar represent the first selected node value and the
last selected node value.
• Dimension Functions: Each dimension can have a function
operate on it exclusively. Select the choice menu button for a
detailed description of each function operation. The last two
functions "awt" and "gtm" are slightly different. "awt" allows the
user to replace the weights before operating the weighted average
function. The new weights must be located in the "Defined
Variables" window before using this function. The geometrical
mean "gtm" is generated by the following function: gtm(x) =
exp(mean ( log(x)))).
Note that everything associated with defined variables has the same
background color as the "Define" button located in the "Graphics"
section.
• Operational Icons: The first column of icons (from top to bottom)
allows the user to: edit the variable's attributes, save the variable to
The output of the program looks less than ideally formatted. To make
it look better, we can make use of format strings. For example:
>>> fout.write("hello\n")
# or equivalently
>>> print >>fout, "hello"
Note that you can mix items of any type in a list. You can also have
lists nested inside lists.
>>> my_other_list = ["a", "b", [1,2,3], "d"]
The lists are indexed by integers starting with zero. To see the first
item in mylist, you would:
>>> firstitem = mylist[0]
# returns the item "a"
# To set an item:
>>> mylist[2] = "x"
# Changes the item in the index position 2.
>>> print mylist
["a", 1.0, "x", 4]
Similarly, individual items can be removed from the list using the
remove() method:
>>> mylist.remove(1.0)
>>> print mylist
["a", "x", 4, "nextitem"]
2.2.5 Loops
We saw the simple loop using the while statement in the previous
examples. Other looping constructs such as the for statement are
available to the user. It is important to note that statements within the
loop are indented. An example of its usage is:
>>> for i in range(3):
>>> print "10 raised to ", i, " is ", 10**i
# will produce the result
10 raised to 0 is 1
10 raised to 1 is 10
10 raised to 2 is 100
One can also loop through lists using the for statement so:
>>> mylist = [‘a’, ‘b’, 3]
>>> for item in mylist:
>>> print item
# produces the output
a
b
3
2.2.6 Dictionaries
and you can check the dictionary membership by using the has_key()
method:
>>> if a.has_key("model_of"):
>>> print a["model_of"]
>>> else:
>>> print "Unknown model of"
2.2.7 Functions
Note once again that the statements inside the function are indented
after the “def” statement. To invoke the function we do the following:
>>> addvalue = myadd(2, 5)
The function definition can also be done in a way such that default
values for input parameters can be set.
def myaddsub(a, b, base=10):
c = a*base + b
d = a*base - b
return c, d
Then, when you need base to take a different value than the default
10, you can
>>> addvalue, subvalue = myaddsub(2, 5, base=2)
2.2.8 Modules
To keep your programs manageable as they grow in size, you
may want to break them up into several files. Python allows you to
put multiple function definitions into a file and use them as a module
that can be imported into other scripts and programs. These files must
have a .py extension. For example
# file my_function.py
def minmax(a,b):
if a <= b:
min, max = a, b
else:
min, max = b, a
return min, max
To use the above module in other programs, you would use the
import statement.
>>> import my_function
>>> x,y = my_function.minmax(25, 6.3)
2.4.1 getting_started_tutorial.py
This tutorial is the first one to study. The tutorial consists of
three examples:
This example shows how to generate data masks, applying masks and
averaging using area weights.
2.4.2 times_tutorial.py
This tutorial demonstrates the uses of the time averaging
functions. Basic examples cover topics such as computing the
December-February seasonal means, computing the climatology,
anomalies (departures) from climatology, construction of seasons not
already defined, customization, and use of powerful criteria to
specify minimum temporal coverage and data distribution.
2.4.3 statistics_tutorial.py
This tutorial covers some of the basic statistical functions such
as rms, correlation, mean absolute difference and their usage with
climate data.
2.4.4 vcs_tutorial.py
This tutorial guides you through some basic plotting functions
and features to visualize the data and produce presentation quality
output. This is by no means an exhaustive demonstration of features -
just a very basic set of capabilities are addressed. Specific examples
cover animations, creating and altering graphics methods, creating
and altering display templates, producing output files for printing and
displaying and changing colormaps etc.
2.4.5 xmgrace_tutorial.py
In this tutorial, the interface to the XmGrace utility is
demonstrated by showing simple plot generation and customization.
XmGrace is a plotting tool developed independent of CDAT and has
a wide user base. To download and install XmGrace, see the Grace
home page at http://plasma-gate.weizmann.ac.il/Grace
pydoc -g starts an HTTP server and also pops up a little window for
controlling it.
>>> binaryio.binclose(iunit)
For computing tasks, variables can be used much like arrays. The
common arithmetic functions are defined for variables, as well as I/O
and slicing operations. One advantage of using CDMS variables for
computation is that the associated domain and metadata information
is carried along with the computation. The benefit of this approach is
that, for example, if we average over time, and then plot the result,
the plotting routines can still be aware that the other dimensions
represent latitude and longitude and draw the continental outlines, do
projections correctly, etc. If the result were merely the averaged data,
that interpretation of it would have been lost. This frequently results
in much simpler scripts than otherwise would be the case.
NumPy Array
Numeric
+ + + Metadata
If you want to control the type of the data, you can supply a typecode,
usually in the form of one of the abstract constants supplied in
Numeric for this purpose. For example,
>>> y = Numeric.array([1,2,3], Numeric.Float)
3.1.2.2 Numeric to MA
Transient
Variable
(MV)
MV.array MV.array
MV Functions
Masked
Array
(MA)
MA.array MA Functions
MA.filled MV.filled
Numeric
Array
Numeric Functions
Two time types are available: relative time and component time.
Relative time is time relative to a fixed base time. It consists of:
• a units string, of the form "units since basetime", and
• a floating-point value
For example, the time "28.0 days since 1996-1-1" has value=28.0,
and units="days since 1996-1-1" Component time consists of the
integer fields year, month, day, hour, minute, and the floating-point
field second. A sample component time is 1996-2-28 12:10:30.0
NH | NorthernHemisphere
SH | SouthernHemisphere
Example:
>>> from cdutil import region
>>> t_northern_hemisphere_only = f(’t’, region.NH)
3.1.8 Databases
Examples:
>>> import cdms, cdutil
>>> f = cdms.open(’data_file_name’)
>>> result = cdutil.averager(f(’var_name’), axis=’1’)
# extracts the variable ’var_name’ from f
# and averages over the dimension whose position is 1.
# Since no other options are specified,
# defaults kick in i.e weights=’generate’ (same as
# weights=’weighted’) and returned=0
# Some ways of using the averager are shown below.
#
# A quick zonal mean calculation would be:
>>> V_zonal_ave = cdutil.averager(V, axis=’x’)
# In the above case, default weights option of
Note: When averaging data with missing values, extra care needs to
be taken. It is recommended that you use the default
weight='generate' option. This uses cdutil.area_weights(V) to get
the correct weights to pass to the averager.
However, the area_weights function requires that the axis bounds are
stored or can be calculated. In the case that such weights are not
stored with the axis specifications (or the user desires to specify
weights from another source), the use of combinewts option can
produce the same results. In short, the following two are equivalent:
>>> xavg_1 = averager(X, axis = 'xy', weights =
area_weights(X))
>>> xavg_2 = averager(X, axis = 'xy', weights =
['weighted', 'weighted', 'weighted'],
combinewts=1)
The following example will help you see the averager() function in
context
In the above example, we averaged over the latitude axis first (using
generated weights) and then over the longitude axis (using equal
weights). The weights can be "equal" or "generate"(generates the
weights for the grid information contained in the variable) or any
array of numbers the user wishes to apply.
Now that we have our climatology over the desired period we can to
compute anomalies over the full period relative to that climatology.
>>> djfdep2 = cdutil.DJF.departures(s, ref=djfclim)
In the case of computing an annual mean, having data only in Jan and
Dec months leads to a centroid value of 0 for the regular centroid, and
the resulting annual mean for the year is biased toward the winter. In
this situation, you should use a cyclical centroid where the circular
nature of the year is recognised and the centroid is calculated
accordingly. Here are some examples of typical usage:
Example 1
Example 2
Example 3
Example 4
where "Values" and "Errors" are tuples containing answer for slope
AND intercept. You can break them as follows. slope, intercept =
Value and slope_error, intercept_error = Errors. i.e.
>>> (slope, intercept), (slope_error, intercept_error) =
\ linearregression(y, error=1)
To get the standard error non adjusted result for slope only, do the
following:
>>> slope, slope_error = linearregression(y, error=1,
nointercept=1)
useful tool for other scientific applications as well. VCS allows wide-
ranging changes to be made to the data display, provides for
presentation hardcopy output, and includes a means for recovery of a
previous display.
Tables for manipulating these primary objects are stored in VCS for
later recall and possible use. In addition, detailed specification of the
primary objects' attributes is provided by eight "secondary objects"
(or "secondary elements"):
1. colormap: specification of combinations of 256 available colors
2. fillarea: style, style index, and color index
3. format: specifications for converting numbers to display strings
4. line: line type, width and color index
5. list: a sequence of pairs of numerical and character values
6. marker: marker type, size, and color index
7. texttable: text font type, character spacing, expansion and color index
8. textorientation: character height, angle, path, and horizontal/vertical
alignment
The plot will display the data "my_data" using default settings.
However, the user can control every aspect of the plot’s appearance
individually. The first of those is the "Graphics Method" described in
the next section.
with more on the way. Each graphics method has its own unique set
of attributes (or members) and functions. They also have a set of core
attributes that are common in all graphics methods. The descriptions
of the current set of graphics methods are as follows:
• boxfill - The boxfill graphics method draws color grid cells to
represent the data on the VCS Canvas.
• isofill - The isofill graphics method fills the area between selected
isolevels (levels of constant value) of a two-dimensional array
with a user-specified color.
• isoline - The isoline graphics method draws lines of constant value
at specified levels in order to graphically represent a two-
dimensional array. It also labels the values of these isolines on the
VCS Canvas.
• outfill - The outfill graphics method fills a set of integer values in
any data array. Its primary purpose is to display continents by
filling their area as defined by a surface type array that indicates
land, ocean, and sea-ice points.
• outline - The Outline graphics method outlines a set of integer
values in any data array. Its primary purpose is to display
continental outlines as defined by a surface type array that
indicates land, ocean, and sea-ice points.
• scatter - The scatter graphics method displays a scatter plot of two
4-dimensional data arrays, e.g. A(x,y,z,t) and B(x,y,z,t).
• vector - The Vector graphics method displays a vector plot of a 2D
vector field. Vectors are located at the coordinate locations and
point in the direction of the data vector field. Vector magnitudes
are the product of data vector field lengths and a scaling factor.
• xvsy - The XvsY graphics method displays a line plot from two
1D data arrays, that is X(t) and Y(t), where ’t’ represents the 1D
coordinate values.
• xyvsy - The Xyvsy graphics method displays a line plot from a 1D
data array, i.e. a plot of X(y) where ’y’ represents the 1D
coordinate values.
Say for example you would like to plot the data object "my_object"
using the "isofill" graphics method (instead of the default "boxfill"
method), you would type:
>>> v.isofill(my_data)
The user can control any aspect of the isofill method to get the precise
appearance on the plot. To alter the isofill methods for use in your
plot it will be necessary to "create" an isofill object. The create
functions allow the user to create VCS objects which can be modified
directly to produce the desired results. Since the VCS "default"
objects allow for modifications, it is best to either create a new VCS
object or get an existing one. When a VCS object is created, it is
stored in an internal table for later use and/or recall.
>>> my_new_isofill = v.createisofill(’newisofillname’)
If there is an existing isofill method you have created and would like
to use it (or alter it), then type:
>>> my_old_isofill =
v.getisofill(’existing_isofillobjectname’)
You can explicitly set each of the attributes. For example, to change
the levels to be used in plotting:
>>> my_old_isofill.levels = [2., 3., 4., 6]
or
>>> print v.plot.__doc__
3.3.2.3 Templates
A picture template determines the location of each picture
segment, the space to be allocated to it, and related properties
Templates also can be created and altered just like the graphics
methods seen in the previous section. The syntax is retained the same
for all objects in VCS so as to avoid confusion. The general
philosophy behind templates is - "You should be able to specify the
behaviour of every picture segment - text, data, line, etc. precisely
according to your needs."
• units : The data units. For example: "Degrees C" You can set the
x,y location, priority and text object (more later on text objects).
• dataname : The name of the variable.
• source : The data source description.
• function : If computed data, the user can give the algebraic
equation.
• file : The location of the file.
• crdate : The date of creation of plot. You can control x,y location,
priority, and text object.
• crtime : The time of creation of plot. You can control x,y location,
priority, and text object.
• mean, max, min : The values are computed automatically from
the data you are plotting and you can set the x,y location, text
object, priority and display format.
• legend : You can set the legend bar location and dimensions (x1,
x2, y1, y2), the line type object, text object and of course priority.
• comment1, comment2, comment3, comment4 : Four text
comments can be drawn. For each of these you can set priority,
location(x, y), and text object.You would have to set the comment
on the data. Say you are plotting "tdata" using the x.plot(tdata)
command, then you need to have done:
>>> tdata.comment1 = 'Your specified comment1"
• line1, line2, line3, line4 : Four lines can be drawn. For each of
these lines you can set priority, start location (x1, y1), end location
(x2, y2) and line object.
• box1, box2, box3, box4 : Four boxes can be drawn. Same as line
except the x1, x2, y1, y2 settings refer to corners of the box.
• xname, yname, zname, tname : These are the possible axes of
x,y,z and t. Note in the case you are trying the plot is a latitude x
longitude plot. If it was a timexlongitude plot then you would be
setting the xname and tname values. You can set the x,y location
for the name, the priority and text object.
• xunits, yunits, zunits, tunits : These are the respective axis units
for whichyou can set the x,y location, text object, and priority .
• xvalue, yvalue, zvalue, tvalue :
• xlabel1, xlabel2 : The x axis labels (bottom and top of your plot)
can be independently set. You can specify the y location (x is
determined by the data), priority and the text object.
• ylabel1, ylabel2 : Same as above except for the left and right side
of your plot. Here you can only specify the x location of the label.
• xtic1, xtic2 : The major tic marks on the bottom and top. You can
control the priority, y1 and y2 (in effect specifying the length!),
and line object (Things like line style, thickness, arrow, etc. etc.
More on the line object later).
• xmintic1, xmintic2: The minitic specifications. Exactly the same
as above otherwise.
• ytic1, ytic2 : The major tic marks on the left and right. You can
control the priority, x1 and x2 (in effect specifying the length!),
and line object.
• ymintic1, ymintic2: The minitic specifications. Exactly the same
as above otherwise.
3.3.2.4 Animation
VCS allows the user to animate the contents of the VCS
Canvas. This function pops up the animation GUI which lets the user
control all aspects of the animation.
>>> v.plot(array,’default’,’isofill’,’quick’)
>>> v.animate.gui()
3.3.2.5 Output
Before attempting to print your plots, make sure that gplot is
built and installed on your system. The VCS graphics can be output
to files of various formats or directly to printers with postscript
capability. The available graphics file formats that one can print to are
postscript, encapsulated postscript (EPS), GIF, CGM, and raster. To
print directly to a printer and optionally specifying a portrait
orientation:
>>> v.printer(’printer_name’, ’p’)
3.4.6.1 Pyfort
Pyfort is a tool for connecting Fortran (Fortran90) routines to
Python (www.python.org). Pyfort translates an input file that
describes the Fortran functions and subroutines you wish to access
from Python into a C language source file defining a Python module.
Fortran was changed significantly by the introduction of the Fortran
90 standard. We will use the phrase "modern Fortran" to indicate
versions of Fortran from Fortran 90 onwards. Pyfort’s input uses a
syntax that is a subset of the modern Fortran syntax for declaring
routines and their arguments. The current release does not yet support
modern Fortran’s "explicit-interface" routines. However, the tool was
designed with this in mind for a future release. Pyfort can in most
cases also build and install the extension you create.
they solve. Pearu Petersen has developed a tool that generates the C/
API modules containing wrapper functions of Fortran routines. This
tool is called F2PY - Fortran to Python Interface Generator. It is
completely written in Python language and can be called from the
command line as f2py. F2PY is released under the terms of GNU
LGPL. The F2PY package and documentation can be downloaded
from http://cens.ioc.ee/projects/f2py2e/
3.4.8 ort
Read data from an Oort file.
3.4.9 trends
Computes variance estimate taking auto-correlation into
account.
3.4.10 pyclimate
This package - also python based, is written and maintained at
the Universidad del País Vasco in Bilbao, Spain. It provides useful
statistical functions for climate applications.
CDAT is a collaboration. You can be part of the collaboration. You don’t need permis-
sion or PCMDI’s approval. You don’t need PCMDI’s programmers to add your algorithms.
Here are descriptions of how to contribute packages the packages contributed to the "contrib"
section of the CDAT source.
If you have the source distribution, use the README files in the sub-
directories of the contrib directory for full documentation. Alter-
nately, type
% pydoc -w <name_of_package>
A
adding your packages 75
Animate 62
Animation 69
Area averaging 50
arrays 38
ASCII text files 35
asciidata 35
autocorrelation 58
autocovariance 58
axis 45
B
base time 43
binaryio 35
boxfill 63
bug-tracking facility 31
C
CDAT Home Page 2, 31
CDAT Website 31
cdms 36
cdtime 43, 44
cdutil 50, 53
centroid function 56
CGM 62
colormap 61, 62
component time 43
contrib 76
corrrelation 58
covariance 58
criteriaarg 56
custom seasons 54
D
data, conversion to Numeric 42
databases 49
docstrings 28
documentation 28
documentation, run-time 76
DODS 2
domain 36, 45
E
Empirical Orthogonal Functions 72
Encapsulated Postscript 62
eof (package) 72
F
F2PY 73
File I/O. 33
file variable 37
fill area 61
fillareaobject 66
format 61
Fortran formatted data 34
fpig 73
G
Generating weights 53
genutil 58
geometric mean 58
GIF 62
Grace 60, 71
GrADS 74
grads (module) 74
GrADS/GRIB 33
graphics method 61
Graphics Methods 62
Graphics Primitives 66
H
happydoc 30, 76
HDF 33
I
isofill 63
isoline 63
L
laggedcorrelation 58
laggedcovariance 58
Learning Python 17
line 61
linearregression 58
lineobject 66
list 61
L-moments 72
M
MA 38
mailing list 31
marker 61
markerobject 66
Masked Array 37
masked arrays 38
Masked Variable 37
masks 49
mean absolute difference 58
metadata 36
missing values 38
MV 38
N
netCDF 33
Numeric 38
Numeric array 37, 39
Numerical Python 28, 39
O
OPeNDAP 2, 34
orientation 62
ort (package) 74
outfill 63
outline 63
P
picture template 61
Postscript 62
primary objects 61
projections 62
pyclimate 74
pydoc 29, 76
Pyfort 73
Python 5
Python scripts 71
R
Raster 62
regrid 47
regridder function 48
Regridding 47
Regridpack 72
regridpack (package) 72
relative time 43
rms 58
S
scatter 63
Scientific Python 34
Script 70
secondary objects 61
Selectors 46
self-describing formats 33
singleton dimensions 46
sphere (package) 72
Spherepack 72
squeeze 46
standard deviation 58
statistical functions 58
statistics_tutorial.py 58
SWIG 75
System Requirements 3
T
Templates 66
text 61
text orientation 61
textobject 66
Time averaging 53
times_tutorial.py 57
transient variable 37
trends (package) 74
U
unformatted data 35
V
variable 36
variables 38
variance 58
VCDAT 5, 28
VCS 60
VCS scripts 71
VCS, scripting 70
vector 63
Visualization 60
Visualization Control System 60
X
xmgrace (package) 71
xml 33
xvsy 63
Z
zoom 62