Mathematica 9
Mathematica 9
DATA MANIPULATION
For use with Wolfram Mathematica 7.0 and later. For the latest updates and corrections to this manual: visit reference.wolfram.com For information on additional copies of this documentation: visit the Customer Service website at www.wolfram.com/services/customerservice or email Customer Service at [email protected] Comments on this manual are welcomed at: [email protected] Printed in the United States of America. 15 14 13 12 11 10 9 8 7 6 5 4 3 2
2008 Wolfram Research, Inc. All rights reserved. No part of this document may be reproduced or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the copyright holder. Wolfram Research is the holder of the copyright to the Wolfram Mathematica software system ("Software") described in this document, including without limitation such aspects of the system as its code, structure, sequence, organization, look and feel, programming language, and compilation of command names. Use of the Software unless pursuant to the terms of a license granted by Wolfram Research or as otherwise authorized by law is an infringement of the copyright. Wolfram Research, Inc. and Wolfram Media, Inc. ("Wolfram") make no representations, express, statutory, or implied, with respect to the Software (or any aspect thereof), including, without limitation, any implied warranties of merchantability, interoperability, or fitness for a particular purpose, all of which are expressly disclaimed. Wolfram does not warrant that the functions of the Software will meet your requirements or that the operation of the Software will be uninterrupted or error free. As such, Wolfram does not recommend the use of the software described in this document for applications in which errors or omissions could threaten life, injury or significant loss. Mathematica, MathLink, and MathSource are registered trademarks of Wolfram Research, Inc. J/Link, MathLM, .NET/Link, and webMathematica are trademarks of Wolfram Research, Inc. Windows is a registered trademark of Microsoft Corporation in the United States and other countries. Macintosh is a registered trademark of Apple Computer, Inc. All other trademarks used herein are the property of their respective owners. Mathematica is not associated with Mathematica Policy Research, Inc.
Contents
Files, Streams, and External Operations
Reading and Writing Mathematica Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . External Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Streams and Low-Level Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naming and Finding Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Files for Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manipulating Files and Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reading Textual Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Searching Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Searching and Reading Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Binary Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generating C and Fortran Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Splicing Mathematica Output into External Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 8 12 18 26 27 28 36 41 44 47 48
Image Processing
Image Creation and Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic Image Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Processing by Point Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Processing by Area Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 63 66 71
read in a file of Mathematica input, and return the last expression in the file display the contents of a file write an expression to a file
Expand@Hx + yL ^ 3D >> tmp Here are the contents of tmp. They can be used directly as input for Mathematica.
In[2]:=
FilePrint@"tmp"D
<< tmp
x3 + 3 x2 y + 3 x y2 + y3
Data Manipulation
FilePrint@"ExampleDatafactors"D
(* Factors of x^20 - 1 *) (-1 + x)*(1 + x)*(1 + x^2)*(1 - x + x^2 - x^3 + x^4)* (1 + x + x^2 + x^3 + x^4)*(1 - x^2 + x^4 - x^6 + x^8)
This reads in the file, and returns the last expression in it.
In[2]:= Out[2]=
<< ExampleData/factors
H-1 + xL H1 + xL I1 + x2 M I1 - x + x2 - x3 + x4 M I1 + x + x2 + x 3 + x4 M I1 - x2 + x4 - x6 + x8 M
If Mathematica cannot find the file you ask it to read, it prints a message, then returns the symbol $Failed .
In[19]:=
<< faxors
Get::noopen : Cannot open faxors.
Out[19]= $Failed
When you read in a file with << file, Mathematica returns the last expression it evaluates in the file. You can avoid getting any visible result from reading a file by ending the last expression in the file with a semicolon, or by explicitly adding Null after that expression. If Mathematica encounters a syntax error while reading a file, it reports the error, skips the remainder of the file, then returns $Failed . If the syntax error occurs in the middle of a package which uses BeginPackage and other context manipulation functions, then Mathematica tries to restore the context to what it was before the package was read.
Data Manipulation
When you use expr >>> file, Mathematica appends each new expression you give to the end of your file. If you use expr >> file, however, then Mathematica instead wipes out anything that was in the file before, and then puts expr into the file.
This writes an expression to the file tmp.
In[4]:=
In[5]:=
FilePrint@"tmp"D
In[7]:=
FilePrint@"tmp"D
(-1 + x)*(1 + x)*(1 - x + x^2)*(1 + x + x^2) (-1 + x)*(1 + x)*(1 + x^2)*(1 + x^4)
If you are familiar with command-line operating systems, you will recognize the Mathematica redirection operators >>, >>> and << as being analogous to the command-line operators >, >> and <.
OutputForm@Factor@x ^ 6 - 1DD >> tmp The expression in tmp is now in output format.
In[9]:=
FilePrint@"tmp"D
2 2 (-1 + x) (1 + x) (1 - x + x ) (1 + x + x )
Data Manipulation
Save @" file",symbolD Save @" file"," form"D Save @" file","context`"D Save @" file",8object1 ,object2 ,<D
Saving definitions in plain text files.
a = 2 - x^2
2 Out[51]= 2 - x
In[53]:=
FilePrint@"afile"D
a = 2 - x^2
This defines a function f which depends on the symbol a previously defined.
In[54]:=
In[55]:=
Save@"ffile", fD
Data Manipulation
The file contains not only the definition of f itself, but also the definition of the symbol a on which f depends.
In[56]:=
FilePrint@"ffile"D
Clear@f, aD You can reinstate the definitions you saved simply by reading in the file ffile.
In[58]:=
<< ffile
2 Out[58]= 2 - x
The function Save makes use of the output forms Definition and FullDefinition, which print as definitions of Mathematica symbols. In some cases, you may find it convenient to use these output forms directly.
The output form Definition @ f D prints as the sequence of definitions that have been made for f.
In[59]:= Out[59]=
Definition@fD
FullDefinition@fD
When you define a new object in Mathematica, your definition will often depend on other objects that you defined before. If you are going to be able to reconstruct the definition of your new object in a subsequent Mathematica session, it is important that you store not only its own definition, but also the definitions of other objects on which it depends. The function Save looks
Data Manipulation
through the definitions of the objects you ask it to save, and automatically also saves all definitions of other objects on which it can see that these depend. However, in order to avoid saving a large amount of unnecessary material, Save never includes definitions for symbols that have the attribute Protected. It assumes that the definitions for these symbols are also built in. Nevertheless, with such definitions taken care of, it should always be the case that reading the output generated by Save back into a new Mathematica session will set up the definitions of your objects exactly as you had them before.
Encode@"source","dest",MachineID->"ID"D
Factor@x ^ 2 - 1D >> tmp This writes an encoded version of the file tmp to the file tmp.x.
In[62]:=
Encode@"tmp", "tmp.x"D
Data Manipulation
Here are the contents of the encoded file. The only recognizable part is the special Mathematica comment at the beginning.
In[63]:=
FilePrint@"tmp.x"D
(*!1N!*)mcm _QZ9tcI1cfre*Wo8:) P
Even though the file is encoded, you can still read it into Mathematica using the << operator.
In[64]:=
<< tmp.x
Out[64]= H-1 + xL H1 + xL
save definitions for a symbol in internal Mathematica format save definitions for all symbols in a context
If you have to read in very large or complicated definitions, you will often find it more efficient to store these definitions in internal Mathematica format, rather than as text. You can do this using DumpSave .
This saves the definition for f in internal Mathematica format.
In[22]:= Out[22]=
DumpSave@"ffile.mx", fD
8f<
<< ffile.mx
<< recognizes when a file contains definitions in internal Mathematica format, and operates accordingly. One subtlety is that the internal Mathematica format differs from one computer system to another. As a result, .mx files created on one computer cannot typically be read on another.
Data Manipulation
If you use DumpSave @" package`", D then Mathematica will write out definitions to a file with a name like package.mx system package.mx, where system identifies your type of computer system.
This creates a file with a name that reflects the name of the computer system being used.
In[24]:= Out[24]=
DumpSave@"gffile`", fD
8f<
<< automatically picks out the file with the appropriate name for your computer system.
In[25]:=
<< gffile`
External Programs
On most computer systems, you can execute external programs or commands from within Mathematica. Often you will want to take expressions you have generated in Mathematica, and send them to an external program, or take results from external programs, and read them into Mathematica. Mathematica supports two basic forms of communication with external programs: structured and unstructured.
Structured communication Unstructured communication use MathLink to exchange expressions with MathLinkcompatible external programs use file reading and writing operations to exchange ordinary text
The idea of structured communication is to exchange complete Mathematica expressions to external programs which are specially set up to handle such objects. The basis for structured communication is the MathLink system, discussed in "MathLink and External Program Communication". Unstructured communication consists in sending and receiving ordinary text from external programs. The basic idea is to treat an external program very much like a file, and to support the same kinds of reading and writing operations.
Data Manipulation
read in a file run an external command, and read in the output it produces feed the textual form of expr to an external command run an external command, and read in a list of the numbers it produces
In general, wherever you might use an ordinary file name, Mathematica allows you instead to give a pipe, written as an external command, prefaced by an exclamation point. When you use the pipe, Mathematica will execute the external command, and send or receive text from it.
This sends the result from FactorInteger to the external program lpr. On many Unix systems, this program generates a printout.
In[1]:=
FactorInteger@2 ^ 31 - 1D >> !lpr This executes the external command echo $TERM, then reads the result as Mathematica input.
In[2]:=
Out[2]= xterm
With a text-based interface, putting ! at the beginning of a line causes the remainder of the line to be executed as an external command. squares is an external program which prints numbers and their squares.
In[1]:= !squares 4 1 2 3 4 1 4 9 16
This runs the external command squares 4, then reads numbers from the output it produces.
In[3]:= Out[3]=
One point to notice is that you can get away with dropping the double quotes around the name of a pipe on the right-hand side of << or >> if the name does not contain any spaces or other special characters. Pipes in Mathematica provide a very general mechanism for unstructured communication with external programs. On many computer systems, Mathematica pipes are implemented using pipe mechanisms in the underlying operating system; in some cases, however, other interprocess communication mechanisms are used. One restriction of unstructured communication in
10
Data Manipulation
Pipes in Mathematica provide a very general mechanism for unstructured communication with external programs. On many computer systems, Mathematica pipes are implemented using pipe mechanisms in the underlying operating system; in some cases, however, other interprocess communication mechanisms are used. One restriction of unstructured communication in Mathematica is that a given pipe can only be used for input or for output, and not for both at the same time. In order to do genuine two-way communication, you need to use MathLink. Even with unstructured communication, you can nevertheless set up somewhat more complicated arrangements by using "temporary files". The basic idea is to write data to a file, then to read it as needed.
open a new file with a unique name in the default area for temporary files on your computer system
OpenWrite@D
Opening a "temporary file".
Particularly when you work with temporary files, you may find it useful to be able to execute external commands which do not explicitly send or receive data from Mathematica. You can do this using the Mathematica function Run. Run@"command ",arg1 ,D
run an external command from within Mathematica
Running external commands without input or output. This executes the external Unix command date. The returned value is an "exit code" from the operating system.
In[4]:= Out[4]=
Run@"date"D
0
Note that when you use Run, you must not preface commands with exclamation points. Run simply takes the textual forms of the arguments you specify, then joins them together with spaces in between, and executes the resulting string as an external shell command.
Data Manipulation
11
It is important to realize that Run never "captures" any of the output from an external command. As a result, where this output goes is purely determined by your operating system. Similarly, Run does not supply input to external commands. This means that the commands can get input through any mechanism provided by your operating system. Sometimes external commands may be able to access the same input and output streams that are used by Mathematica itself. In some cases, this may be what you want. But particularly if you are using Mathematica with a front end, this can cause considerable trouble.
run command , using expr as input, and reading the output back into Mathematica
As discussed above, << and >> cannot be used to both send and receive data from an external program at the same time. Nevertheless, by using temporary files, you can effectively both send and receive data from an external program while still using unstructured communication. The function RunThrough writes the text of an expression to a temporary file, then feeds this file as input to an external program, and captures the output as input to Mathematica. Note that in RunThrough , like Run, you should not preface the names of external commands with exclamation points.
This feeds the expression 789 to the external program cat, which in this case simply echoes the text of the expression. The output from cat is then read back into Mathematica.
In[5]:= Out[5]=
RunThrough@"cat", 789D
789
SystemOpen @"target"D
Opening files with external programs.
opens the specified file, URL or other target with the associated program on your computer system
This opens the URL using your system's preferred web browser.
In[6]:=
SystemOpen@"http: www.wolfram.com"D
SystemOpen uses settings in your operating system to determine how to open a URI or file. When opening files, it typically uses the same program that would be used if you double-clicked the file's icon.
12
Data Manipulation
When you open a file or a pipe, Mathematica creates a "stream object" that specifies the open stream associated with the file or pipe. In general, the stream object contains the name of the file or the external command used in a pipe, together with a unique number. The reason that the stream object needs to include a unique number is that in general you can have several streams connected to the same file or external program at the same time. For example, you may start several different instances of the same external program, each connected to a different stream. Nevertheless, when you have opened a stream, you can still refer to it using a simple file name or external command name so long as there is only one stream associated with this object.
This opens an output stream to the file tmp.
Data Manipulation
13
stmp = OpenWrite@"tmp"D
Write@stmp, a, b, cD Since you only have one stream associated with file tmp, you can refer to it simply by giving the name of the file.
In[3]:=
In[4]:=
Close@stmpD
Out[4]= tmp
FilePrint@"tmp"D
abc x
OpenWrite@" file"D OpenWrite@D OpenAppend @" file"D OpenWrite@"!command "D Write @stream,expr1 ,expr2 ,D WriteString @stream,str1 ,str2 ,D Close @streamD
Low-level output functions.
open an output stream to a file, wiping out the previous contents of the file open an output stream to a new temporary file open an output stream to a file, appending to what was already in the file open an output stream to an external command write a sequence of expressions to a stream, ending the output with a newline (line feed) write a sequence of character strings to a stream, with no extra newlines tell Mathematica that you are finished with a stream
When you call Write @stream, exprD, it writes an expression to the specified stream. The default is to write the expression in Mathematica input form. If you call Write with a sequence of expressions, it will write these expressions one after another to the stream. In general, it leaves no space between the successive expressions. However, when it has finished writing all the expressions, Write always ends its output with a newline.
This reopens the file tmp.
14
Data Manipulation
stmp = OpenWrite@"tmp"D
This writes a sequence of expressions to the file, then closes the file.
In[7]:=
Out[7]= tmp
All the expressions are written in input form. The expressions from a single Write are put on the same line.
In[8]:=
FilePrint@"tmp"D
stmp = OpenWrite@"tmp"D
WriteString@stmp, "Arbitrary output.\n", "More output."D This writes another string, then closes the stream.
In[11]:=
Out[11]= tmp
Here are the contents of the file. The strings were written exactly as specified, including only the newlines that were explicitly given.
In[12]:=
FilePrint@"tmp"D
Data Manipulation
15
An important feature of the functions Write and WriteString is that they allow you to write output not just to a single stream, but also to a list of streams. In using Mathematica, it is often convenient to define a channel which consists of a list of streams. You can then simply tell Mathematica to write to the channel, and have it automatically write the same object to several streams. In a standard interactive Mathematica session, there are several output channels that are usually defined. These specify where particular kinds of output should be sent. Thus, for example, $Output specifies where standard output should go, while $Messages specifies where messages should go. The function Print then works essentially by calling Write with the $Output channel. Message works in the same way by calling Write with the $Messages channel. "The Main Loop" lists the channels used in a typical Mathematica session. Note that when you run Mathematica through MathLink, a different approach is usually used. All output is typically written to a single MathLink link, but each piece of output appears in a packet which indicates what type it is. In most cases, the names of files or external commands that you use in Mathematica correspond exactly with those used by your computers operating system. On some systems, however, Mathematica supports various streams with special names.
standard output standard error
"stdout" "stderr"
The special stream "stdout" allows you to give output to the standard output provided by the operating system. Note however that you can use this stream only with simple text-based interfaces to Mathematica. If your interaction with Mathematica is more complicated, then this stream will not work, and trying to use it may cause considerable trouble.
16
Data Manipulation
option name
default value
the default output format to use the width of the page in characters whether to include ` marks in approximate numbers encoding to be used for special characters
You can associate a number of options with output streams. You can specify these options when you first open a stream using OpenWrite or OpenAppend .
This opens a stream, specifying that the default output format used should be OutputForm .
In[13]:=
Out[14]= tmp
FilePrint@"tmp"D
2 x + y
2 z
Note that you can always override the output format specified for a particular stream by wrapping a particular expression you write to the stream with an explicit Mathematica format directive, such as OutputForm or TeXForm . The option PageWidth gives the width of the page available for textual output from Mathematica. All lines of output are broken so that they fit in this width. If you do not want any lines to
Data Manipulation
17
be broken, you can set PageWidth -> Infinity . Usually, however, you will want to set PageWidth to the value appropriate for your particular output device. On many systems, you will have to run an external program to find out what this value is. Using SetOptions , you can make the default rule for PageWidth be, for example, PageWidth :> << "!devicewidth", so that an external program is run automatically to find the value of the option.
This opens a stream, specifying that the page width is 20 characters.
In[16]:=
Out[17]= tmp
The lines in the expression written out are all broken so as to be at most 20 characters long.
In[18]:=
FilePrint@"tmp"D
stmp = OpenWrite@"tmp"D
18
Data Manipulation
Options shows the options you have set for the open stream.
In[21]:=
Options@stmpD
TotalWidth , TotalHeight , CharacterEncoding Automatic, NumberMarks $NumberMarks<
Close@stmpD
Out[22]= tmp
find the options set for all streams in the channel $Output
At every point in your session, Mathematica maintains a list Streams @D of all the input and output streams that are currently open, together with their options. In some cases, you may find it useful to look at this list directly. Mathematica will not, however, allow you to modify the list, except indirectly through OpenRead and so on.
Data Manipulation
19
At any given time, however, you have a current working directory, and you can refer to files or other directories by specifying where they are relative to this directory. Typically you can refer to files or directories that are actually in this directory simply by giving their names, with no directory information.
your current working directory set your current working directory revert to your previous working directory
Directory@D
/users/sw
SetDirectory@"Examples"D
/users/sw/Examples
Directory@D
/users/sw/Examples
ResetDirectory@D
/users/sw
When you call SetDirectory , you can give any directory name that is recognized by your operating system. Thus, for example, on Unix-based systems, you can specify a directory one level up in the directory hierarchy using the notation .., and you can specify your "home" directory as ~. Whenever you go to a new directory using SetDirectory , Mathematica always remembers what the previous directory was. You can return to this previous directory using ResetDirectory. In general, Mathematica maintains a stack of directories, given by DirectoryStack@D. Every time you call SetDirectory , it adds a new directory to the stack, and every time you call ResetDirectory it removes a directory from the stack.
20
Data Manipulation
the parent of your current working directory the initial directory when Mathematica was started your home directory, if this is defined the base directory for systemwide files to be loaded by Mathematica the base directory for user-specific files to be loaded by Mathematica the top-level directory in which your Mathematica installation resides
Finding a File
Whenever you ask for a particular file, Mathematica in general goes through several steps to try and find the file you want. The first step is to use whatever standard mechanisms exist in your operating system or shell. Mathematica scans the full name you give for a file, and looks to see whether it contains any of the "metacharacters" *, $, ~, ?, @, ", and '. If it finds such characters, then it passes the full name to your operating system or shell for interpretation. This means that if you are using a Unix-based system, then constructions like name * and $VAR will be expanded at this point. But in general, Mathematica takes whatever was returned by your operating system or shell, and treats this as the full file name. For output files, this is the end of the processing that Mathematica does. If Mathematica cannot find a unique file with the name you specified, then it will proceed to create the file. If you are trying to get input from a file, however, then there is another round of processing that Mathematica does. What happens is that Mathematica looks at the value of the Path option for the function you are using to determine the names of directories relative to which it should search for the file. The default setting for the Path option is the global variable $Path .
$Path
Search path for files.
In general, the global variable $Path is defined to be a list of strings, with each string representing a directory. Every time you ask for an input file, what Mathematica effectively does is
Data Manipulation
21
In general, the global variable $Path is defined to be a list of strings, with each string representing a directory. Every time you ask for an input file, what Mathematica effectively does is temporarily to make each of these directories in turn your current working directory, and then from that directory to try and find the file you have requested.
Here is a typical setting for $Path . The current directory (.) and your home directory (~) are listed first.
In[5]:= Out[5]=
$Path
{., ~, /users/math/bin, /users/math/Packages}
find the file with the specified name that would be loaded by Get and related functions determine whether the file exists
FindFile searches all directories in $Path and returns the absolute name of the file that would be loaded by Get, Needs , and other functions. FileExistsQ tests whether the file with the given name exists.
In[5]:= Out[5]=
FindFile@"init.m"D
"C:\\Documents and Settings\\sw\\Application Data\\Mathematica\\Kernel\\init.m"
FindFile applied to a package name returns the absolute name of the init.m file from that package.
In[5]:= Out[5]=
FindFile@"Combinatorica`"D
"C:\\Program Files\\Wolfram Research\\Mathematica\\7.0\\AddOns\\Packages\\Combinatorica\\Kernel\\init.m"
22
Data Manipulation
FileNamesA forms,$Path,Infinity E
give all files whose names match forms in any subdirectory of the directories in $Path
Getting lists of files in particular directories.
FileNames returns a list of strings corresponding to file names. When it returns a file that is not in your current directory, it gives the name of the file relative to the current directory. Note that all names are given in the format appropriate for the particular computer system on which they were generated.
Here is a list of all files in the current working directory whose names end with .m.
In[6]:= Out[6]=
FileNames@"*.m"D
{alpha.m, control.m, signals.m, test.m}
This lists files whose names start with a in the current directory, and in subdirectories with names that start with P.
In[7]:= Out[7]=
The file name form you give to FileNames can use any of Mathematica's string pattern objects, typically combined with the ~~ operator.
Data Manipulation
23
This gives a list of all files in your current working directory whose names match the form Test * .m.
In[3]:= Out[3]=
FileNames@"Test*.m"D
{Test1.m, Test2.m, TestFinal.m}
This lists only those files with names of the form Test d .m, where d is a sequence of one or more digits.
In[3]:= Out[3]=
Composing a Filename
DirectoryName @" file"D ToFileName @"directory","name"D ParentDirectory @"directory"D
extract the directory name from a file name assemble a full file name from a directory name and a file name give the parent of a directory assemble a full file name from a hierarchy of directory names
You should realize that different computer systems may give file names in different ways. Thus, for example, Windows systems typically give names in the form dir : dir dir name and Unix systems give names in the form dir dir name. The function ToFileName assembles file names in the appropriate way for the particular computer system you are using.
This gives the directory portion of the file name.
In[8]:= Out[8]=
DirectoryName@"PackagesMathtest.m"D
PackagesMath
This constructs the full name of another file in the same directory as test.m.
In[9]:= Out[9]=
ToFileName@%, "abc.m"D
PackagesMathabc.m
24
Data Manipulation
FileNameSplit @"name"D FileNameJoin @8dir1 ,<D FileNameTake @"name",D FileNameDrop @"name",D FileNameDepth @"name"D $PathnameSeparator
Manipulating file names.
split the file name into a list of directory and file names combine a list of directory and file names into the file name extract part of the file name drop parts of the file name get the number of path elements in the file name path name separator used in your operating system
Functions like FileNameSplit and FileNameJoin provide additional operations on file names. They respect the file name separator used by your operating system and will split the file name appropriately. FileNameJoin will by default use the $PathnameSeparator to produce the name in a canonical form suitable for your operating system. If you want to set up a collection of related files, it is often convenient to be able to refer to one file when you are reading another one. The global variable $Input gives the name of the file from which input is currently being taken. Using DirectoryName and ToFileName you can then conveniently specify the names of other related files.
the name of the file or stream from which input is currently being taken
$Input
One issue in handling files in Mathematica is that the form of file and directory names varies between computer systems. This means for example that names of files which contain standard Mathematica packages may be quite different on different systems. Through a sequence of conventions, it is however possible to read in a standard Mathematica package with the same command on all systems. The way this works is that each package defines a so-called Mathematica context, of the form name`name`. On each system, all files are named in correspondence with the contexts they define. Then when you use the command << name`name` Mathematica automatically translates the context name into the file name appropriate for your particular computer system.
Data Manipulation
25
Mathematica expression file in plain text format Mathematica notebook file Mathematica definitions in DumpSave format
If you use a notebook interface to Mathematica, then the Mathematica front end allows you to save complete notebooks, including not only Mathematica input and output, but also text, graphics and other material. It is conventional to give Mathematica notebook files names that end in .nb, and most versions of Mathematica enforce this convention.
the parent of your current working directory the initial directory when Mathematica was started
You can use FileBaseName and FileExtension to extract the name of the file and its extension. When you open a notebook in the Mathematica front end, Mathematica will immediately display the contents of the notebook, but it will not normally send any of these contents to the kernel for evaluation until you explicitly request this to be done. Within a Mathematica notebook, however, you can use the Cell menu in the front end to identify certain cells as initialization cells, and if you do this, then the contents of these cells will automatically be evaluated whenever you open the notebook.
The I in the cell bracket indicates that the second cell is an initialization cell that will be evaluated whenever the notebook is opened.
It is sometimes convenient to maintain Mathematica material both in a notebook which contains explanatory text, and in a package which contains only raw Mathematica definitions. You can do this by putting the Mathematica definitions into initialization cells in the notebook. Every time you save the notebook, the front end will then allow you to save an associated .m file which contains only the raw Mathematica definitions.
26
Data Manipulation
It is sometimes convenient to maintain Mathematica material both in a notebook which contains explanatory text, and in a package which contains only raw Mathematica definitions. You can do this by putting the Mathematica definitions into initialization cells in the notebook. Every time you save the notebook, the front end will then allow you to save an associated .m file which contains only the raw Mathematica definitions.
<<context`
Using contexts to specify files.
This reads in one of the standard packages that come with Mathematica.
In[1]:=
<< VectorAnalysis`
file in DumpSave format file in DumpSave format for your computer system file in Mathematica source format initialization file for a particular directory files in other directories specified by $Path
Mathematica is set up so that << name` will automatically try to load the appropriate version of a file. It will first try to load a name.mx file that is optimized for your particular computer system. If it finds no such file, then it will try to load a name.m file containing ordinary system-independent Mathematica input. If name is a directory, then Mathematica will try to load the initialization file init.m in that directory. The purpose of the init.m file is to provide a convenient way to set up Mathematica packages that involve many separate files. The idea is to allow you to give just the command << name`, but then to load init.m to initialize the whole package, reading in whatever other
Data Manipulation
27
If name is a directory, then Mathematica will try to load the initialization file init.m in that directory. The purpose of the init.m file is to provide a convenient way to set up Mathematica packages that involve many separate files. The idea is to allow you to give just the command << name`, but then to load init.m to initialize the whole package, reading in whatever other files are necessary.
copy file1 to file2 give file1 the name file2 delete a file give the number of bytes in a file give the modification date for a file set the modification date for a file to be the current date give the type of a file as File, Directory or None
Different operating systems have different commands for manipulating files. Mathematica provides a simple set of file manipulation functions, intended to work in the same way under all operating systems. Notice that CopyFile and RenameFile give the final file the same modification date as the original one. FileDate returns modification dates in the 8year, month, day, hour, minute, second < format used by DateList .
create a new directory delete an empty directory delete a directory and all files and directories it contains
CreateDirectory @"name"D DeleteDirectory @"name"D DeleteDirectory A"name", DeleteContents->True E CopyDirectory @"name1 ","name2 "D
Functions for manipulating directories.
28
Data Manipulation
FilePrint@"ExampleDatanumbers"D
11.1 44.4
22.2 55.5
33.3 66.6
This reads all the numbers in the file, and returns a list of them.
In[2]:=
ReadList@"ExampleDatanumbers", NumberD
ReadList A" file",9Number,Number=E read numbers from a file, putting each successive pair into
a separate list
Data Manipulation
29
ReadList can handle numbers which are given in Fortran-like "E " notation. Thus, for example, ReadList will read 2.5 E + 5 as 2.5 105 . Note that ReadList can handle numbers with any number of digits of precision.
Here is a file containing numbers in Fortran-like "E " notation.
In[5]:=
FilePrint@"ExampleDatabignum"D
4.5E-5 2.5E2
7.8E4 -8.9
ReadList@"ExampleDatabignum", NumberD
ReadList can read not only numbers, but also a variety of other types of object. Each type of object is specified by a symbol such as Number.
Here is a file containing text.
In[7]:=
FilePrint@"ExampleDatastrings"D
ReadList@"ExampleDatastrings", CharacterD
, A, n, d, < , i, s, , t, e, x, t, ., , , m, o, r, e, , t, e, x, t, .,
Out[8]= 8H, e, r, e,
Here are the integer codes corresponding to each of the bytes in the file.
In[9]:=
ReadList@"ExampleDatastrings", ByteD
10, 65, 110, 100, 32, 109, 111, 114, 101, 32, 116, 101, 120, 116, 46, 10<
Out[9]= 872, 101, 114, 101, 32, 105, 115, 32, 116, 101, 120, 116, 46, 32,
This puts the data from each line in the file into a separate list.
30
Data Manipulation
This puts the data from each line in the file into a separate list.
In[10]:=
Out[10]= 8872, 101, 114, 101, 32, 105, 115, 32, 116, 101, 120, 116, 46, 32<,
Byte Character Real Number Word Record String Expression Hold AExpression E
Types of objects to read.
single byte of data, returned as an integer single character, returned as a one-character string approximate number in Fortran-like notation exact or approximate number in Fortran-like notation sequence of characters delimited by word separators sequence of characters delimited by record separators string terminated by a newline complete Mathematica expression complete Mathematica expression, returned inside Hold
ReadList@"ExampleDatastrings", WordD
ReadList allows you to read words from a file. It considers a word to be any sequence of characters delimited by word separators. You can set the option WordSeparators to specify the strings you want to treat as word separators. The default is to include spaces and tabs, but not to include, for example, standard punctuation characters. Note that in all cases successive words can be separated by any number of word separators. These separators are never taken to be part of the actual words returned by ReadList .
option name default value
whether to make a separate list for the objects in each record separators for records separators for words whether to keep zero-length records whether to keep zero-length words words to take as tokens
This reads the text in the file strings as a sequence of words, using the letter e and . as word separators.
Data Manipulation
31
This reads the text in the file strings as a sequence of words, using the letter e and . as word separators.
In[12]:=
Out[12]= 8H, r,
Mathematica considers any data file to consist of a sequence of records. By default, each line is considered to be a separate record. In general, you can set the option RecordSeparators to give a list of separators for records. Note that words can never cross record separators. As with word separators, any number of record separators can exist between successive records, and these separators are not considered to be part of the records themselves.
By default, each line of the file is considered to be a record.
In[13]:=
FilePrint@"ExampleDatasentences"D
32
Data Manipulation
FilePrint@"ExampleDatasource"D
This gives a list of the parts of the file that lie between H : and : L separators.
In[19]:=
By choosing appropriate separators, you can pick out specific parts of files.
In[20]:=
ReadList@"ExampleDatasource", Record, RecordSeparators -> 88"H: function ", "@"<, 8" :L", "D"<<D
Mathematica usually allows any number of appropriate separators to appear between successive records or words. Sometimes, however, when several separators are present, you may want to assume that a null record or null word appears between each pair of adjacent separators. You can do this by setting the options NullRecords -> True or NullWords -> True .
Here is a file containing words separated by colons.
In[21]:=
FilePrint@"ExampleData words"D
first:second::fourth:::seventh
Here the repeated colons are treated as single separators.
In[22]:=
In most cases, you want words to be delimited by separators which are not themselves considered as words. Sometimes, however, it is convenient to allow words to be delimited by special token words, which are themselves words. You can give a list of such token words as a setting for the option TokenWords .
Data Manipulation
33
In most cases, you want words to be delimited by separators which are not themselves considered as words. Sometimes, however, it is convenient to allow words to be delimited by special token words, which are themselves words. You can give a list of such token words as a setting for the option TokenWords .
Here is some text.
In[24]:=
FilePrint@"ExampleDatalanguage"D
22*a*b+56*c+13*a*d
This reads the text, using the specified token words to delimit words in the text.
In[25]:=
You can use ReadList to read Mathematica expressions from files. In general, each expression must end with a newline, although a single expression may go on for several lines.
Here is a file containing text that can be used as Mathematica input.
In[26]:=
FilePrint@"ExampleDataexprs"D
x + y + z 2^8
This reads the text in exprs as Mathematica expressions.
In[27]:=
ReadList@"ExampleDataexprs", ExpressionD
Out[27]= 8x + y + z, 256<
ReadList@"ExampleDataexprs", Hold@ExpressionDD
ReadList can insert the objects it reads into any Mathematica expression. The second argument to ReadList can consist of any expression containing symbols such as Number and Word specifying objects to read. Thus, for example, ReadList @" file", 8Number, Number<D inserts successive pairs of numbers that it reads into lists. Similarly, ReadList @" file", Hold @Expression DD puts expressions that it reads inside Hold . If ReadList reaches the end of your file before it has finished reading a particular set of objects you have asked for, then it inserts the special symbol EndOfFile in place of the objects it has not yet read.
34
Data Manipulation
If ReadList reaches the end of your file before it has finished reading a particular set of objects you have asked for, then it inserts the special symbol EndOfFile in place of the objects it has not yet read.
Here is a file of numbers.
In[29]:=
FilePrint@"ExampleDatanumbers"D
11.1 44.4
22.2 55.5
33.3 66.6
The symbol EndOfFile appears in place of numbers that were needed after the end of the file was reached.
In[30]:=
execute a command, and read its output read any input stream
This executes the Unix command date, and reads its output as a string.
In[31]:= Out[31]=
ReadList@"!date", StringD
8Thu Mar 31 19:20:36 CST 2005<
OpenRead @" file"D OpenRead @"!command "D Read @stream,typeD Skip @stream,typeD Skip @stream,type,nD Close @streamD
Functions for reading from input streams.
open a file for reading open a pipe for reading read an object of the specified type from a stream skip over an object of the specified type in an input stream skip over n objects of the specified type in an input stream close an input stream
ReadList allows you to read all the data in a particular file or input stream. Sometimes, however, you want to get data a piece at a time, perhaps doing tests to find out what kind of data to expect next. When you read individual pieces of data from a file, Mathematica always remembers the current point that you are at in the file. When you call OpenRead , Mathematica sets up an input stream from a file, and makes your current point the beginning of the file. Every time you read an object from the file using Read , Mathematica sets your current point to be just after the object you have read. Using Skip , you can advance the current point past a sequence of
Data Manipulation
35
When you read individual pieces of data from a file, Mathematica always remembers the current point that you are at in the file. When you call OpenRead , Mathematica sets up an input stream from a file, and makes your current point the beginning of the file. Every time you read an object from the file using Read , Mathematica sets your current point to be just after the object you have read. Using Skip , you can advance the current point past a sequence of objects without actually reading the objects.
Here is a file of numbers.
In[32]:=
FilePrint@"ExampleDatanumbers"D
11.1 44.4
22.2 55.5
33.3 66.6
snum = OpenRead@"ExampleDatanumbers"D
Read@snum, NumberD
Out[34]= 11.1
In[37]:=
ReadList@snum, NumberD
Close@snumD
Out[38]= ExampleDatanumbers
You can use the options WordSeparators and RecordSeparators in Read and Skip just as you do in ReadList .
36
Data Manipulation
You can use the options WordSeparators and RecordSeparators in Read and Skip just as you do in ReadList . Note that if you try to read past the end of file, Read returns the symbol EndOfFile.
Searching Files
FindList @" file","text"D FindList @" file","text",nD FindList @" file", 8"text1 ","text2 ",<D
Finding lines that contain specified text. Here is a file containing some text.
In[1]:=
get a list of all the lines in the file that contain the specified text get a list of the first n lines that contain the specified text get lines that contain any of the texti
FilePrint@"ExampleDatatextfile"D
Here is the first line of text. And the second. And the third. Here is the end.
This returns a list of all the lines in the file containing the text is.
In[2]:=
FindList@"ExampleDatatextfile", "is"D
Out[2]= 8Here is the first line of text., And the third. Here is the end.<
FindList@"ExampleDatatextfile", "fourth"D
Out[3]= 8<
By default, FindList scans successive lines of a file, and returns those lines which contain the text you specify. In general, however, you can get FindList to scan successive records, and return complete records which contain specified text. As in ReadList , the option RecordSeparators allows you to tell Mathematica what strings you want to consider as record separators. Note that by giving a pair of lists as the setting for RecordSeparators , you can specify different left and right separators. By doing this, you can make FindList search only for text which is between specific pairs of separators.
Data Manipulation
37
This finds all sentences ending with a period which contain And.
In[4]:=
Out[4]= 8
option name
default value
separators for records whether to require the text searched for to be at the beginning of a record separators for words whether to require that the text searched for appear as a word whether to treat lowercase and uppercase letters as equivalent
This finds only the occurrence of Here which is at the beginning of a line in the file.
In[5]:=
In general, FindList finds text that appears anywhere inside a record. By setting the option WordSearch -> True , however, you can tell FindList to require that the text it is looking for appears as a separate word in the record. The option WordSeparators specifies the list of separators for words.
The text th does appear in the file, but not as a word. As a result, the FindList fails.
In[6]:=
Out[6]= 8<
Out[7]= 8And the third. Here is the end., And the third. Here is the end.<
It is often useful to call FindList on lists of files generated by functions such as FileNames.
38
Data Manipulation
It is often useful to call FindList on lists of files generated by functions such as FileNames.
Finding text in the output from an external program. This runs the external Unix command date in a text-based interface.
In[8]:=
! date
OpenRead @" file"D OpenRead @"!command "D Find @stream,textD Close @streamD
Finding successive occurrences of text.
open a file for reading open a pipe for reading find the next occurrence of text close an input stream
FindList works by making one pass through a particular file, looking for occurrences of the text you specify. Sometimes, however, you may want to search incrementally for successive occurrences of a piece of text. You can do this using Find . In order to use Find , you first explicitly have to open an input stream using OpenRead . Then, every time you call Find on this stream, it will search for the text you specify, and make the current point in the file be just after the record it finds. As a result, you can call Find several times to find successive pieces of text.
This opens an input stream for textfile.
In[10]:=
stext = OpenRead@"ExampleDatatextfile"D
Find@stext, "And"D
Calling Find again gives you the next line containing And.
Data Manipulation
39
Calling Find again gives you the next line containing And.
In[12]:=
Find@stext, "And"D
Close@stextD
Out[13]= ExampleDatatextfile
Once you have an input stream, you can mix calls to Find , Skip and Read . If you ever call FindList or ReadList , Mathematica will immediately read to the end of the input stream.
This opens the input stream.
In[14]:=
stext = OpenRead@"ExampleDatatextfile"D
This finds the first line which contains second, and leaves the current point in the file at the beginning of the next line.
In[15]:=
Find@stext, "second"D
Read can then read the word that appears at the beginning of the line.
In[16]:=
Read@stext, WordD
Out[16]= And
Skip@stext, Word, 3D Mathematica finds is in the remaining text, and prints the entire record as output.
In[18]:=
Find@stext, "is"D
Close@stextD
Out[19]= ExampleDatatextfile
40
Data Manipulation
find the position of the current point in an open stream set the position of the current point set the current point to the beginning of a stream
SetStreamPositionAstream,Infinity E
set the current point to the end of a stream
Finding and setting the current point in a stream.
Functions like Read , Skip and Find usually operate on streams in an entirely sequential fashion. Each time one of the functions is called, the current point in the stream moves on. Sometimes, you may need to know where the current point in a stream is, and be able to reset it. On most computer systems, StreamPosition returns the position of the current point as an integer giving the number of bytes from the beginning of the stream.
This opens the stream.
In[20]:=
stext = OpenRead@"ExampleDatatextfile"D
When you first open the file, the current point is at the beginning, and StreamPosition returns 0.
In[21]:=
StreamPosition@stextD
Out[21]= 0
Read@stext, RecordD
StreamPosition@stextD
Out[23]= 31
SetStreamPosition@stext, 5D
Out[24]= 5
Data Manipulation
41
Read@stext, RecordD
Close@stextD
Out[26]= ExampleDatatextfile
Read@str, WordD
Out[2]= A
ReadList@str, WordD
42
Data Manipulation
Close@strD
Out[4]= String
Input streams associated with strings work just like those with files. At any given time, there is a current position in the stream, which advances when you use functions like Read . The current position is given as the number of characters from the beginning of the string by the function StreamPosition@streamD. You can explicitly set the current position using SetStreamPosition@stream, nD.
Here is an input stream associated with a string.
In[5]:=
The current position is initially 0 characters from the beginning of the string.
In[6]:=
StreamPosition@strD
Out[6]= 0
Read@str, NumberD
Out[7]= 123
The current position is now 3 characters from the beginning of the string.
In[8]:=
StreamPosition@strD
Out[8]= 3
This sets the current position to be 1 character from the beginning of the string.
In[9]:=
SetStreamPosition@str, 1D
Out[9]= 1
If you now read a number from the string, you get the 23 part of 123.
In[10]:=
Read@str, NumberD
Out[10]= 23
Data Manipulation
43
SetStreamPosition@str, InfinityD
Out[11]= 11
If you now try to read from the stream, you will always get EndOfFile.
In[12]:=
Read@str, NumberD
Out[12]= EndOfFile
Close@strD
Out[13]= String
Particularly when you are processing large volumes of textual data, it is common to read fairly long strings into Mathematica, then to use StringToStream to allow further processing of these strings within Mathematica. Once you have created an input stream using StringToStream , you can read and search the string using any of the functions discussed for files.
This puts the whole contents of textfile into a string.
In[14]:=
str = StringToStream@sD
This gives the lines of text in the string that contain is.
In[16]:=
FindList@str, "is"D
Out[16]= 8Here is the first line of text., And the third. Here is the end.<
This resets the current position back to the beginning of the string.
In[17]:=
SetStreamPosition@str, 0D
Out[17]= 0
44
Data Manipulation
This finds the first occurrence of the in the string, and leaves the current point just after it.
In[18]:=
Out[18]= the
Read@str, WordD
Out[19]= first
Close@strD
Out[20]= String
Binary Files
Functions like Read and Write handle ordinary printable text. But in dealing with external data files or devices it is sometimes necessary to go to a lower level, and work directly with raw binary data. You can do this using BinaryRead and BinaryWrite .
read one byte read an object of the specified type read a list of objects write one byte write a sequence of bytes write the characters in a string write an object of the specified type write a sequence of objects
BinaryRead @streamD BinaryRead @stream,typeD BinaryRead @stream,8type1 ,type2 ,<D BinaryWrite @stream,bD BinaryWrite @stream,8b1 ,b2 ,<D BinaryWrite @stream,"string"D BinaryWrite @stream,x,typeD BinaryWrite @ stream,8x1 ,x2 ,<,typeD
Data Manipulation
45
"Byte" "Character8" "Character16" "Complex64" "Complex128" "Complex256" "Integer8" "Integer16" "Integer32" "Integer64" "Integer128" "Real32" "Real64" "Real128" "TerminatedString" "UnsignedInteger8" "UnsignedInteger16" "UnsignedInteger32" "UnsignedInteger64" "UnsignedInteger128"
8-bit unsigned integer 8-bit character 16-bit character IEEE single-precision complex number IEEE double-precision complex number IEEE quad-precision complex number 8-bit signed integer 16-bit signed integer 32-bit signed integer 64-bit signed integer 128-bit signed integer IEEE single-precision real number IEEE double-precision real number IEEE quad-precision real number null-terminated string of 8-bit characters 8-bit unsigned integer 16-bit unsigned integer 32-bit unsigned integer 64-bit unsigned integer 128-bit unsigned integer
Types supported in BinaryRead and BinaryWrite . This writes a sequence of bytes to a file.
In[1]:=
Out[1]= tmp
BinaryWrite automatically opens a stream for the file. This closes it.
In[2]:=
Close@"tmp"D; This reads the first byte from the file, returning it as an integer.
In[3]:=
BinaryRead@"tmp"D
Out[3]= 97
46
Data Manipulation
BinaryRead@"tmp", "Character8"D
Out[4]= b
BinaryRead@"tmp", "Integer32"D
Out[5]= EndOfFile
Like Read and Write , BinaryRead and BinaryWrite work with streams. But if you give a file name, they automatically open the specified file as a stream. To create a stream directly you can use OpenRead or is OpenWrite. required On some computer to be systems, used the option setting and for any stream with
BinaryRead
BinaryWrite , in order to prevent possible corruption from such issues as newline translation. In using Mathematica you are normally completely insulated from the raw representation of data inside your computer. But with BinaryRead and BinaryWrite this is no longer so. One of the subtleties that then arises is that different computers may take the bytes that make up numbers to be in different orders, as specified by their setting for $ByteOrdering .
This writes a 32-bit integer to a file.
In[6]:=
Out[6]= tmp2
Close@"tmp2"D; This reads the integer back, but assumes an opposite byte ordering.
In[8]:=
read all the bytes in a file read all the data, treating it as objects of a certain type
BinaryReadList@" file",types,nD
Reading complete binary files.
Data Manipulation
47
Out[9]= tmp3
BinaryReadList@"tmp3", "Byte"D
Out[10]= 80, 0, 0, 0, 0, 0, 0, 224, 89, 187, 237, 66, 115, 107, 1, 64<
BinaryReadList@"tmp3", "Real32"D
This treats the data as pairs containing a byte and a 32-bit real.
In[12]:=
-38 Out[12]= 980, 0.<, 80, -0.00332451<, 9237, 4.32454 10 =, 864, EndOfFile<=
BinaryRead and BinaryWrite allow complete flexibility in reading and writing raw binary data. But in many practical applications one instead wants to work only with particular predefined formats. You can do this using Import and Export. In addition to many complex formats, Import and Export support files containing sequences of identical data elements, of the same types as in BinaryRead and BinaryWrite . They also support the "Bit" format, consisting of individual binary bits, represented as 0 or 1.
48
Data Manipulation
Expand@H1 + x + yL ^ 2D
1 + 2 x + x2 + 2 y + 2 x y + y2
FortranForm@%D
1 + 2*x + x**2 + 2*y + 2*x*y + y**2
Out[2]//FortranForm=
Here is the same expression in C form. Macros for objects like Power are defined in the C header file mdefs.h that comes with most versions of Mathematica.
In[3]:=
CForm@%D
1 + 2*x + Power(x,2) + 2*y + 2*x*y + Power(y,2)
Out[3]//CForm=
You should realize that there are many differences between Mathematica and C or Fortran. As a result, expressions you translate may not work exactly the same as they do in Mathematica. In addition, there are so many differences in programming constructs that no attempt is made to translate these automatically.
compile an expression into efficient internal code
Compile @x,exprD
A way to compile Mathematica expressions.
One of the common motivations for converting Mathematica expressions into C or Fortran is to try to make them faster to evaluate numerically. But the single most important reason that C and Fortran can potentially be more efficient than Mathematica is that in these languages one always specifies up front what type each variable one uses will be~integer, real number, array, and so on. The Mathematica function Compile makes such assumptions within Mathematica, and generates highly efficient internal code. Usually this code runs not much if at all slower than custom C or Fortran.
Data Manipulation
49
splice Mathematica output into an external file named file.mx, putting the results in the file file.x splice Mathematica output into infile, sending the output to
outfile
The basic idea is to set up the definitions you need in a particular Mathematica session, then run Splice to use the definitions you have made to produce the appropriate output to insert into the external files. #include "mdefs.h" double f(x) double x; { double y; y = <* Integrate[Sin[x]^5, x] *> ; return(2*y - 1) ; }
A simple C program containing a Mathematica formula.
#include "mdefs.h" double f(x) double x; { double y; y = -5*Cos(x)/8 + 5*Cos(3*x)/48 - Cos(5*x)/80 ; return(2*y - 1) ; }
The C program after processing with Splice.
50
Data Manipulation
import a table of data from a file export list to a file as a table of data
Out[1]= out.dat
FilePrint@"out.dat"D
5.7 -1.2
4.3 7.8
Import@"out.dat", "Table"D
885.7, 4.3<, 8-1.2, 7.8<<
Import@" file", "Table" D will handle many kinds of tabular data, automatically deducing the details of the format whenever possible. Export@" file", list, "Table" D writes out data separated by spaces, with numbers given in C or Fortran-like form, as in 2.3 E5 and so on.
import data assuming a format deduced from the file name export data in a format deduced from the file name
Import@"name.ext"D Export@"name.ext",exprD
Importing and exporting general data.
"CSV", "TSV", "XLS" "HarwellBoeing", "MAT", "MTX" "DIF", "FITS", "HDF5", "MPS", "SDTS", etc.
Import and Export can handle not only tabular data, but also data corresponding to graphics, sounds, expressions and even whole documents. Import and Export can often deduce the
Data Manipulation
51
Import and Export can handle not only tabular data, but also data corresponding to graphics, sounds, expressions and even whole documents. Import and Export can often deduce the appropriate format for data simply by looking at the extension of the file name for the file in which the data is being stored. "Exporting Graphics and Sounds" and "Importing and Exporting Files" discuss in more detail how Import and Export work. Note that you can also use Import and Export to manipulate raw files of binary data.
This imports a graphic in JPEG format.
In[4]:=
Import@"ExampleDataturtle.jpg"D
Out[4]=
$ImportFormats $ExportFormats
import formats supported on your system export formats supported on your system
Importing and exporting lists and tables of data. This exports a list of data to the file out1.
In[1]:=
Out[1]= out1
52
Data Manipulation
FilePrint@"out1"D
Import@"out1", "List"D
If you want to use data purely within Mathematica, then the best way to keep it in a file is usually as a complete Mathematica expression, with all its structure preserved, as discussed in "Reading and Writing Mathematica Files: Files and Streams". But if you want to exchange data with other programs, it is often more convenient to have the data in a simple list or table format.
This exports a two-dimensional array of data.
In[4]:=
Out[4]= out2.dat
FilePrint@"out2.dat"D
5.6e12 3 5
7.2e12
Import@"out2.dat", "Table"D
If you have a file in which each line consists of a single number, then you can use Import@" file", "List"D to import the contents of the file as a list of numbers. If each line consists of a sequence of numbers separated by tabs or spaces, then Import@" file", "Table" D will yield a list of lists of numbers. If the file contains items that are not numbers, then these are returned as Mathematica strings.
Data Manipulation
53
Out[7]= out3.dat
FilePrint@"out3.dat"D
first second
3.4 7.8
Import@"out3.dat", "Table"D
InputForm@%D
Import@" file","List"D Import@" file","Table"D Import@" file","String"D Import@" file","Text"D Import@" file",8"Text","Lines"<D Import@" file",8"Text","Words"<D
Importing files in different formats.
treat each line as a separate numerical or other data item treat each element on each line as a separate numerical or other data item treat the whole file as a single character string treat the whole file as a single string of text treat each line as a string of text treat each separated word as a string of text
Out[11]= out4.txt
FilePrint@"out4.txt"D
54
Data Manipulation
export a sequence of graphics for an animation generate a string representation of exported graphics
Data Manipulation
55
"EPS" "PDF" "SVG" "PICT" "WMF" "TIFF" "GIF" "JPEG" "PNG" "BMP" "PCX" "XBM" "PBM" "PPM" "PGM" "PNM" "DICOM" "AVI"
Encapsulated PostScript (.eps) Adobe Acrobat portable document format (.pdf) Scalable Vector Graphics (.svg) Macintosh PICT Windows metafile format (.wmf) TIFF (.tif, .tiff) GIF and animated GIF (.gif) JPEG (.jpg, .jpeg) PNG format (.png) Microsoft bitmap format (.bmp) PCX format (.pcx) X window system bitmap (.xbm) portable bitmap format (.pbm) portable pixmap format (.ppm) portable graymap format (.pgm) portable anymap format (.pnm) DICOM medical imaging format (.dcm, .dic) Audio Video Interleave format (.avi)
Typical graphics formats supported by Mathematica. Formats in the first group are resolution independent. This generates a plot.
In[1]:=
Out[1]=
Export@"sinplot.eps", %D
Out[2]= sinplot.eps
When you export a graphic outside of Mathematica, you usually have to specify the absolute size at which the graphic should be rendered. You can do this using the ImageSize option to Export. ImageSize -> x makes the width of the graphic be x printers points; ImageSize -> 72 xi thus makes the width xi inches. The default is to produce an image that is four inches wide.
56
Data Manipulation
ImageSize -> x makes the width of the graphic be x printers points; ImageSize -> 72 xi thus makes the width xi inches. The default is to produce an image that is four inches wide. ImageSize -> 8x, y< scales the graphic so that it fits in an xy region.
absolute image size in printers points how the image is oriented in the file resolution in dpi for the image
Within Mathematica, graphics are manipulated in a way that is completely independent of the resolution of the computer screen or other output device on which the graphics will eventually be rendered. Many programs and devices accept graphics in resolution-independent formats such as Encapsulated PostScript (EPS). But some require that the graphics be converted to rasters or bitmaps with a specific resolution. The ImageResolution option for Export allows you to determine what resolution in dots per inch (dpi) should be used. The lower you set this resolution, the lower the quality of the image you will get, but also the less memory the image will take to store. For screen display, typical resolutions are 72 dpi and above; for printers, 300 dpi and above.
AutoCAD drawing interchange format (.dxf) STL stereolithography format (.stl)
"DXF" "STL"
Data Manipulation
57
TeXForm @exprD
Mathematica output for TeX.
Hx + yL ^ 2 Sqrt@x yD
Hx + yL2 xy
TeXForm@%D
\frac{(x+y)^2}{\sqrt{x y}}
Out[2]//TeXForm=
ToExpression A"input",TeXForm E
Converting TeX strings to Mathematica.
This converts a TeX string to Mathematica. Note the double backslashes needed in the string.
In[3]:= Out[3]=
In addition to being able to convert individual expressions to TeX, Mathematica also provides capabilities for translating complete notebooks. These capabilities can usually be accessed from the File Save As... menu in the notebook front end.
58
Data Manipulation
Export has many options applying to HTML export that allow you to specify how notebooks should be converted for web browsers with different capabilities.
print expr in MathML form use StandardForm rather than traditional mathematical notation interpret a string of MathML as Mathematica input
MathMLForm@x ^ 2 zD
<mfrac> <msup> <mi>x</mi> <mn>2</mn> </msup> <mi>z</mi> </mfrac> </math>
Out[1]//MathMLForm= <math>
If you paste MathML into a Mathematica notebook, Mathematica will automatically try to convert it to Mathematica input. You can copy an expression from a notebook as MathML using the Copy As menu in the notebook front end.
export in XML format import from XML import data from a string of XML
Somewhat like Mathematica expressions, XML is a general format for representing data. Mathematica automatically converts certain types of expressions to and from specific types of XML. MathML is one example. Another example is SVG for graphics. If you ask Mathematica to import a generic piece of XML, it will produce a SymbolicXML expression. Each XML element of the form < elem attr = ' val ' > data < elem > is translated to a Mathematica SymbolicXML expression of the form XMLElement @"elem", 8"attr" -> "val"<, 8data<D.
Data Manipulation
59
If you ask Mathematica to import a generic piece of XML, it will produce a SymbolicXML expression. Each XML element of the form < elem attr = ' val ' > data < elem > is translated to a Mathematica SymbolicXML expression of the form XMLElement @"elem", 8"attr" -> "val"<, 8data<D. Once you have imported a piece of XML as SymbolicXML, you can use Mathematica's powerful symbolic programming capabilities to manipulate the expression you get. You can then use Export to export the result in XML form.
This generates a SymbolicXML expression, with an XMLElement representing the a element in the XML string.
In[2]:= Out[2]=
ExportString@%, "XML"D
Out[5]= <a> <b bb='1'> <c>xx<c> <b> <b bb='2'> <c>xx<c> <b> <a>
Import@"http:url",D Import@"ftp:url",D
Importing data from web sources.
import a file from any accessible URL import a file from an FTP server
Import@"http:reference.wolfram.commathematicaExampleDataocelot.jpg"D
60
Data Manipulation
Image Processing
Image Processing
Mathematica now provides built-in support for both programmatic and interactive image process ing~fully integrated with Mathematica's powerful mathematical and algorithmic capabilities. You can create and import images, manipulate them with built-in functions, apply linear and nonlinear filters to them, and visualize them in any number of ways.
The simplest method for creating an image object is to wrap Image around a matrix of real values ranging from 0 to 1.
Here is a one-channel image created from a matrix of numbers.
In[1]:= Out[1]=
You can also copy and paste or drag and drop an image from other applications. You can use Import to obtain an image from a file on the local file system or any accessible remote location.
Data Manipulation
61
i = Import@"ExampleDataocelot.jpg"D
Out[10]=
image
give the number of channels present in the data for image give the type of values used for each pixel element in image give True if image has the form of a valid Image object and False otherwise give the list of default options assigned to a symbol the array of pixel values in image
ImageDimensions@iD
Options@i, ColorSpaceD
The image's array of pixel values can be easily extracted using the function ImageData. By default, the function returns real values, but you can ask for a specific type using the optional "type" argument.
62
Data Manipulation
This returns a fragment of the image as a matrix of real values scaled to the range 0 to 1.
In[14]:=
Out[14]//MatrixForm=
Out[13]//MatrixForm=
In the case of multichannel images, the raw pixel data is represented by a 3D array arranged in one of two possible ways as determined by the option Interleaving .
This imports a color image.
In[1]:=
i = Import@"ExampleDatalena.tif"D
Out[1]=
With the default setting Interleaving -> True , the data is organized as a 2D array of lists of color values, a triplet in the common case of images in RGB color space.
This shows the default data organization.
In[22]:=
Out[22]//MatrixForm=
The option setting Interleaving -> False can be used to store and retrieve the raw data as a list of matrices, one for each of the color channels.
Here is a fragment of the example image arranged as a list of channel matrices.
Data Manipulation
63
Out[23]= :
A multichannel image can be split into a list of single-channel images and, conversely, a multichannel image can be created from any number of single-channel images.
This splits the example RGB color image into three grayscale images.
In[2]:=
ColorSeparate@iD
Out[2]= :
>
In[3]:=
64
Data Manipulation
ImageTake@i, 50D
ImageCrop conveniently complements ImageTake. Instead of specifying the exact number of rows or columns to be extracted, it allows you to define the desired dimensions of the resulting image, namely, the number of rows or columns that are to be retained. By default, the cropping operation is centered, thus an equal number of rows and columns are deleted from the edges of the image.
Here a 100100 pixel region is extracted from the center of the example image.
In[27]:=
Out[27]=
While ImageCrop is primarily used to reduce the dimensions of the source image, it is frequently desirable to pad an image to increase its dimensions. All the most common padding methods are supported.
This shows four different padding methods applied to the right edge of the example image.
In[33]:=
Grid Partition@ ImagePad@i, 880, 50<, 80, 0<<, D & 80, "Reflected", "Fixed", "Periodic"<, 2D
Out[33]=
It is frequently necessary to change the dimensions of an image by resampling or to reposition it in some manner. Functions that perform these basic geometric tasks are readily available.
Data Manipulation
65
give a resized version of image that is w pixels wide give a thumbnail version of image rotate image counterclockwise by 90 reverse image by top-bottom mirror reflection
Here, ImageResize is used to increase and diminish the size of the original image, respectively.
In[38]:=
Out[38]=
ImageRotate is another common spatial operation. It results in an image whose pixel positions are all rotated counter-clockwise with respect to a pivot point centered on the image.
This rotates the example image by 30 degrees.
In[39]:=
ImageRotate@i, p 6D
Out[39]=
Several useful image processing tasks require nothing more than simple arithmetic operations between two images or an image and a constant. For example, you can change brightness by multiplying an image by a constant factor or by adding (subtracting) a constant to (from) an image. More interestingly, the difference of two images can be used to detect change and the product of two images can be used to hide or highlight regions in an image in a process called masking. For this purpose, three basic arithmetic functions are available.
66
Data Manipulation
add an amount x to each channel value in image subtract a constant amount x from each channel value in
image
multiply each channel value in image by a factor x
In[17]:=
, 1 3FF
Out[17]=
Contrast Modification
Contrast modifying point operations frequently encountered in image processing include negation (grayscale or color), gamma correction, which is a power-law transformation, and linear or nonlinear contrast stretching.
Data Manipulation
67
give a lighter version of an image give a darker version of an image give the negative of image, in which all colors have been negated adjust the levels in image, rescaling them to cover the range 0 to 1 apply f to the list of channel values for each pixel in image
One of the simplest examples of a point transformation is negation. For a grayscale image f , the transformation is defined by gHi, jL = 1 - f Hi, jL. It is applied to every pixel in the source image. In the case of multichannel images, the same transformation is applied to each color value, of every pixel.
This show the original example image and its digital negative.
In[6]:=
Out[6]=
The function ImageAdjust can be used to perform most of the commonly needed contrast stretching and power-law transformations, while ImageApply enables you to realize any desired point transformation whatsoever.
This increases contrast using linear scaling.
In[37]:=
ImageAdjust@i, 1.5D
Out[37]=
As an example of a nonlinear contrast stretching operation, consider the following transformation called sigma scaling. Assuming the default range of 0 to 1, the transformation is defined by
68
Data Manipulation
As an example of a nonlinear contrast stretching operation, consider the following transformation called sigma scaling. Assuming the default range of 0 to 1, the transformation is defined by gHi, jL =
1+ 1
f Ii, jM-m s
1 1+
x-m s
Here are several plots of the transformation for different values of the variance parameter.
In[12]:=
GraphicsRow@Plot@f@x, 0.5, D, 8x, 0, 1<, PlotRange 80, 1<, Ticks False, ImageSize TinyD & 80.15, 0.1, 0.05, 0.01<D
Out[12]=
Out[36]=
Image binarization is the operation of converting a multilevel image into a binary image. In a binary image, each pixel value is represented by a single binary digit. In its simplest form, binarization, also called thresholding, is a point-based operation that assigns the value of 0 or 1 to each pixel of an image based on a comparison with some global threshold value t. gHi, jL = 1, if f Hi, jL t 0, if f Hi, jL < t
Thresholding is an attractive early processing step because it leads to significant reduction in data storage and results in binary images that are simpler to analyze. Binary images permit the use of powerful morphological operators for shape and structure-based analysis of image content. Binarization is also a form of image segmentation, as it divides an image into distinct regions.
Data Manipulation
69
create a binary image from image give an approximation to image that uses only n distinct colors
Color images are first converted to grayscale prior to thresholding. If the threshold value is not explicitly given, an optimal value is calculated using one of several well-known methods.
Here is the default binarization based on Otsu's method for optimal threshold selection.
In[2]:=
Binarize@iD
Out[2]=
Here ImageApply is used to return a color image in which each individual channel is binarized, resulting in a maximum of 8 distinct colors.
In[17]:=
Out[17]=
Color Conversion
Four color spaces are currently supported: RGB (red, green, and blue), CMYK (cyan, magenta, yellow, and black), HSB (hue, saturation, and brightness) and grayscale. The RGB (red, green, blue) color scheme is the most frequently used color representation used in practice. The three so-called primary colors are combined (added) in various proportions to produce a composite, full-color image. The RGB color model is universally used in color moni-
70
Data Manipulation
tors and video recorders and cameras. Also, the human visual system is tuned to perceive color as a variable combination of these primary colors. The primary colors added in equal amounts produce the secondary colors of light: cyan (C), magenta (M), and yellow (Y). These are the primary pigment colors used in the printing industry and thus the relevance of the CMY color model. For image processing applications it is often useful to separate the color information from luminance. The HSB (hue, saturation, brightness) model has this property. Hue represents the dominant color as seen by an observer, saturation refers to the amount of dilution of the color with white light, and brightness defines the average luminance. The luminance component may, therefore, be processed independently of the images color information.
convert color specifications in expr to refer to the color space represented by colspace
This shows the conversion results from an RGB source to the remaining supported color spaces.
In[38]:= Out[38]=
In[39]:=
Image@8880., 1., 1., 0.<, 81., 0., 1., 0.<, 81., 1., 0., 0.<<<, "Real", ColorSpace -> "CMYK", Interleaving -> TrueD Out[39]= Image@8880., 1., 1.<, 80.3333333333333333, 1., 1.<, 80.6666666666666666, 1., 1.<<<, "Real", ColorSpace -> "HSB", Interleaving -> TrueD [email protected], 0.587, 0.114<<, "Real", ColorSpace -> "Grayscale", Interleaving -> NoneD
Note that the RGB -> Grayscale transformation uses the weighting coefficients recommended for U.S. broadcast TV (NTSC) and later incorporated into the CCIR 601 standard for digital video.
Image Histogram
An important concept common to many image enhancement operations is that of a histogram, which is simply a count (or relative frequency, if normalized) of the gray levels in the image. Analysis of the histogram gives useful information about image contrast. Image histograms are important in many areas of image processing, most notably compression, segmentation, and thresholding.
Data Manipulation
71
image
plot a histogram of the pixel levels for each channel in
image
Out[3]=
g @i, jD = T
where g is the output image resulting from applying transformation T to the 3 3 centered neighborhoods of all the pixels in input image f. It should be noted that the spatial dimensions and geometry of the neighborhood are generally determined by the needs of the application. Examples of image processing region-based operations include noise reduction, edge detection, edge sharpening, image enhancement, segmentation, and more.
72
Data Manipulation
Out[4]=
The more general (but slower) ImageFilter function can be used in cases when traditional linear filtering is not possible and the desired operation is not implemented by any of the builtin filtering functions.
This calculates the maximum range of values within a small neighborhood of each pixel.
In[5]:=
Out[5]=
Data Manipulation
73
A large number of linear and nonlinear operators are available as built-in functions. Here is a partial listing.
give a blurred version of image give a sharpened version of image replace every value by the mean value in its range r convolve with a Gaussian kernel of pixel radius r replace every value by the median in its range r replace every value by the minimum in its range r replace each pixel with the most common pixel value in its range r
Blur @imageD Sharpen @imageD MeanFilter @image,rD GaussianFilter@image,rD MedianFilter @image,rD MinFilter@image,rD CommonestFilter @image,rD
One of the more common applications of linear filtering in image processing has been in the computation of approximations of discrete derivatives and consequently edge detection. The well-known methods of Prewitt, Sobel, and Canny are all essentially based on the calculation of two orthogonal derivatives at each point in an image and the gradient magnitude.
Here are the two Sobel filters.
In[6]:=
sobelY = 881, 2, 1<, 80, 0, 0<, 8- 1, - 2, - 1<< 4.; sobelX = 881, 0, - 1<, 82, 0, - 2<, 81, 0, - 1<< 4.; This returns the edges of a grayscale image using Sobel filters.
2
In[7]:=
ImageBSqrtBImageDataBImageConvolveB
, sobelXFF +
ImageDataBImageConvolveB
, sobelYFF FF
Out[7]=
As a second example, consider the task of removing the impulsive noise, which is called salt noise due to its visual appearance, from an image. This is a classic example contrasting the
74
Data Manipulation
As a second example, consider the task of removing the impulsive noise, which is called salt noise due to its visual appearance, from an image. This is a classic example contrasting the different outcomes resulting from a linear moving-average and a nonlinear moving-median calculation.
This creates a small image with impulsive noise.
In[13]:=
Image@ReplacePart@ArrayPad@ConstantArray@160, 820, 20<D, 15, 60D, 255, RandomInteger@81, 50<, 8100, 2<DD, "Byte"D
Out[13]=
Out[14]=
Morphological Processing
Mathematical morphology provides an approach to the processing of digital images that is based on the spatial structure of objects in a scene. In binary morphology, unlike linear and nonlinear operators discussed so far, morphological operators modify the shape of pixel groupings instead of their amplitude. However, in analogy with these operators, binary morphological operators may be implemented using convolution-like algorithms with the fundamental operations of addition and multiplication replaced by logical OR and AND.
give the dilation with respect to a range r square give the erosion with respect to a range r square
Data Manipulation
75
This shows the dilation (left) and erosion (right) of the example image (center) using a 55 uniform structuring element.
In[8]:=
Out[9]=
The definitions of binary morphology extend naturally to the domain of grayscale images with Boolean AND and OR becoming point-wise minimum and maximum operators, respectively. For a uniform, zero-valued structuring element, the dilation of an image f reduces to the following simple form: f @i - 1, j - 1D f @i, j - 1D f @i + 1, j - 1D f @i - 1, jD f @i, jD f @i + 1, jD f @i - 1, j + 1D f @i, j + 1D f @i + 1, j + 1D
g @i, jD = Max
This shows the grayscale dilation (left) and erosion (right) of the example image (center) using a 55 uniform structuring element.
In[10]:=
Out[10]=
These operators can be used in combinations using a single structuring element or a list of such elements to perform many useful image processing tasks. A partial listing includes thinning, thickening, edge and corner detection, and background normalization.
This uses dilation and erosion to detect edges in a grayscale image.
In[17]:=
Out[18]=
76
Data Manipulation
give the fixed point of the geodesic dilation of the image marker constrained by the image mask give the fixed point of the geodesic erosion of the image marker constrained by the image mask give the distance transform of image, in which the value of each pixel is replaced by its distance to the nearest background pixel give an array in which each pixel of image is replaced by an integer index representing the connected foreground image component in which the pixel lies
MorphologicalComponents @imageD
An important category of morphological algorithms, called morphological reconstruction, are based on repeated application of dilation (or erosion) to a marker image, while the result of each step is constrained by a second image, the mask. The process ends when a fixed point is reached. Interestingly, many image processing tasks have a natural formulation in terms of reconstruction. Peak and valley detection, hole filling, region flooding, and hysteresis threshold are just a few examples. The latter, also known as a double threshold, is an integral part of the widely used Canny edge detector. Pixels falling below the low threshold are rejected, pixels above the high threshold are accepted, while pixels in the intermediate range are accepted only if they are "connected" to the high threshold pixels. Connectivity may be established using a variety of algorithms, but reconstruction gives an effective and very simple solution.
Here are the low, high, and double threshold images, respectively.
In[36]:=
Out[36]= :
>