0% found this document useful (0 votes)
112 views

ROOT CSC11-Handout PDF

ROOT is an open source data analysis framework started in 1995 at CERN. It provides tools for data storage, analysis, and visualization. Key features include an efficient data storage and query system, statistical analysis algorithms, and scientific visualization capabilities. ROOT is used widely in particle physics and related fields for processing large datasets.

Uploaded by

joao da silva
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views

ROOT CSC11-Handout PDF

ROOT is an open source data analysis framework started in 1995 at CERN. It provides tools for data storage, analysis, and visualization. Key features include an efficient data storage and query system, statistical analysis algorithms, and scientific visualization capabilities. ROOT is used widely in particle physics and related fields for processing large datasets.

Uploaded by

joao da silva
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 173

ROOT

Bertrand Bellenot, Axel Naumann


CERN
WHAT IS ROOT?

CSC11 • ROOT 2
What's ROOT?

CSC11 • ROOT 3
ROOT: An Open Source Project
• Started in 1995
• 7 full time developers at CERN, plus Fermilab
• Large number of part-time developers: let users
participate
• Available (incl. source) under GNU LGPL

CSC11 • ROOT 4
ROOT in a Nutshell
Framework for large scale data handling
Provides, among others,
– an efficient data storage, access and query system
(PetaBytes)
– advanced statistical analysis algorithms (multi dimensional
histogramming, fitting, minimization and cluster finding)
– scientific visualization: 2D and 3D graphics, Postscript, PDF,
LateX
– geometrical modeller
– PROOF parallel query engine

CSC11 • ROOT 5
Graphics

CSC11 • ROOT 6
Histogramming
• Histogram is just occurrence counting, i.e. how often
they appear
• Example: {1,3,2,6,2,3,4,3,4,3,5}

CSC11 • ROOT 7
Histogramming
• How is a Real Histogram Made?
Lets consider the age distribution of the CSC
participants in 2008:

Binning:
Grouping ages of
participants in several
categories (bins)

CSC11 • ROOT 8
Histogramming
Table of Ages
(binned)

Shows distribution of ages, total number of entries (57


participants) and average: 27 years 10 months 6 days…

CSC11 • ROOT 9
Histograms
Analysis result: often a histogram

Menu:
View / Editor

CSC11 • ROOT 10
Fitting
Analysis result: often a fit of a histogram

CSC11 • ROOT 11
Fit Panel
To fit a histogram:
right click histogram,
"Fit Panel"

Straightforward interface
for fitting!

CSC11 • ROOT 12
2D/3D
We have seen 1D histograms, but there are
also histograms in more dimensions.

2D Histogram 3D Histogram

CSC11 • ROOT 13
OpenGL
OpenGL can be used to render 2D & 3D histograms,
functions, parametric equations, and to visualize 3D
objects (geometry)

CSC11 • ROOT 14
Geometry
• Describes complex detector geometries
• Allows visualization of these detector geometries with
e.g. OpenGL
• Optimized particle transport in complex geometries
• Working in correlation with simulation packages such
as GEANT3, GEANT4 and FLUKA

CSC11 • ROOT 15
Geometry

CSC11 • ROOT 16
EVE (Event Visualization Environment)
• Event: Collection of data from a detector
(hits, tracks, …)
Use EVE to:
• Visualize these physics objects together with detector
geometry (OpenGL)
• Visually interact with the data, e.g. select a particular
track and retrieve its physical properties

CSC11 • ROOT 17
EVE

CSC11 • ROOT 18
Math

CSC11 • ROOT 19
Math Example:

RANDOM NUMBERS

CSC11 • ROOT 20
Quasi-Random Numbers
• Needed e.g. to simulate nature: will particle interact?
• Trivial example for random number generator function:
last digit of n = n + 7,
say start with 0:
0, 7, 4, 1, 8, 5, 2, 9, 6, 3, 0, 7, 4, 1, 8, 5, 2, 9, 6, 3, 0, 7, 4, 1, 8,
• Properties:
– identical frequency of all numbers 0..9
– “looks” random, but short period:
0, 7, 4, 1, 8, 5, 2, 9, 6, 3, 0, 7, 4, 1, 8, 5, 2, 9, 6, 3, 0, 7, 4, 1,
– numbers not independent!

CSC11 • ROOT 21
Random Number Generator
• Solution: more complex function
– Mersenne Twister (TRandom3) is recommended
TRandom3 myRnd; myRnd.Uniform();
generates random number between >0 and <= 1
– period 106000, fast!
• Flat probability distribution function good for dice,
usually not for physics:
– measurement uncertainty: gaussian
– particle lifetime: N(t) = N0 exp(-t/τ) i.e. exponential
– energy loss of particles in matter: landau
CSC11 • ROOT 22
Naïve Random Distribution
• Want to “sample” distribution
y = 2x for 0 < x < 1
• Acceptance-rejection method
• Generate random x* and y* point:
if y* <= 1 – x2, return as random
number, else generate new x*, y*.

• Problem: waste of CPU, especially if function


evaluation costs a lot of CPU time

CSC11 • ROOT 23
Random Distribution From Inverse
• Integral g(x) of distribution f(x) (probability density
function “PDF”):

f(x) g(x)

• and inverse i(y) of g(x)


• random number 0 < y* < 1
• return i(y*) i(y)
• i(y*) is distributed like f(x)
CSC11 • ROOT 24
Smart Random Distribution
• Problem with inverse: must be known!
• Can combine rejection on f() and inverse of a() and b()
with a(x) <= f(x) <= b(x) to reduce sampling overhead

• ROOT implements fast generators for random


numbers distributed like Gauss, exp, Landau,
Poisson…

CSC11 • ROOT 25
Interlude: HELP!
ROOT is a framework – only as good as its
documentation.

http://root.cern.ch
• User's Guide (it has your answers!)
• Reference Guide
What is TRandom?
What functions does it have?

CSC11 • ROOT 26
LET'S FIRE UP ROOT!

CSC11 • ROOT 27
Setting Up ROOT
Before starting ROOT:
setup environment variables $PATH,
$LD_LIBRARY_PATH

(ba)sh:
$ source /PathToRoot/bin/thisroot.sh

(t)csh:
$ source /PathToRoot/bin/thisroot.csh

CSC11 • ROOT 28
Starting Up ROOT
ROOT is prompt-based
$ root
root [0] _

Prompt speaks C++


root [0] gROOT->GetVersion();
(const char* 0x5ef7e8)"5.27/04"

CSC11 • ROOT 29
ROOT As Pocket Calculator
Calculations:
root [0] sqrt(42)
(const double)6.48074069840786038e+00
root [1] double val = 0.17;
root [2] sin(val)
(const double)1.69182349066996029e-01

Uses C++ Interpreter CINT

CSC11 • ROOT 30
Running Code
To run function mycode() in file mycode.C:
root [0] .x mycode.C

Equivalent: load file and run function:


root [0] .L mycode.C
root [1] mycode()

Quit:
root [0] .q
All of CINT's commands (help):
root [0] .h

CSC11 • ROOT 31
ROOT Prompt
Why C++ and not a scripting language?!
You'll write your code in C++, too. Support for python,
ruby,… exists.

Why a prompt instead of a GUI?


ROOT is a programming framework, not an office
suite. Use GUIs where needed.

CSC11 • ROOT 32
Running Code
Macro: file that is interpreted by CINT (.x)

int mymacro(int value)


{
int ret = 42;
ret += value;
return ret;
}

Execute with .x mymacro.C(42)

CSC11 • ROOT 33
Compiling Code: ACLiC
Load code as shared lib, much faster:
.x mymacro.C+(42)

Uses the system's compiler, takes seconds


Subsequent .x mymacro.C+(42) check for changes,
only rebuild if needed
Exactly as fast as e.g. Makefile based stand-alone binary!
CINT knows types, functions in the file, e.g. call
mymacro(43)

CSC11 • ROOT 34
Compiled versus Interpreted
Why compile?
Faster execution, CINT has limitations, validate code.

Why interpret?
Faster Edit → Run → Check result → Edit cycles
("rapid prototyping").
Scripting is sometimes just easier.

Are Makefiles dead?


Yes! ACLiC is even platform independent!

CSC11 • ROOT 35
A LITTLE C++

CSC11 • ROOT 36
A Little C++
Hopefully many of you know – but some don't.
• Object, constructor, assignment
• Pointers
• Scope, destructor
• Stack vs. heap
• Inheritance, virtual functions

If you use C++ you have to understand these concepts!

CSC11 • ROOT 37
Objects, Constructors, =
Look at this code:
TNamed myObject("name", "title");
TNamed mySecond;
mySecond = myObject;
cout << mySecond.GetName() << endl;

CSC11 • ROOT 38
Objects, Constructors, =
Look at this code:
TNamed myObject("name", "title");
TNamed mySecond;
mySecond = myObject;
cout << mySecond.GetName() << endl;

Creating objects:
1. Constructor TNamed::TNamed(const char*,
const char*)
2. Default constructor TNamed::TNamed()

CSC11 • ROOT 39
Objects, Constructors, =
Look at this code:
TNamed myObject("name", "title");
TNamed mySecond;
mySecond = myObject;
cout << mySecond.GetName() << endl;

3. Assignment:
mySecond myObject
TNamed: TNamed:
fName "" fName "name"
fTitle "" fTitle "title"
CSC11 • ROOT 40
Objects, Constructors, =
Look at this code:
TNamed myObject("name", "title");
TNamed mySecond;
mySecond = myObject;
cout << mySecond.GetName() << endl;

3. Assignment: creating a twin


mySecond myObject
TNamed:
fName "" = TNamed:
fName "name"
fTitle "" fTitle "title"
CSC11 • ROOT 41
Objects, Constructors, =
Look at this code:
TNamed myObject("name", "title");
TNamed mySecond;
mySecond = myObject;
cout << mySecond.GetName() << endl;

4. New content
mySecond
TNamed: output: "name"
fName "name"
fTitle "title"
CSC11 • ROOT 42
Pointers
Modified code:
TNamed myObject("name", "title");
TNamed* pMySecond = 0;
pMySecond = &myObject;
cout << pMySecond->GetName() << endl;

Pointer declared with "*", initialize to 0

CSC11 • ROOT 43
Pointers
Modified code:
TNamed myObject("name", "title");
TNamed* pMySecond = 0;
pMySecond = &myObject;
cout << pMySecond->GetName() << endl;

"&" gets address:

[address] myObject
pMySecond TNamed:
fName "name"
fTitle "title"
CSC11 • ROOT 44
Pointers
Modified code:
TNamed myObject("name", "title");
TNamed* pMySecond = 0;
pMySecond = &myObject;
cout << pMySecond->GetName() << endl;

Assignment: point to myObject; no copy

myObject
pMySecond
[address] = & TNamed:
fName "name"
fTitle "title"
CSC11 • ROOT 45
Pointers
Modified code:
TNamed myObject("name", "title");
TNamed* pMySecond = 0;
pMySecond = &myObject;
cout << pMySecond->GetName() << endl;

Access members of value pointed to by "->"

CSC11 • ROOT 46
Pointers
Changes propagated:
TNamed myObject("name", "title");
TNamed* pMySecond = 0;
pMySecond = &myObject;
pMySecond->SetName("newname");
cout << myObject.GetName() << endl;

Pointer forwards to object


Name of object changed – prints "newname"!

CSC11 • ROOT 47
Object vs. Pointer
Compare object:
TNamed myObject("name", "title");
TNamed mySecond = myObject;
cout << mySecond.GetName() << endl;

to pointer:
TNamed myObject("name", "title");
TNamed* pMySecond = &myObject;
cout << pMySecond->GetName() << endl;

CSC11 • ROOT 48
Object vs. Pointer: Parameters
Calling functions: object parameter obj gets copied
for function void funcO(TNamed obj);
call!
TNamed myObject;
funcO(myObject);

Pointer parameter: only address passed,


no copy
void funcP(TNamed* ptr);
TNamed myObject;
funcP(&myObject);

CSC11 • ROOT 49
Object vs. Pointer: Parameters
Functions changing parameter: funcO can only access
copy! void funcO(TNamed obj){
caller not obj.SetName("nope");
changed! }
funcO(caller);

Using pointers (or references) funcP can change


caller
void funcP(TNamed* ptr){
ptr->SetName("yes");
}
funcP(&caller);
CSC11 • ROOT 50
Scope
Scope: range of visibility and C++ "life".
Birth: constructor, death: destructor
{ // birth: TNamed() called
TNamed n;
} // death: ~TNamed() called

Variables are valid / visible only in scopes:

int a = 42;
{ int a = 0; }
cout << a << endl;

CSC11 • ROOT 51
Scope
Functions are scopes:
void func(){ TNamed obj; }

func();
cout << obj << end; // obj UNKNOWN!

must not return TNamed* func(){


pointers to TNamed obj;
local variables! return &obj; // BAD!
}

CSC11 • ROOT 52
Stack vs. Heap
So far only stack:
TNamed myObj("n","t");

Fast, but often < 10MB. Only survive in scope.

Heap: slower, GBs (RAM + swap), creation and


destruction managed by user:
TNamed* pMyObj = new TNamed("n","t");
delete pMyObj; // or memory leak!

CSC11 • ROOT 53
Stack vs. Heap: Functions
Can return heap objects without copying:
TNamed* CreateNamed(){
// user must delete returned obj!
TNamed* ptr = new TNamed("n","t");
return ptr; }

ptr gone – but TNamed object still on the heap, address


returned!
TNamed* pMyObj = CreateNamed();
cout << pMyObj->GetName() << endl;
delete pMyObj; // or memory leak!

CSC11 • ROOT 54
Inheritance
Classes "of same kind" can re-use functionality
E.g. plate and bowl are both dishes:
class TPlate: public TDish {...};
class TBowl: public TDish {...};

Can implement common functions in TDish:


class TDish { TPlate *a = new TPlate();
public: a->Wash();
void Wash();
};

CSC11 • ROOT 55
Inheritance: The Base
Use TPlate, TBowl as dishes:
assign pointer of derived to pointer of base "every
plate is a dish"
TDish *a = new TPlate();
TDish *b = new TBowl();

But not every dish is a plate, i.e. the inverse doesn't work.
And a bowl is totally not a plate!

TPlate* p = new TDish(); // NO!


TPlate* q = new TBowl(); // NO!

CSC11 • ROOT 56
Virtual Functions
Often derived classes behave differently:
class TDish { ...
virtual bool ForSoup() const;
};
class TPlate: public TDish { ...
bool ForSoup() const { return false; }
};
class TBowl: public TDish { ...
bool ForSoup() const { return true; }
};

CSC11 • ROOT 57
Pure Virtual Functions
But TDish cannot know! Mark as "not implemented"
class TDish { ...
virtual bool ForSoup() const = 0;
};

Only for virtual functions.


Cannot create object of TDish anymore (one function is
missing!)

CSC11 • ROOT 58
Calling Virtual Functions
Call to virtual functions evaluated at runtime:
void FillWithSoup(TDish* dish) {
if (dish->ForSoup())
dish->SetFull();
}

Works for any type as expected:


TDish* a = new TPlate();
TDish* b = new TBowl();
FillWithSoup(a); // will not be full
FillWithSoup(b); // is now full

CSC11 • ROOT 59
Virtual vs. Non-Virtual
So what happens if non-virtual?
class TDish { ...
bool ForSoup() const {return false;}
};

Will now always call TDish::ForSoup(), i.e. false


void FillWithSoup(TDish* dish) {
if (dish->ForSoup())
dish->SetFull();
}

CSC11 • ROOT 60
Congrats!

You have earned yourself the CSC ROOT


C++ Diploma.
From now on you may use C++ without
feeling lost!

CSC11 • ROOT 61
Summary
We know:
• why and how to start ROOT
• C++ basics
• that you run your code with ".x"
• can call functions in libraries
• can (mis-) use ROOT as a pocket calculator!

Lots for you to discover during next two lectures and


especially the exercises!

CSC11 • ROOT 62
Streaming, Reflection, TFile,
Schema Evolution

SAVING DATA

CSC11 • ROOT 63
Saving Objects
Cannot do in C++:
TNamed* o = new TNamed("name","title");
std::write("file.bin", "obj1", o);
TNamed* p =
std::read("file.bin", "obj1");
p->GetName();

E.g. LHC experiments use C++ to manage data


Need to write C++ objects and read them back
std::cout not an option: 15 PetaBytes / year of processed
data (i.e. data that will be read)

CSC11 • ROOT 64
Saving Objects – Saving Types
What's needed?
TNamed* o = new TNamed("name", "title");
std::write("file.bin", "obj1", o);

Store data members of TNamed; need to know:


1) type of object
2) data members for the type
3) where data members are in memory
4) read their values from memory, write to disk

CSC11 • ROOT 65
Serialization
Store data members of TNamed: serialization
1) type of object: runtime-type-information RTTI
2) data members for the type: reflection
3) where data members are in memory: introspection
4) read their values from memory, write to disk: raw I/O

Complex task, and C++ is not your friend.

CSC11 • ROOT 66
Reflection
Need type description (aka reflection)
1. types, sizes, members
class TMyClass {
float fFloat;
TMyClass is a class.
Long64_t fLong;
};
Members:
– "fFloat", type float, size 4 bytes
– "fLong", type Long64_t, size 8 bytes

CSC11 • ROOT 67
Platform Data Types
Fundamental data types (int, long,…):
size is platform dependent

Store "long" on 64bit platform, writing 8 bytes:


00, 00, 00, 00, 00, 00, 00, 42
Read on 32bit platform, "long" only 4 bytes:
00, 00, 00, 00

Data loss, data corruption!

CSC11 • ROOT 68
ROOT Basic Data Types
Solution: ROOT typedefs

Signed Unsigned sizeof [bytes]


Char_t UChar_t 1
Short_t UShort_t 2
Int_t UInt_t 4
Long64_t ULong64_t 8
Double32_t float on disk, double in
RAM

CSC11 • ROOT 69
Reflection
Need type description (platform dependent)
1. types, sizes, members
2. offsets in memory class TMyClass {
float fFloat;
– 16 Long64_t fLong;
Memory Address

– 14 };
– 12 fLong
TMyClass

– 10
– 8
– 6 PADDING "fFloat" is at offset 0
– 4
– 2 fFloat "fLong" is at offset 8
CSC11 • ROOT
– 0 70
I/O Using Reflection
members  memory  disk

– 16
Memory Address

– 14
– 12 fLong
TMyClass

– 10
– 8
– 6 PADDING
– 4
– 2 fFloat
– 0

CSC11 • ROOT 71
C++ Is Not Java
Lesson: need reflection!
Where from?

Java: get data members with

Class.forName("MyClass").getFields()

C++: get data members with


CAREFUL
– oops. Not part of C++.
THIS LANGUAGE
HAS NO BRAIN.
USE YOUR OWN

CSC11 • ROOT 72
ROOT And Reflection
Simply use ACLiC:
.L MyCode.cxx+

Creates library with reflection data ("dictionary") of all


types in MyCode.cxx!

Dictionary needed for interpreter, too


ROOT has dictionary for all its types

CSC11 • ROOT 73
Back To Saving Objects
Given a TFile:
TFile* f = TFile::Open("file.root","RECREATE");

Write an object deriving from TObject:


object->Write("optionalName")
"optionalName" or TObject::GetName()

Write any object (with dictionary):


f->WriteObject(object, "name");

CSC11 • ROOT 74
TFile
ROOT stores objects in TFiles:
TFile* f = TFile::Open("file.root", "NEW");
TFile behaves like file system:
f->mkdir("dir");
TFile has a current directory:
f->cd("dir");
TFile compresses data ("zip"):
f->GetCompressionFactor()
2.6

CSC11 • ROOT 75
"Where Is My Histogram?"
TFile owns histograms, graphs, trees
(due to historical reasons):
TFile* f = TFile::Open("myfile.root");
TH1F* h = new TH1F("h","h",10,0.,1.);
TNamed* o = new TNamed("name", "title");
o->Write();
delete f;
h automatically deleted: owned by file.
o still there
unique names!
even if saving o to file!
TFile acts like a scope for hists, graphs, trees!
CSC11 • ROOT 76
Risks With I/O
Physicists can loop a lot:
For each particle collision
For each particle created
For each detector module
Do something.
Physicists can loose a lot:
Run for hours…
Crash.
Everything lost.

CSC11 • ROOT 77
Name Cycles
Create snapshots regularly:
MyObject;1
MyObject;2

MyObject;5427
MyObject

Write() does not replace but append!


but see documentation TObject::Write()

CSC11 • ROOT 78
The "I" Of I/O
Reading is simple:
TFile* f = TFile::Open("myfile.root");
TH1F* h = 0;
f->GetObject("h", h);
h->Draw();
delete f;

Remember:
TFile owns histograms!
file gone, histogram gone!

CSC11 • ROOT 79
Ownership And TFiles
Separate TFile and histograms:
TFile* f = TFile::Open("myfile.root");
TH1F* h = 0;
TH1::AddDirectory(kFALSE);
f->GetObject("h", h);
h->Draw();
delete f;

… and h will stay around.


Put in root_logon.C in current directory to be executed
when root starts

CSC11 • ROOT 80
Changing Class – The Problem
Things change:
class TMyClass {
float fFloat;
Long64_t fLong;
};

CSC11 • ROOT 81
Changing Class – The Problem
Things change:
class TMyClass {
double fFloat;
Long64_t fLong;
};

Inconsistent reflection data, mismatch in memory, on


disk
Objects written with old version cannot be read
Need to store reflection with data to detect!

CSC11 • ROOT 82
Schema Evolution
Simple rules to convert disk to memory layout
1. skip removed members
file.root RAM
Long64_t fLong; ignore
float fFloat; float fFloat;

2. default-initialize added members


TMyClass(): fLong(0) RAM
file.root Long64_t fLong;
float fFloat; float fFloat;

3. convert members where possible


CSC11 • ROOT 83
Class Version
ClassDef() macro makes I/O faster, needed when deriving
from TObject
Can have multiple class versions in same file
Use version number to identify layout:
class TMyClass: public TObject {
public:
TMyClass(): fLong(0), fFloat(0.) {}
virtual ~TMyClass() {}
...
ClassDef(TMyClass,1); // example class
};

CSC11 • ROOT 84
Reading Files
Files store reflection and data: need no library!

function

call

CSC11 • ROOT 85
ROOT I/O Candy
• Nice viewer for TFile: new TBrowser
• Can even open
TFile::Open("http://cern.ch/file.root") including
read-what-you-need!
• Combine contents of TFiles with $ROOTSYS/bin/hadd

CSC11 • ROOT 86
Summary
Big picture:
• you know ROOT files – for petabytes of data
• you learned that reflection is key for I/O
• you learned what schema evolution is

Small picture:
• you can write your own data to files
• you can read it back
• you can change the definition of your classes

CSC11 • ROOT 87
ROOT COLLECTION CLASSES
Collection Classes
ROOT collections polymorphic containers: hold pointers
to TObject, so:
• Can only hold objects that inherit from TObject
• Return pointers to TObject, that have to be cast back
to the correct subclass

void DrawHist(TObjArray *vect, int at)


{
TH1F *hist = (TH1F*)vect->At(at);
if (hist) hist->Draw();
}

CSC11 • ROOT 89
TClonesArray
Array of objects of the same class ("clones")
Designed for repetitive data analysis tasks:
same type of objects
created and deleted
many times.
No comparable class in STL!

The internal data structure of a


TClonesArray
CSC11 • ROOT 90
TClonesArray
Standard array:
while (next_event()) {
TTrack
TTrack
for (int i=0;i<N;++i) TTrack
TTrack
TTrack
TTrack
a[i] = new TTrack(x,y,z); TTrack
TTrack
do_something(a); TTrack
TTrack
a.clear();
};

TClonesArray:
while (next_event()) {
for (int i=0;i<N;++i) TTrack
TTrack
new(a[i]) TTrack(x,y,z); TTrack
TTrack
TTrack
TTrack
do_something(a); TTrack
TTrack
TTrack
TTrack
a.Delete();
};

CSC11 • ROOT 91
Traditional Arrays
Very large number of new and delete calls in large loops
like this (Nevents x Ntracks times new/delete):
Nevents
= 100000
TObjArray a(10000);
while (TEvent *ev = (TEvent *)next()) {
for (int i = 0; i < ev->Ntracks; ++i) {
a[i] = new TTrack(x,y,z,...);
...
Ntracks
} = 10000
a.Delete();
}

CSC11 • ROOT 92
Use of TClonesArray
You better use a TClonesArray which reduces the
number of new/delete calls to only Ntracks: Nevents
= 100000
TClonesArray a("TTrack", 10000);
while (TEvent *ev = (TEvent *)next()) {
for (int i = 0; i < ev->Ntracks; ++i) {
new(a[i]) TTrack(x,y,z,...);
...
} Ntracks
a.Delete(); = 10000

}
• Pair of new / delete calls cost about 4 μs
• Allocating / freeing memory Nevents*Ntracks = 109 times
costs about 1 hour!
CSC11 • ROOT 93
ROOT TREES
Trees
From: To:
Simple data types Complex data types
(e.g. Excel tables) (e.g. Database tables)
Event

Header Type

Particles

Pt Charge

Energy Track

Vertex

Position

CSC11 • ROOT … 95
Why Trees ?
• Extremely efficient write once, read many ("WORM")
• Designed to store >109 (HEP events) with same data
structure
• Trees allow fast direct and random access to any entry
(sequential access is the best)
• Optimized for network access (read-ahead)

CSC11 • ROOT 96
Why Trees ?
object.Write() convenient for simple objects like
histograms, inappropriate for saving collections of
events containing complex objects
• Reading a collection: read all elements (all events)
• With trees: only one element in memory, or even only
a part of it (less I/O)
• Trees buffered to disk (TFile); I/O is integral part of
TTree concept

CSC11 • ROOT 97
Tree Access
• Databases have row wise access
– Can only access the full object (e.g. full event)
• ROOT trees have column wise access
– Direct access to any event, any branch or any leaf even in the
case of variable length structures
– Designed to access only a subset of the object attributes
(e.g. only particles’ energy)
– Makes same members consecutive, e.g. for object with
position in X, Y, Z, and energy E, all X are consecutive, then
come Y, then Z, then E. A lot higher zip efficiency!

CSC11 • ROOT 98
Branch Creation from Class

a Branches
b
float a; c
int b; N
double c[5];
int N; xxxxxx Class1::a
float* x; //[N] yyyyyy Class1::b
float* y; //[N]
c1 T::a
Class1 c1;
Class2 c2; //! c3 T::b
T::c
Class3 *c3;
std::vector<T> v1; v1 S::a
S::b
std::vector<S*> v2; v2
TClonesArray *tc; X::a
tc X::b
X::c
X::d
CSC11 • ROOT 99
ObjectWise/MemberWise Streaming
member-wise
streaming of
3 modes to stream
collections default
an object
since 5.27

a1b1c1d1a2b2c2d2…anbncndn
a
b
c a1a2..anb1b2..bnc1c2..cnd1d2..dn
d
a1a2…an
member-
b1b2…bn
wise gives
better
c1c2…cn
compression
d1d2…dn
CSC11 • ROOT 100
Building ROOT Trees
Overview of
– Trees
– Branches
5 steps to build a TTree

CSC11 • ROOT 101


Tree structure

CSC11 • ROOT 102


Tree structure
• Branches: directories
• Leaves: data containers
• Can read a subset of all branches – speeds up
considerably the data analysis processes
• Branches of the same TTree can be written to
separate files

CSC11 • ROOT 103


Memory ↔ Tree
• Each Node is a branch in the Tree
Memory
0

T.Fill()

T
CSC11 • ROOT 104
Memory ↔ Tree
• Each Node is a branch in the Tree Memory
0
1
2
3
4
5
6

T.GetEntry(6)

T
CSC11 • ROOT 105
Five Steps to Build a Tree
Steps:
1. Create a TFile
2. Create a TTree
3. Add TBranch to the TTree
4. Fill the tree
5. Write the file

CSC11 • ROOT 106


Example macro
void WriteTree()
{
Event *myEvent = new Event();
TFile f("AFile.root", "RECREATE");
TTree *t = new TTree("myTree","A Tree");
t->Branch("EventBranch", &myEvent);
for (int e=0;e<100000;++e) {
myEvent->Generate(); // hypothetical
t->Fill();
}
t->Write();
}

CSC11 • ROOT 107


Step 1: Create a TFile Object
Trees can be huge  need file for swapping filled entries

TFile *hfile = TFile::Open("AFile.root",


"RECREATE");

CSC11 • ROOT 108


Step 2: Create a TTree Object

The TTree constructor:


– Tree name (e.g. "myTree")
– Tree title

TTree *tree = new TTree("myTree","A Tree");

CSC11 • ROOT 109


Step 3: Adding a Branch
• Branch name
• Address of pointer
to the object

Event *myEvent = new Event();


myTree->Branch("eBranch", &myEvent);

CSC11 • ROOT 110


Step 4: Fill the Tree
• Create a for loop
• Assign values to the object contained in each branch
• TTree::Fill() creates a new entry in the tree: snapshot
of values of branches’ objects

for (int e=0;e<100000;++e) {


myEvent->Generate(e); // fill event
myTree->Fill(); // fill the tree
}
CSC11 • ROOT 111
Step 5: Write Tree To File

myTree->Write();

CSC11 • ROOT 112


Reading a TTree
• Looking at a tree
• How to read a tree
• Friends and chains

CSC11 • ROOT 113


Example macro
void ReadTree() {
TFile f("AFile.root");
TTree *T = (TTree*)f->Get("T");
Event *myE = 0; TBranch* brE = 0;
T->SetBranchAddress("EvBranch", &myE, brE);
T->SetCacheSize(10000000);
T->AddBranchToCache("EvBranch");
Long64_t nbent = T->GetEntries();
for (Long64_t e = 0;e < nbent; ++e) {
brE->GetEntry(e);
myE->Analyze();
}
}
Data pointers (e.g. myE) MUST be set to 0
CSC11 • ROOT 114
How to Read a TTree
Example:

1. Open the Tfile


TFile f("AFile.root")

2. Get the TTree


TTree *myTree = 0; or
f.GetObject("myTree",my
Tree)

CSC11 • ROOT 115


How to Read a TTree
3. Create a variable pointing to the data
root [] Event *myEvent = 0;
4. Associate a branch with the variable:
root [] myTree->SetBranchAddress("eBranch", &myEvent);
5. Read one entry in the TTree
root [] myTree->GetEntry(0)
root [] myEvent->GetTracks()->First()->Dump()
==> Dumping object at: 0x0763aad0, name=Track, class=Track
fPx 0.651241 X component of the momentum
fPy 1.02466 Y component of the momentum
fPz 1.2141 Z component of the momentum
[...]

CSC11 • ROOT 116


Branch Access Selection
• Use TTree::SetBranchStatus() or TBranch::GetEntry()
to select branches to be read
• Speed up considerably the reading phase
TClonesArray* myMuons = 0;
// disable all branches
myTree->SetBranchStatus("*", 0);
// re-enable the "muon" branches
myTree->SetBranchStatus("muon*", 1);
myTree->SetBranchAddress("muon", &myMuons);
// now read (access) only the "muon" branches
myTree->GetEntry(0);

CSC11 • ROOT 117


Looking at the Tree
TTree::Print() shows the data layout

root [] TFile f("AFile.root")


root [] myTree->Print();
******************************************************************************
*Tree :myTree : A ROOT tree *
*Entries : 10 : Total = 867935 bytes File Size = 390138 *
* : : Tree compression factor = 2.72 *
******************************************************************************
*Branch :eBranch *
*Entries : 10 : BranchElement (see below) *
*............................................................................*
*Br 0 :fUniqueID : *
*Entries : 10 : Total Size= 698 bytes One basket in memory *
*Baskets : 0 : Basket Size= 64000 bytes Compression= 1.00 *
*............................................................................*

CSC11 • ROOT 118


Looking at the Tree
TTree::Scan("leaf:leaf:….") shows the values
root [] myTree->Scan("fNseg:fNtrack"); > scan.txt

root [] myTree->Scan("fEvtHdr.fDate:fNtrack:fPx:fPy","",
"colsize=13 precision=3 col=13:7::15.10");

******************************************************************************
* Row * Instance * fEvtHdr.fDate * fNtrack * fPx * fPy *
******************************************************************************
* 0 * 0 * 960312 * 594 * 2.07 * 1.459911346 *
* 0 * 1 * 960312 * 594 * 0.903 * -0.4093382061 *
* 0 * 2 * 960312 * 594 * 0.696 * 0.3913401663 *
* 0 * 3 * 960312 * 594 * -0.638 * 1.244356871 *
* 0 * 4 * 960312 * 594 * -0.556 * -0.7361358404 *
* 0 * 5 * 960312 * 594 * -1.57 * -0.3049036264 *
* 0 * 6 * 960312 * 594 * 0.0425 * -1.006743073 *
* 0 * 7 * 960312 * 594 * -0.6 * -1.895804524 *

CSC11 • ROOT 119


TTree Selection Syntax
Print the first 8 variables of the tree:
MyTree->Scan();

Prints all the variables of the tree:


MyTree->Scan("*");

Prints the values of var1, var2 and var3.


MyTree->Scan("var1:var2:var3");
A selection can be applied in the second argument:
MyTree->Scan("var1:var2:var3", "var1>0");

Prints the values of var1, var2 and var3 for the entries
where var1 is greater than 0
Use the same syntax for TTree::Draw()
CSC11 • ROOT 120
Looking at the Tree
TTree::Show(entry_number) shows values for one entry

root [] myTree->Show(0);
======> EVENT:0
eBranch = NULL
fUniqueID = 0
fBits = 50331648
[...]
fNtrack = 594
fNseg = 5964
[...]
fEvtHdr.fRun = 200
[...]
fTracks.fPx = 2.066806, 0.903484, 0.695610,-0.637773,…
fTracks.fPy = 1.459911, -0.409338, 0.391340, 1.244357,…

CSC11 • ROOT 121


TChain: the Forest
• Collection of TTrees: list of ROOT files containing the
same tree
• Same semantics as TTree
As an example, assume we have three files called
file1.root, file2.root, file3.root. Each contains tree
called "T". Create a chain:
TChain chain("T"); // argument: tree name
chain.Add("file1.root");
chain.Add("file2.root");
chain.Add("file3.root");
Now we can use the TChain like a TTree!
CSC11 • ROOT 122
TChain
T(3)
file3.root

T(2)
file2.root

T(1)
file1.root

chain files together

CSC11 • ROOT 123


Data Volume & Organisation
• A TFile typically contains 1 TTree
• A TChain is a collection of TTrees or/and TChains

100MB 1GB 10GB 100GB 1TB 10TB 100TB 1PB

1 1 5 50 500 5000 50000

TTree TChain

CSC11 • ROOT 124


Tree Friends
• Trees are designed to be read only
• Often, people want to add branches to existing
trees and write their data into it
• Using tree friends is the solution:
– Create a new file holding the new tree
– Create a new Tree holding the branches for the user
data
– Fill the tree/branches with user data
– Add this new file/tree as friend of the original tree

CSC11 • ROOT 125


Tree Friends
tree
tree_1 tree_2

n x

a b c i j
o p
k l x
q r
TFile f1("tree.root");
tree.AddFriend("tree_1", "tree1.root")
tree.AddFriend("tree_2", "tree2.root");
tree.Draw("x:a", "k<c");
tree.Draw("x:tree_2.x");
CSC11 • ROOT 126
Splitting

Split level = 0 Split level = 99


CSC11 • ROOT 127
Splitting
• Creates one branch per member – recursively
• Allows to browse objects that are stored in trees, even
without their library
• Fine grained branches allow fine-grained I/O - read
only members that are needed
• Supports STL containers too, even vector<T*>!

CSC11 • ROOT 128


Splitting
Setting the split level (default = 99)

Split level = 0 Split level = 99

tree->Branch("EvBr", &event, 64000, 0 );


CSC11 • ROOT 129
Performance Considerations
A split branch is:
• Faster to read – if you only want a subset of data
members
• Slower to write due to the large number of branches

CSC11 • ROOT 130


Summary: Trees
• TTree is one of the most powerful collections available
for HEP
• Extremely efficient for huge number of data sets with
identical layout
• Very easy to look at TTree - use TBrowser!
• Write once, read many (WORM) ideal for experiments'
data; use friends to extend
• Branches allow granular access; use splitting to create
branch for each member, even through collections

CSC11 • ROOT 131


Selectors, Analysis, PROOF

ANALYZING TREES

CSC11 • ROOT 132


Recap
TTree efficient storage and access
for huge amounts of structured data
Allows selective access of data
TTree knows its layout

Almost all HEP analyses based on TTree

CSC11 • ROOT 133


TTree Data Access
TSelector: generic "TTree based analysis"
Derive from it ("TMySelector")
ROOT invokes TSelector's functions,
Used e.g. by tree->Process(TSelector*,…), PROOF
Functions called are virtual, thus TMySelector's functions
called.

CSC11 • ROOT 134


TTree Data Access
E.g. tree->Process("MySelector.C+")

Init(tree)

SlaveBegin()

Process(i)
yes
next entry?
no
Terminate()
CSC11 • ROOT 135
TSelector
Steps of ROOT using a TSelector:
1. setup TMySelector::Init(TTree *tree)
fChain = tree; fChain->SetBranchAddress()
2. start TMySelector::SlaveBegin()
create histograms
3. run TMySelector::Process(Long64_t)
fChain->GetTree()->GetEntry(entry);
analyze data, fill histograms,…
4. end TMySelector::Terminate()
fit histograms, write them to files,…

CSC11 • ROOT 136


Analysis
TSelector gives the structure of analyses

Content of data analysis:


science by itself
covered by Ivica Puljak

CSC11 • ROOT 137


Example for a common ROOT analysis ingredient

FITTING

CSC11 • ROOT 138


Fitting
Sampling "known" distribution
Influenced by statistics
Missing data

Not monotonic!

CSC11 • ROOT 139


Fit
Combine our knowledge with statistics / data by
fitting a distribution:

1. Find appropriate
function with
parameters

1. Fit function to
distribution

CSC11 • ROOT 140


Fitting: The Math
Fitting = finding parameters such that
|f(x) – hist(x)|
minimal for all points x [or any similar measure]

Histogram with errors:


|f(x) – hist(x)| / err(x)
or similar

CSC11 • ROOT 141


Fitting: The Function
Finding the proper function involves:
• behavioral analysis:
starts at 0, goes to constant, monotonic,…
• physics interpretation:
"E proportional to sin^2(phi)"
• having a good knowledge of typical functions (see
TMath)
• finding a good compromise between generalization
("constant") and precision ("polynomial 900th degree")

CSC11 • ROOT 142


Fitting: Parameters
Let's take "erf" erf(x)/2.+0.5
Free parameters: [2]
[0]: x @ center of the slope
[1]: ½ width of the slope [0]
[2]: maximum efficiency
[1]
Define fit function:
TF1* f = new TF1("myfit",
"(TMath::Erf((x-[0])/[1])/2.+0.5)*[2]"
0., 100.);

CSC11 • ROOT 143


Fitting: Parameter Init
A must! [2]

[0]

[1]
f->SetParameter(0, 35.);
Sensible values:
f->SetParameter(1, 10.);
f->SetParameter(2, 1.);
CSC11 • ROOT 144
Fitting Result
Result of hist->Fit(f); is printed, or use

f->GetParameter(0)

[0]: 34.9
[1]: 12.1
[2]: 0.98

which means:
(TMath::Erf((x-34.9)/12.1)/2.+0.5)*0.98

Get efficiency at pT=42GeV: f->Eval(42.)


CSC11 • ROOT 145
Fitting: Recap

You now know


• why large samples are relevant
• what fitting is, how it works, when to do it, and how it's
done with ROOT.

CSC11 • ROOT 146


Bleeding Edge Physics
with
Bleeding Edge Computing

INTERACTIVE DATA ANALYSIS


WITH PROOF
Parallel Analysis: PROOF
Some numbers (from Alice experiment)
• 1.5 PB (1.5 * 1015) of raw data per year
• 360 TB of ESD+AOD* per year (20% of raw)
• One pass at 15 MB/s will take 9 months!

Parallelism is the only way out!

* ESD: Event Summary Data AOD: Analysis Object Data


CSC11 • ROOT 148
CAF Usage Statistics

CSC11 • ROOT 149


PROOF
Huge amounts of events, hundreds of CPUs
Split the job into N events / CPU!
PROOF for TSelector based analysis:
• start analysis locally ("client"),
• PROOF distributes data and code,
• lets CPUs ("workers") run the analysis,
• collects and combines (merges) data,
• shows analysis results locally

CSC11 • ROOT 150


Interactive!
• Start analysis
• Watch status while running
• Forgot to create a histogram?
– Interrupt the process
– Modify the selector
– Re-start the analysis
• More dynamic than a batch system

CSC11 • ROOT 151


PROOF
Storage

Client commands, PROOF farm


scripts

list of output
objects MASTER
(histograms, …)

Workers
CSC11 • ROOT 152
Scheduling
• Decides where to run which (part of) the jobs
• E.g. simple batch system
• Can autonomously split jobs into parts (“packets”)
• Involves
– resource management (CPU, I/O, memory)
– data locality
– priorities (jobs / users)
– and whatever other criteria are deemed relevant
• Often optimizing jobs’ distribution towards overall
goal: maximum CPU utilization (Grid), minimum time
to result (PROOF)
CSC11 • ROOT 153
Packetizer Role and Goals
• Distributes units of work (“packets”) to workers
• Grid’s packet: >=1 file
• Result arrives when last resource has processed last
file:
t = tinit + maxjobs(Ri . Nifiles) + tfinal
tinit, tfinal: time to initialize / finalize the jobs
Ri: processing rate of job i
Nifiles: number of files for job i
• Result:
– slowest job defines running time
– large tail in CPU utilization versus time
CSC11 • ROOT 154
Static…
• Example: 24 files on 4 worker nodes, one under-
performing

The slowest worker node sets


the processing time

CSC11 • ROOT 155


PROOF’s Dynamic Packetizer
• PROOF packetizer’s goal: results as early as possible
• All workers should finish at the same time:
t = tinit + maxjobs(Ri . Nifiles) + tfinal
ideally, Ri . Nifiles equal for all jobs
• Cannot reliably predict performance (Ri) of workers
– job interaction, e.g. number of jobs accessing the same disk
– CPU versus I/O duty of jobs
• Instead: update prediction based on real-time past
performance while running
• Pull architecture: workers ask for new packets
CSC11 • ROOT 156
Dynamic Packet Distribution

The slowest worker node gets


less work to do: the processing
time is less affected by its
under performance

CSC11 • ROOT 157


PROOF Packetizer Live

CSC11 • ROOT 158


Creating a session
To create a PROOF session from the ROOT
prompt, just type:

TProof::Open("master")

where "master" is the hostname of the master


machine on the PROOF cluster

CSC11 • ROOT 159


PROOF Lite

Client commands,
scripts

list of output
objects
(histograms, …)

Multi-core Desktop/Laptop

CSC11 • ROOT 160


What is PROOF Lite?
• PROOF optimized for single many-core machines
• Zero configuration setup
– No config files and no daemons
• Like PROOF it can exploit fast disks, SSD’s, lots of
RAM, fast networks and fast CPU’s
• If your code works on PROOF, then it works on PROOF
Lite and vice versa

CSC11 • ROOT 161


Creating a session
To create a PROOF Lite session from the
ROOT prompt, just type:

TProof::Open("")

Then you can use your multicore computer as


a PROOF cluster!

CSC11 • ROOT 162


PROOF Analysis
• Example of local TChain analysis
// Create a chain of trees
root[0] TChain *c = new TChain("myTree");
root[1] c->Add("http://www.any.where/file1.root");
root[2] c->Add("http://www.any.where/file2.root");
PROOF

// MySelector is a TSelector
root[3] c->Process("MySelector.C+");

CSC11 • ROOT 163


PROOF Analysis
• Same example with PROOF
// Create a chain of trees
root[0] TChain *c = new TChain("myTree");
root[1] c->Add("http://www.any.where/file1.root");
root[2] c->Add("http://www.any.where/file2.root");

// Start PROOF and tell the chain to use it


root[3] TProof::Open("");
root[4] c->SetProof();

// Process goes via PROOF


root[5] c->Process("MySelector.C+");

CSC11 • ROOT 164


TSelector & PROOF
• Begin() called on the client only
• SlaveBegin() called on each worker: create histograms
• SlaveTerminate() rarely used; post
processing of partial results before they are sent to
master and merged
• Terminate() runs on the client: save results, display
histograms, …

CSC11 • ROOT 165


PROOF Analysis
Selector (worker) output list

SlaveBegin() Process() SlaveTerminate()


•Create histos, … •Post-processing
OK
•Define output list preselection analysis

Selector (client) event


branch
n

Begin() Terminate()
leaf leaf branch branch •Final analysis
(fitting, saving …)
branch leaf leaf leaf

leaf leaf

Chain 1 2 n last

loop over events

CSC11 • ROOT 166


Output List (result of the query)
• Each worker has a partial output list
• Objects have to be added to the list in
TSelector::SlaveBegin() e.g.:
fHist = new TH1F("h1", "h1", 100, -3., 3.);
fOutput->Add(fHist);
• At the end of processing the output list gets sent to the
master
• The Master merges objects and returns them to the
client. Merging is e.g. "Add()" for histograms,
appending for lists and trees

CSC11 • ROOT 167


Example
void MySelector::SlaveBegin(TTree *tree) {
// create histogram and add it to the output list
fHist = new TH1F("MyHist","MyHist",40,0.13,0.17);
GetOutputList()->Add(fHist);
}

Bool_t MySelector::Process(Long64_t entry) {


my_branch->GetEntry(entry); // read branch
fHist->Fill(my_data); // fill the histogram
return kTRUE;
}

void MySelector::Terminate() {
fHist->Draw(); // display histogram
}

CSC11 • ROOT 168


Results

At the end of Process(), the output list is accessible via


gProof->GetOutputList()

// Get the output list


root[0] TList *output = gProof->GetOutputList();
// Retrieve 2D histogram "h2"
root[1] TH2F *h2 = (TH2F*)output->FindObject("h2");
// Display the histogram
root[2] h2->Draw();

CSC11 • ROOT 169


PROOF GUI Session
Starting a PROOF GUI session is trivial:

TProof::Open()

Opens GUI:

CSC11 • ROOT 170


PROOF GUI Session – Results
Results accessible via TSessionViewer, too:

CSC11 • ROOT 171


PROOF Documentation
Documentation available online at
http://root.cern.ch/drupal/content/proof
But of course you need a little cluster of CPUs

Like your multi-core


game console!

CSC11 • ROOT 172


Summary
You've learned:
• analyzing a TTree can be easy and efficient
• integral part of physics is counting
• ROOT provides histogramming and fitting
• > 1 CPU: use PROOF!

Looking forward to hearing from you:


• as a user (help! bug! suggestion!)
• and as a developer!

CSC11 • ROOT 173

You might also like