0% found this document useful (0 votes)
37 views

Introduction To VSAM Files

This document provides an overview of VSAM files and how they are structured and organized. It explains that VSAM files offer faster searching and retrieval of data compared to sequential files by organizing records in key-sequenced order and using an index. The document also describes the different types of VSAM files and how records are grouped and stored in control intervals within a KSAM file.

Uploaded by

jac012100
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Introduction To VSAM Files

This document provides an overview of VSAM files and how they are structured and organized. It explains that VSAM files offer faster searching and retrieval of data compared to sequential files by organizing records in key-sequenced order and using an index. The document also describes the different types of VSAM files and how records are grouped and stored in control intervals within a KSAM file.

Uploaded by

jac012100
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 8

Introduction to VSAM Files

Q. What is VSAM? Never heard of it…!


VSAM stands for Virtual Storage Access Method. In Windows, user's data is stored in
files. In a text file, on Windows or Linux, the data consists of lines/records one
after the other. Such files are called Sequential Datasets (Files). VSAM is a new,
improved way of storing Data. VSAM overcomes some of the limitations of conventiona
file systems like Sequential Files.
Q. Hold on for a sec... What does a Sequential File look like?
In the early days of computers, all the data was stored in Sequential Files(Physica
Sequential Dataset). Data was stored in the form of records, one after the other.
Suppose, we wanted to store the information about all the employees in our
organization. Below, you’d find a find picture of a how a Sequential File/PS Datase
looks like :

As you can see, each record represents the data of a single, individual employee.
This way, there would be thousands of records that make EMPDAT Sequential File.
Q. So, its pretty cool the way a Sequential File stores data. But how to get it
back? How to search for a particular employee?
Well, that’s the tough part. Coz sequential datasets work more or less similar to a
Cassette Tape. Yup.. an audio cassette tape. The songs recorded on the cassette tap
are analogous to records in a Sequential File. If you want to play a particular
song, you have to start from the beginning of the tape, travel through the entire
tape, till you reach the desired song. You can’t directly jump to a song and play
it. You have to read through the tape, and forward scan through it, till you reach
the desired place.

On the same lines, when you want to search for a particular record say Employee no.
04, you have to travel through the entire the list of records, one by one, till you
reach the desired record. The longer is the Sequential File, the longer is would
take to access the record. You just don’t know, where the record lies hidden in suc
huge list or sequential file. The records are scattered and distributed hap-hazardl
in the file. So, searching or getting data records, i.e. retrieval of data in a
sequential file takes a very long time.
Q. I get it.. as far as Searching goes – Sequential Files are not very efficient.
What’s VSAM got to offer?
You can use a more structured and organised way of storing this data called VSAM.
Though the abbreviation is a little geeky, VSAM files are superior in comparison to
ordinary sequential files. Searching and retrieving data from a VSAM File is very
fast.

Apart from this, there are many other advantages that VSAM has to offer :
- Free space in a VSAM File is not wasted, it is reclaimed automatically.
- VSAM Files are device and O/S independent, this means if you stored data in VSAM
on MVS O/S on Mainframes, you can port the file, and read it from Windows O/S on an
Intel Machine, without impacting the data contents.
Q. What are the types of VSAM files?
VSAM files are of 3 types -
1) Entry Sequenced file(ESDS)
2) Key Sequenced files(KSDS)
3) Relative Record file(RRDS)

A VSAM file is also called a Cluster. Hence, the names ESDS Cluster, KSDS Cluster
and RRDS cluster, are used interchangeably with ESDS File, KSDS File, RRDS File.
Q. What’s a Key Sequenced File(KSDS)? Can you explain in brief?
- Concept of Key : In a KSDS file, every record is identified by a unique
identification key. Every single, individual employee will have a distinct and
unique key value. This key could be his Employee Identification No, since it is
unique for each employee. No two employees can have the same key value.

- How data is stored in a KSDS File : When you first create a KSDS file, it is
initially empty. You must fill data into the KSDS file. Thus, you need to
populate(Load) the KSDS file with real data. Generally we do a sequential load,
which means the data must be supplied in increasing(ascending) order of the key.
This is because, a KSDS file stores all the data records in increasing(ascending)
order of the key.

- KSDS File Structure : A KSDS file contains two parts :


1) Data Part – That stores the file records(actual data)
2) Index Part – Keeps track of the location of the records in the data part.

Given below is a rough sketch which will give you a big picture of what a KSDS File
looks like. Of course, the details are explained at length further ahead.
- For Dummies - Concept of Memory Address :
A KSDS file has 2 parts – Index Component and Data Component. The Data Component
contains the Data records. Every record is stored in 1 Storage or memory Location.
Every memory location houses 1 record. Just like, the houses on a street in which
people live, in Mainframe memory, in each house/cell/storage location lives 1
record. Houses on a street have a residential address by which they can be easily
reached. If you know the house address, you can access the house. The same way, our
houses/storage locations in the memory have unique addresses, by which they can be
accessed. If you knew the location/address of a memory location you can easily
access the record stored there(in much less time).

- For Dummies – Comparing a Book’s Index with a KSDS Index ; How search performance
improves with the help of Index Component :
Imagine, if you didn’t have an index in a book, and you wanted to find a keyword.
You would have to read through the entire length of the book, page by page, till yo
come across the word you’ve been looking for. The Index simplifies this activity.
Basically, a book index has two columns, one the keyword, and other the page
no./location in the text where this keyword is located. Every page has a page-
number. Let’ say you want to search the term Mainframe Computers. You look up this
keyword in the Index. This is easy, because the index is sorted in Alphabetical
order of the key-value. You jump to the section –'M'. Look up this term, in the
index points to Page No. 373. You jump straight to page 373 and start reading about
Mainframe Computers.

Just as every page has a page no., every record in a KSDS Data file has an address.
The KSDS Index file has an entry for every key-value(key-field). For example,
employees 1, 2 and 3 each would have an entry in the index. The index also stores
the memory address(offset) of this Employee record in the KSDS Data file.

Like a book index is sorted alphabetically on the keyword, the KSDS index file is
sorted in increasing order of the key-field. Let's assume, Employee ID as key-field
So, how does it work? Let's say you wanted to find the name of Employee No. 0004.
Simple, you look up the the row of Employee, with Key-value=0004 in the KSDS Index
file. This is easy, because, the index is already sorted on the Key field =>
Employee ID. Now, you find the address of the Storage Location(House) in the KSDS
Data file, where Employee ID 04 lives. This is location no. 600.
Since you know the address, you can now directly jump and fly to address 600, and
access the name of the Employee. This is far quicker than you thought.

The gist of this concept is, KSDS Index file stores key-values, and pointers(memory
address)set to the corresponding records in the Data file. This way, Searching is
faster and easier.

The process of building an Index on a key-field for Data Records is called


Indexing(or simply building an INDEX).

Let me caution you, that the diagram above is a very crude or preliminary picture o
the KSDS Index file. Don’t go by it. In reality, the KSDS Index file has an
inverted-tree structure. In Computer Science, we call such a tree, a B+ Tree. If yo
are curious to know, what’s a B+ Tree, and how the KSDS Index file really looks,
read on. If you feel, you’ve absorbed a lot, you can call it a day!
Q. How records are organized in KSDS Data file?
A KSDS file stores logical records of a file in fixed length blocks called Control
Intervals(CI). In a KSDS Data file, a Control Interval holds several logical
records. The logical records within each control-interval are always kept sorted by
key-field.

A KSDS File could have thousands of Control Intervals. In a Control Interval,


records can be of any size or length. We do not distinguish in particular between
fixed-length and variable-length records. However, as a rule, all Control Intervals
in KSDS file are exactly equal in size(length).

When a new KSDS file is created, you must specify the size of the Control Intervals
in the file. By default, the Control Intervals(CI) in a KSDS File assume a size=
4k(4096) bytes. However, the size of Control Intervals in KSDS Files can lie in the
range of 512 bytes <= Control Intervals Size <= 32k

When you create a new KSDS File, the control intervals in it are empty. As you load
data into the KSDS file, the Control Intervals are populated with information.

What follows from hereon, shall give you a picture of how Control Intervals look
like in Memory.

Control Interval (Very idealistic – Simplified)


Assume that, Control Intervals are 4096 bytes long. A logical record(Employee
record) spans 1024 bytes. Then,

No. of records per CI = 4096/1024 = 4 records/CI

Thus, in this example, the Control Interval is completely full(no room for new
records).
Control Interval often contains some empty/free space(Close to real model) :
Assume that, Control Intervals are 4096 bytes long. The first logical record = 1000
bytes, the second logical record = 1500 bytes, the third logical record= 1,300
bytes.

Logical Record 1 + Logical Record 2 + Logical Record 3


= 1000 + 500 + 1300 = 2800 bytes.

Thus, the remaining space = 4096 – 2800 = 1,296 bytes is left free. This free-space
can be used to accommodate a new logical record. Thus, Control Intervals may also
have free-space.

New logical records can be added to a Control Interval, by using the free-space in
the Control Interval(CI).

VSAM Control Interval

Control Interval showing addition of record with key 30

Let's look at the recipe followed by VSAM, to add a new logical record to a KSDS
File.

1. VSAM goes through a full-index search to locate the Control Interval(CI) in the
KSDS Data file, in which the new record must be placed.
(This search is exactly the same as that used to randomly retrieve a record).
2. After the index search locates the Control Interval(CI), that Control
Interval(CI) is loaded into memory. VSAM then searches through the logical records
in the Control Interval to determine, where the new record should go.(Recall, that
KSDS file stores all data records in increasing order of the key).
3. The new record is then inserted into the Control Interval(CI), in key sequence,
re-arranging the other records, as necessary.
4. The updated Control Interval(CI) is now written back to its original location on
the Disk.

Control Interval also contains extra Information(Real Model) :


VSAM treats all the logical records, as if they were variable-length(even if, they
are fixed-length). VSAM keeps track of the length of Logical records in a Control
Interval, by using special Record-definition Field(RDF), at the end of each Control
Interval. This special field that holds the length information for each logical
record is 3 bytes long.

Moreover, VSAM also keeps track of the amount of the free-space and its location,
within a Control Interval. This meta-information is stored in a special Control
Interval-definition Field(CIDF), at the end of each Control Interval. This special
field that holds [amount,location] of the free-space for a Control Interval is 4
bytes long.
Control Area(CA) :
A Control Area(CA) is a group of related Control Intervals.

KSDS Files are organised as Control Areas(CA) which in turn contain hundred’s of
fixed-length Control Intervals(CI) filled with logical records, free-space and
Control information.
Q. Can you show me a picture or visual of how KSDS Data file looks like?
A KSDS Data file is – a collection of control intervals and control areas. A CI
normally holds several logical records. At the end of each CI, control information
is stored. Between the logical records, and the control information, there’s free-
space, where new records can be added.

Q. What does a KSDS Index file look like?


The KSDS Index file is organised in two parts – Index Set and Sequence Set. Lowest
level of index entries is called the Sequence set. There is one sequence set record
for each control area, in the KSDS Data file. The sequence set record for each
control area, contains an entry for each control interval in that control area. The
entry for a control interval stores (i) the highest key of the logical records in
that CI (ii) the physical disk address(pointer to) of that CI.

The CI entries within a sequence set record, are kept in increasing(ascending) orde
of the key. This facilitates control-intervals within a control-area to be retrieve
in key-sequence, during sequential processing, irrespective of whether the actual
CI’s are in key sequence within the CA.

As I just said, the sequence set record for a CA, contains an entry for each contro
interval in that control area. In order to facilitate random processing, each CI
entry has a (i) highest key of the CI (ii) vertical pointer to the Control Interval
The vertical pointer can be followed to retrieve any or all the records within that
CI.

In addition to vertical pointers to each CI, each sequence set record also contains
a horizontal pointer, to the next sequence set record in key sequence. The
horizontal pointers are followed during sequential processing. After all the record
in a control area have been read, the horizontal pointer is followed to move to the
next sequence set record which points to the successive control area.

The Index set is organised as a tree or hierarchical structure. There is one and
only one index set record at the top of the tree(that is at the root). Index
searching during random processing begins at this root index set record.

The root and all the other index set records consist of several entries. Each entry
consists of the highest key of the next lower-level index set record, and a pointer
to said index set record. The individual entries within an index set record are kep
in key sequence.

During Random processing, the logical record that you want to access, must be first
looked up in the Index. This process proceeds as follows :
1. The root index set record is input, and the first entry greater than or equal to
the key of the desired record is located. Associated with this key value, is
downward pointer to next lower-level index set record.
2. The next lower level index set record is input, and the first entry >= key of
desired record is located. Associated with this key value, is a downward pointer to
the next lower-level index set record.
3. This process continues, until you reach a sequence set record. At this point, th
first CI entry >= key of the desired record is located. Associated with this key
value, is a downward pointer to the control-interval.
4. The indicated control-interval(CI) is input, and is searched for desired logical
record. If the record is not in this CI, it is not in the file(and the COBOL progra
is notified of record-not-found condition).

You might also like:

You might also like