0% found this document useful (0 votes)
2 views

PL Data Types (1)

Uploaded by

kudzaicpemhiwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

PL Data Types (1)

Uploaded by

kudzaicpemhiwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Programming Languages

University of Zimbabwe

 Facilitator: G. Mhlanga
 +263775190786
 29/10/2024
Programming
Languages
Data Types
 Most programming languages require the
programmer to declare the data type of every data
object, and most database systems require the user
to specify the type of each data field.
 Available data types vary from one
programming language to another, and from
one database application to another, but the
following usually exist in one form or another:
 integer: In more common parlance, whole number; a number that
has no fractional part.
 floating-point: A number with a decimal point. For example, 3 is
an integer, but 3.5 is a floating-point number.
 character(text ): Readable text
Purpose of Types in a Programming Language
 Types provide implicit context for many
operations, so that the programmer
does not have to specify that type
explicitly
 For example in C, the expression a + b
 Uses integer addition if a and b are of
integer type
 Uses floating pointing addition if a and b
are of double type
Type Systems
A type system consists of:
 A mechanism to define types and associate
them with certain language constructs
 A set of rules for type equivalence, type
compatibility and type inference
 Typeequivalence rules determine when the types
of two values are the same
 Typecompatibility rules determine when a value of
a given type can be used in a given context
 Type information allows the language to limit the
set of acceptable values to those that provide a
particular subroutine interface
Type Checking
 The process of ensuring that a program obeys the language’s typical
compatibility rules
 A violation of the rules is known as a type clash
 A language is said to be strongly-typed if it prohibits in a way that the
language can enforce, the application of any operation to any object
that is not intended to support that operation
 A language is said to be statically typed if it is strongly typed and
type checking can be performed at compile time
 Ex: Ada is strongly type and for the most part statically typed
 Ex: Pascal can do most of its type checking at compile time,
though the language not quite strongly type: untagged variant
records are its only loophole
 Polymorphism allows a single body of code to work with objects of
multiple types. It may or may not imply the need for run-time type
checking
 Because the types of objects can be thought of as implied
(unspecified) parameters, dynamic typing is said to support
implicit parametric polymorphism
The Meaning of “Type”
 Type can be thought of in 3 points of view:
Denotational Point of View
 A type is simply a set of values
Constructive Point of View
 A type is either one of a small collection of built-in types (integer,
character, Boolean, real, etc; also called primitive or predefined
types), or a composite type created by applying a constructor (record,
set, array, etc) to one or more simpler types
Abstraction-based Point of View
 A type is an interface consisting of a set of operations with well-
defined and mutually consistent semantics

For most programmers, types usually reflect a mixture of these


viewpoints. In denotational semantics i.e one of the leading ways to
formalize the meaning of programs, a set of values is known as a
domain.
Classification of Types
 Most languages provide built in types similar to
those supported in hardware by most processors:
integers, characters, Boolean, and real (floating
point) numbers.
 Booleans are typically implemented as single byte
quantities with 1 representing true and 0
representing false.
 Characters have traditionally been implemented as
one byte quantities as well, typically using the ASCII
encoding.
 More recent languages use a two byte
representation designed to accommodate the
Unicode character set.
Classification of Types
Numeric Types
 C and Fortran distinguish between different lengths of integers
and real numbers.
 Differences in precision across language implementations lead
to a lack of portability: programs that run correctly on one
system may produce run-time errors or erroneous results on
another.
 A few languages, including C,C++,C# and Modula-2,provide
both signed and unsigned integers.
 Fortran,C99 and Common Lisp provide a built in complex type,
usually implemented as a pair of floating point numbers that
represent the real and imaginary Cartesian coordinates.
 Ada supports fixed point types, which are represented
internally by integers. Integers, Booleans, characters are
examples of discrete types.
 Discrete, rational, real, and complex types together constitute
Classification of Types
Enumeration Types
 Enumeration Types Enumerations were introduced by Wirth in
the design of Pascal. They facilitate the creation of readable
programs, and allow the compiler to catch certain kinds of
programming errors. An enumeration type consists of a set of
named elements. In Pascal one can write: Type weekday=(sun,
mon, tue), ordered, so comparisons are generally
valid(mon<tues)
 There is usually a mechanism to determine the predecessor or
successor of an enumeration value(in Pascal, tomorrow :=succ
(today). Values of an enumeration type are typically
represented by small integers, usually a consecutive range of
small integers starting at zero. In many languages these
ordinal values are semantically significant.
Classification of Types
Subrange Types
 Like enumerations, subranges were first introduced in Pascal.
A subrange is a type whose values compose a contiguous
subset of the values of some discrete base type.
 In Pascal subranges look like this: Type test_score=0..100;
Workday= mon..fri;
 In Ada one would write Type test_score is new integer range
0..100; Subtype workday is weekday range mon..fri;
 The range… portion of the definition in Ada is called a type
constraint.
Classification of Types
Composite Types
 Nonscalar types are usually called composite, or constructed types. They are
generally created by applying a type constructor to one or more simpler types.
 Common composite types include records, variant records, arrays, sets,
pointers, lists, and files.
 Records- A record consists of collection of fields, each of which belongs to a
simpler type.
 Variant records-It differs from normal records in that only one of a variant
records field is valid at any given time.
 Arrays-Are the most commonly used composite types. An array can be thought
of as a function that maps members of an index type to members of a
component type.
 Sets- A set type is the mathematical powerset of its base type, which must
often be discrete.
 Pointers-A pointer value is a reference to an object of the pointers base type.
They are most often used to implement recursive data types
 Lists-Contain a sequence of elements, but there is no notion of mapping or
indexing.
 A list is defined recursively as either an empty list or a pair consisting of a
head element and a reference to a sublist.
Type Conversions and Casts
 In a language with static typing, there are many contexts in
which values of a specific type are expected.
In the statement a := expression
we expect the right-hand side to have the same type as a.
In the expression a + b
The overloaded + symbol designates either integer or
floating-point addition.
We expect either that a and b will both be integers, or that they
will both be reals.
In a call to a subroutine, foo(arg1, arg2, . . . , argN)
 We expect the types of the arguments to match those of the
formal parameters.
 If the programmer wishes to use a value of one type in a context
that expects another, he or she will need to specify an explicit
type conversion (type cast ).
Type Conversions and Casts
There are three principal cases:
1. The types would be considered structurally equivalent, but the
language uses name equivalence.
2. The types have different sets of values, but the intersecting
values are represented in the same way .
3. The types have different low-level representations, but we can
define some sort of correspondence among their values.
Nonconverting Type Casts
In systems programs, one needs to change the type of a value
without changing the underlying implementation.
To interpret the bits of a value of one type as if they were another
type.
● A change of type that does not alter the underlying bits is
called a nonconverting type cast, or sometimes a type pun.
● Cast is the term used for conversions in languages like C
Type Compatibility
Most languages do not require equivalence of types in every
context.
A value’s type must be compatible with that of the context in
which it appears.
In an assignment statement, the type of the right hand side must
be compatible with that of the left-hand side.
In a subroutine call, the types of any arguments passed into the
subroutine must be compatible with the types of the
corresponding formal parameters.
● The definition of type compatibility varies greatly from
language to language.
● An Ada type S is compatible with an expected type T if and
only if:
1. S and T are equivalent,
2. one is a subtype of the other, or
3. both are arrays, with the same numbers and types of
elements in each dimension
Type Compatibility
Coercion
Whenever a language allows a value of one type to be used in a context that
expects another, the language implementation must perform an automatic,
implicit conversion to the expected type.
This conversion is called a type coercion.
● A coercion may require run-time code to perform a dynamic semantic check, or
to convert between low-level representations.
● Ada coercions need the former, never the latter:
d : weekday;
k : workday;
type calendar_column is new weekday;
c : calendar_column; ... k := d; -- run-time check required
d := k; -- no check required; every workday is a weekday
c := d; -- static semantic error;
-- weekdays and calendar_columns are not compatible
To perform this third assignment in Ada we would have to use an explicit
conversion: c := calendar_column(d);
Fortran 90 allows arrays and records to be intermixed if their types have the same
Type Compatibility
Coercion
Two arrays have the same shape if they have the
same number of dimensions, each dimension has the
same size.
Field names do not matter, nor do the actual high and
low bounds of array dimensions.
● C allow arrays and pointers to be intermixed in
many cases.
● C++ provides an extremely rich, programmer-
extensible set of coercion rules.
The programmer can define coercion operations to
convert values of the new type to and from existing
types.
Type Compatibility
Overloading and Coercion
An overloaded name can refer to more than one
object; the ambiguity must be resolved by context.
Consider the addition of numeric quantities.
In the expression a + b, + may refer to either the
integer or the floating-point addition operation.
In a language without coercion, a and b must either
both be integer or both be real.
● Ada formalizes the notion of “constant type” for
numeric quantities: an integer constant is said to have
type universal_integer.
● A floating-point constant is said to have type
universal_real.
Universal Reference Types
To facilitate the writing of general-purpose container
(collection) objects that hold references to other
objects, several languages provide a universal
reference type.
● In C and C++, this type is called void *.
● In Clu it is called any; inModula-2, address; inModula-3,
refany; in Java, Object; in C#, object.
● Arbitrary l-values can be assigned into an object of
universal reference type, with no concern about type safety.
● Here we need to include in the representation of each
object a tag that indicates its type.
● This approach is common in object-oriented languages, which
generally need it for dynamic method binding.
Type Inference: Type checking ensures that the components of
an expression have appropriate types.
The result of an arithmetic operator usually has the same type as
the operands. The result of a function call has the type declared
Records(Structures) and Variants(Unions):
Record types allow related data of heterogeneous
types to be stored and manipulated together.
Some languages like Algol 68, C,C++,Common Lisp
use the term structure instead of record.
Fortran 90 simply calls its records “types”.
Structures in C++ are defined as a special form of
class.
C# uses a reference model for variables of class
types, and a value model for variables of struct
types.
Records(Structures) and Variants(Unions):
Syntax and Operations:-
In c a simple record might be defined as follows.
struct element {
char name[2];
int atomic_number;
double atomic_weight; _
Bool metallic;
};
Arrays
 Arrays are the most common and important
composite data types. Arrays are usually
homogeneous.
 They can be thought of as a mapping from an index
type to a component or element type.
 Some languages allow non-discrete index types.
 The resulting associative arrays must generally be
implemented with hash tables or search trees.
 Associative arrays also resemble the dictionary or
map types supported by the standard libraries of
many object-oriented languages.
Arrays
Syntax and Operations
 Most languages refer to an element of an array by
appending a subscript delimited by parentheses or
square brackets-to the name of the array.
 In Fortran and Ada ,one says A(3);
 in Pascal and C, one says A[3].
 Since parentheses are generally used to delimit the
arguments to a subroutine call, square bracket
subscript notation has the advantage of
distinguishing between the two.
Arrays
Declarations
One declares an array by appending subscript notation to the syntax
that would be used to declare a scalar.
In C: Char upper[26];
In Fortran: character, dimension (1:26)::upper
character (26) upper //shorthand notation
In C the lower bound of an index range is always zero; the indices of an
n-element array are 0……n-1.
In Fortran the lower bound of the index range is one by default.
Most languages make it easy to declare multidimensional arrays:
mat : array (1..10, 1..10) of real; -- Ada
real, dimension (10,10) :: mat ! Fortran
In Ada, mat1 : array (1..10, 1..10) of real;
String Representations in Programming Languages
 In many languages, a string is simply an array of characters.
 In other languages, strings have special status, with
operations that are not available for arrays of other sorts.
 It is easier to provide special features for strings than for
arrays in general because strings are one-dimensional.
 Manipulation of variable-length strings is fundamental to a
huge number of computer applications.
 Powerful string facilities are found in various scripting
languages such as Perl, Python and Ruby.
 Lisp, Icon, Java, C# allow the length of a string-valued variable
to change over its lifetime, requiring that space be allocated
by a block or chain of blocks in the heap.
 Many languages, including C and its descendants, distinguish
between literal characters and literal strings. Other languages
(e.g., Pascal) make no distinction: a character is just a string of
length one
String Representations in Programming Languages
 Most languages also provide escape sequences that allow
nonprinting characters and quote marks to appear inside of
strings.
 An arbitrary character can be represented by a backslash
followed by
(a) 1 to 3 octal (base-8) digits,
(b) an x and one or more hexadecimal (base-16) digits,
(c) a u and exactly four hexadecimal digits, or
(d) a U and exactly eight hexadecimal digits.
 The variable is to be implemented as a contiguous array of
characters in the current stack frame.
● Pascal and Ada support a few string operations, including
assignment and comparison for lexicographic ordering.
● Given the declaration char *s, the statement s = "abc" makes s
point to the constant "abc" in static storage.
 A string variable is a reference to a string.
 Assigning a new value to such a variable makes it refer to a different
String Representations in Programming Languages
Sets
A set is an unordered collection of an arbitrary number of distinct values
of a common type.
The type from which elements of a set are drawn is known as the base
or universe type.
Introduced by Pascal.
Pascal supports sets of any discrete type, and provides union,
intersection, and difference operations:
var A,B,C :set of char;
D,E : set of weekday; ……….
A := B + C; A := B * C; A := B - C;
Many ways to implement sets, including arrays, hash tables, and various
forms of trees
The most common implementation employs a bit vector whose length
(in bits) is the number of distinct values of the base type.
Operations on bit-vector sets can make use of fast logical instructions on
most machines. There are many ways to implement sets, including
arrays, hash tables, and various forms of trees.
Reference Types
A recursive type is one whose objects may contain
one or more references to other objects of the type.
● Recursive types are used to build a wide variety
of “linked” data structures, including lists and trees.
● In languages that use a reference model of
variables, it is easy for a record of type foo to
include a reference to another record of type foo:
every is a reference anyway.
● Recursive types require the notion of a pointer: a
variable (or field) whose value is a reference to
some object.
● Automatic storage reclamation (garbage
collection) dramatically simplifies the programmer’s
task, but imposes certain run-time costs.
Reference Types
Syntax and Operations
 Operations on pointers include allocation and
deallocation of objects in the heap, dereferencing of
pointers to access the objects to which they point,
and assignment of one pointer into another.
● In C, Pascal, or Ada which employ a value model,
the assignment
A: = B puts the value of B into A.
● If we want B to refer to an object and we want A:
= B to make A refer to the object to which B refers,
then A and B must be pointers.
● The assignment A := B in Java places the value of
B into A if A and B are of built-in type; it makes A
refer to the object to which B refers if A and B are of
user-defined type.
Reference Types
Reference Model
 In Lisp, which uses a reference model of variables
but is not statically typed, tree could be specified
textually as (# \ R (# \X ( ) ( ) ) ( # \ Y (# \ Z ( ) ( ) )
(# \ W ( ) ( ) ))).

[Implementation of a tree in Lisp, A diagonal slash


through a box indicates a null pointer. The C and A
tags serve to distinguish the two kinds of memory
blocks: cons cells and blocks containing atoms ].
Reference Types
Reference Model
● When writing in a functional style, one often finds a
need for types that are mutually recursive.
● In a compiler, for example, it is likely that symbol
table records and syntax tree nodes will need to refer
to each other.
● A syntax tree node that represents a subroutine call
will need to refer to the symbol table record that
represents the subroutine.
● The symbol table record, will need to refer to the
syntax tree node at the root of the subtree that
represents the subroutine’s code.
Reference Types
Value Model
In Pascal tree data types would be declared as follows:
type chr_tree_ptr = ^chr_tree;
chr_tree = record
left,right : chr_tree_ptr;
val : char
end;
In C:
struct chr_tree
{
struct chr_tree * left, *right;
char val;
};
Reference Types

Fig: Typical implementation of a tree in a language


with explicit pointers.
As in a diagonal slash through a box indicates a null
pointer. In Ada:
my_ptr := new chr_tree;
In C:
my_ptr = malloc(sizeof(struct chr_tree));

C’s malloc is defined as a library function, not a built-in part of


the language.
Reference Types
The programmer must specify the size of the allocated
object explicitly, and while the return value (of type
void*) can be assigned into any pointer, the
assignment is not type-safe.
_ C++, Java, and C# replace malloc with a built-in,
type-safe new:
my_ptr = new chr_tree( arg list );
● The C++/Java/C# new will automatically call any
user-specified constructor (initialization) function,
passing the specified argument list.
● To access the object referred to by a pointer, most
languages use an explicit dereferencing operator.
● In Pascal and Modula this operator takes the form of
a postfix “up-arrow”: my_ptrˆ.val := 'X';
Reference Types
Pointers and Arrays in C
Pointers and arrays are closely linked in C.
Consider the following declarations:
int n;
int *a; /* pointer to integer */
int b[10]; /* array of 10 integers */
Now all of the following are valid:
1. a = b; /* make a point to the initial element of b */
2. n = a[3];
3. n = *(a+3); /* equivalent to previous line */
● In most contexts, an unsubscripted array name in C
is automatically converted to a pointer to the array’s
first element as shown here in line 1.
Reference Types
● Lines 3 illustrate pointer arithmetic: Given a pointer to an
element of an array, the addition of an integer k produces a
pointer to the element k positions later in the array.
● C allows pointers to be subtracted from one another or
compared for ordering, provided that they refer to elements of
the same array.
A declaration must allow the compiler to determine the size of
the elements of an array or, equivalently, the size of the objects
referred to by a pointer.
Neither int a[ ][ ] nor int (*a)[ ] is a valid variable or parameter
declaration: neither provides the compiler with the size
information it needs to generate code for a + i or a[i].
● The built-in sizeof operator returns the size in bytes of an
object or type.
● When given a pointer as argument it returns the size of the
pointer itself.
Garbage Collection
 Explicit reclamation of heap objects is a serious burden on the
programmer and a major source of bugs.
 The code required to keep track of object lifetimes makes programs
more difficult to design, implement and maintain.
 Automatic garbage collection has become popular for imperative
languages as well.
 It tends to be slower than manual reclamation.
 Reference counts: The simplest garbage collection technique simply
places a counter in each object that keeps track of the number of
pointers that refer to the object.
When the object is created, this reference count is set to 1.
● When one pointer is assigned into another, the run time system
decrements the reference count of the object formerly referred to by the
assignments left hand side.
● It increments the count of the object referred to by the right hand
side.
● When a reference count reaches zero, its object can be reclaimed.
● To prevent the collector from following garbage addresses, each
Garbage Collection
 Type descriptors are simply a table that lists the offsets within the
type at which pointers can be found, together with the addresses of
descriptors for the types of the objects referred to by those pointers.
 For a tagged variant record type, the descriptor is a bit more
complicated. ● It must contain a list of values (or ranges) for the
tag, together with a table for the corresponding variant.
 For untagged variant records, reference counts work only if the
language is strongly typed.
 Tracing Collection: A better definition of a “useful” object is one
that can be reached by following a chain of valid pointers starting
from something that has a name.
 The blocks in the bottom half of are useless, even though their
reference counts are nonzero.
 Tracing collectors work by recursively exploring the heap, starting
from external pointers, to determine what is useful.
Lists
 A list is defined recursively as either the empty list or a pair
consisting of an object and another list.
 Lists are ideally suited to programming in functional and logic
languages, which do most of their work via recursion and
higher order functions.
 In Lisp, a program is a list, and can extended itself at run time
by constructing a list and executing it.
 Lists can also be used in imperative programs.
 Clu provides a built-in type constructor for lists, and a list class
is easy to write in most object-oriented languages.
Lists
 Lists in ML and Lisp
 Lists in ML are homogeneous: every element of the list must
have the same type.
 Lisp lists, are heterogeneous: any object may be placed in a
list, so long as it is never used in an inconsistent fashion.
● An ML list is usually a chain of blocks, each of which contains an element
and a pointer to the next block.
● A Lisp list is a chain of cons cells, each of which contains two pointers, one
to the element and one to the next cons cell.
● An ML list is enclosed in square brackets, with elements separated by
commas:[a, b, c, d] A Lisp list is enclosed in parentheses, with elements
separated by white space: (a b c d).
● Lisp systems provide a more general, dotted list notation that captures both
proper and improper lists.
● A dotted list is either an atom (possibly null) or a pair consisting of two
dotted lists separated by a period and enclosed in parentheses.
● The list (a . (b . (c . d))) is improper; its final cons cell contains a pointer to d
in the second position, where a pointer to a list is normally required. ●
Programs are lists in Lisp, Lisp must distinguish between lists that are to be
evaluated and lists that are to be left “as is,” as structures..
Files
Files and Input/Output
We can distinguish between interactive I/O and I/Owith files.
Input/output facilities allow a program to communicate with the
outside world. Interactive I/O generally implies communication
with human users or physical devices, which work in parallel with
the running program.
● Files may be further categorized into those that are temporary and
those that are persistent.
● Temporary files exist for the duration of a single program run; their
purpose is to store information that is too large to fit in the memory
available to the program.
● Persistent files allow a program to read data that existed before the
program began running, and to write data that will continue to exist
after the program has ended.
● Some languages provide built in file data types and special syntactic
constructs for I/O. The principal advantage of language integration is the
ability to employ non-subroutine call syntax, and to perform operations
that may not otherwise be available to library routines.
● A purely library-based approach to I/O, may keep a substantial amount

You might also like