0% found this document useful (0 votes)
128 views

Language Design of JAVA

The document discusses design considerations for an introductory programming language. It aims to have syntax that is easy to understand while drawing from Java where appropriate. The language should include basic data types like boolean, char, int, and string to solve introductory problems. Control structures like conditionals, switch statements, and while loops are also discussed to have syntax similar to Java. Arrays and user-defined types will function similarly but be declared simply without memory allocation.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views

Language Design of JAVA

The document discusses design considerations for an introductory programming language. It aims to have syntax that is easy to understand while drawing from Java where appropriate. The language should include basic data types like boolean, char, int, and string to solve introductory problems. Control structures like conditionals, switch statements, and while loops are also discussed to have syntax similar to Java. Arrays and user-defined types will function similarly but be declared simply without memory allocation.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Language Design

The main concerns in the design of the language are:

It should have a syntax which is easy to understand, remember and


write.

The syntax of Java should be used where it seems suitably simple, so


that the learner can see how to write the same programs in Java when
they want to learn a new language.

Enough suitable keywords and structures should be included to allow


the language to have sufficient functionality for it to be used to solve
all of the problems which might be set as an introductory programming
exercise.

Data Types
It makes sense to share the concrete data types of Java so that no type
conversion has to be done at the translation or compilation stage. The basic
types supplied by Java are:

boolean

char

byte

short

int

long

float

double

Are all of these needed? It is probably not necessary to have all of these
when writing introductory programs. The difference between a float and a

double,

or a short and a long is unlikely to be something that an introductory


programmer should need to understand.
It would be useful to have a string type. Strings in Java are not basic types
but objects. This can probably be made transparent to the introductory
programmer.
A more suitable set of basic types might be:

boolean

char

int

real

string

The term real has been used instead of double. It is necessary to have a
floating point data type, but over complicated to have more than one at
different precisions. A double is more flexible than a float, so it makes sense to
use a double if only one of the two types is going to be included. It would be
confusing to use the word float to represent this, as it would get translated to
a Java double rather than a Java float. However, it does not seem sensible to
call it double if there are no other floating point precisions available. This will
only lead to the question "What is a single?". Calling the type real seems a
good solution. An introductory programmer is much more likely to be familiar
with the concept of a real number than a double precision floating point
number.
Contentious point
Should string have a capital letter? In the introductory language a string is
presented as a basic type. None of the other basic types have capital letters.
In Java, String is a class, not a basic type, and classes start with capital
letters (String). If string is left uncapitalised then it may be confusing when
moving to Java. If it is capitalised, it will be incongruous, and require a
complex explanation as to why it is that way. At the current time, I am of the
opinion that string should not be capitalised.

Variables and Constants


In Java variables are declared like this:

int a;
int b = 100;

This seems good and simple, and will be used in the introductory language.
The use of the semi-colon as a statement separator will be considered later.
In Java constants are declared like this:
{ public } static final int a = 100;

This seems complicated and difficult to explain. In the introductory language


a const keyword will be provided, which will map to static final in the translation
to Java.
const int a = 100

Records and User Defined Types


Records (structures containing a set of other types) e.g. a Point containing an
x and a y co-ordinate, both of a basic type, are useful. In Java this would be a
class.
A large part of programming and software engineering is about finding a
good abstraction model. Writing a program to fit this model is helped a lot by
the provision for user defined Abstract Data Types. These are again
implemented using classes in Java.
A point class in Java:
class Point
{
int x;
int y;
}

In Java, the contents of the record (object) are accessed in the following way,
assuming p is a Point:
p.x = 4;
p.y = 3;

This method of defining a class (using a class keyword), and the use of the
dot operator to access member variables, is fairly simple and widely used
across a variety of languages, so it seems a good idea to keep these for the
teaching language. The use of curly brackets to delimit the class definition
seems an acceptable approach. Taking this approach here suggests using
curly brackets for delimiting all blocks of statements (loops ...) as they are
used in Java.

In C or C++ there is provision to provide a new name for a type. For instance
the name "age" could be assigned as a synonym for "int". This is done using
a typedef. Although this helps with modelling a problem, there is no provision
for using typedefs in Java. It would be difficult to write code in Java which a
typedef would map to, so unfortunately it will not be included in the
introductory language at the moment.

Declaring Variables of User Defined Types


In Java, objects (instances of classes) are created using the new operator, in
the following way.
Point p = new Point();
p.x = 2;

This creates an object on the heap, assigning memory dynamically at


runtime. In C++ objects can be created on the heap in the same way, or
they can be created on the stack in the following way:
Point p;
p.x = 2;

This uses the same syntax as declaring a variable of a basic type. As the
introductory programmer should not need to know the difference between
creating an object on the heap and creating it on the stack, it seems sensible
to use the C++ stack creation syntax for the creation of variables of all
types. The differences between basic and non basic types are thus made as
transparent as possible to the programmer, who can use them in the same
way. As objects cannot be created on the stack in Java, a declaration such as
Point p would have to be converted to Point p = new Point() by the translator.

Arrays
Arrays are fairly fundamental data structures. They are usually referenced by
an identifier and an index. For instance in C or Java, a number (or expression)
is used in square brackets after the identifier, eg
numbers[4] = 12;

This seems a fairly straightforward notation. In Turing, a similar syntax is


used, but with round brackets instead of square. To preserve consistency
with Java, square brackets will be used.
In Java, arrays are objects, and need to be created using the new operator. eg
int numbers[] = new int[12];
String names[];

names = new String[12];

In the introductory language, arrays will be declared without this, in a C


style:
int numbers[12];

and converted to the correct Java by the translator.

Statement Separators
To determine where one statement ends and another begins, some sort of
delimiter is needed. It would be possible to use a newline for this, putting
each statement on a different line. However, it is quite often desirable to
change the layout of the program code to make it easier to read and show its
structure more clearly. Using newlines to define new statements makes this
difficult. Adding white space may cause a working program not to function.
An alternative, as used in C, C++ and Java, is to use a semicolon (or some
other special character) to separate statements. This allows arbitrary
newlines to be inserted without affecting the function of the program,
although it does mean that an extra character has to be added after each
statement, and this does not look as tidy as it would without. The fact that
Java uses a semicolon as its statement separator swings the decision to use
a semicolon in the teaching language also. It is often the case that a lot of
compiler errors in Java, and even more in C/C++, are caused by the
programmer forgetting a semicolon at the end of a line. If the programmer
gets into the habit of putting semicolons in right from the start then this may
be reduced.

Conditionals
Conditionals are a fundamental part of any programming language. The most
useful and generic construct is if .. then .. else. The Java syntax for this
seems straightforward enough to include in a language for teaching, so the
following syntax will be used:
if ( condition - a boolean expression )
{
statements ...
}
else
{
statements ...
}

Case

Case is a useful construct to prevent programmers having to write:


if
if
if
if

(
(
(
(

a
a
a
a

==
==
==
==

1
2
3
4

)
)
)
)

{
{
{
{

statements
statements
statements
statements

}
}
}
}

In Java the case construct is called switch and has the following syntax:
switch ( a )
{
case 1 :
text = "first case";
break;
case 2 :
text = "second case";
break;
case default :
text = "no other cases match";
break;
}

If a does not match any of the cases, the default case is selected. The break
statements are used to stop execution at the end of each case and jump to
the closing curly bracket. If the break statement was not included at the end
of case 1, and case 1 was matched, after case 1's statements had been
executed, case 2's statements would also be executed, and so on until a
break statement was reached. Some programmers find it an annoyance to
have to include a break at the end of each of their cases, but it does allow for
the possibility of leaving them out on purpose in order to let the execution
drop through to the next case. In order not to prevent programmers from
using this technique if they want to, it will be left to the programmer to put in
the break statements rather than having the translator put them in
automatically.

Loops
There are four common types of loops: while loops, repeat .. until loops,
generalised loops and for loops. The first two are quite similar. With while the
test for exiting the loop is done at the top (so the body of the loop may not
be executed) and with repeat .. until the test is done at the end, so the body
is always executed at least once. The Turing programming language provides
a generalised loop in which the programmer can put the exit condition at the
top or the bottom (or anywhere in the middle!) to determine the way in
which the loop works. Most repeat .. until loops can be rewritten as while
loops, so to aid consistency only a while will be provided.
while ( condition - boolean expression )
{
statements
}

The other sort of loop to be considered is the for loop. This gives a number of
iterations using an index variable. For example for i from 1 to 10 or for i from 10 to
1. The syntax for this in Java is:
for ( i = 1 ; i <= 10 ; i++ )
{
statements
}

This provides maximum flexibility from one construct without increasing


complexity to do more sophisticated things (e.g. increment in different steps
or have complex loop termination conditions).
Turing takes a slightly less sophisticated approach, using:
for i : 0 .. 9
put i
end for

The Java approach provides much more flexibility, but the Turing syntax is far
simpler. It is a difficult decision which of these is the more important
consideration. I would argue for simplicity, consistent with all of the features
of this new language, but at the same time it would be easy to make the
language too restrictive, therefore not allowing more sophisticated programs
to be written. While the language should be kept simple, it is also important
that a large number of problems can be solved and techniques applied using
it. My proposed syntax for the Kenya for loop is as follows:
for i = 0 to 9
{
print i;
}

or
for decreasing i = 9 to 0 step 2
{
print i;
}

The second case gives a loop counter which is decremented by 2 at each


iteration of the loop.
It may be possible to allow both this format and the Java style format of the
loop in order to let the programmer (or the teacher) choose which they feel is
more useful or applicable in a certain case.

Procedures and Functions

Java calls procedures and functions "methods". Procedures are just functions
which do not return a value (they return "void"). All methods are members of
classes. Unless a method is declared static, it can only be called if an object of
the class of which it is a member has been created. As at the current time
the introductory language will not support object-oriented programming, all
methods should be members of the class in which main is defined, and be
declared static with package access, so that they can be called from any part
of the program.

Input and Output


Textual output in Java, on the console at least, is most easily achieved by
using the library functions System.out.println() and System.out.print() to print a line
of text (with a newline at the end in the case of println()). To hide the library,
the teaching language will provide functions print and println() which will
translate to calls of System.out.print() and System.out.println() respectively in the
Java.
Doing console input in Java is not simple. The cleanest way found to read say
an integer from the keyboard into an integer variable is to do something like
the following.
try {
java.io.BufferedReader stdin = new java.io.BufferedReader(new
java.io.InputStreamReader(System.in));
String line = stdin.readLine();
int i = Integer.parseInt(line);
}
catch ( java.io.IOException e ){ System.out.println(e); }
catch ( NumberFormatException e ){ System.out.println(e); }

At the moment I think that the best way to deal with input is to provide
functions called things like readInt() and readString(). If the user includes these
in their program, a function wrapping code similar to the above will be
included in the Java source code, and called at the relevant point. Another
option would be to have a generic read() function which would translate to
different Java functions depending on the type of the variable to which the
result of the read() is being assigned. This is more complex to implement.

Operators
The following operators will be provided:

==

!=

<

>

<=

>=

and

or

not

Generics
Generics[14] are a sophisticated concept. Java is an object-oriented language
and (almost) everything is an object (i.e. it extends the class Object). When
programs deal with large numbers of objects, they tend to hold them in
various kinds of containers (like Vectors, HashMaps etc). Any sort of object
can be put into a container and got out again later. However, if you put in
say a Dog, where Dog is a class that you or someone else has defined, and
try to get it out again, you get out an Object. This is because containers hold
Objects. They do not remember the more explicit type of each Object put
into the container, and so they can only give an Object back. It is up to the
programmer to remember the type of the objects they put into the container,
and convert them back to this type using a cast.
In Java this looks like:
Vector v = new Vector();
Dog d = new Dog();
v.add(d);

Dog e;
e = (Dog)v.elementAt(0);

The cast is the bracketed (Dog) after the assignment operator on the last line.
This coerces the Object which comes out of the vector to a Dog, so that it
can be assigned to e.
It is somewhat annoying for the programmer to have to remember the type
of the objects in a container and cast them whenever they are extracted. A
solution to this problem would be to have a class called DogVector which
only contained Dogs. We could then be sure that any object extracted from a
DogVector would be a Dog and therefore no cast would be necessary.
However, there will be other types of objects that programmers will want to
store in vectors as well as Dogs (in fact anything that is an Object) and using
this approach a different class would have to be written for each container
for each type of object to be contained. Every time a programmer defined a
new type they would have to define a new set of containers to put them in.
Generics offer a solution to this problem by providing the possibility of
having containers that are parameterised by type. That is, we can say we
want a Vector < A > . This means we want a Vector, but that everything it
contains will be of type A (where A could be Dog, Date, String ... ). The
parameterised container then deals with any type coercion necessary.
C++ offers generics in the form of templates. At the moment[15] Java does
not have generics, but compilers are available which will compile a superset
of Java, including parameterised types. GJ (Generic Java)[16] is such a
compiler. It would not be difficult to produce GJ code as the translation from
Kenya rather than Java (GJ is a superset of Java, so the code would be pure
Java if generics were not used in the Kenya code). This would allow
programmers to use the feature, removing much of the need for casting, one
of the less elegant features of Java.
The use of parameterised types is quite an advanced concept, and it is
questionable whether they should be included in a teaching language.
However, I think that they should be included as the novice can choose not
to use them. When they do come to work with containers, the concept of the
parameterised type can be explained just as easily as the need to cast
objects when they are extracted from containers.

You might also like