Language Design of JAVA
Language Design of JAVA
Data Types
It makes sense to share the concrete data types of Java so that no type
conversion has to be done at the translation or compilation stage. The basic
types supplied by Java are:
boolean
char
byte
short
int
long
float
double
Are all of these needed? It is probably not necessary to have all of these
when writing introductory programs. The difference between a float and a
double,
boolean
char
int
real
string
The term real has been used instead of double. It is necessary to have a
floating point data type, but over complicated to have more than one at
different precisions. A double is more flexible than a float, so it makes sense to
use a double if only one of the two types is going to be included. It would be
confusing to use the word float to represent this, as it would get translated to
a Java double rather than a Java float. However, it does not seem sensible to
call it double if there are no other floating point precisions available. This will
only lead to the question "What is a single?". Calling the type real seems a
good solution. An introductory programmer is much more likely to be familiar
with the concept of a real number than a double precision floating point
number.
Contentious point
Should string have a capital letter? In the introductory language a string is
presented as a basic type. None of the other basic types have capital letters.
In Java, String is a class, not a basic type, and classes start with capital
letters (String). If string is left uncapitalised then it may be confusing when
moving to Java. If it is capitalised, it will be incongruous, and require a
complex explanation as to why it is that way. At the current time, I am of the
opinion that string should not be capitalised.
int a;
int b = 100;
This seems good and simple, and will be used in the introductory language.
The use of the semi-colon as a statement separator will be considered later.
In Java constants are declared like this:
{ public } static final int a = 100;
In Java, the contents of the record (object) are accessed in the following way,
assuming p is a Point:
p.x = 4;
p.y = 3;
This method of defining a class (using a class keyword), and the use of the
dot operator to access member variables, is fairly simple and widely used
across a variety of languages, so it seems a good idea to keep these for the
teaching language. The use of curly brackets to delimit the class definition
seems an acceptable approach. Taking this approach here suggests using
curly brackets for delimiting all blocks of statements (loops ...) as they are
used in Java.
In C or C++ there is provision to provide a new name for a type. For instance
the name "age" could be assigned as a synonym for "int". This is done using
a typedef. Although this helps with modelling a problem, there is no provision
for using typedefs in Java. It would be difficult to write code in Java which a
typedef would map to, so unfortunately it will not be included in the
introductory language at the moment.
This uses the same syntax as declaring a variable of a basic type. As the
introductory programmer should not need to know the difference between
creating an object on the heap and creating it on the stack, it seems sensible
to use the C++ stack creation syntax for the creation of variables of all
types. The differences between basic and non basic types are thus made as
transparent as possible to the programmer, who can use them in the same
way. As objects cannot be created on the stack in Java, a declaration such as
Point p would have to be converted to Point p = new Point() by the translator.
Arrays
Arrays are fairly fundamental data structures. They are usually referenced by
an identifier and an index. For instance in C or Java, a number (or expression)
is used in square brackets after the identifier, eg
numbers[4] = 12;
Statement Separators
To determine where one statement ends and another begins, some sort of
delimiter is needed. It would be possible to use a newline for this, putting
each statement on a different line. However, it is quite often desirable to
change the layout of the program code to make it easier to read and show its
structure more clearly. Using newlines to define new statements makes this
difficult. Adding white space may cause a working program not to function.
An alternative, as used in C, C++ and Java, is to use a semicolon (or some
other special character) to separate statements. This allows arbitrary
newlines to be inserted without affecting the function of the program,
although it does mean that an extra character has to be added after each
statement, and this does not look as tidy as it would without. The fact that
Java uses a semicolon as its statement separator swings the decision to use
a semicolon in the teaching language also. It is often the case that a lot of
compiler errors in Java, and even more in C/C++, are caused by the
programmer forgetting a semicolon at the end of a line. If the programmer
gets into the habit of putting semicolons in right from the start then this may
be reduced.
Conditionals
Conditionals are a fundamental part of any programming language. The most
useful and generic construct is if .. then .. else. The Java syntax for this
seems straightforward enough to include in a language for teaching, so the
following syntax will be used:
if ( condition - a boolean expression )
{
statements ...
}
else
{
statements ...
}
Case
(
(
(
(
a
a
a
a
==
==
==
==
1
2
3
4
)
)
)
)
{
{
{
{
statements
statements
statements
statements
}
}
}
}
In Java the case construct is called switch and has the following syntax:
switch ( a )
{
case 1 :
text = "first case";
break;
case 2 :
text = "second case";
break;
case default :
text = "no other cases match";
break;
}
If a does not match any of the cases, the default case is selected. The break
statements are used to stop execution at the end of each case and jump to
the closing curly bracket. If the break statement was not included at the end
of case 1, and case 1 was matched, after case 1's statements had been
executed, case 2's statements would also be executed, and so on until a
break statement was reached. Some programmers find it an annoyance to
have to include a break at the end of each of their cases, but it does allow for
the possibility of leaving them out on purpose in order to let the execution
drop through to the next case. In order not to prevent programmers from
using this technique if they want to, it will be left to the programmer to put in
the break statements rather than having the translator put them in
automatically.
Loops
There are four common types of loops: while loops, repeat .. until loops,
generalised loops and for loops. The first two are quite similar. With while the
test for exiting the loop is done at the top (so the body of the loop may not
be executed) and with repeat .. until the test is done at the end, so the body
is always executed at least once. The Turing programming language provides
a generalised loop in which the programmer can put the exit condition at the
top or the bottom (or anywhere in the middle!) to determine the way in
which the loop works. Most repeat .. until loops can be rewritten as while
loops, so to aid consistency only a while will be provided.
while ( condition - boolean expression )
{
statements
}
The other sort of loop to be considered is the for loop. This gives a number of
iterations using an index variable. For example for i from 1 to 10 or for i from 10 to
1. The syntax for this in Java is:
for ( i = 1 ; i <= 10 ; i++ )
{
statements
}
The Java approach provides much more flexibility, but the Turing syntax is far
simpler. It is a difficult decision which of these is the more important
consideration. I would argue for simplicity, consistent with all of the features
of this new language, but at the same time it would be easy to make the
language too restrictive, therefore not allowing more sophisticated programs
to be written. While the language should be kept simple, it is also important
that a large number of problems can be solved and techniques applied using
it. My proposed syntax for the Kenya for loop is as follows:
for i = 0 to 9
{
print i;
}
or
for decreasing i = 9 to 0 step 2
{
print i;
}
Java calls procedures and functions "methods". Procedures are just functions
which do not return a value (they return "void"). All methods are members of
classes. Unless a method is declared static, it can only be called if an object of
the class of which it is a member has been created. As at the current time
the introductory language will not support object-oriented programming, all
methods should be members of the class in which main is defined, and be
declared static with package access, so that they can be called from any part
of the program.
At the moment I think that the best way to deal with input is to provide
functions called things like readInt() and readString(). If the user includes these
in their program, a function wrapping code similar to the above will be
included in the Java source code, and called at the relevant point. Another
option would be to have a generic read() function which would translate to
different Java functions depending on the type of the variable to which the
result of the read() is being assigned. This is more complex to implement.
Operators
The following operators will be provided:
==
!=
<
>
<=
>=
and
or
not
Generics
Generics[14] are a sophisticated concept. Java is an object-oriented language
and (almost) everything is an object (i.e. it extends the class Object). When
programs deal with large numbers of objects, they tend to hold them in
various kinds of containers (like Vectors, HashMaps etc). Any sort of object
can be put into a container and got out again later. However, if you put in
say a Dog, where Dog is a class that you or someone else has defined, and
try to get it out again, you get out an Object. This is because containers hold
Objects. They do not remember the more explicit type of each Object put
into the container, and so they can only give an Object back. It is up to the
programmer to remember the type of the objects they put into the container,
and convert them back to this type using a cast.
In Java this looks like:
Vector v = new Vector();
Dog d = new Dog();
v.add(d);
Dog e;
e = (Dog)v.elementAt(0);
The cast is the bracketed (Dog) after the assignment operator on the last line.
This coerces the Object which comes out of the vector to a Dog, so that it
can be assigned to e.
It is somewhat annoying for the programmer to have to remember the type
of the objects in a container and cast them whenever they are extracted. A
solution to this problem would be to have a class called DogVector which
only contained Dogs. We could then be sure that any object extracted from a
DogVector would be a Dog and therefore no cast would be necessary.
However, there will be other types of objects that programmers will want to
store in vectors as well as Dogs (in fact anything that is an Object) and using
this approach a different class would have to be written for each container
for each type of object to be contained. Every time a programmer defined a
new type they would have to define a new set of containers to put them in.
Generics offer a solution to this problem by providing the possibility of
having containers that are parameterised by type. That is, we can say we
want a Vector < A > . This means we want a Vector, but that everything it
contains will be of type A (where A could be Dog, Date, String ... ). The
parameterised container then deals with any type coercion necessary.
C++ offers generics in the form of templates. At the moment[15] Java does
not have generics, but compilers are available which will compile a superset
of Java, including parameterised types. GJ (Generic Java)[16] is such a
compiler. It would not be difficult to produce GJ code as the translation from
Kenya rather than Java (GJ is a superset of Java, so the code would be pure
Java if generics were not used in the Kenya code). This would allow
programmers to use the feature, removing much of the need for casting, one
of the less elegant features of Java.
The use of parameterised types is quite an advanced concept, and it is
questionable whether they should be included in a teaching language.
However, I think that they should be included as the novice can choose not
to use them. When they do come to work with containers, the concept of the
parameterised type can be explained just as easily as the need to cast
objects when they are extracted from containers.