LX Compared to Other Languages
Christophe de Dinechin
Version 1.5 (updated 2002/01/23 16:15:22)
For many programmers, seeing a language in action and a few simple examples is the best way to get a general idea of what it can do. Below are some of the interesting characteristics of LX, roughly from the simplest to the most advanced, which should give you some idea of the language. Each feature contains a short description, an example, and an indication of why this is useful or necessary. This list is only intended to show where LX is most likely to differ from other common languages such as C++. It is not a specification.
We also do not discuss here in depth what sets LX apart, the numerous ways to extend the language.
Most programming languages use keywords or special characters (such as '{' and '}') to delimit blocks. This makes it possible for the indentation to no longer reflect the actual structure of the program, as in the following C++ code:
if (A == B); printf("A==B\n");
LX, on the other hand, uses the indentation to indicate the structure of blocks. The indentation is defined with either only spaces, or only tabs. Combinations in a same source file are not allowed.
if A = B then WriteLn "A=B"
Indentation in LX remains very flexible. It does not prevent grouping multiple statements on a line, nor does it prevent splitting a long line in the middle. LX has terminating characters and keywords (';' and 'end'), but they are optional when indentation allows them to be deduced.
-- Single line compound statement if N = 0 then return 1; else return N * fact(N-1); end -- Multi-line expressions and optional end while ptr <> nil and ptr.size > 0 and ptr.array[ptr.size] <> 0 and ptr.array[ptr.size-1] <> 0 and ptr.array[0] <> ptr.array[ptr.size] loop Write "A=", A, "B=", B ptr := ptr.next end -- 'end' is optional here, deducible from indentation
Why? Code is read many times as often as it is written. Indentation is the easiest way for the human brain to visualize program structure. So forcing indentation to be kept in sync with program structure helps maintenance.
No keyword or character is required to open or close a block, since indentation is used to delimit the block:
procedure TableOfFactorials() is with integer N, I; real Fact WriteLn "--- Table of factorials ---" for N in 1..100 loop Fact := 1.0 for I in 2..N loop Fact *= I WriteLn "Factorial of ", N, " is ", Fact WriteLn "--- End -------------------"
The semi-colon ';' is required to separate statements or declarations only if they are on the same line. Otherwise, a line break can act as an instruction end. Instructions or blocks can span multiple lines, or be on the same line separated with semi-colons (see the for loops above). An instruction can be split between lines using indentation. Invoking a procedure, such as WriteLn, does not require parenthesis around the arguments.
Why? The punctuation is unnecessary, and causes additional visual clutter. In addition, on many non-US keyboard layouts, typical punctuation characters such as { or } are difficult to type.
To most non-programmers, JohnSmith, john_smith and JOHN_SMITH all denote the same person. As programmers, we unlearn this identity to make them different. When looking up a name, LX ignores case and separating underscore '_' characters (two consecutive underscores are not allowed).
-- There is only one 'file' and one 'open_file' below function OpenFile(string Name, Mode) return file FILE F := open_file("foo.dat", "r")
On the other hand, name overloading allows you to reuse the same name for different entities when the name will not be used in the same context.
type rectangle function Rectangle(integer X, Y, Width, Height) return rectangle rectangle Rectangle := Rectangle(10, 10, 20, 20)
In addition, LX, like any Mozart-based language, requires a renderer, which can be used to present code to programmers with their preferred style.
Why? Style preferences cause religious reaction from many otherwise reasonable programmers. Style insensitivity allows reuse of libraries while maintaining a consistent style in your own code.
Cons Entities that would be distinguished by case in the real world can no longer be distinguished this way. For instance, V for volume and v for speed in physics. This is common and reasonable practice.
As processor architectures move toward ever increasing bit sizes (32, then 64 bits), large numbers become difficult to decipher.
long long million = 1000000; long long billion = 1000000000; double pi = 3.1415926535897932384626433;
LX allows you to use a separating underscore '_' between digits to make them easier to read:
integer million := 1_000_000 integer billion := 1_000_000_000 real pi := 3.14159_26535_89793_23846_26433;
Why? Large constants are getting more common with larger processor word sizes.
On today's architectures, the most used bases are 10, 2 and 16. Octal is rarely used. Yet C has a short notation for base 8, a longer notation for base 16, and no notation at all for base 2. Worse, these notations do not apply to floating-point numbers, making the encoding of numbers such as MAX_FLT awkward. Last, a few mathematical algorithms are best written with bases such as 3 or 7.
LX offers a general notation for based numbers, that allows any base between 2 and 36, and applies to floating-point numbers as well.
integer Large := 16#FFFF_FFFF integer Twenty_Seven := 3#1000 real MAX_FLT is 2#1.1111_1111_1111_1111_1111_1111#E127
Why? This solution is general and easy to read. It doesn't create syntactic ambiguities as other notations would. LX parses 0x0 as 0 x 0 (infix x operator), and 03H-1 as 03 H -1.
Several languages (C++, Ada) offer the possibility to overload arithmetic operators such as '+' or '/'. The languages I know of limit themselves to binary operators. In some cases, combining operations results in improved performance (vector and multimedia operations) or precision (matrix operations). LX offers a general notation for defining such operator combinations:
function Multiply_Add(vector A, B, C) return vector written A*B+C vector M, N, O vector P := M * N + O
LX also allows this notation to be used for the parameters to generic types, in which case the use of multiple operators is even more useful.
generic [type item; ordered Low, High] type array written array[Low..High] of item type vector is array[1..100] of real array V[1..10] of integer
Some of the arguments of binary operators need not be parameters of the function being defined
function IsIdentity(matrix M) return boolean written M = 1 function IsZero(matrix M) return boolean written M = 0
A written form can even be used to enable implicit conversions:
function Real(integer I) return real written I real R := 1 -- Implicit call to Real(1)
Why? The operator overloading syntax is easier to read than in most alternatives (consider how you indicate if operator++ is prefix or postfix in C++). It is also more general. When applied to generic types, it allows libraries to define a syntax such as array[A..B] of T. When applied to objects, it allows optimizations that the compiler cannot know about to be specified by the programmer. It also enable convenient mathematical notations (0 < X < 1)
In the above array example, of is not an operator, but a named infix operator. LX allows you to use arbitrary names for infix operators in expressions.
function And(integer X, Y) return integer written X and Y integer Bits := Value and Mask function Rectangle (integer X, Y, Width, Heigh) return rectangle written Width by Height at X row Y rectangle R := 10 by 20 at 30 row 40
This technique allows one to use an infix function call notation where appropriate, as in Objective-C or Smalltalk. Such notations are often (but not always) more readable.
Why? In addition to basic operators such as A and B, it also can be used for convenience notations such as A between 0 and 1.
Cons Someone will soon realize that with the proper declarations, the following becomes legal LX:
if you read this then you are an idiot
Bertrand Meyer's Eiffel language formalized the idea of programming by contract. LX allows you to define preconditions, postconditions and internal assertions in your objects. The compiler may either verify these predicates, or use them for better optimizations.
function Factorial(integer N) return real assume N >= 0 ensure out Result >= in N procedure Display() is with integer I for I in 1..100 loop with real F := Factorial(I) assert F > 0.0 WriteLn "F = ", F
Note: the keyword for preconditions may change to 'require', following a suggestion that 'assume' makes the responsibilities of the caller and callee less clear.
Why? Programming by contract improves the maintainability and robustness of code, and it can improve optimizations too.
C and C++ functions only have input arguments passed by value. Other kinds of argument passing conventions have to be simulated using pointers. As the underlying architecture changes, the compiler cannot take advantage of wider registers.
In the following code, assuming short is 16 bits, it makes sense to pass a pointer when the size of pointers is 16 bits.
struct Rect { short Top, Left, Bottom, Right; }; void CopyRect(const Rect *from, Rect *to)
When the pointer is 32 bit, on many RISC architectures, copying the struct from registers would be faster than accessing it from memory. On 64-bit architecture where the whole structure fits in a single register, having to use pointers and memory accesses becomes more expensive than passing the struct directly in a register, but the compiler is not allowed to do that.
In LX, the programmer specifies if the parameters are used for input, output or input/output to the function or procedure. As a result, the compiler is free to select the best compromize for passing the arguments, depending on the target architecture.
type rect is record with integer16 Top, Left, Bottom Right procedure CopyRect(in rect From; out rect To)
On a 64-bit architecture, one register will be passed as input, and one as output, with no memory access. C will mandate two 64-bit pointers as input, with at least one memory load and one store inside the function. On a 16-bit architecture, LX will probably use the same approach as C, passing two pointers. Note that since an LX compiler may use different methods for passing arguments, input arguments are read-only (in C and C++, the local copy in the function can be modified.)
Output arguments are constructed in the procedure, and deleting them is the responsibility of the caller, except if the callee exits with an exception. A program is invalid if there is a way for a procedure to exit without having initialized all of its output parameters. A compiler is free to check this at compile time or at run time. In general, if output arguments have complex constructors, they should be initialized before anything else is done.
Why? Specifying the intent improves readability and maintenability, and gives the compiler the information it needs to make the right optimization.
Some function require very large parameter lists. Calls to these functions tend to become difficult to read. In C++, the problem is often worked around by passing objects that encapsulate parameters, but in some cases this is not practical.
control_register DiskController = control_register( 0xEFFF42D0, 32, 16, 0xFFFFFFFF, true, true);
LX allows the call site to specify the name of the arguments, in which case the order of the arguments may be different than the order of the parameters.
control_register DiskController := control_register( Address: 16#EFFF_42D0, Register_Size: 32, Memory_Access_Size: 16, Bit_Mask: 16#FFFF_FFFF, Read_Enable: true, Write_Enable: true)
C and C++ pass the result of functions by copy. Just like passing all incoming arguments by value, this can be inefficient. Some compilers perform an optimization known as the Named Return Value (NRV) optimization, but the ability to perform this optimization is very dependent on the coding style in the function. And many compilers just don't do this optimization at all.
In LX, the return value of a function is named result. Actually, there is practically no difference between the two following declarations:
function F(integer N) return integer procedure F(out integer result; integer N)
In many cases, this allows a more compact coding of the function:
function Factorial(integer N) return real is with integer I Result := 1.0 for I in 2..N loop Result *= I
There is, naturally, also a return statement, which can be useful for early termination of a function:
function Factorial(integer N) return real is if N = 0 then return 1.0 return N * Factorial(N-1)
If the returned data type has a contructor, then it is not possible for the function to return an uninitialized value. From that point of view, result is considered like any output parameter.
At least one very frequent operation requires a variable number of arguments: displaying the results of a program. So far, no language has found a really elegant solution:
WriteLn('X=', X, 'Y=', Y);
printf("X=%d, Y=%d\n", X, Y);
cout << "X=" << X << ", Y=" << Y << eol;
System.out.println("X=" + X + ", Y=" + Y)
LX allows you to define a WriteLn procedure yourself, that behaves exactly like the Pascal WriteLn yet can be defined in a library and written in LX. This is achieved using the others keyword, which stands for any number of arguments. A procedure or function with others parameters is generic, and the compiler will use it to generate functions with the appropriate number of arguments.
generic type writeable if with writeable W WriteIt W procedure WriteLn(others) is Write others Write NewLine procedure Write(writeable W; others) is WriteIt W Write others procedure WriteIt(integer I) procedure WriteIt(real R) procedure WriteIt(character C)
Note that this definition also makes use of a generic type named writeable. This type indicates that the writeable type in the first Write definition can be replaced by any type for which it is possible to call WriteIt. With this definition, the LX call looks similar to the corresponding Pascal call. Formatting is achieved using the format operator.
WriteLn "X=", X, ", Y=", Y WriteLn X format "X=###.###", Y format "Y=###.###"
The same technique can be used to define a Max function taking an arbitrary number of arguments of any type with a "less-than" operator:
generic type ordered if with ordered X, Y with boolean B := X < Y function Max(ordered X) return ordered is return X function Max(ordered X; others) return ordered is result := Max(others) if result < X then result := X real A, B, C, D real E := Max(A, B, C, D) integer I, J, K := Max(I, J)
Last, the same technique is also used to define generic types with variable numbers of arguments:
generic [type item; ordered Low, High; others] type array written array[Low..High, others] of item is array[Low..High] of array[others] of item array Matrix[1..5, 1..5] of real
LX has no special data type using keywords, such as int in C and C++. All data types obey the same rules. Predefined data types such as integer are simply defined by the compiler implicitly. They are actually extracted from the LX.BUILT_IN module. Therefore, the integer name can be used like any other name, for instance to define a conversion function from an arbitrary precision big_integer type.
type big_integer; function Integer(big_integer Z) return integer
When a type is used in a declaration, only one word of the type is on the left of the declared name. The remaining of the type is on the right of the declared name.
type Proc is procedure(integer N) procedure P(integer N) type Rec is record with integer N record R with integer N type Vector5 is array[1..5] of integer array V5[1..5] of integer
LX builds data types by aggregation, in a way similar to single inheritance in C++. Additional data fields are added to existing types. A predefined record empty data type is used to create data records.
type point is record with real X, Y type point_3D is point with real Z
C and C++ have union types. The syntax is not very practical since it allows only one declaration for each union member. To get an integer followed by either a real or two shorts or four chars, one has to write:
struct VariantStruct { int I; union { float F; struct { short S1, S2; } Shorts; struct { char C1, C2, C3, C4; } Chars; } U; }; VariantStruct S; char C4 = S.U.Chars.C4;
LX used tagged records to allow the declaration of such data structures, where multiple declarations can be placed under each of the conditions.
type variant_record is record with integer I when true: real F when true: short S1, S2 when true: char C1, C2, C3, C4; when true char exponent char mantissa1, mantissa2, mantissa3; when true: bitmask bits[32] variant_record S char C4 := S.C4
Another form of variant records can be declared in LX and not easily in C or C++: records in which a data member declaration involves previous data members. For instance:
type TwoStrings is record with unsigned Size1 array String1[1..Size1] of character unsigned Size2 array String2[1..Size2] of character
In C or C++, it is not possible to really control the access to fields in a variant record. LX gives the possibility by adding conditions in the tags of a variant record. For instance, in the VariantStruct example above, it is possible to control when each of the fields is accessed depending on the value of I:
type variant_record is record with integer I when I >= 0: real F when I in 3..27 : short S1, S2 when true: char C1, C2, C3, C4
The guard conditions are used as assertions when accessing each of the fields. The conditions need not be mutually exclusive.
In C and C++, the interface of a type is its implementation. In other words, the following data type necessarily contains 4 floating point values.
struct complex { float Re, Im, Rho, Theta; };
LX allows you to define the interface of a type independently of its actual implementation. For instance, it is possible to provide read-only Rho and Theta fields in a complex type.
-- Interface of the type type complex with real Re, Im out real Rho, Theta -- Implementation of the type (normally hidden to the user) type complex is record with real Re, Im -- Implementation of the "missing" data fields (properties) function Rho(complex Z) return real written Z.Rho is return Sqrt(Z.Re^2 + Z.Im^2) function Theta(complex Z) return real written Z.Theta is return Atan2(Z.Im, Z.Re)
In C++, inheritance also implies a common internal representation. Therefore, it is not possible to create a Big_Integer class that would behave like an Integer clas if it doesn't also share its data members.
In LX, the properties of "behaving like" (or "being a") is called logical inheritance, and is independant of the underlying representation. One way of achieving logical inheritance is through data inheritance, in which case base-slicing will occur as in C++:
type shape type rectangle like shape is shape with integer X, Y, W, H
But the data types can also be totally unrelated. In that last case, conversions must be provided between the derived and base types.
type integer type big_integer like integer is record with unsigned Size array Bytes[1..size] of byte function Integer(big_integer I) return integer
Logical inheritance indicates that the derived type can be used as input argument to any function that would take the base object. It is also the basis for dynamic function selection in the case of dynamic dispatch, the LX replacement for C++ virtual functions.
Multiple logical inheritance is allowed. Multiple conversion functions must be provided in that case, for each of the base classes. C++ style multiple inheritance by aggregation can trivially be implemented using records, but more sophisticated schemes using more complex conversion functions can also be used.
Constructors and destructors are special functions used to create and destroy objects. C++ makes heavy use of constructors and destructors, mostly because of its very weak memory management model inherited from C. While they are less useful in LX, constructors and destructors are available.
A constructor is a function with the same name as the type it returns. In any scope where a constructor is present, any object of the type must be constructed using a constructor. A function taking no argument is called the default constructor, and can be used for object declarations with no initialization. If there is a constructor, but no default constructor, default initialization of objects is not valid.
type complex function Complex(real Re, Im) return complex -- constructor procedure P() is with complex Z := Complex(1.0, 3.0) with complex I -- Error: not initialized
Contrary to C++, constructors can be defined anywhere and involve other constructors. For instance, the following is legal in LX (notice the use of a nested function to create a default constructor):
type complex function Complex(real Re, Im) return complex -- contructor procedure P() is with function Complex() return complex is return Complex(0.0, 1.0) with Complex I -- OK: Uses the local default constructor
Constructors that take a single argument of a different type are called conversion constructors. In the case of logical inheritance, the constructor converting to the base class is invoked implicitly by the compiler.
As indicated previously, LX functions define a parameter named result, which needs to be initialized when there are constructors. If there is a constructor but no default constructor, the very first statement of the function must be a return statement or initialize result using an assignment.
function I() return complex is return Complex(0.0, 1.0) function J() return complex is result := Complex(0.0, -1.0) function K() return complex is -- Error: no default constructor, first statement not an init Write "Hello" return Complex (3.5, 7.2)
If there are multiple output arguments, the first N statements of the procedure or function must initialize each of the output arguments in their declaration order. The exception to this rule are constructors declared in the same scope as the type they construct, or other constructors if none is declared in the type scope.
Destructors in LX are procedures called Delete and taking one input argument of the given type. They are invoked implicitly by the compiler when any variable of the given type exits a given scope.
type vector function vector(unsigned Size) return vector procedure delete(vector V) procedure P() is with vector V := vector(270) -- 'delete V' implicitly called at P exit
As for constructors, destructors can be defined locally. In that case, though, the last statement of a local destructor is to call implicitly the global destructor it hides.
type vector function vector(unsigned Size) return vector procedure delete(vector V) procedure P() is with procedure delete(vector V) is -- ... some specific code -- the global delete V is invoked here with vector V := vector(3) -- ... other code -- The local delete V (and in turn the global one) is invoked -- on exit from P() here
Contrary to C and C++, pointers are not built-in entities. Just like arrays, they are constructed generic types. The LX library offers multiple pointer variants, corresponding to different pointer usages, and allowing the compiler to make reasonable optimization assumptions.
In general, all these pointer types are made largely unnecessary in application code, because of LX dynamic objects, which should be used for building complex data structures.
Pointer types that may be offered by the LX library include:
integer I := 3, J := 4 reference R to integer := I -- Create the reference reference Q to integer := J R := 5 Write "I=", I -- Will write 5, not 3 R := Q -- Assigns R (and I) to 4, the value of J R := 7 -- Modifies I, not J Rename J, R -- Set the reference R to point to J R := 8 -- Modify J through R Write "J=", J -- Will write 8
array A[1..5] of integer address Ptr of integer := address(A[0]) Ptr += 3 -- With some luck, points to A[3] :-) *Ptr := 5 Ptr := raw_alloc(1000) free Ptr
Memory accessed by pointers is allocated by the alloc function, which takes an object argument used to initialize the memory. Memory is freed by the free procedure, which also resets the pointer to null. free can apply to a null pointer and does nothing in this case. The destructor of pointers doesn't free the memory.
procedure P() is with complex I pointer P to complex -- Initialized with NULL pointer Q to complex P := alloc(I) -- Allocate and initialize Q := P -- Two pointers to same object P.Re := 78.25 -- Implicit dereference with '.' *P := I + I -- Replace pointed object free P -- Freeing memory (and set P to null) P := pointer(address(I)) -- OK: explicitly taking address P := pointer(I) -- Error: can't point to "stack" P := P + 3 -- Error: no arithmetic
procedure Swap(big_blob A, B) is with auto_ptr P to big_blob := alloc(B) B := A A := *P -- The allocated memory is freed here automatically
type person is access to record with string name, first_name person father, mother function person(string name, first_name; person father := NULL, mother := NULL) return person is result := new(person) result.name := name result.first_name := name result.father := father result.mother := mother function CreateJohnDoe() return person is return person("Doe", "John", person("Doe", "Jack"), person("Duh", "Jane"))
Accesses are the building blocks for creating dynamic objects.
C and C++ offer rudimentary modular programming through the #include preprocessor programming. This puts a lot of burden on the programmer, in particular having to write include guards at the beginning and end of each include file. It has several other significant drawbacks, such as the cost for the compiler to re-parse the include files over and over. For a language as complex as C++, the build-time cost is very high.
LX, like many other languages, has a real notion of modular programming. In LX, modules are declared using the same syntax as records. Modules are nothing more than constant records. Modules are typically declared using the module data type, which is a constant empty type.
A module interface therefore looks like the following:
module COMPLEX with type complex with real Re, Im; constant complex I function Complex () return complex function Complex (real Re, Im) return complex function Add(complex Z1, Z2) return complex written Z1+Z2 function Sub(complex Z1, Z2) return complex written Z1-Z2 -- ... and more
The implementation of the module can be defined in a different file, as follows:
module COMPLEX body is with type complex is record with real Re, Im constant complex I is Complex(0.0, 1.0) function Complex() return complex is return Complex(0.0) function Complex(real Re, Im) return complex is result.Re := Re result.Im := Im function Add body is result.Re := Z1.Re + Z2.Re result.Im := Z1.Im + Z2.Im -- ... and so on
The import statement is used to import declarations from another source file, without the cost of re-parsing them. The import statement is typically used to import modules. It can also be used to give a local short name to a module name.
import IO = LX.TEXT_IO procedure WriteHello() is IO.WriteLn "Hello"
In the module implementation above, the COMPLEX module and the Add function have their argument lists replaced with the body keyword. When the type key and the name are enough to select a unique declaration, the body keyword can be used to replace all the additional parameters of this declaration.
The COMPLEX module implementation above is therefore a short version of:
module COMPLEX with type complex with real Re, Im; constant complex I function Complex () return complex function Complex (real Re, Im) return complex function Add(complex Z1, Z2) return complex written Z1+Z2 function Sub(complex Z1, Z2) return complex written Z1-Z2 -- ... and more is with type complex is record with real Re, Im constant complex I is Complex(0.0, 1.0) function Complex() return complex is return Complex(0.0) function Complex(real Re, Im) return complex is result.Re := Re result.Im := Im function Add (complex Z1, Z2) return complex written Z1+Z2 is result.Re := Z1.Re + Z2.Re result.Im := Z1.Im + Z2.Im -- ... and so on
The body keyword can be used for procedures, functions, types, enumerations and record types (including modules).
A good way to refer to imported entities is to prefix them with a module name, which can be a short-cut if one has been given in the import statement.
import IO = LX.TEXT_IO procedure SayHello() is IO.WriteLn "Hello, world..."
Oftentimes, it is practical to be able to use members of a module or record directly. The using statement serves that purpose:
import IO = LX.TEXT_IO procedure SayHello() is using IO WriteLn "Hello World..." WriteLn "I feel like talking today" WriteLn "What do you think?"
The same technique can be used to access members of a record directly:
procedure Module(vector V of (vector of complex)) return real is with integer I, J result := 0.0 for I in 1..Size(V) loop for J in 1..Size(V[I]) loop using V[I][J] result += Re*Re + Im*Im -- Found in V[I][J] result := Sqrt(result)
In many cases, the compiler will be able to optimize accesses to data members when this technique is used. Typically, the compiler might be able to compute the address of V[I][J] only once in the above loop.
It is also worth noting that whenever a qualified name is used to access a function or procedure, that same qualified name is implicitly used for the function or procedure arguments. Also, expression reduction applies to imported modules content
import CX = COMPLEX function Test(CX.complex Z) return CX.complex is result := Z + CX.complex(1.5, 3.7) -- '+' declared in COMPLEX result := CX.Sub(result, I) -- I found in CX implicitly
In C, C++ or Ada, the compiler has to know the complete description of a data type presented in an interface. Therefore, these languages require you to expose so-called "private" parts of the interface in the interface files. These are indeed all but private, since they are exposed to all users of the module.
struct Complex { Complex(double, double); double Re(); double Im(); private: double _Re, _Im; };
LX, on the other hand, relies on the normal visibility rules to identify what operations are allowed on a type and what operations are not. To achieve that goal, LX allows slightly more operations with types for which the definition is not available. Also, LX offers type interfaces to control the exposure of data fields.
-- Module interface type complex function Complex(real Re, Im) return complex function Re(complex Z) return real function Im(complex Z) return real -- Operations allowed to the module user: the type is opaque complex I := complex(0.0, 1.0) real Zero := Re(Z)
The for loop in LX is controlled by an iterator object. Iterator objects offer operations such as starting the loop, testing if this is the last iteration, and advancing to the next iteration. Iterator objects are typically created using iterator expressions, such as I in 1..5. The definition for such an integer iterator can be given as follows:
LX offers numerous standard iterators, for instance to iterate over arrays, lists, vectors and other containers.type integer_iterator is record with reference Counter to integer integer Low, High function integer_iterator(reference Counter to integer; integer Low, High) return integer_iterator written Counter in Low..High is result.Counter := reference(Counter) result.Low := Low result.High := High procedure Start(in out integer_iterator I) is I.Counter := I.Low function More(integer_iterator I) return boolean is return I.Counter <= I.High procedure Next(in out integer_iterator I) is I.Counter += 1 procedure Test_Integer_Iterator() is with integer I for I in 1..5 loop WriteLn "I=", I
procedure Write(array A) is with array.item I for I in A loop Write I
C and C++ have a switch statement to select among various alternatives. This statement cannot be used for non-integral values, in particular objects. LX equivalent statement is the case statement, but it can be used for many more data types.
procedure Analyze_This(integer X) is case X is when 1: WriteLn "X=1" when 2..5: WriteLn "X is small" when others: WriteLn "I have no idea" WriteLn "Detailed analysis complete"
The case statement can be applied to any type, provided a suitable Index function is defined. The Index function returns the one-based index of its first argument among all other arguments, or a value which is not within 1..Number_of_arguments-1 otherwise. Index functions are defined by default for numeric types, strings and characters.
type point is record with integer X, Y type rectangle is record with integer X, Y, W, H function Index(point P) return integer is return 1 -- Not found function Index(point P; point Test; others) return integer is if P = Test then return 1; else return Index(P, others) + 1 function Index(point P; rectangle Test; others) return integer is if P in Test then return 1; else return Index(P, others) + 1 procedure TestPoint(point P; rectangle R1, R2; point Q) is case P is when R1: WriteLn "P is in R1" when R2: WriteLn "P is in R2" when Q: WriteLn "P = Q" when others: WriteLn "P is somewhere, who knows?"
C and C++ have numerous specialized keywords, such as register, volatile or inline which are implementations hints to the compiler. Unfortunately, numerous situations are not covered by standard keywords. As a result, some implementations of C and C++ had to add their own keywords, such as __far, __export or __thread. C and C++ also have pragmas, but their use is awkward, due to the preprocessor-style syntax. Ada also has pragmas, but they require a lengthy notation which makes them difficult to use.
LX uses pragmas for all such keywords, and many more. The pragma notation in LX is also much more convenient, since they are often used. Pragmas are simply placed between curly braces. Pragmas can significantly change the implementation, but they normally do not change the fact that the program compiles or not.
{inline} function Get_Amount(account A) return amount is return A.Amount {address 16#EFFF_4DB0} record Control_Register with boolean Enable {offset 0} {bit 0} boolean Reset {offset 0} {bit 3} boolean Error {offset 0} {bit 7} integer Count {offset 1} {bit 0} {bitsize 3} boolean Overflow{offset 1} {bit 4} {lazy} function And(boolean A, B) return boolean written A and B is if not(A) then return false return B {C "_memcpy"} function memcpy(address Target, Source; unsigned Size) return address
Every compiler can define its own pragmas, but some pragmas are standardized by the language, including:
Last, new pragmas can be added by the user to extend the compiler, using reflection.
Several languages such as Ada, C++ and Eiffel provide exceptions as a mean to deal with error conditions. Exceptions can be used to signal these conditions to calling functions. Exception handlers are used to deal with the exceptions.
Ada and C++ exceptions are signals that are handled typically at one place, and then stopped there. An exception handler can rethrow the exception, but this takes a slight effort.
Eiffel and LX use a slightly different model, where exceptions signal an anomalous condition that persists until explictly cleared. Exception handler perform cleanup before propagating the exception, rather than just intercepting it. LX also offers the possibility to retry the block that caused the exception.
exception SERIAL_OVERRUN procedure Copy_Serial_Data(serial_port Input, Output) is with integer Retries := 0 try loop with byte B := Read(Input) exit if B = EOT catch SERIAL_OVERRUN: Retries += 1 if Retries < 3 then retry Reset Input Reset Output catch others: Reset Input Reset Output
More and more applications are very dynamic in nature. There are types that are practically always accessed (using pointers or similar entities) rather than referenced directly. In an object-oriented design, the types being accessed often form a deep inheritance hierarchy. LX facilitates the manipulation of such types by allowing the use of pointers to become implicit. Using such types become very similar to the way the types are used in Java.
A dynamic object is implemented by deriving the implementation from the object data type.
type person is object with string name, first_name person father, mother function person(string name, first_name; person father := NULL, mother := NULL) return person is result.name := name result.first_name := name result.father := father result.mother := mother function CreateJohnDoe() return person is return person("Doe", "John", person("Doe", "Jack"), person("Duh", "Joan"))
C++ offers dynamic dispatch, that is the invokation of different methods based on the actual type of an object. The mechanism in C++ is called virtual functions. The argument on which the dispatch is being done uses a special syntax: it is not passed within the parenthesed arguments to the function, but before a dot '.' or arrow '->' operator, and becomes the implicit 'this' in the virtual function. Virtual functions must be declared within the class declaration for the first argument. Last, dynamic dispatch is invoked through pointers or references, which are allowed to point to objects of derived classes: pointers and references are polymorphic.
struct Shape { virtual float Surface(); ... }; struct Rectangle: Shape { Rectangle (double, double, double, double); virtual float Surface(); ... }; Shape *shape = new Rectangle(1.3, 4.5, 7.0, 9.8); float surface = shape->Surface();
This technique creates a subtle problem: you can't do dynamic dispatch if the functionality was not initially part of the base class. If the user of Shape needs a virtual Draw functionality, the only place where it can be added is class Shape. Adding this functionality is possible only if you own the class (that is, you can't add it if Shape was part of a third-party library). Even if you can add the functionality, it may break dependent code, for instance code that used the name Draw already. This problem is known as the weak base class problem, and can be very significant in large scale software engineering with C++.
type shape type rectangle like shape function Rectangle(real Top, Left, Bottom, Right) return rectangle function Surface(any shape S) return real function Surface(any rectangle S) return real any shape S := rectangle(1.0, 3.14, 2.718, 9.00) real Surface := Surface(S) -- Create a polymorphic type type polymorphic_shape is any shape
A polymorphic object can be converted to a non-polymorphic type or to a polymorphic type of a base or derived type using the as operator. The operator raises an exception if the dynamic type doesn't match the conversion being made (this never happens for conversion to a base type). Conversion to a base dynamic type may result in a truncation. The as operator is also allowed on pointer and access types and returns null rather than raise an exception if the conversion is invalid.
procedure Draw (rectangle R) function BoundingBox(any shape S) return rectangle function DrawBoundingBox(polymorphic_shape S) is if S isA rectangle then Draw S as rectangle -- Raises exception if S not a rectangle else Draw BoundingBox(S)
Dynamic dispatch follows the rules of logical inheritance, and ignores data inheritance. In LX, data inheritance is a practical tool for implementing types, but is not in general exposed in the user-visible interface of the type.
Expression reduction can be used on expressions that result in dynamic dispatch. This can be used in particular to mimic the syntax used by other languages, whenever that notation is natural or easier to read.
function Surface(any shape S) return real written S.Surface() function Offset(any shape S; point P) return any shape written S + P
In C++, dynamic dispatch can only occur on one argument at a time. LX has no such restriction. Note that dispatching through multiple arguments has a significant runtime cost. The rules for selecting the called procedure or function in that case follow the overloading rules when the types are known at compile time.
function Intersect(any shape S1, S2) return any shape written S1 inter S2 function Intersect(any rectangle R1, R2) return any rectangle any shape S1 := rectangle(1, 2, 3, 4) any shape S2 := rectangle(3, 4, 5, 6) any shape S3 := shape() any shape S4 := S1 inter S2 -- Invokes (rectangle, rectangle) any shape S5 := S2 inter S3 -- Invokes (shape, shape)
It is important to stress again that while multi-way dynamic dispatch is possible, it is either slower than single-way dynamic dispatch (it does not execute in constant time), or requires vast amounts of memory, or both.
In many object-oriented development models, invoking a method on an object is called "sending a message", and the object is said to "receive the message". Some languages, such as SmallTalk or Objective-C, have the possibility to forward unknown messages to a "delegate". This can be used to implement small objects, known as "proxies", that filter some messages and send the rest to their delegate.
Although LX does not have a dedicated mechanism for forwarding and delegation, the characteristics of its logical inheritance mechanism make it easy to implement them. The proxy will inherit from the types it wants to forward to, and have a conversion to the base types that converts to its delegates. For instance, to create a grayed_shape proxy that intercepts drawing to any shape, leaving other operations to a shape go through, one can write:
type shape procedure Draw(any shape S) function Surface(any shape S) return real type grayed_shape like shape is record with any shape Delegate function grayed_shape(any shape S) return grayed_shape is result.Delegate := S procedure Draw (any grayed_shape P) is Set_Gray_Mode Draw P.Delegate Reset_Gray_Mode -- Conversion to base, implicitly called function shape(grayed_shape P) return any shape is return P.Delegate procedure Test() is with rectangle R := rectangle(1, 2, 3, 4) with grayed_shape G := grayed_shape(R) Draw R -- Draws the rectangle as is Draw G -- Draws the rectangle grayed out with real S1 := Surface(R) -- Compute rectangle surface with real S2 := Surface(G) -- Compute rectangle surface using delegate WriteLn "S1=", S1, " S2=", S2
In C++, it is possible to define template functions, for instance generic algorithms that apply to a variety of types. However, template parameters have to be repeated over and over, although for a family of algorithms they often tend to be identical.
template <class T> T min(T a, T b) { if (a < b) return a; else return b; } template <class T> T max(T a, T b) { if (a > b) return a; else return b; }
LX introduces the idea of true generic types. Generic types, parameterized or not, can be used directly in parameters or return types as well as in the definition of type interfaces or in generic argument lists. In all cases, they implicitly make the corresponding declaration generic, with the corrresponding parameter. This can make generic code significantly smaller.
generic type ordered function Min(ordered A, B) return ordered is if A < B then return A; else return B function Max(ordered A, B) return ordered is if A > B then return A; else return B
Within a same declaration, a same generic type is identical. Two instances of ordered in the Min function above always correspond to the same type.
The same is also true for parameterized generic types. An instance of the name of such a type without the corresponding parameters makes the declaration generic on all the parameters of the type. In that case, the parameters of the type can be referred to using a dotted notation, as if they were fields of the generic type.
generic [type item] type list function First(list L) return list.item function Last(list L) return list.item
Generic types facilitate the declaration of generic algorithms. But they also can make them more precise. Generic types can be constrained using the if keyword to specify an interface that the generic type must follow. The constraint is generally a small piece of code that is supposed to compile for any value of the generic type. Generic instantiation will fail otherwise.
generic type ordered if -- Indicate that 'ordered' requires a boolean "less-than" with ordered A, B with boolean C := A < B function Min(ordered A, B) return ordered integer X, Y, Z := Min(X, Y) -- OK record T, U, V := Min(T, U) -- Error: no less-than for records
A generic parameter can 'derive from' a generic type, indicating that it inherits its constraints.
generic [type row like ordered; type column like ordered] type map map M[integer, character] -- OK, constraints satisfied map N[integer, record] -- Error: constraints not satisfied
C++ offers template specialization and partial specialization. Specialization is used to define special cases for template instantiation.
template <class T> class vector; // Template template <class T> class vector<T *>; // Partial specialization template <> class vector<bool>; // Specialization
In addition to such "structural" specializations, LX also supports specializations based on predicates, which enables specializations that would require intermediate 'facet' helper classes in C++. Predicated generic specialization makes the intent much clearer.
generic [type T] type vector generic [type T] type vector for vector[pointer to T] -- structural generic type vector for vector[boolean] -- structural generic [type T] type vector when size(T) = size(integer) -- predicated
The combination of expression reduction and type-safe variable argument lists makes it possible to define generic types that behaves really like built-in types used to work in languages such as Pascal.
array A[1..5] of integer array B[1..3, 'A'..'C'] of real A[1] := 3 B[2, 'C'] := 2.5
LX has a standard library that is inspired by the C++ Standard Library for the containers and algorithms part, and by the Java library for graphics and networking. LX has its own libraries for input/output and mathematical operations. Last, LX offers access to the C library when there is one on the system.
The LX library tries to resemble these other libraries whenever possible. However, it sometimes offers a different interface to take advantage of unique LX features.
import IO = LX.Text_IO import ALGO = LX.Algorithms import LIST = LX.List procedure Example() is -- This function reads strings from its standard input and -- sorts them on the output with string S; LIST.list L of string for S in IO.StdIn loop L += S ALGO.Sort L for S in L loop WriteLn S
Reflection is the ability for a program to access its own representation. It can simplify dramatically operations that are defined on complex types as a composition of operations of their simpler subtypes: cloning, encoding data to transmit over the network, disk storage. Generic programming (templates) is not generally well suited for that purpose. Reflection also makes "active libraries" possible, that is libraries that direct the compiler.
Java offers an elementary form of reflection, giving access to a description of classes in a program. However, it is not possible to modify the program by altering the reflective representation. Such "read-only" access to reflective data is often called introspection rather than reflection. Several other languages, most notably LISP, have given programs the ability to modify themselves.
LX takes another approach to reflection. Whenever the compiler encounters a pragma, it invokes a compiler extension (typically, a shared library) and hands it a complete, standardized representation of the declaration or instruction being compiled, as well as the context in which it is being compiled. The compiler extension can then decide what to do with the code it has received. This makes LX the first language that mandates user-extensible compilers.
For instance, one may declare a {clonable} pragma that automatically generates a "Clone" function from type definitions.
{clonable} type tree is record with access Left to tree access Right to tree
For a more in-depth discussion of reflection, see the Mozart page. Mozart is the basis for LX reflection.