The design of the E programming language
Transcription
The design of the E programming language
The Design of the E Programming JOEL E. RICHARDSON, University MICHAEL Language J. CAREY, and DANIEL T. SCHUH of Wisconsin E is an extension of C++ designed for writing software systems to support persistent applications. Originally designed as a language for implementing database systems, E has evolved into a general persistent programming language E was the first C++ extension to support transparent persistence, the first C++ implementation to support generic classes, and remains the only C++ extension to provide general-purpose lterators, In addition to its contributions to the C + + programming domain, work on E has made several contributions to the field of persmtent languages in general, including several distinct implementations of persistence. Thm paper describes the main features of E and shows through examples how E addresses many of the problems that arise in building persistent systems. Categories and Subject Descriptors: D.3.2 [Programming Languages]: Language [Programming Languages] structs and Features; H.2.3 [Database Management]: Languages—database tlons—obJect-oriented languages, C++; D.3.3 Classifica- Language Con( persistent ) la n - guuges General Additional Terms: Design, Languages Key Words and Phrases. Extensible database systems, persistent object management 1. MOTIVATION In the mid-1980’s, several database research groups, responding to the needs of a variety of emerging applications, began to explore “extensible database systems” [9, 15, 16, 44, 49]. Although different groups have different notions of what “extensible” means, a common desire is to support a high degree of flexibility for customizing the database system to the user’s application. Such flexibility is mostly lacking in today’s commercial systems, and attempts to provide it through added software layers experienced severe performance penalties [Care85]. These experiences prompted researchers to explore system architectures which could be extended easily and without loss of perfor- This research was partially supported by the Defense Advanced Research Projects Agency under contract NOO014-88-K-0303, by the National Science Foundation under grant IRI-8657323, by Fellowships from IBM and the University of Wisconsin, by DEC through its Incentives for Excellence program, and by grant from GTE Laboratories, Texas Instruments, and Apple Computer. Authors’ addresses: J. E. Richardson, IBM Almaden Research Center (K55/801), 650 Harry Road, San Jose, CA 95120. M J. Carey and D. T, Schuh, Computer Sciences Department, University of Wisconsin, 1210 West Dayton Street, Madison, WI 53706. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is ~ven that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. @ 1993 ACM 0164–0925/93/0700-0494 $01.50 ACM Tmns.actmns OnPmgnmnmng Languages and Systems, Vol 15, No 3, July 1993, Pages 494–534 The Design of the E Programming Language . 495 mance. For example, in an extensible DBMS, it should be easy to augment the collection of “base” types with new user-defined types.1 The EXODUS project at the University of Wisconsin has been exploring a toolkit approach to extensibility. The software tools simplify the construction of customized database systems as well as the extension of these systems once they have been built. The first component of the EXODUS toolkit is the EXODUS Storage Manager [14]. It provides basic support for objects, files, and transactions. Next is the E programming language and its compiler. This paper describes the E language design; various approaches to the implementation of E are described in Richardson and Carey [41], Schuh et al. [48] and Shekita and Zwilling [51]. The third major EXODUS component is the Optimizer Generator [21], which allows a database implementor (DBI) to produce a customized query optimizer from a set of rules describing a given query algebra. E was originally intended as the language in which to write database system code, that is, abstract data types (eg., time), access methods (eg., grid files), and operator methods (eg., hash join) were all to be written in E. E was also intended as a target language for schema and query compilation, with user-defined schemas being translated into E types and user queries into E procedures [12, 39]. The difficulty in building a DBMS in a conventional systems programming language (eg., C) derives from several factors. First, the DBI must write code whose primary task is to manipulate shared data on secondary storage. A significant portion of the total system code is therefore devoted to interacting with the storage layer, eg., calling the buffer manager to read a record. Second, the DBI must write code for operators and access methods without knowing a priori the data types on which they might operate. For example, the DBI cannot know that some user will eventually want to build an index, keyed on area, over a set of polygons. Finally, the DBI must build a query processor that converts queries posed by end users into a form that the system can execute. This translation is greatly simplified if the basic operators can be written in a composable manner. These and other influences led to the design of the E language. While the original goals are still clearly evident, E has evolved into a general purpose language for coding persistent applications. E is an extension of C + + [56, 18] providing generator classes, iterators and persistent objects. C + + provided a good starting point with its class-based data abstraction features and its expanding popularity as a systems programming language. Parameterized types in the form of generator (or generic) classes were added for their utility both in defining database container types, such as sets and indices, as well as in expressing generic operators, such as select and join. Iterators were added as a useful programming construct in general, and as a mechanism for structuring database queries in particular. Both generators and iterators 1 Commercial ADT-INGRES Ingres now supports user-defined [55] and POSTGRES [44]. ACM TransactIons on Programming abstract data types along the lines pioneered by Languages and Systems, Vol. 15, No. 3, July 1993. 496 . J. E. Richardson et al. were inspired by CLU [32]. Persistence—the ability of a language object to survive from one program run to the next—was added because it is an essential attribute of database objects. In addition, by describing the database or object base in terms of persistent variables, one may then manipulate it directly using expressions within the E language. Two factors influenced the way in which E’s extensions were added to C + +. The first was the desire to maintain upward compatibility, that is, that E should be a strict superset of C + +. Originally, we were forced to make a small compromise in order to support generator classes: in E, nested class definitions follow nested scope rules, while nested class definitions were 1.2 and 2.0 of C + + [56]. Now that exported to the global scope in versions C + + The also supports second subset nesting important of class factor of E be compilable C + + compiler. tence to the object, we (db) The class context for ln Avalon\C time, to is block. is several first based on in of any to contributions be a to the + + to extend [26, 17], C++ inheritance objects Persistence a persis- declared Avalon/C the persistent by added fully. and were C++ produced we type more E persistence access pinning of a special point the to be a property whose this a problem. that that which persistence objects same + +, and in longer desired than way languages. the is no We of E represent programming recoverable, this efficient the 5 explains at approximately persistence. no less of allowing only implementation persistent designed instead Section and code [18], efficiency. influenced persistence type. design of into primarily language: allow database field This scopes concerned with from must be E is based on the within the the persis- storage class, and the 1/0 required to access a persistent object is tent The storage class approach has since appeared in at least two transparent. more recent E was language also the implementation E designs, first predated remains the Persistent extension only to add the C + + Modula-3 generic C + + [27] classes template extension to and design provide ObjectStore to C + +; its [57]. [31]. design In and addition, a general-purpose iterator construct. This remainder basics ing sections C ++. other the of this of C + + classes, then These tree redefines types the and 3-5, persistent briefly current E. to indicate ACM Transactions modifies several Since E of C + + 7. Section of the how the presentation the on Programmmg parts tree. the shortcomings was have first An uses language of the the current and several is E adds to found in to help extend tree a part in are of a Sections design of these and database and\or implemented, other discussed by summarizing included application the the follow- generators design presentation Appendix that 5 describes to make E The features them Section designed appeared; database similar 4 describes example of 8 concludes language. and Section 2 reviews example. extensions to iterators Section tree language binary and binary features as a generic of Section E’s keys. 6 discusses status main duplicate extensions in three compare 3 discusses the as follows: a simple Section example implementation is organized to support Following Section the also persistence database. paper present sections languages. binary paper introducing at described the end by Languages and Systems, Vol 15, No 3, July 1993 the of the Atkinson The Design of the E Programming and Buneman the examples 2. C++ The C [4] in original design with several been it should compiled abstraction of features, highlight the with v.2.O has to this version. C + + and which and be noted 497 that all of run. by [56], order present in which extended function multiple calls, inheritance, to avoid examples E, eg., v.1.2 type-checked In of C + + added of C + + added our were interaction features extension overloading, C + + ported review an operator features. been our tance) in E. Finally, have of E was classes, other recently will be expressed paper . REVIEW [30] tion, can this Language unnecessary will include version and inheritance (especially, see Section 5.3.3. complica- only 1.2. Where and E has the data necessary, multiple we inheri- 2.1 Classes A central its concept C++ includes as operations well the abstraction class are not and a starting In for always CLU method reference to a member was an or Smalltalk data instance. members [20], our a C + + and capabilities for as the It is up (data abstraction and class Unlike of instances. motivations a type, of the to the function) that selection C + + of C + + as of E. objects member invoked. members, Member function, to that this, which Within class methods). the is bound data called (or within class parameter, x of the are functions instance; of the an implicit the [32] which The main design called on which 2.2 public. of the member through in defines instance on representation explicitly to a specific to a data is realized are A class of any performed representation are applied reference be the declare the parlance, operations may hide one of a class. representation provided to were point C++ class that which provide notion the necessarily of a class private classes is the both mechanisms does designer are in definition any instance. a member function, is equivalent to this ~ unqualified The is a pointer and functions binding to the object an unqualified X. Inheritance Another reason supports subtyping. that we chose Given C++ a class as a A, we may starting define point a class for E B that is that it is a subtype of A as follows: class{...}; class B: public A{...}; A is called the representation this context without B. B may reimplement base class, of A as well specifies this keyword, declare the and as As that public public the data member functions derived member members members additional ACM Transactions B, class. functions. of A are of A would members and defined on Programming in The also become member B inherits both public keyword public private functions, members in of B; members and the of it may A. Languages and Systems, Vol. 15, No, 3, July 1993. 498 . One the J. E. Richardson et al. of the late key of reimplemented an actually refer a type At by Although the even we show do not of database several and been in both completely specific in solved; we Figures la new in discuss this single- is binding requiring late binding is This task of types Not issues E extends C + + multiple-inheritance, semantics the Late without paper, and objects. the of code object virtual. to be implementation. binding of the C + +, f x may B) is called. In A is of runtime, the or is type invocation At subtypes code. persistent defining (A off with f in an A. defers type existing both and the is binding examples including adapting contains implementation of) languages a method type actual functions types challenges, tence the to be extended member the program B. Late point, declaring programming suppose (static) of type recompilation mechanisms, is, a x whose appropriate subtyping realm That suppose that hierarchy to (or achieved and to an instance and changes calls. pointer runtime. determined, allows B, object f until of object-oriented method in through for contributions binding all of further and these in to has presented type persis- problems Section have 6. 2.3 An Example The example in simple binary to the address stores a floating pointers to classes: one tree itself. point to the which node and tree in that In and a pointer key. subtrees. to the The nodes in the tree in search is unbalanced, and we its “wrapper” simple while limit the a very value example, and both is a simple for a key each uses one and operations on the of defines the (i.e., are encapsulates the with a pair which insert showing node along representation that still tree entity implementation (i.e., example definition is to map indexed operations the tree this is recursive, its class C++ having the class a complete of the right defines is operation key and lb basic nodes recursive the major tree nodes. features, to inserting searching. la gives the of each node in floating and entity left The Figure tion its and The point to keep tree and of an The order index. nodes) methods). In tree point pointers The key value to the left interface search methods member instances; the node. constructor will created. If the 2 In C++, a void* ACM Transactions of the follows and right introduces the and functions binaryTreeNode created to tree class. The insert, a constructor binaryTreeNode guarantees be invoked may legally takes Each 2 to the indexed (leftChild of member and The representa- node a (entPtr), rightChild). declarations that following for the its constructor that if class class. initializes a class arguments, declaration. Constructors form has whenever they all the The the fields a constructor, an instance must function initialize class of a newly then of the be supplied point to any type of object. on Programming contains entity to binaryTreeNode comprises the as one named binaryTreeNode. These automatically constructor heading. interface as well elaborated C + + class subtrees a set are is binaryTreeNode. class the (node Key), a pointer public keyword public definition the Languages and Systems, Vol. 15, No. 3, July 1993 that class with is the The Design of the E Programming Language . 499 class bmaryTreeNode { float void bmaryTreeNode bmaryTreeNode nodeKey; *entPtr; ●leftChild; +rightChild, public” binaryTreeNode( float, void, ); void. search( float ); void Inserl( bmaryTreeNode* ); /. constructor./ ); bmaryTreeNode:’blnaryTreeNode( void+ insertPfr ) { float inserfKey, nodeKey = mserfKey; entPtr = msertf%, leftChdd = nghtChdd = NULL; 1 void . binary TreeNode. search( float searchKey ) { if( searchKey == nodeKey ) return entpfr; else if( searchKey < nodeKey ) if( leftChild == NULL ) return NULL; else return leftChild+search( searchKey ), else if( rightChlld == NULL ) return NULL, else return rlghtChlld+search( searchKey ), ) void blnaryTreeNode msert( bmaryTreeNode, newNode ) if( newNode+nodeKey == this+ nodeKey ){ return, /. no duplicates allowed./ else if( newNode+nodeKey < thls+nodeKey ) if( leftChdd == NULL ) leftChdd = newNode, else leftCh[ld+nsert( newNode ); else if( rlghtChild == NULL ) rlghtChlld = newNode; else rlghtChlld+nsert( newNode ), ) Fig. 1. object’s declaration. binaryTreeNode For (a) Class definition example, The sively is searches not function We member to be The the and a pointer that routine declaration this either null to a new node has searches for the node been is which on Programming key node’s entity subtree not position and its or exist, is to be inserted adds the the and the the recurkey member into key a null with insert The with and key pointer does returned. initialized proper a zero the node’s If that pointer the with compares returns subtree. the ACM Transactions initialized search appropriate found, takes assume values. instance function searchKe y, and argument, the tree nodes. aNode(O.0, NULL); aNode is a binaryTreeNode pointer. in for binary tree. pointer node as a Languages and Systems, Vol. 15, No. 3, July 1993. 500 J. E. Richardson . et al. class binaryTree { binaryTreeNode+ root; public ●/ /, constructor binaryTreeo, void . search( float ), void insert( float, void ), ● }1 bmaryTree .binaryTreeo root = NULL: ( ) void . blnaryTree search( float searchKey ) { if( root == NULL ) return NULL; else return root+ search( searchKey ), msert( float msertKey, void msertPtr ) ( void bmaryTree binaryTreeNode, newNode = new bmaryTreeNode( if( root == NULL ) root = newNode, else root+ lnsert( newNode ), ● Fig. 1. new leaf. Note remedy this Figure that lb gives class used to start the is pointer to a is NULL, contains the simply trees rejected; the of to provided arguments. the otherwise, the next section will we root node recursively. creating a which instance If the tree pass the first of a class new to the and new the root, operator heap. Again, we immediately and the if it function a constructor, node mainly this pointer, The on having new is binaryTree initializes root allocated the it of a insert member The is empty, node the dynamically. been and constructor check node has class, representation binaryTree The class. As we said earlier, node The tree, an we the the a node creating around node. search search binaryTree of the eg., in a search. root To example are root; are wrapper recursion, a pointer we keys definition a thin we an returns since the really to NULL. is not duplicate for binary ); shortcoming. this pointer (b) Class definition msertKey, insertplr have becomes insert proceeds recursively. 3. ITERATORS We now consider language [32]. rating the both elegant first its production and two cooperating ACM TransactIons of two many E extensions practical. the agents, on Programmmg that contributions, of a sequence highly as it generalizes iterator, prises the Among of values This iteration an iterator were CLU from control found in function inspired by the CLU that sepa- demonstrated the use of those abstraction for loops. (i-function) is An values called iterator and Languages and Systems, Vol. 15, No. 3, July 1993 is an com- an iterate The Design of the E Programming loop (i-loop), The i-loop stream that work is a client of values. together of the The to produce i-function, i-function produces value one-at-a-time to the client when an yields a value, i-function requests can another value, be viewed within the the loop. form of an process it views that state from resumes execution. of coroutine, one source each a normal is preserved. may of a result function, When Thus, that 501 of values. as the yielding by a return local . a sequence simply stream Unlike its i-function as a limited context and which Language the an i-loop i-function be invoked only i-loop. 3.1 Iterators in E We originally query included processing, as a general like and take parameters the comprising the and Consider the the one makes a second each yield processes the tion will also terminate only one pass, it declares following i-loop foo, and An also value An return i-function any i-function looks the type. may may The code invoke other the yield bottom, suspends its return.) the an invocation activates the receive i-functions f and the this may shows many. one or more which to values. and iterator by arguments client example have it i-func- terminates, a statement yielded first average. the (An followed by actual the while loop may iterate, the element, Although followed supplies to it Then terminates. i-function keyword parentheses, for greater average. than next the also are is invoked, the is larger the When i-function that execution requests general, comprises in that a normal in compute element point. the to bigEle- i-function array bigElements order client of the integer the For forms i-function, example, 9, where the yield off, one associated the are int with x Figure 2 types );foo y=g( are two );int z=f( )){... simultaneous } activations Z. maybe a main After of the initializing terminates, it is and the After still the control flows A, body to the that control prints next of an uses enters to the bigElements on Programming context i-loop returns loop if the an array control active; ACM Transactions within containing When sequence. if only invoked program is activated, bigElements also the a variable i-function i-function function precedes of purpose When in each when Each there with shows iterator. an usefulness respectively: that one values unsorted elements. array executing iterate(mt x=f( Note statements. database of their an iterator yield 2. The of an the statement, body. Figure of the after loop loop structuring iterator yield arbitrary; bigElements invocations and may is yielding the by the and of all out” iterate i-function in through point, yield in convinced keyword contain elements element; “falls An the resume control the and body example pass may utility Syntactically, that type their became be recursive. average makes At body any i-function may ments is to yield than except function of E for quickly construct. function, type, in we programming a normal iterators iterators although loop, that i-loop. big Elements the the loop, nextEl holds value, control and the returns has terminated, then statement program. in the the the first to loop Languages and Systems, Vol. 15, No 3, July 1993. 502 J. E. Richardson et al. . iterator mt blgElements( mt . array, float sum=OO, float ave = 0.0, mt size ) { if(slze > O) { /+ first compute the average,/ for( mt I = O; t< size, I++) sum += array[ I ], ave = sum / size, ) /. now yield the blg elements Fig. 2. A simple iterator ./ for( mt I = O, I < .9ze, I++ ) if( array[ I ] > ave ) yield array[ I ], example. ) mamo { mt A[1O], /. Irutiallze A./ /. Now find blg elements ./ iterate( mt nextEl = bigElements( printf(’’”l~d “, nextEl), A, 10 ) ) } Flow of Control 3.2 In the above defined. the example, That is, i-function ber is loop this providing the default are still ing be applied to an 3.2.1 tive in stream produce any client In i-loop with the several order variable on this this the terminates to with termiremain- determine which that i-function O otherwise. all others the the empty, 1 if the the order when while of function, and case, loop; terminated program returns terminated, In of the associated the the theme, capabilities. resumption a built-in empty(v) v has loop have the allow E providefi variable; top the controls concurrently. variables by to flow num- until i-function at the iteration, The iterates variations The is implicitly each value. loop control i-functions loop loop in may activa- we will see shortly. Aduance. certain The cases. of values the for i-functions of the next a single general unaffected loop the i-function; more all terminated, associated example obtain addition, If some values i-functions. have of the to be activated resumes are an iterate top implementation-dependent. the i-functions tion with terminated. active, the E provides i-functions is the to In example. i-functions active by through at order terminate. of control have and in programmer resumption i-functions nated to multiple flow of control entry, determined simple E allows of is decides in flow loop resumed of iterations i-function the at ACM TransactIons flow an by merging input of other default Consider streams i-functions. on Programmmg two using The of control iterator sorted input iterators, default described that flow above is supposed streams. so the of control is too to yield Naturally, merge i-function is inappropriate Languages and Systems, Vol 15, No 3, July 1993 restrica sorted we wish is to also for this a The Design of the E Programming task. If we simply Language 503 . try iterate(int vail = streaml ( ); int va12= stream2( )){... } then we will have to buffer produced by iterate(int then march one The of the repeat Clearly, streams in number streams). the more advance example, the arbitrary of If we entire inner flexible was and (up the to loop the nesting, ie., for each element part to body entire may sequence } i-loop control statement lock-step, values try vail = streaml ( )) iterate(int va12= stream2( )){... we will outer. down an considered by the is needed. introduced in meet this need. As an consider advance va12; where this loops above. statement The associated va12 with has its separated the va12, new list advance and va12, terminated, The default how receive i-function an flow and explicitly advance will not advance pass then i-functions We first yield the then i-function from iteration. The to of an that the if are it came; terminates maybe i-loop, two either val 1 i-function other stream. i-function yield i-loop, variables, the active we any iteration. loop see of If i-functions from one which loop the element active, body enters and the each empty function the test a comma- with any for control advance are if advanced are activated, values. the ie., and statement, order). through When iterate two activation associated out, statement simply will the activation example. initial of the i-function ); after the advance advance may have form, i-functions stream2, on that the carried only if so, we both of either implementation-dependent are of control context is to resume on a given merge their and If the the until smaller other when it value i-function both i-functions loop can terminated. 3.2.2 Iheczk. which The i-function loop has until all example to of all active still been i-functions i-functions A given i-function sequence, other executed with only if suspended at built structure some that ACM TransactIons it For ie., causes control to release such the example, deallocate on Programming the client may termination loop. need This decide though, a client immediate example, To handle must i-functions, Alternatively, this that So far, explicit terminates point. which the client iteration. require clause. client yield for tasks. optional the with the given by normally, It may, an how any terminated. sometimes bookkeeping syntax on determined have associated may however. shows to resume break out of an i-loop; decide perform merge activation termination iterates yield the advance the the is exhausted. tion general the stream 1 and has its (in are implement i-functions, have In is executed those the case, stream2( this resumed 3 shows to in resumptions then Figure used is default advanced, statement value. statement no within of the of variables; variables then appears effect cases, clause z-loop suppose before is over the heap we have extended a statement while that the an terminating, termina- space which i-function i-function and or to the is is has sup- Languages and Systems, Vol. 15, No 3, July 1993. 504 . J. E. Richardson et al. iterator Fig, 3. Using the advance mt mergeo { iterate( mt vail = streaml (); int va12 = stream20 ) if( empty (vail) ) yield va12; else if( empty (vai2) ) yield vail, else if( vail < va12 ){ yield vail, advance vail, } else{ yield va12, advance va12; statement. } } } pose the p points variable example, if the client to the breaks cleanUp routine defined) root after will of the the be called structure. i-function before Then has the in the yielded i-function following x, the (user- terminates: yield x: cleanup; In the absence of this clause, no further i-function code is executed before termination. 3.3 A Recursive As a final the example, previous only the that it search no to tree to all of the in we If the search Finally, the At the point in levels top the that tree the 5. To the a point stack the we key are is less than has then we picks is a stack tree. This situation has of the immediately the Transactions inserting asked tree, up key the show keys, on Programming the the execution direction from search the results to the insert it we must be have a sequence key rewritten of pointers is greater than of searching current each result the entry in the right key, level the the the node’s to so a match, since we show routine finds value, yields If the return code values of active then above. current node tree in for Figure all 5 (in first Each of Figure 5, On that the any with the is labeled 4. Languages and Systems, Vol 15. No. 3, July 1993 a Assume key value activations arrow is to right order). iterator yield. (The activation At corresponding entries of recursive of the one-by-one. i-functions 5, 3, 8, 5,2, stack of growth.) the the is illustrated to search we before number same yield it we terminated. there left Now, or equal client by subtree. yielding the built left from brevity, the if yield again equal, search amended which keys. simply implementation For instead, the loop, client line matching subtree, level, into with tree keys. have entry; client indicates current with node, keys we 4 as an iterator left subtree of the sample ACM the entries binary duplicate that entry search the if after new Figure current the handles a duplicate many entities the subtree. the in it Assume rejects find search 4 modifies so that routine. longer inserts prepared key Figure section recursively the Iterator Example next at to the with its The Design of the E Programming I iterator void, binaryTreeNode Language float searchKey :search( . 505 before all ) 2( if( searchKey <= nodeKey ) 3“ 4 5 6 7 8 9 { if( leftChild I= NULL) iterate( void. p = leffChlld-search( yield p, searchKey )) if( searchKey == nodeKey ) yield entPtr; 10 11 } elseif( 12 13 14 15 16 rlghtChlldl= NULL) iterate( void p = rightChlld+search( yield p, searchKey )) ● } Fig.4. Recursive search iterator. ,/--—-.. c’ $ 13 Fig. 5. Operation ofrecursive search iterator. 6 c1 10 We note that duplicate tion of all search any the entries active such may been do clauses contain would be on the to this Although not activation choose yielded; i-functions. iterator newest client have break event the yield termination executed out of triggers the statements clauses in the (since as described loop a cascading binary none above, termina- are beginning tree needed), with the stack. 3.4 Iterators in Other Languages Iterators have ously appeared with CLU, (i-functions) and results To [50]. form Alphard next. The C++, one defines and whose construct standard of not a class for provide invoking along iterators, with ACM Transactions that loop state. would While simple on Programming two first for the clean coding invoke init and (and advocated) state solution not the [56]. of the state define is possible an init and functions: the does over defined iterators initialize C + + generators iterating programmer maintain to Contemporanedefine for possible support one a fairly a few loop the is special to provided solution members the languages. a first class) include advance and Alphard, first) any data many programmer in similar functions and for loop for (or A whose member a value a C++ a next. does a for loop both to in the a generator (similar invoke which return write forms allowed provided execution repeatedly in various Alphard then in That is, iteration and one to a special using the conventions. Languages and Systems, Vol. 15, No. 3, July 1993. 506 J. E. Richardson et al. . The Alphard tages iterated over, the target one for data points can as the state as well eral as the signed in several [47], mer to write via predicates general part yields [60], it types the allows both as of the multiple next the CLU, e.g., funccontrol which can [43], and sets), include the program- only implicitly DBPL, offers iterating over and de- a restricted Examples be specified for Sev- languages allow another have [46]. offer query. i-functions, arrays, Owl are languages i-functions preservation iterators Trellis\ applications, These Rigel since Such of a database [1]. implicit lists, the resume iterators, (DBPLs), results but type has in and semantics. database of objects. that required kind invocations, on the iterator to be each is made between save their since O + + i-loops, sets (such logic for programmer depend CLU-like of of writing and form) disadvan- of object iterator. with languages task that over the languages Plain design; container is other arbitrary an explicitly of the occur state the of iterator Pascal/R state programming to simplify form data then (or iterator coding two type the state complicate must problems control database Second, significantly and appeared iteration. of an iteration has each class because state the for a separate the in iterators E. First, is required saving programmer of these data This writing and define involved the to CLU usually explicitly of tion3, Neither approach like desired. types object yield must access responsible and C + + to languages of iterative of and relative a more built-in programmer-provided i-functions. In comparing systems E programmer for support high-level, in hand, parallel cate iterators parallel Owl a query to statement quired is for i-functions just permits one to unique such to permits found be obtained clean up cleanup after clause the Finally, and both the clause per yield each explicit does tuples In not E, the parallel resumption and E termination i-function in form of join predi- a means an i-loop. a whole, are advance i-functions provide of as other found iterators E’s of the i-func- the a restricted computations. Rigel all On satisfying a iteration provide Instead, something be as a target not Trellis/Owl. support explicit premature for to under low-level DBPLs. processed. both E i-functions, stream-merging allowing cleanup CLU intended be to be useful provide Thus, individual and was must intended in many O + + the processing. one and general in of E flow E must as in permitting support E was invocation .4 Rigel that control queries. programmer-defined, invocation, in compilation, query-oriented Trellis\ provided since query implementing E are recall where for E supports or languages, Moreover, database sufficient CLU these language, control. language tions to programming whereas refor Rigel E point. 3 Shaw et al. [50] mentions the difficulty of writing certain kinds of iterators such as those to compute recurrence relations. As a simple example, consider generating the Fibonaccl numbers, The first two numbers are computed differently from the rest, so the next function must know whether it is returning the first, second, or nth ( n > 2) number, In E, we would simply write an i-function having three yield points. 4 This flexibility is not without cost, however, The more restricted semantics of CLU iterators allow for an efficient stack-oriented implementation [5], while z-function activations in E must be allocated on the heap. ACM TransactIons on Programmmg Languages and Systems, Vol. 15, No. 3, July 1993 The Design of the E Programming 4. GENERATOR As mentioned much written the code without eventually on attribute of is fixed, in must programmer is responsible untyped buffer be tasks be confused with to handle for One implicit. 507 of the and on record types, goals by of In the set of Another length the interpreting to an elegant as providing the and for was E can of addition, make generators CLU basic type offset and will methods to extend. calculations original a few the routine. be code is that is difficult were inspired generators) access is that must the of approach to each DBI that knowledge this different offset a DBMS switching system coding We Alphard by explicitly the as of objects has with the facing such operators problem passed pages. mechanical types basic therefore order information specific essentially obvious problems system DBMS The types, and of the flexible traditional in.” One is that, problem A these at hand. types and of the “wired any one a large knowledge types operate basic introduction, for manipulate. attribute . CLASSES in of the Language such [32] (not to to the solution problem. is a parameterized A generator or more stack[T], unknown elements. entity 4.1 being Classes E introduces of generic member the name 4.1.1 ate type a stack type of the key and stack of the that the name. since the element Instantiation. assume we of frames by: in only type In class by the type of the regular types for The the type member notable functions, 6. We class has themselves the omit is that of showing T the in have definition shall feature names as data or as a eg., are specified parameters example, class names, a generator parameters Figure classes.5 A generaformal is used form square the form a bounded the stack wherever is needed. order supplying have formal For is shown class like Syntactically, declarations. functions, parameters; or return class of generator form generator definitions. the class in the of class the except following a specific example, within class, (skeletal) types number as argument other brackets any freely types, for a regular of have be used member the in E parameterized tor class may basis types. indexed. Generator may (formal) given any element In the case of our binary two type parameters: which, introducing ie., one that is defined in terms of one The classic example is the generic type type T, defines the type of a stack of T tree, we will make it a generic class by type, defined to use a generic actual class, arguments we must to the class frame; we can then instanti- first generator. define the For type of class frameStack: stack [frame]; 5 We note that generics have recently been added to C++ (v.3.O and beyond) in the form templates. E generator classes and C + + templates will be compared in Section 4.3.2. ACM Transactions on Programming of Langaages and Systems, Vol. 15, No. 3, July 1993. 508 J, E. Richardson . et al class stack[ class T { } ] f1 Fig. 6. A generic mt T top! stk[ 1001, // top-of-stack Index // the elements stacko, popo:” push(T), sEmptyo, II constructor T votd mt public stack class declaration. 1 Given this definition, can now declare we use frameStack and instances. For example: frameStack S1 ; f; frame // S1 IS a frameStack //f is a frame II push f onto S1 S1 push(f), Attempting error to an empty instantiate stack ber class functions binary ters, with however, One means member with the generic may tree one We of which we must for be flagged as a type and Furthermore, be invoked on objects can is the be able be made key this only type may intStack out” classes generator, type. For two type two key is to constrain a key values the type to to mem- example, paramebe to determine key used these parameter for mem- be introducing order parame- having can to be may the by In to programmer signatures the specified be used of the generic type. the “fleshing within to compare of accomplishing However, by is type define declarations; names parameter any example, types class. class a class then a class. function same If example, could, on instantiating the the generator. stack int is not really body ber Parameters. as in the though functions instantiate on Class constraints even specify but body, the [lnt], also onto S1 will a frame anything time. Constraints 4.1.2 ter push at compile with instance useful, ordering. type: binaryTreeNode class [ class keyType {public: int compare(keyType class entityType{ } *); }, 1 1...}; With this a public an integer. declaration, member an Within the actual class compare function search routine, may that be bound takes we can to keyType only a keyType pointer now compare keys Int cmpVal = search Key.compare(&nodeKey); if(cmpVal < O) {... } else if(cmpVal = = O) {... } else {... } ACM TransactIons on Programmmg Languages and Systems, Vol. 15, No. 3, July 1993 and if it has returns as follows: The Design of the E Programming Of course, compare the there is an additional function ordering within the 4.1.3 be less of the type two requirement than, keys. system, equal Such the semantic integer than zero constraints . 509 returned by the corresponding cannot to be expressed however. Parameters. ~urzetzon that to, or greater Language One shortcoming of the above approach is names of class parameters are formal names, the names of any member functions included as constraints are actual names. In the above, any class may instantiate keyType provided that it has a example member function whose name is literally compare. While this may be useful it may be too restrictive. For example, we may in some contexts, in others a preexisting class that has a comparison routine, but the routine’s have name may not be compare. We may also have a class defining several different comparison routines corresponding to different criteria for ordering instances. To overcome this problem, E allows member functions of a parameter class to be named as separate, formal parameters to the generic. For that, while example, the we could redefine our generic node class as follows: class binaryTreeNode [ class keyType{ }, class entityType{ }, int keyType: :compare(keyType * ) 1 {...}; The keyType parameter member function that name of the specific If the may programmer even may takes a member wishes, be overloaded be instantiated keyType pointer function the must formal operators. or (For with any be supplied actual our class returns and at instantiation member examples, having an integer; functions we have some the time. (or chosen both) to pass compare routine, but we could have instead passed a pair of overloaded operators, < and = = .) To illustrate these points, assume that we have a class dataPoint for to build an index over recording experimental observations and that we wish number taken from the experimental such points. The key is to be a complex complex is defined as follows: data, where a single class complex{ I* representation . . . * / public: int cmplmag(complex*); // compare imaginary int cmpReal(complex *);// compare real parts }; We may imaginary instantiate then parts Despite type in which the keys are ordered by their as follows: class complexNode: approach a node parts the flexibility above still binaryTreeNode[ provided falls ACM Transactions short by in complex, dataPoint, complex: :cmplmag.1; member some on Programming cases. function One class remaining parameters, problem the is that Languages and Systems, Vol. 15, No. 3, July 1993. 510 J. E. Richardson . it is not possible to directly type as (eg., float), methods. instantiate such Another et al. types are is that problem a binary not tree classes it is with and do a fundamental not a relatively have key comparison C+ + common coding to define comparison routines (and other symmetric binary operarather than as member functions of the tors) for a class as friend functions6 class. Thus, while the actual key type might be a class, the comparison routine might not be a member function. To handle these cases, E also allows normal functions to be generic class parameters, as the following example illustrates: practice class binaryTreeNode [ class keyType{ }, class entityType{ }, int compare(keyType *, keyType ~) 1 1...}; Here, any any type (including (nonmember) compare. instantiate tree a fundamental function This is the type) may a matching having solution we will keyType, instantiate signature use in may our be and to used ongoing binary example. Constant 4.1.4 ports is and, within used in useful Parameters. a constant the value. generator must defining instantiation. array For maximum number we may of class may may be be used be a compile of elements class stack[class it members example, kind parameter class, an instantiation in last The The time whose define is a class parameter of freely any as a constant. size a generic const. This depends stack that E sup- fundamental on type The value is particularly the class particular such that the parameter: }, int STKMAX] T{ { Int top; T stk[STKMAX] ; };’”” We may class 4.2 then define intStack: a class stack[lnt, for stacks of one hundred integers as follows: 100]; Nested Instantiation We have shown how binaryTreeNode to define as a generator class. In order to binaryTree, as a generic, we must arrange for binaryTreeNode to be instantiated automatically whenever a user’s program binaryTree. instantiates E allows a new type to be defined within the scope of a class, along the define lines the wrapper of C + + 2.1 class, [18]. In E, creating types in this 6 A class can declare any function to be a friend. Such a function it is permitted to access the class’s representation. ACM Transactions on Programming way includes is not a member defining a of the class, but Languages and Systems, Vol. 15, No. 3, July 1993. The Design of the E Programming Language . 511 type by instantiating a generic. Furthermore, within the context of a generator class GA, we may instantiate another generator GB by supplying GAs parameters to GB, then any instantiation of GA with actual parameters causes a nested instantiation of GB. Thus, we can make the binaryTree class a generator as shown in Figure 7. Within the context of binaryTree, a new class btn is instantiated from binaryTreeNode by passing along the parameters in the next supplied to bi naryTree. We will complete this class definition section. 4.3 Discussion and Comparison with Related Work In E, a generic class is not a true type, and no objects can be declared of such a class. This is in contrast to languages like ML [36] and Fun [10], whose type systems support truly generic objects, eg., a generic identity function in which the return type depends on the type of the actual parameter. E follows the more common practice of defining a generic to be a “mold” for creating new types, and instantiation of a type is defined to be (essentially) equivalent to macro expansion of the generic class definition. Note, however, that this does not imply a macro expansion implementation. (In fact, E follows CLU [5] in compiling generic code that is shared by all instantiations.) Other languages that define generics similarly include CLU [32], Ada [28], Trellis\ Owl [45], and Eiffel [34]. Another language that supports generics of this sort is the most recent version of C + +, ie., C + + v.3.O [18]. A generic class or function in C + + v.3.O is called a template. Unlike an E generator class, a template class cannot specify constraints on the instantiating type. However, the member functions of a template class can invoke methods of the (unknown) actual parameter type; this design implies that instantiating a template with a type that does not supply the required methods is an error that may not be detected until link time [18]. In E, such an error would be caught at compile time. Unlike C++ templates, E does not directly provide generic functions. The same effect can be achieved in E, however, by defining the function as a static member of a generic class.7 In C++ v.3.0, a class created from a template can serve as a base class for further subtyping. Furthermore, a template class T1 may inherit from a contemplate class C or from another template class T2. In the former case, any class instantiated from T1 becomes a subclass of C. In the latter case, any class instantiated from T1 becomes a subclass of one instantiated (automatically) from T2 with the same parameters. For example, in C + + v.3 .0, we can define Set < T > to inherit from Collection < T > , so that for a specific type T’, Set < T’ > is a subtype of Collection < T’ >. Trellis\OwI [45] and Eiffel [35] also support this kind of definition. 7 If a member function of a C++ class is declared static, then it exists independently object of the class and may be invoked directly by using the scope resolution operator example, C: :f ( ); is a valid invocation of static member function f of class C. ACM TransactIons on Programming of any (::). For Languages and Systems, Vol. 15, No. 3, July 1993 512 . J. E. Richardson et al. class bmaryTree ( class class int 1{ keyType{ }, entityType{], compare( keyType,, ClaSsbtn : bmaryTreeNode[ keyType. ) keyType, entltyType, compare ]; btn *root; public: bmaryTreeo; entltyType. search( keyType ); void insert( keyType, entltyType+ ), }; Fig. 7. Ageneric binary tree class, Similar to C ++, E allows a class instantiated from a class generator to serve as a base class for further subtyping. E also allows a generator class to inherit from other types or generics, thus supporting the incremental definition of generic types. Unfortunately, E does not define any subtype relationships between types instantiated from such classes. From a language design standpoint, this is certainly a flaw; fortunately, it has not proved to be a problem in practice. On the other hand. E’s support for constraints on class parameters has been very useful. Finally, E’s implementation of generics in terms of shared, generic code has allowed us to write large, separately-compiled generic modules (eg., a B + tree index). Such large packages would have been impractical with a macroexpanding implementation. Generator classes were designed and implemented prior to the appearance of the C + + template design [57]. Thus, E diverged from C + + in this respect. Later, while templates were still only a design, we were content to continue. Now that templates are finally available, however, we have had to reconsider. Despite the relative strengths of E’s generator design and the possibility of correcting its flaws, we currently plan to drop generator classes in favor of templates when we move to version 3.0. Clearly, it would be a mistake to support both generators and templates, as the result would likely be impossible to understand or to maintain. Equally clearly, the C + + community is going to become familiar with templates and will be (justifiably) reluctant to program in E if they are required to switch to generators. Given these realities, adopting C + + templates in future releases of E seems like the best choice. 5. DB TYPES AND PERSISTENCE In the discussion so far, we have described langaage extensions in E that allow the programmer to process sequences of values and to define parameterized types. Both features are important for database-style programming. However, the data objects available to the program thus far are still volatile objects whose lifetimes are bounded by a program run. We now introduce the features of E that allow a program to create and use persistent objects and thus to describe a database or object base, together with its operations, strictly within the language. ACM Transactions on Programmmg Languages and Systems, Vol 15, No. 3, July 1993 The Design of the E Programming 5.1 Language . 513 Database Types E mirrors the existing C++ types and type constructors with corresponding database types (db types) and type constructors. Any type definable in C++ can be analogously defined as a db type. Db types are used to describe the types of objects in the database, ie., the database schema. However, not every db type object is necessarily part of a database; db type objects may also be allocated on the stack or in the heap. We will shortly convert the binary tree class into a db type. Let us informally define a db type to be: (1) one of the fundamental db types: dbshort, dbint, dblong, dbfloat, dbdouble, dbchar, and dbvoid. Fundamental db types are fully interchangeable with their nondb counterparts, eg., it is legal to multiply an int and a dbshort, assign a dbint to a float, etc. (2) a dbclass (or dbstruct, or dbunion). Every be of a db type. The argument and return be either db or nondb types. data member of a dbclass must types of member functions may (3) a pointer to a db type object. The usual kinds of pointer arithmetic are legal on db pointers, and casting is allowed between one db pointer type and another. It is not possible to convert a db pointer into a normal (nondb) pointer, nor into any nonpointer type (eg., int). (4) an array of db type objects. As in C or C + +, an array name is equivalent to a pointer to its first element. (6) a reference to a db type object. 5.2 The persistent Storage Class Having db types allows the E programmer to define the types of objects in the database. The persistent storage class provides the basis for populating the database. If the declaration of a db type variable specifies that its storage class is persistent, then that variable survives across all runs of the program and across crashes. A simple example is a program that counts the number of times it has ever been run: persistent dbint count= O; main( ){printf(’’This program has been run %d times.”, count++ ); } Here, the integer count is a persistent variable whose initial value is set to O. Each time the program runs, it prints the current value of count and then increments it. Note that there are no explicit calls to read or write count, and there are no references to any external files; I\O is implicit in the program. The great convenience of language support for persistence is that it allows the programmer to concentrate on the algorithm at hand rather than on the details of moving data between disk and main memory [3]. In the first implementation of persistence, the E compiler interacted with the runtime system to reserve a storage location for the object; this address was compiled into the code as a constant [41]. The current implementation uses a more flexible scheme that defers binding to a storage location until ACM Transactions on Programming Languages and Systems, Vol 15, No. 3, July 1993. 514 J. E. Richardson . runtime. in The each DUS Storage with the store file into store either object because has substantial been the the naming that original we for at that run, first discuss in scheme there Section every current or because this implementation, shall of no time While EXO- interacts has first time. names the address object the in it object an is running the to current If variable locations starts program. it is created the storage the over concerning translating E program program deleted,8 for obtain in improvement problems an to named a map corresponding When variable address, keeps their Manager. persistent persistent the persistent E source et al. is a are still 6. Collections 5.3 E provides the built-in generator db class collection[T] to allow the dynamic creation and deletion of objects. For a specific type T’, a collection [T’ ] may of an object within contain objects of type T’ or any subtype of T’. The lifetime a collection is program be persistent. specific db type, that with is Creating in define by object first object. of another heap-like instantiating with built-in a also a objects depending class. It collections, the if will instantiate As or persistent, member in a Collection. order to take programs on should be capable collection of class In control to create C++ v.2.0, of storage objects in one may allocation. overload E uses a collection. As this an example, is defined as a dbclass and that it has a constructor that E code string containing the person’s name. The following person that takes a character defines to particular, that must a collection be volatile a data type, Objects to allow a type type, and may be in then d bvoid. new operator suppose any collection; programmer the declaring possible of argument mechanism may the class, collection it of collection, before also objects the 5.3.1 the and lifetime a persistent generic a given it containing any the in of collection declaration, noted by an object Like type of any the bounded creates describing creates two collections people of persons, within declares an instance of that the ‘collection: dbclass person{...}; dbclass City: collectlon[person]; City Madison, person *PI = new(Madison)person( ’’Jane”); person * p2 = new(Madlson, pl)person(’’Toby”); As shown above, The arguments. newly allocated a collection, subtype overloaded argument object; as long the as the type type of entity since both the are manifest. In new operation specifies argument of the condition created the first can specitied in the the type of the this example, takes be any for the collection. collection we collection expression new The and are one or two additional containing the creating object that is the compiler type same can instances object ACM Transactmns on Programming is not defined within the language. Languages and Systems, Vol 15, No 3, July 1993 to as or a this being of person of persons, but if, for example, student were a subtype we could also create student instances within this collection. variables the verify of the a collection 8 Deletion of named persistent separate utlhty for this task. for evaluates in of person, We provide a The Design of the E Programming Language . 515 Note that a collection in E is similar to a typed heap in that objects are allocated and deallocated in them, rather than inserted or removed. Since an object can exist in only one collection, some tasks can be slightly awkward. For example, to simulate the object’s appearing in another collection C, we would have to declare C as a collection of pointers. However, this design is in keeping with the purpose of E: to be a low-level language supporting the implementation of higher-level data models. For example, a class extent in a higher-level data model could be implemented as a collection of objects, while sets could be implemented as collections of pointers. 5.3.2 Physical Clustering. When objects are stored on disk, their locations relative to one another can have a significant impact on overall performance. Generally speaking, objects that will be used together should be stored together if possible. The second new argument, which is optional, allows the programmer to communicate physical clustering hints to the storage layer. The second new in the example above requests that the new person (Toby) be created near the object referenced by pl (Jane).g In general, the second argument to new may be any pointer-valued expression, and the referenced object need not be of the same type nor in the same collection as the newly created object. It is up to the implementation of the underlying storage layer to determine what “near” means, and at worst, the hint will be ignored. In the current implementation of the EXODUS Storage Manager, the search for a nearby location begins on the same disk page if the objects are allocated in the same collection, and on the same disk cylinder otherwise [14]. Collections. The collection generator class has an iterator 5.3.3 Scanning member function, scan( ), for scanning all of the elements in a collection. This iterator returns a sequence of pointers to the objects in the collection. The following example processes all of the people in Madison: iterate(person * p = Madison. scan( )){... } Note that even though a collection of T may contain objects of a subtype of T, a scan always returns T pointers. For example, the preceding scan always yields a person*, although some instances in the collection might be of type student. The introduction of multiple inheritance in C + + v.2.O posed some challenges in the implementation of collection scans. The challenges stem from the fact that an E collection is implemented as an EXODUS Storage Manager file, which only records the OIDS of the objects that it contains. A file scan thus produces a pointer to the beginning of each object. However, this is not necessarily the correct pointer to return to the E program. For example, suppose class C has supertypes A and B, and we create a C object in 9 In E v.1.2, in and near clauses were added as new syntax extensions. We have elected to drop this syntax in favor of the new operator overloading capability introduced in C++ v.2.O. Under the old E syntax, the second new example would have read: p2 = in(Madlson)near(pl ACM Transactions on Programmmg )new person(”Toby”): Languages and Systems, Vol. 15, No 3, July 1993. 516 . J, E. Richardson et al. a collection[Bl. Now suppose that we scan the collection, obtaining a B pointer to each object in turn. When the scan encounters the C object, the returned pointer must be adjusted to refer to the “B part” of the object. Mechanisms to accomplish this and to handle virtual base classes have been implemented.l” Note that the existing C++ mechanism for moving up the type hierarchy (in which the compiler inserts code to adjust the pointer) does not apply in this situation, as it is not known until runtime that a particular object returned by the scan is actually of type C. The usual C + + delete operator 5.3.4 Destroying Objects and Collections. may be used to remove an object from a collection. For example, we can delete Toby (in the previous example) with: delete p2; If the object’s type has a destructor, the destructor will be called first and then the object will be destroyed. If a collection is destroyed, the objects that it contains are also destroyed, If the collection contains objects of a class having a destructor, then the destructor will be invoked on each object before the collection is destroyed. ConcepAssume that we wish to delete Madison, which is a collection[person]. tually, this process involves the following steps: iterate(person * p = Madisonscan( /* now destroy the empty collection )){delete p; } ... * / For performance reasons, however, our implementation does not actually destroy the objects individually. Rather, the destructor calls reinitialize each object, and then the entire collection is destroyed en masse. The semantics of persistent object destruction parallel those of volatile object destruction and consequently inherit all of the same problems. In particular, dangling references are possible. Since the storage layer never reuses object ids, however, we prevent the worst effect of dangling references, ie., the overwriting of random data. Other problems associated with explicit deletion, such as creation of garbage, are not addressed by our design. C + + relies on destructors to ensure proper cleanup when an object is deleted. In designing E, we elected to extend the existing semantics to persistent objects rather than attempting to define implicit deletion semantics for C ++. 5.4 The Binary Tree Example Revisited Let us now (finally) Unlike the previous reimplement incremental our binary tree example as a db type. examples, here we reproduce the entire 10The offset of the B part M stored in the object header (provided by the EXODUS Storage Manager), and the scan lterator adjusts the pointer by this amount before returning it to the client, If B is a virtual base class, then the object header contains the offset of the virtual base pointer within the object, m which case the scan iterator retrieves the pointer out of the object itself and returns it. In either case, the appropriate B part must be unambiguous or else an error that attempts to create a C will be reported by the compiler when it processes the new statement object in the collectlon[B]. ACM Transactions on Programming Languages and Systems, Vol. 15, No. 3, July 1993 The Design of the E Programming Language . 517 implementation for comparison with the original C + + version. The node class shown in Figure 8a has changed from the C + + version of Figure la in the following ways: The insert routine accepts duplicates, and the search routine is an iterator (as developed in Section 3). The key and entity types are type parameters, and the key comparison routine is a function parameter (as developed in Section 4). The class itself is a dbclass (as developed in this section). Figure 8b shows the binary tree class. In order to define this class, we must first instantiate two new classes which we will then use. The class btn is ie,, binaryTreeNode instantiated with the same parameters as for binaryTree, this is a nested instantiation as described in Section 4. Next, btnSet is instantiated as a type of collection containing btn nodes. The binary tree itself is now represented as allNodes, a collection containing the nodes, and root, a pointer to the root node. On an insert, the new node is allocated in the tree’s collection. Other changes to the binary tree class parallel those made for the node class, ie., the use of type parameters, and the definition of search as an iterator. Finally, Figure 8C shows an example using a persistent binary tree index. This program builds an index over students keyed on grade point average (gpa). Since the students must persist, we first define school as a collection of students, and we declare a persistent instance, UWmadison, of this type. We then define a comparison routine for floating point numbers, and we use this routine, along with the types student and dbfloat, to instantiate a specific index type. Next we declare a persistent index, gpal ndex. Finally, the main program shows examples of creating a new student and adding the corresponding index entry and of iterating over all students with a given gpa. 5.5 Implementing a Disk-Based Index The binary tree example developed in this paper is clearly hinting at the implementation of “real” database index structures, eg., B + trees, in which each node contains many keys. In defining such structures, an essential constraint is that each index node must fit on one disk page and must make maximal use of the space on that page. If we define the node type as a generic class, then the number of keys that will fit on a page varies with the specific key type. One approach is to define the generator with a constant parameter, as we did in the stack example of Section 4. However, this approach forces the user of the class to compute the maximal number of keys for each instantiation. An easier approach is to make use of the fact that within a generator, the expression sizeof(T), where T is a type parameter, is treated as a constant and may be used in declaring array bounds. For example, assume that PAGESIZE is a constant giving the size of a disk page in bytes. In Figure 9, we have outlined the definition of a simplified generic class describing leaf nodes in a B + tree; each node is to contain an array of key-pointer pairs where the number of array elements is the maximum that will fit on one page. Like the binary tree example, this class is parameterized by the key and entity types and by the key comparison routine. For convenience, we have defined an ACM TransactIons on Programming Languages and Systems, Vol. 15, No 3, July 1993. 518 s J E Richardson et al. dbclass bmaryTreeNode [ dbclass keyType{ }, dbclass int eni!tvTvDe{ ,., . },. compare( keyType”, key Type’ 1 1{ keyType entltyType bmaryTreeNode blnaryTreeNode node Key, .entpfr, ,leff Chdd, .rlghtChdd, public blnaryTreeNode( key Type, entltyType *), iterator entltyType’ search( key TyPe), void msert( bmaryTree Node + ) /*constructor’/ ). bmaryTreeNode, blnaryTreeNode[ keyType insertKey, entltyType ‘ Inserfptr ) { nodeKey= msertKey, entPtr= mserfptr, leftChlld = rlghtChlld = NULL } iterator entltyType ”blnaryTreeNode search( key Type searchKey){ mt cmp = compare( &searchKey, &nodeKey ), if( cmp <= O ){ if( leftChild 1= NULL ) herate( enttyType . p = leftChlld+earch( searchKey )) yield p, if( cmp == O ) yield entptr, } else if( rlghtChlld 1= NULL ) iterate( entlfyType yield . p = rlghtChlld-search( searchKey )) p, } void bmaryTreeNode msert( blnaryTreeNode’ newNode ) ( Int cmp = compare( &(newNode+nodeKey), &nodeKey ) if( cmp <= O ) if( leftChlld == NULL ) leftChdd = newNode, else IeftChlld+mserf( newNode ) else if( rlghtChild == NULL ) rlqhtChlld = newNode, else rlghtChild+mserf( Fig t3. (a) The binary newNode ), tree nude clam. auxiliary type, kpp, for key-pointer pairs; the tree node is an array of these it is structures. Note that since kpp is defined in terms of class parameters, also a generic type, and it is implicitly instantiated with each instantiation of BTreeLeaf. We then define two macros for convenience. The amount of usable space on a page is the size of the page minus any overhead for control information; in this simple example, the only control data is an integer giving the current number of entries in the array. Finally, the maximum number of ACM TransactIonson ProgrammingLanguagesand Systems,Vol. 15,No.3, July 1993. The Design of the E Programming dbclass I Language . 519 bmaryTree dbclass dbclass int keyType{ ), entKyType{ ], Compare( keyType., keyType~ ) 1 ( dbclass dbclass btn binary TreeNode[ keyType, enttyType, btnSet collectlon[ btn ]: btnSet bt n compare ]; ailNodes; ●root, public bmaryTreeo, 1, constructor.1 iterator entity Type. search( keyType ), ), void Insert( keyType, enttyType ● ) bmaryTree blnaryTreeo { root = NULL, ) iterator entlyType . blnayTree’ search( keyType searchKey ) f if( root == NULL ) return, else iterate( enMyType yield p, . p = root+ search( searchKey )) J void blnaryTree Insefl( keyType mseflKey, enfityType ● insertPtr ) 1 btn . newNode = new( allNodes ) btn( InsertKey, msertPtr ), if( root == NULL ) root = newNode, else root+ nsert( newNode Fig. 8. ), (b) The binary tree class. array entries is the amount of available space divided by the size of an entry, The data member, kpPairs, is then defined to be an array ie., by sizeof(kpp). whose dimension is this maximum. 5.6 Comparison with Other Persistence Mechanisms 5.6.1 Reachability. The style of persistence provided by E might be termed “allocation-based persistence. That is, an object is persistent only if it is created as such, either by being declared a persistent variable or by being created within a persistent collection. This approach is quite different from reachability-based persistence, in which an object persists if it is reachable from one or more distinguished roots. A number of persistent languages and ACMTransactionsonProgrammmgLanguagesandSystems,Vol. 15,No 3, July 1993. 520 . J. E. Richardson et al. dbclass student{ }; dbclass school collection[ student], persistent school UWmadlson, int compare( dblloat. x, dbfloat ● y) float cmp = (.x - ●y), if( cmp <0 ) return -1, else if( cmp == O ) return O, else return 1, } dbclass gpalndexType binaryTree[ persistent gpalndexType gpalndex, mamo dbfloat, student, compare ], { student ● s, s = new( UWmadlson ) student( gpalndex msert( s+gpa, s), iterate( ), student , s = gpalndex search( 3,0 )) ) Fig. 8. OODBMSS (c) Example using a persistent to date have adopted the latter binary approach, tree eg., PS-Algol [3], Galileo [2], GemStone [33], Zeitgeist [19], and PGraphite [59], Reachability is perhaps a more convenient mechanism for the programmer, who no longer needs to worry about persistent objects containing dangling references, e.g., to a reclaimed volatile object. l[t also allows greater flexibility in that a program can decide whether an object should become persistent after it already exists. In E, a volatile object can be “made” persistent only by copying its value into a persistent object. We elected not to base E’s persistence on reachability for several reasons. As mentioned earlier, we first envisioned E as being a target language for the compilation of higher level data models. We felt that in such a language, the persistence of a given object should be an explicit property rather than an implicit side-effect of being part of a particular data structure. Perhaps a more compelling reason (for E) is that reachability fits most naturally into a garbage-collected language. That is, the reachability traversal for determining persistence requires the same information as that for garbage collection. Because E is an extension of C++, basing E’s persistence on reachability would have required that database types (at least) be quite different from their C + + counterparts, something that we wanted to avoid. For example, unions (or change the we would probably have had to disallow persistent since they do not carry enough information to determine semantics of union) which member is “active.” ACM Transactions on Programming Languages and Systems, Vol. 15, No 3, July 1993, The Design of the E Programming dbclass [ Language . 521 BTreeLeaf dbclass dbclass mt key Type{ ), entltyType{ }, compare( keyType., key Type+ ) 1{ /+ auxiliary deflnltlons./ kpp { dbstruct keyType enttyType. #define MAXSPACE #define MAXENTRIES keyVal, entPtr; (PAGESIZE (MAXSPACE slzeof(dbmt)) / slzeof(kpp)) /. data members./ nKeys, dblnt kpp kpPa!rs[ MAXENTRIES]; public ) Fig. 9. A generic DbClass for B + tree leaf nodes. 5.6.2 Collections and Class Extents. A central issue in the design of a database programming language is how (or whether) collections of persistent relation as a type constructor; objects are defined. Pascal\R [47] introduced tuples could be added or deleted under program control, although the relations themselves could only be named variables. One implication of this restriction is that nested relations were not allowed. Other DBPL’s, eg., Rigel [43] and Plain [60], took a similar approach with similar restrictions. PS-Algol [3], the first language providing fully general (orthogonal) persistence, made the runtime heap the basis for persistence, Any object reachable from a distinguished “database root” pointer would persist. However, the persistent heap had no notion of a collection of objects; such a collection would have to be coded explicitly as a persistent data structure. E takes an intermediate approach. Like the DBPL’s, a given collection stores a specific type of object, and there are facilities for processing all of the objects in a collection. Like PS-Algol, there are no restrictions on the type of object that maybe persistent (except that it must be a db type); for example, one may define collections of persistent heap, however; the collections. E does not provide an implicit object in E requires the specification of a dynamic creation of a persistent collection in which to create the new object. Another popular approach to persistence is to support class extents. An extent is the set of all instances of a class; when an instance is created, the system automatically places the object within the proper extent. Programs can then use class extents to form the basis of queries. Most systems that support extents also define inclusion semantics for subtypes, so the extent of a class includes the extents of all subclasses. A query over an extent may specify whether or not to include subclass extents. Examples of systems that support extents include Orion [8], 02 [7], PC LOS [37], and O + + [1]. ACM Transactions on Programming Langaages and Systems, Vol. 15, No, 3, July 1993. 522 . J. E. Richardson const MAXSTRING typedef dbstruct dbstruct dbchar et al. .16, Strlng[MAXSTRING], Part, // forward decl Use { Pan. Uses; Part. UsedIn, dbmt Quantity, Use. NextUses, NextUsedIn, Use * // // // // // the subparl the composite how many of next entry on next entry on part that uses it the subpart cnmposlte part’s uses chain subpart’s used-m chain )! dbstruct Part ( Name, // name of part UsedIn, // subpart-of chain String Use. Part( char., Use ), Match( char. ), costAndMass(dbmt&, dbint&); ● mt virtual void }. dbstruct basePart dbmt dbmt virtual void public Part{ Cost, // cost of base part Mass, // weight of base part basePart( char,, Use ., dblnt, dbmt ); costAndMass(dblnt&, dbint&), )! dbstruct composite Pan public Parl { Use. MadeFrom, dbmt Assembly Cost, dbmt MassIncrement, virtual void // hst of components // additional cost to assemble com~nents // additional mass to assemble components composite Parf( char., Use +, dbmt, dbint, Use .), costAndMass(dblnt&, dbmt&), )1 dbstruct , persistent ParfsDb ( dbclass dbclass dbclass bpCollType cpCollType useCollType bpCollType cpCollType useCollType bPartsFile, cPartsFde, Uses, Pari . f!nd Part( char. PaflsDb coliectlon[ collectlon[ collectlon[ baseParl ]; compos!tePart Use ], ]; // all base parts // all compcwte parls // all uses/used-m links ), Database, Fig. 10. Task l—Describe the database. Extents are a simple, convenient, and fairly natural way to add persistence an object-oriented !anguage, especially if one is focusing on database applications. In fact, extents maybe seen as an extension of relational DBMS semantics: just as a relation defines both a tuple type and all existing tuples, so does a class define an object type and all existing instances. However, we to ACM Transactions on Programming Languages and Systems, Vol. 15, No. 3, July 1993. The Design of the E Programming Language void Task20 { iterate( basePari. p = t)- ,abase.bPartsFile scano ) If( p+cost >= 1Lo ){ printf (’’name = “), strprint( p+Name ), printf(” cost= %d mass = %d”, psCost, . 523 psMass), 1 } Fig. 11. void Task3(char. Part. p; in!d=O, int m=O; 1 Task 2—Print out expensive parts. name) { P = Database .findPart( name), p+costAndMass( d, m); printf(’’name = O/.s,cost = O/~d,mass = %d”, name, d, m ), Fig. 12. Task 3—Find cost and mass of a part. void Task4( char, name, dbint cost, dbint masslnc, Use + usesLrst ) { /. Assume usesLM points to each subpart, but that subparts have not yet been recorded as being used in this new part. ./ composte Part , newPart, newParf = new( Database cPartsFile ) composite Pari( name, NULL, cost, mass lnc,usesLtst for( Use. up= usesL@: Up 1= NULL, up= up+ NextUses // this use IS due to the new composite part up+ Usedln = newPart; // insert use-record at head of subpart’s up+ NextUsedln = up+ Uses+ Usedln, up+ Uses-+ Usedln = up: used-in ); ) { chain \ Fig. 13. did not feel that tence with types tation language but not to force Task 4—Add new manufacturing step. extents were appropriate for E, since they associate persisrather than with instances. In a general-purpose implemenlike E, we wanted to be able to implement extents if desired, them in all cases. 6. DISCUSSION So far, we have presented the design of the E language and have shown examples of its use. Of course, the true test of a language’s practicality is in its implementation and actual use. We have built several E compilers, and we ACM Transactions on Programming Languages and Systems, Vol. 15, No. 3, July 1993. 524 J. E. Richardson . et al. have explored five distinct implementations of persistence [41, 42,48,51, 61]. Furthermore, we (and others) have used E to build a number of persistent applications. This experience has revealed the language’s weaknesses as well as its strengths, and it uncovered some interesting implementation challenges. In this section, we describe some of the more interesting issues. A detailed evaluation of E will appear in a forthcoming paper. 6.1 Language Design Issues The idea of defining a new storage class Persistent Objects. 6.1.1 Naming struck us as a very clean approach to extending C + + with a persistence mechanism. Just as an auto variable lives for one procedure invocation, and a static variable lives for the lifetime of a process, a persistent variable was to live across processes, as a kind of “super” static object. Under this definition, the name of a persistent variable is not merely a handle for some object in the persistent store, but it is actually a denotation of the object itself. In retrospect, this approach has drawbacks in the area of name space management. Since the names of all top-level persistent objects are variable names, the name space seen by an E program must obey C + + scope rules. The sufficiency of these rules for writing large nonpersistent applications is questionable, and they are even more restrictive for organizing a large database. For example, sharing can be hampered by the fact that a program cannot include two persistent variables (from two different E modules) that have the same name, just as clashes in transient global variable names are disallowed. Similarly, long term program evolution is hampered since it is not currently possible to change the name of a persistent E object once it has been created. Finally, writing general-purpose code can be impeded by the early binding of names to objects, although the use of procedures can largely alleviate this problem. While it is possible to circumvent most of these problems, doing so requires a fair amount of foresight and care when developing a large E program. An alternative would have been to separate the names of objects in the persistent store from the names used in the program text. For example, we could have chosen to limit persistent variable access to pointer traversals, requiring pointer variables to be bound to specific objects via runtime calls (similar to opening a file). Our feeling was that adding persistence as a storage class was more natural, somehow; moreover, it doesn’t prevent E programmers from simulating the latter approach by building and using a directory class to manage the name space of persistent objects dynamically. 6.1.2 Orthogonality. One of the more controversial design points was our decision to give E a “two-headed” type system, rather than simply to introduce persistence as an orthogonal property of all types. Orthogonality is, after all, often cited as a desirable feature of a persistent language [3, 4], and some users of E have complained about its dual approach [23]. The motivation for E’s use of db types stems both from philosophy and implementation concerns. First, E was originally conceived as a language in which to write database management systems. In such systems, there is a clear distinction between ACM Transactions on Programming Languages and Systems, Vol 15, No 3, July 1993 The Design of the E Programming Language . 525 those objects that persist and those that are volatile. For example, lock tables and transaction descriptors are definitely not persistent, while objects in the database definitely are. The “db” attribute of a type distinguishes between objects that may be persistent and those that are definitely volatile. We note that O++, which was designed more recently than E, makes a similar distinction [1]. The separation of normal types from db types also has a strong grounding in performance considerations. Those same system resources that are known to be volatile (eg., those mentioned in the previous paragraph) are often the ones that are accessed with the highest frequency. If every object reference might be a persistent reference, then every access must check that the needed object is in memory. 11 Even if the check costs only one boolean test, we might still significantly increase the cost of accessing these critical system resources. In E, accesses to nondb type objects suffer no loss of performance over the same accesses in C + -t. In addition to the cost of a pointer dereference, another factor in our decision to introduce db types was the representation of pointers. On a VAX, a pointer is 32 bits, giving an address space of approximately 4 GB. However, databases are already exceeding this limit and, in fact, are moving into the terabyte range. If persistence were orthogonal to all types, then we would be faced with several not very appealing alternatives. We could make all pointers 32 bits, and thus guarantee that E would already be obsolete for real world applications. Alternatively, we could make all pointers be “large enougW for our projected needs. In the current implementation, E uses the EXODUS Storage Manager [14] as its persistent store, so every E pointer involves an EXODUS object id (12 bytes) plus an offset (4 bytes). Thus, this approach would quadruple the space needed, and also the copying cost, for every pointer; neither would be desirable from the standpoint of achieving good program performance [40]. 6.2 Implementation Issues While the preceding sections discussed issues related to the language design, in this section we turn our attention to shortcomings in its implementation. As we shall see, these problems stem largely from the C/Unix model of creating and running programs, a model which we attempted to preserve in E. It has become all too apparent that this model is insufficient for supporting a persistent language and that an integrated programming environment is needed. 6.2.1 Type Persistence. tween compilation units In C and C++, type definitions are shared bevia textual inclusion of header files. There is no 11This is not necessarily true for all persistent language implementations. For example, it is possible to implement an object-faulting mechanism that relies on the machine’s paging hardware (eg., see Shekita and Zwilling [51] and Lamb et al. [3 l]). However, such mechanisms tend to limit either the space of addressable objects [51] or the space of concurrently addressable objects [31] to the size of the machine’s virtual address space; they can also have performance problems when programs operate on large (relative to physical memory) databases [51]. ACM Transactions on Programming Languages and Systems, Vol. 15, No. 3, July 1993. 526 s J. E. Richardson et al. notion of a type existing outside of any particular run of the compiler. A given type, T, used to compile one module may or may not be the same T used to compile another. This mechanism is acceptable (though not particularly desirable) for C and C + + programming. For E programs, however, it is critical that the definition of a type remain consistent across compilations. If an object created with type T is later manipulated by a program with a different definition of T, the database can be easily corrupted. If persistent objects were limited to simple C structures, then very careful programming could avoid this problem. For an object-oriented language with persistence, however, even careful programming is insufficient. In order to support late binding of method invocations, each object must carry enough information to identify its own type uniquely. In the face of persistence, the identity of a type must also persist. This issue of persistent type identity is fundamentally at odds with the existing model of writing C programs under Unix. A complete solution to these problems requires an integrated programming environment that provides, among other features, a type library in which a type has an identity independent of any particular compilation. It also requires answering some hard questions such as: How are types shared between programmers? If a type changes, is it still the same type? If so, what happens to existing objects of that type, and if not, how can old programs use the new type?12 These issues extend well beyond the scope of our original intentions in designing E. Thus, our implementation provides only a partial solution to the type identity problem, a solution based on computing a hash value from the class definition. An approximation to type identity is achieved in this way, without environment support, since the same class definition will hash to the same value in different compilations. When a persistent class instance is created, the class’s hash value is stored in the object, later identifying the object’s type during method dispatch. Of course, hash collisions are possible, though extremely unlikely; an E program that includes two classes with the same hash value simply terminates itself at startup time. While we considered several alternative designs, this one seemed to be the best initial compromise. While computing hash values provides an initial 6.2.2 Type Availability. solution to the type persistence problem, there is another related issue that it does not address. The problem is that it is possible for a program to encounter an object whose type was not known when the program was compiled. To see how this problem arises, let us first consider the implementation of C + +. Here, it is assumed that every object encountered by a program was created by that program. As a result, for every type of object encountered, there is at least one module of the program (ie., the one that created the object) that knows about its type. The implementation of virtual function dispatch can 12The latter questions relate to the problem of schema evolution, a well-known hard problem in database systems. Researchers in object-oriented database systems have offered several approaches to the problem [8, 38, 53], although none seems entirely satisfactory. ACM TransactIons on Programming Languages and Systems, Vol 15. No 3, July 1993 The Design of the E Programming Language . 527 then assume that both the method dispatch table (vtbl) and the method code itself are somewhere in the program’s address space. Unfortunately, these assumptions can break down in the presence of persistence. Suppose we have a persistent graph, G, whose nodes are of type T1. Suppose that we write a program PI that traverses G, invoking a virtual function on each node. In order to compile Pl, we need only include definitions for T1 and its base classes (if any). Now suppose we write another program P2 that defines T2 as a subtype of T1 and adds a T2 node to G. If we then run PI, the program will terminate with an error when it invokes the virtual function on the new node. Note that this is not a type error; T2 is a subtype of Tl, and the invocation is legal. The problem is that neither the vtbl for T2 nor any of T2’s methods are available to PI. As was the case with type persistence, the root of the problem lies not in the language design itself, but in the language’s implementation within the context of an environment that does not adequately support persistence. Providing such support would involve a significant amount of effort. Programs would have to become objects in the database, and an execution would involve incremental dynamic linking of method code. The environment would have to track dependencies between programs and the persistent objects they create. Tools would have to be provided to allow users to build, execute, debug, and evolve programs within the environment. This, in turn, would require us to define a model of programs and the programming process. While these are all very interesting and important problems, they fall far outside the scope of the EXODUS project. Thus, at least for the present, maintaining E programs under Unix can sometimes be awkward. The first implementations of E had only a primitive 6.2.3 Transactions. notion of transaction. Each program run constituted one transaction. Current support is somewhat better: library calls are available to begin, commit, or abort a transaction. These calls are supported by the current implementation of the EXODUS Storage Manager, which provides atomic, recoverable transactions. The persistent data touched by a transaction is locked in a two-phase manner, and recovery is provided via write-ahead logging. At present, a program is limited to executing a series of independent, flat transactions; in the future, we may provide nested transactions. We also expect to integrate E’s transaction support with the recently introduced facilities for handling exceptions in C + + [ 18]. 7. OTHER RELATED WORK Throughout this paper we have compared particular features in E with other languages having similar features. Thus, most of the important comparisons with other languages have already been made. In this section, we focus our attention on comparing E with several other languages that have also extended C + + with persistence. 7.1 Avalon / C++ Avalon/C + + [26, 17] is a language designed to support reliable distributed computing. This language utilizes the inheritance mechanism of C + + to ACM Transactions on Programming Languages and Systems, Vol. 15, No, 3, July 1993. 528 . J. E. Richardson et al. allow programmers to design data types having customized synchronization is then modeled as a set of objects and recovery properties. Persistence encapsulated by a server; a server may recover the state of its objects after a crash. E differs from this approach in that its main goal is to provide transparent persistence for structuring large object bases and transparent 1/0 for manipulating them. In E, persistent objects exist independently of any active process. to declare db types might have been a reasonable Using inheritance alternative for adding this feature into C ++. Instead of defining a type as a dbclass, one would define it as a class that inherits (directly or indirectly) class, db. There are two problems with such an from a special predefine approach, however. The first is that the fundamental C + + types (e.g., int) are not classes, second possible not to clear might 7.2 so we problem cannot define how end a class one up with would having their the that db analogs addition inherits define from terms disallow db and meaning such of inheritance. inheritance, both a consistent to simply in of multiple it nondb for such The would be classes. a class, It is and we cases. O++ Researchers guage, at O++, ming AT&T that features [1]. for constraints form of iterator variations single or higher-level the interesting requirements tially the refer “dual” persistent two contains a pointer requires that pointer be the given type with all glance, data the “dual” a pointer argument. handle pointers to persistent seem of this type as well modifier. if we E Now that TransactIons on Programming Languages and Systems, its shares many C++, e.g., the is are in In of the can type modifier refer may O + + consider class persist, requires to objects, then Vol. 15, No. 3, July the which then that a function function to “dual” flexibility, a one poten- essentially more their E, it might (not to be confused with E’s 13O++ also has a type modifier “persistent” class) which indicates that the pointer deftn ztely refers to a persistent object. ACM of a type that class the as nonpersistent extent object. defining while desire a two Despite O + + associates to afford as a db type, Again, and to indicate If objects the still to in extents; object. a pointer pointers provides subtypes. persistent Consider member. type from E one support also either its inherits pointer may O + + O++ equivalent. be defined takes of between db including over to a volatile O++, O++ essentially class In of a given (Thus, E a type; C + + provides querying all of a possibly object. at first are that of it DBPLs, features, pointer the declaration While methods for and a lanprogram- and queries comparison attribute object.13 pointers.) of a persistent the type problems defining most allow a a persistent “db” to with Like implemented systems-level extension extents, calculus-like of and and an class construct point for associates also application-oriented) of storing One is triggers. extents limitations possibility O + + and high-level maintains and looping designed both expressing the (more basic E, O + + for of this type Laboratories to blend Like However, integrity Bell seeks persistence. a define is that be E the which able to E requires persistent storage 1993. The Design of the E Programming that the pointer declaration refer have approach is that place, ie., O + + design to the the with a db “dual” type, while type modifier. “possibly the type’s is that the and nonpersistent difference, i.e., pointers. potential attribute However, isolates types the to the that requires A persistent” definition. syntax persistent O++ Language one need argument appear of in it one of the difference where E’s only advantage of the place the 529 advantage a potential essence . between really makes a 7.3 ObjectStore In the past oriented relevance most two years, database here of the is the provide also a system. ObjectStore having been such as CAD, ObjectStore shares E’s basic ing variables discussed all into earlier. pointers are mapping and to handle the design of class—a variants more (which ing, its features model relationships data, STATUS We have currently for persistent them target Manager from object are in an E v.2.1 the form objects), uses a hash id-based resident ACM Transactions pointers [61]. to memory manage driving to it object support sets Since fit in Object- provides class for in pointer collection of objects. DBPL, I\O goal C + + working associative working, E compilers caches virtual problem ObjectStore; arameterized sets an persistent virtual close declar- several library. inverse queries Other members and index- transactions. One into of in (A whose include compilers two as end-user its used [31]. a type-p in representing as cooperative These get by naming and are applications provides between and objects objects use type objects. copies for to is extended a persistent to type pointers, (GIS). ObjectStore within the memory desire E. by partitioning structures goals applica- obtained alleviating virtual of ObjectStore two v.2. 1 translator. Storage other possible intended and 8. CURRENT the ObjectStore collection advanced versioned cfront was E, ) is than same C + + are objects their systems with persistence: C + + data the information in to data-intensive is orthogonal as normal larger template—for O++ of persistence size as Like C + + (like approach Unlike library persistence essentially objects thereby auxiliary ObjectStore speeds to persistent on E’s E, and approach [31]. through with common allocating Unlike same in or databases, the memory.) Store improves Design particular interface complex, geographic object- Of only language with in features class named databases dereferencing main storage techniques C + + use and Object language-based and of storage class, ObjectStore class CASE, number of this collection. storage a allocation-based a persistent with CAE, has introduced model. persistence designed for data from the was intended have C + + offer integrated, as tions a which extended tightly database O ++, companies on system offerings, ObjectStore to startup based ObjectStore commercial interfaces, order several products objects table memory into Other on Programming both differ in to and virtual the in buffers locate memory on the they of cached swizzles E implementations based how the objects pointers, AT&T manage I\O EXODUS [48]; the converting addresses while have experimented their Languages and Systems, Vol. 15, No. 3, July 1993 530 J. E. Richardson . with compile-time storage layer system [51]. classes programs fast and us and EXODUS been (and at the University system Air Force The initial and tasks part. the (2) Print name, the (3) Compute Record between parts. production- to the [25], systems system to at the support University user at the support of Colorado database external a set present are of four languages. which [4]. and mass and here the of Wisconsin sites were the tasks We excerpts a given reported part to have evaluate coded from the and that code. is either a base part that more run The or a following: of all total manufacturing part follows That list step Every immediate subpart. its immediate subparts. in the may these classes part In on Programming objects P keeps addition, the cost composite database, that spirit are of those be a $100. is, how a new defined the described as subtypes many-to-many composite by Atkinson basePart or a compositePart. a Usedln list of other each than part. subparts. either maintains parts of a given in from is, a part of Use base cost is manufactured an ACM TransactIons we in mass implementation, linked and tasks cost, implementation two-way proposed programming E, four total a new Buneman of our object DBMS DBMS manager the to date. and relational University at E v.2.1. data integrated database project in sites database. the composite our in The Describe Our [4] database (1) a the Other DATABASE of database is a parts the are of of of E in [23]. Buneman example at and of two et al. object Spring design; relational object-oriented 29], environment A PARTS composite MOOSE an a nested an toolset [13, as they 35 external with other is well as a template. demonstration [22], [24], template remain a DBMS which is how versions a variety a small Technology earlier to over construct University experiences expressiveness to [52], State Hanson and example used development by will E under of E v.3.O C + + for to make compiler, (which redefined distributed including programming Atkinson In been program APPENDIX: and persistence) has Wisconsin the do well of E by late from of is currently g + + ftp version depart via and Gnu the sizes. implementation be appropriately being) EXCESS the this will supported of Wisconsin of CAPITL (4) will of EXTRA\ their be Institute University [54]. will at Wright ARCADIA recently E v.3.O prototypes, rule on to operating can of database anonymous The Mach on how on C + + v.3.O to be distributing class is based E via the implementation range handled). earlier, toolkit management the expect types be calls implementation is continuing of E based will is already generator has a wide an under each research distribute of E (iterators The E freely we generic features across optimizing explored mapping that and and also memory version to mentioned collection scheduling have shows robust software underway, that virtual experience This As we to E v,2. 1, a version enable EXODUS to and of applications, development. 1992. on Our addition will 42], based certain In approaches [41, persistence et al. part of usage parts keeps in which a Uses Languages and Systems, Vol. 15, No. 3, July 1993 Part. A relation P is list of The Design of the E Programming The database base parts, combined the search first routine declared parts, two for parts qualifying in the record. mass, is (the bodies of which and mass; a subparts, Finally, are then 4 adds assumes and mass task by creating that composite the to the Usedln and a new list a list part part of its we a Database is part’s cost use of virtual and functions returns the cost and mass. its and to the cost of database. its This part, routine then own mass composite The instance over of a composite new subparts. iterate each definition of the simply for simply cost name part have provides information its sums of its composite of each to incremental a new increments, due A base it is given could class Finally, desired calculation recursively own parts, the recursive shown). its name. collections: we This 531 class. printing part adds of parts.) of expensive interesting not . three (Alternatively, a given of this more composite Task routine 3, the somewhat and with reporting database, Task and records. collection a part instance 2, the a class containing usage a single looks Task and into as a persistent base as PartsDb, defined that To perform the is composite Language its cost completes adding this the instance subparts. ACKNOWLEDGMENTS We would like willingness to Zwilling aid in wrote We all role would also like David Finally, we improve the to would E test through their us in helpful the and Dan the an invaluable contributed referees for of the language. suggestions direction discussions their Mike was use his for particular, Lieuwen extensive for project In suite and Rowe pointing to thank presentation the Bober numerous EXODUS so patiently. Paul Larry for for like v.2.O of the pigs for bugs. thank Solomon DeWitt members of guinea programs v. 1.2 and Marvin of all) of the v. 1.2 compiler to both iterators, the numerous finding similarly to thank play regarding of C + +, and and constant comments that (most feedback. helped us to The language significantly. REFERENCES 1. AGRAWAL, R., AND GEHANI, and the data model. In N. H. ODE Proceedings (Object Database of the ACM-SIGMOD and Environment): Conference (Portland, OR, June, 1989). 2. ALBANO, A., CARDELLI, language. ACM 3. ATKINSON, L., AND ORSINI, R. Trans. Database M. P., BAILEY, Syst. Galileo: A Strongly-typed, 10, 2 (June P. J., CHMHOLM, ACM 5. ATKINSON, Comput. R., LISKOV, of the ACM National 19, 2 (June B., AND SCHEIFLER, Con ferenee 6. ATWOOD, T,, AND HANNA, of the 4th International Sept. Suru. S. Two Workshop conceptual K. J., COCKSHOTT, W. P., AND MORRISON, R. approach to persistent proflamming. Comput. J. 26, 4 (1983). 4. ATKINSON, M. P., AND BUNEMAN, O. P. Types and persistence languages. interactive 1985). in database of implementing CLU. An programming 1987). R. Aspects In Proceedings (1978). approaches on to adding Persistent persistence Object Systems to C + +. In Proceedings (Martha’s Vineyard, MA, 1990). 7. BANCILHON, PFEFFER, F., BARBEDETTE, G., BENZAKEN, P., RICHARD, P., AND VELEZ, ACM Transactions F. The on Programming V., DELOBEL, design and C., GAMERMAN, implementation S., LECLUSE, of 02, C., an object- Languages and Systems, Vol. 15, No. 3, July 1993 532 J, E. Richardson . oriented ented database Database 8. BANEHJEE, schema system. Systems J., KIM) evolutlon ence (San W., Francisco, model and 10. CARDELLI, phism. 11 L., Munster, KIM, H. CA, May, data J , m~ J., ACM P. AND DF,WITT, On D J. Myopoulos, Eds , Springer-Verlag, E. J.. Overview In Kaufman, 1990. Database and of the 18. ELLIS, an extensible and In On polymor- Knowledge Base M. Brodie Technologies, extensible Database Systems A data Conference Concepts, and DBMS, in (Pacific Grove, E. J., AND Proceedings CA, model and query language (Chicago, IL, June, 1988) E. J. Storage Databases, of 1986). for Management and Applications, for W. Kim and 1989. D. J., GRAEFE, G., HAIGHT, S. L. in Object-OrLented PROBE: The D, M., RICHARDSON, J, E., SCHUH, D. T., EXODUS Databases, Extensible S. Zdonik A Knowledge-Oriented Management: in avalon/C++. M., for 1988). abstraction, Systems. Integrating IEEE AND STROUSTRUP, B. The Annotated Project: Eds., Management Intelligence and An Morgan- System. Database In Technolo- 1986. of 21, 12 (Dec. Comput. DBMS and D. Maier, Database Artifwlal gies, M. Brodie and J. Myopoulos, Eds., Springer-Verlag, 17. DETL~FS, D., HERLIHY, M., AND WING, J. Inheritance properties data Database EXODUS SIGMOD AND VANDENBERG, Base of Confer- 1986 Addison-Wesley, SMITH, J. On Knowledge SZGMOD concepts D. J., RICHARDSON, J. E., AND SHEKITA, Readings 16. DA~AL, U., ND implementation 13, 3 (Sept. types, Intelligence of the ACM 15, CAREY, M. J., DEWITT, SHEWTiL on Object-Ori- 1985). In Object-OrLented Eds., and of the ACM Syst. D. J., AND VANDENBF,RG, S. L. m EXODUS F. Lochovsky, Workshop Semantics Implementation Database on Object-Oriented Proceedings 14. CAREY, M. J., DEWITT, Objects F. Proceedings understanding architecture Workshop In tlonal D. J., FRANK, D., GRAEFE, G,, RICHARDSON, J. E., SHEKITA, The 13. CAREY, M. J., DEWITT, EXODUS. T. E. Extensible Artificial M. the International H In Trans. 17, 4 (Dec. Integrating MURALIKKISHNA, Interns 1988). KORTH, AND WISE, Management: 12. CAREY, M. J., DEWITT, Sept. 1987). T. Y., SurL. of the 2nd FRG. databases, language. Cornpuf. M. Proceedings AND WEGNER, ACM CAREY, In (Bad in object-oriented 9. BATORY, D. S., LEUNG, data et al. synchromzatlon and recovery 1988). C++ Reference Manual. Addison-Wesley, 1990. 19. FORD, S , JOSEPH, J., LANGWORTHY, SPARACIN, D., THATTE, object-oriented Verlag, S., WELLS, programming LIVELY, son-Wesley, SIGMOD DEWITT, Conference HANSON, E. 23, HANSON, In D. (San An initial E., Advances S. G., PEREZ, ZEITGEIST: in Object-Ortented Smalltalk-80: E., PETERSON, Database Database The Language Conference (Phoenix, AZ, Oct. optimizer CA, May, on the design T., AND ROTH, persistent the ACM The EXODUS Francisco, report HARVEY, object-oriented M. 25. HEIMBIGNER, HERLIHY, PA, July, D. M., Proceedings 27. HOSKING, programming support S.vstems, R., for Springer- Vineyard, Rationale 29. IOANNIDIS, zts Implernentatlon, Addi- generator. In Proceedings of the ACM 1987). of ariel SIGMOD in language on Object-Oriented and Programming Record DBMS 18, 3 (Sept implementation a database toolkit. In Systems, Languages of the Triton nested 1989) using an Proceedings of and Applications 1991). Personal AND WING, of the 17th communication, J. Avalon: International design Oct. Language database 1991. support Symposium relational for reliable on Faalt-Tolerant distributed systems. Computing (Pittsburgh, In 1987). A., Proceedings 28. ICHBIAH, and Experiences 24. HARVEY, T. SCHNEPF, C.. AND ROTH, M. The system. SIGMOD Record 20, 3 (Sept. 1991). ACM PATHAK, 1983. 21. GRAEFE, G., mu 26 D., 1988. 20. GOLDBERG, A., AND ROBSON, D. 22 D., D., AND AGARWALA, AND Moss, of MA, the Sept. 4th J. E. B. Y., TransactIons compile-time Workshop on optimizations Persistent for Object persistence. Systems In (Martha’s 1990). J., BARNES, J., HELIARD, for the Towards International design AND Lmriy, J., KRIEG-BRUCKNER, of the Ada M. on Programmmg programming MOOSE: Languages Modehng B., ROUBINE, language. objects and Systems, Vol O., mm SZGPLAN in Notices a simulation 15, No 3, July WICHMANN, environment. 1993 B. 14, 6 (1979). In The Design of the E Programming Proceedings of IFIP 1989, llth World Computer Congress Language Francisco, (San . 533 CA, Aug. 1989), 821-826. 30. KERNIGHAN, 31. LAMB, B., AND RITCHIE, C. LANDIS, Commun. ACM G., D. The C Programming ORENSTEIN, Language. J., AND WEINREB, D. The 33. MAIER, D., STEIN, J., OTIS, A., AND PURDY, A. of the ACM Oriented Conference (Portland, and Applications 34. MEYER, B. Genericity Programming Object-Oriented A Proposal Functional 37. PAIZPKR, A. European Software (Martha’s Systems Compiled In Item Vineyard, of the ACM SIGMOD 45. Conference SCHAFFERT, reference C., Trellis/Owl. tems, In Database 48. International M. The Nov. tional Issues and implementafor Workshop Managing Applications C. data Trellis B., KILIAN, model. In object-based M., Conference (Portland, level and Updates in RIGEL. Proceedings MA, CHANG, Sept. 13th Language AND WILPOLT, C. Sept. constructs An introduction to Programming Sys- 1986). for data of type relation. ACM Trans. W., D. J. Persistence International in E revisited—ImpIementation Workshop on Persistent Object FREYTAG, J. C., in the Starburst on ObJect-Oriented International Systems LOHMAN, database Database Systems G., MCPHERSON, J., MOHAN, system. In Proceedings (Pacific Grove, CA, 1986). Cricket: Workshop A mapped, on Persistent persistent Object C., AND of the Interna- Defining and object store. In Proceedings Systems (Martha’s Vineyard, MA, 1990). 52. EXODUS IL, a 1990). 51. SH~KITAj E. J. AND ZWILLING, M. Sept. of the environment: on Object-Oriented Oregon, language of the 4th Extensibility Workshop 4th in ObJect In Proceed- Proceedings 50. SIIAW, M., WULF, W., AND LONDON, R. Abstraction and verification in Alphard: ACM 20, 8 (Aug. 1977). specifying iteration and generators. Commun. of the 1/0 on Persistent 1985. of the ACM Vineyard, SCHWARZ, P., Ph.D. dissertation, 2, 3 (1977). In PIRAHESH, H. language. 1987). SCHUH, D. T., CAREY, M. J,, AND DEWITT (Martha’s for Database System Imple(San Francisco, CA, Technique Views, POSTGRES England, high In Languages (1979). BULLIS, and Syst. experiences. 49. A New Proceedings Some DBMS. Systems, in the E language: of the 4th COOPER, T., Languages Object-Oriented of the Conference implementation Data Abstraction, DEC-TR-372, 47. SCHMIDT, J. W. SZGMOD Faulting C., COOPER, T., AND WILPOLT, 46. SCHAFFERT, Constructs of the ACM Conference (Brighton, manual. on Lwp MA, Sept. 1990). 44. ROWE, L., AND STONEBRAKER, VLDB Symposium In Proceedings 1988). Programming Programming Proceedings 43. ROWE, L., AND SCHOENS, K. ings 1984 FL, Oct. 1987). E: A persistent systems Madison, Aug. 1989. Language. 1988. of the in the GemStone on Object-Oriented on Object- Oregon, Sept. 1986). of CLOS Persistence. (Oslo, Norway, 41. RICHARDSON,J. E., AND CAREY, M. J. Persistence Exper. 19, 12 (Dec. 1989). tion. So@,o, Pratt. Persistent In Languages Conference Programming Class Modification Conference (Orlando, 42. RICHARDSON, J. E. DBMS. Systems, (Portland, Prentice-Hall, implementation 39. RICHARDSON,J. E., AND CAREY, M. J. mentation in EXODUS. In Proceedings May 1987). 40. RICHARDSON, J. E. Univ. of Wisconsin, in CLU. of an object-oriented of the ACM ML. In Proceedings York, Aug. 1984). on Object-Oriented 38. PENNY, D. J., AND STEIN, J. and Applications mechanisms Programming and Applications Construction. (New PCLOS: A Flexible of the ACM Development In Proceedings Languages for Standard Programmmg Conference Proceedings system. OR, 1986). Systems, 35. MEYER, B. Abstraction on Object-Oriented Versus Inheritance. 36. MILNER, R. and 1978. database 10 (Oct. 1991). 34, 32. LISKOV, B., SNYDER, A,, ATKINSON, R., AND SCHAFFERT, C. Commun. ACM 20, 8 (Aug. 1977). Proceedings Prentice-Hall, ObjectStore May, Software Demonstration. Presented at the ACM SZGMOD Conference (Chicago, 1988). 53. SKARRA, A., .AND ZDONIK, ACM S. B. Transactions Type evolution on Programming in an object-oriented Languages and Systems, database. In Research Vol. 15, No, 3, July 1993. 534 . J. E Richardson Directions in et al. Object-Or[ented Programmmg, B. Shriver and P. Wegner, Eds., MIT Press, 1987, 54. SOLOMON, M. Personal 55. STONEBRAKER, M. the 2nd communication. Inclusion International of new Conference 56. STROUSTRUP,13. The C++ 57. STROUSTRUP, B. Jan. types on Data Programmmg Parameterized types (Denverj CO. Oct. 1988). Programming 58. STROUSTROP,B. The C++ 1992. in relational database Engineering Language. for C++. systems. Addison In In Proceedings of CA, Feb. 1986). (Los Angeles, Wesley, proceedings 1986. of the USENIX Ed. Addison-Wesley, 1991. C++ Conference 59. TARR, P., WILEDEN, In Proceedings Vineyard, SIGMOD 61. WHITE, Received ACM Sept. A. 4th Data persistence TransactIons Workshop Management D. in as a Computer 1989; 2nd and Limiting PGraphite-style on Persistent Object Systems Persistence. (Martha’s Facihties of PLAIN. In Proceedings of the ACM (1979). S., AND DEWITT, February International Language, Extending 1990). The Conference supporting available of the MA, 60. WASSERMAN, J., AND CLARKE, L. Pointer the Sciences revised on Programmmg swizzling E programming Dept. December Languages in virtual memory: language. Tech. Rep., Univ. 1990 and January and Systems, An alternative Submitted of Wisconsin, 1992; for Madison, accepted Vol. 15, No, 3, July approach publication. March 1993 Feb. 1992 to (Also 1992.)