AllegroCache � Object Persistence in Lisp

Jans Aasman, John Foderaro

Franz Inc.

Introduction

The Common Lisp community doesn�t really have a common story when it comes to object persistence as an orthogonal feature to the language. We don�t even have a standard way to do object serialization or industrial-grade B trees as a native data structure.

Franz decided to explore a new object-oriented database design, AllegroCache, for Lisp application developers. Several people have pointed out to us that the majority of the IT industry is moving in the direction of Object Relational Mappings on top of relational databases, and that our object-oriented database might be the wrong thing at the wrong time.

While those concerns are not without merit, they overlook some critical developments in the IT industry especially regarding the Lisp community.

First of all, we see more and more of our customers trying to solve extremely complex problems with very large data sets. Some of these problems naturally fit the relational data model and their needs can be easily satisfied with our direct interfaces to relational databases like MySql and Oracle. However, some problems cannot easily fit into a relational data model. These are problems where the data is best described as a complex graph or pointer space, and where the graphs are many times larger than memory.� Furthermore, the data may be very heterogeneous and the data schema may need to evolve frequently.�� We believe that such situations will increasingly be more prevalent in the commercial sectors, and a better database choice is clearly needed.

Secondly, we believe that the Lisp community will not win the inter-language competition by addressing problems that other languages and relational databases can handle adequately.� Complex problems and complex data models are where we can carve out a niche with a modest amount of effort in an object-oriented database.

In this paper we will first describe AllegroCache from the perspective of the Lisp programmers.� We then discuss AllegroCache from the perspective of a modern database.� Next we will talk about the implementation of AllegroCache, particularly our experience with the Btrees AllegroCache uses. We will conclude with some applications and prototypes that have already benefited from this new object-oriented database.

From The Perspective of Lisp Programmer

The best way for a Lisp programmer to get a feel for what AllegroCache can do is to try out the demo program in the appendix.� (You can get an evaluation copy of AllegroCache by emailing us at [email protected].)� The following describes the key functionality of AllegroCache.

1. A new persistent-class metaclass

We added a new CLOS metaclass called �persistent-class�.� Instances of a CLOS class of persistent-class are automatically stored on disk whenever a �commit� is called.� New keywords introduced for object slots include :index.

(defclass person ()

� ((name :index :any)

(soc-security :index :any-unique)

(sex)

(dob)

(children)

(city))

�(:metaclass persistent-class))

2. Commit/Rollback

Once a program opens one (or more) database with

(open-file-database �database name� :if-does-not-exist :create :if-exists :supersede)

The Lisp session will be in a transaction until the database is closed.� It is up to the program to do a (commit) or a (rollback).� A commit will make all changes to the data permanent on disk.� A rollback on the other hand will clear all new objects and changes to objects since the last commit.

3. Unique object identifiers

Every instance created for a persistent-class will have an object identifier (oid) that is guaranteed to be unique and will never change during the lifetime of the database. Programs will have access to this identifier so that any object can be retrieved or deleted based on its oid.

4. Referential Integrity

In many cases, the application data model resembles a complex graph or pointer network, in which instances point to other instances.� When an object in this space is deleted, AllegroCache will silently and automatically turn all references to the deleted-object to nil.

5. Maps and Indexes

AllegroCache introduces the general concept of a persistent Map. A Map is a set of key-value pairs where the keys and values take almost any Lisp type.� Maps are always cached in memory and stored persistently on the disk.� A Map can be 1-1, many-1, 1-many, or many-many.� A Map where the value is the oid of a persistent object in AllegroCache is called an Index. When a class is defined, the program can specify which slots should have Indexes maintained for them.

6. Sets

In some cases the value of a slot can be a list of lisp objects. When the number of objects is low, either a list or a vector can be used to store the oid�s. When the number of objects becomes large, then it will be more efficient to store them in a set.

7. Redefining Classes

One of the unique features of AllegroCache is that you can redefine classes within your program on the fly.� When the class definitions are redefined at run-time, AllegroCache will lazily update existing instances to the new class definition whenever you access them.

8. Retrieving objects

There are basically two ways to retrieve objects from the database. If there is an index for the object, you can then use

(retrieve-from-index �person �name �Jans Aasman�).�

You can also loop through all object instances to retrieve the one desired.

(doclass (e �person)

(when (equal (name e) �Jans Aasman�)

�� (whatever e)))

9. Prolog as retrieval language

The methods described in (8) above are adequate for retrieving objects from disk or from the cache directly. However, when working with deeply nested structures, and when it becomes difficult to remember which slots were indexed, a more declarative way would be easier to express search or graph-matching algorithms.� AllegroCache provides Prolog as an alternative retrieval language, which appears to be the most stable and easiest way to formulate complex queries based on our experience.

The Database View

The previous section describes AllegroCache purely from a Lisp programmer�s perspective to enable persistent CLOS objects.�� In reality, AllegroCache is also a full-fledged, production database.�� As such, it entails all essential properties of a commercial database, similar to the typical Relational Database Management System (RDBMS)

1. Multi-user environments

AllegroCache 1.0 can run as a standalone database or as a server that serves multiple clients or processes.

In the standalone version the Btrees, database-manager and the application all reside in the same Lisp process.� In fact, the application can have more than one database open in the same Lisp process simultaneously.� Normally when working with one database, all AllegroCache database manipulation functions will apply to that database.� In the case of opening multiple databases in the same Lisp process, special wrapping of the database manipulation functions is required.

2. ACID

AllegroCache complies with the essential ACID requirements for a reliable database.� Specifically, a production-grade database management system must achieve four goals: Atomicity, Consistency, Isolation and Durability (ACID). Databases that fail to meet any of these four goals can not be considered reliable.

� Atomicity states that database modifications must follow an all or nothing rule. A transaction is said to be atomic, if one part of the transaction fails, the entire transaction fails.

� Consistency states that only valid data will be written to the database.� If a transaction violates the databases consistency rules, the entire transaction will be rolled back and the database will be restored to a state consistent with those rules.

� Isolation requires that multiple transactions occurring at the same time not affect each other�s execution.� For example, if one application issues a transaction against a database at the same time that another application issues a different transaction against the same database, both transactions should operate on the database in an isolated manner.� Note that the isolation property does not dictate which transaction will execute first, merely that they will not interfere with each other.

� Durability requires that any transaction committed to the database will not be lost. Durability is ensured through the use of database backups and transaction logs to restore all committed transactions prior to any software or hardware failures.

3. Transactional Model

AllegroCache supports a full transactional model.� It can do long and short transactions, commits and rollbacks, and the database supports recovery to a consistent state should the machine fail during the commit.

AllegroCache can be viewed as a shared data store and a set of one or more clients each with a private data store.� The shared data store is only changed when a successful commit is done by one of the clients. The shared data store is thus advanced from one state to another continuously by the commit function. The state of the shared data store is denoted by a number called the transaction number which increments on every successful commit. When a client first starts up and connects to the shared data store, it is given the current transaction number.� All references that client makes to the shared data store will be with respect to that transaction number. Until the client does a commit or rollback the client sees the shared data store in the same state it was in when it first connected to the shared data store, even if other clients commit new data to it.� Therefore, the shared data store holds not only the current state of the database but also previous states as well.� This architecture allows a very robust transaction model suitable for any mission-critical application.

4. Concurrency

In general AllegroCache works best with the optimistic model for concurrency.� This means that the program checks for conflict only when committing the data.� AllegroCache users can implement more pessimistic advisory locks if so desired.

From C-based Btrees to Lisp-based Btrees

At the heart of every large database is a set of Btrees (actually they are B trees but we will call it simply Btrees in this paper).� The 1st prototype of AllegroCache used the Btrees from the Berkeley DB library (www.sleepycat.com).� Berkeley DB is a highly efficient C-based Btree system that is available on almost every platform. Initially we used the full Berkeley DB Transactional Data Store that provides for full ACID transactions and recovery.� However, with multiple clients, performance degrades rapidly.� Therefore, the AllegroCache prototype did all database management in Lisp, and used only the plain Berkeley DB Data Store. But even with that interface, AllegroCache encountered some technical issues.� In order to ensure isolation, the Berkeley DB code employs aggressive locking and uses low level OS locking primitives.� The result is that the Lisp application just froze when the process was waiting for access to data. This does not fit well with Lisp's own multithreading.

So Franz decided to have a completely Lisp-based Btree system, loosely taking after Knuth�s original algorithm. The Lisp-based Btree provides all the functionality AllegroCache needs, with no loss of speed and scalability (in fact, it runs faster than the C-based Btrees).� The Lisp Btree handles literally billions of key/value pairs with ease, and is incorporated into the current 2^nd Alpha release.

Besides resolving technical difficulties, a full Lisp implementation of the Btrees offers the following advantages:

1. More flexible caching of Btree pages. Lisp programmers now have full access to the main statistics of the Btree performance (cache hits, cache misses, access to overflow area�s, density of nodes and leaves, etc) so the size of the buffer for the Btree pages can be managed on the fly for best performance.� One could even imagine a rule-based caching system that dynamically adapts to demand.

2. Less copying and consing. With an external Btree such as Berkeley DB, AC has to copy data from C space into Lisp space.� With Lisp Btrees, AC can read the keys and values directly from the block buffers of the Btree code.� Therefore, it can do many more manipulations with keys and values without having to copy any data.

3. Easier comparison functions. Btrees keep keys in the leaves in sorted order. To do so, it requires comparison functions.� If you need a separate binary encoding scheme for keys in Berkeley DB, you had to write your own comparison functions in C.� With the Lisp Btrees, you can easily specify both the encoding schemes for the keys and the comparison functions.

4. Easier and better locking. Berkeley DB used C style locking, which doesn�t fit well with Lisp multithreading. With Lisp Btree, AC now has full control over its locking behavior.

5. Better cursor behavior. Looping over leaves in a Btree is done with cursors. In Berkeley DB you have to close the cursors explicitly or they would stay in memory forever. Worst, if you forget to close a cursor and you close the database, it would be impossible to open the database again sometimes.� With the Lisp Btrees, the cursors are automatically garbage collected when they are no longer needed.� AC also implements floating cursors. So even if other processes add new keys/values to, say the left in the leave chain, the cursor would stay at the same logical spot in the leaf.

6. Easier to distribute. Franz can now distribute AllegroCache on all the 20 supported platforms easily. With Berkeley DB, we had to create library distributions for every supported platform.� This is also advantageous for customers who build applications with AllegroCache.�� They too don�t have to worry about the deployment platform their client requires.

7. Another feather on the Lisp cap.� The final advantage is that we now have another demonstration of a real world product that is just as fast as its C counterpart.

Applications and Prototypes

So far, we have worked with a number of partners on applications and prototypes of AC.� For example, Pepite from Belgium runs its Common Lisp based data-mining package on AllegroCache, Tellme is experimenting with a personal directory service for mobile phones on AllegroCache. An Indian Lisp company, KIDO Software, is using AllegroCache for a fraud detection system over call detail records for a commercial telecomm company. The BioLingua project at Stanford is experimenting with AllegroCache to store its frame-based biology knowledge.

Appendix

The following example code shows the main properties of AllegroCache. (You can get an evaluation copy of AllegroCache by emailing us at [email protected].)� The following program will be included in the evaluation version.

; cd to the directory where you store this tutorial and evaluate every

; expression step by step (don't load the program in at once)

; :cd c:/franznetwork/lisp/allegrocache/v8/tutorials/

(load "exampleload.cl") ; load allegrocache and prolog and some utilities

(in-package :example)

(open-file-database "/s/ja/temp/test3" ; please adapt to your filesystem��

�� :if-does-not-exist :create

�� :if-exists :supersede)

; defclass* is a macro� that writes the full defclass expression..

(defclass* person ()

� ((id :index :any-unique)

�� (first-name :index :any)

�� (last-name :index :any)

�� (city :index :any)

�� birth-date

�� children

�� sex

�� relationship))

#| this is what the macro expands into

(defclass person ()

� ((id :index :any-unique :accessor id :initform nil :initarg :id)

�� (first-name :index :any :accessor first-name :initform nil :initarg :first-name)

�� (last-name :index :any :accessor last-name :initform nil :initarg :last-name)

�� (city :index :any :accessor city :initform nil :initarg :city)

�� (birth-date :accessor birth-date :initform nil :initarg :birth-date)

�� (children :accessor children :initform nil :initarg :children)

�� (sex :accessor sex :initform nil :initarg :sex)

�� (relationship :accessor relationship :initform nil :initarg :relationship))

� (:metaclass persistent-class))� ; note the metaclass specification

|#�

(defclass* relationship ()

� (p1 p2 start-date end-date relationship-type))

(defclass* city ()

� ((name :index :any)

�� inhabitants state))

; a macro that defines a nicer way to print the objects..

(defprinter person first-name last-name)

(defprinter city name)

(defprinter relationship p1 p2)

;=====

; here is a list that defines a family. So person 1 is Kathy

; Smith, she lives in Oakland, born in 1932-4-16, female and

; had a child, person #5 who is John Smith, etc...

(setf family '((1 Kathy Reagan Oakland 1932-4-16 female (5))

�� (2 Bob Smith Oakland 1934-7-10 male (5))

�� (3 Jane Becker Boston 1928-6-13 female (6 9 10))

�� (4� Mike Sears Boston 1926-5-12 male (6 9 10))

�� (5 John Smith Berkeley 1958-7-29 male (7 8))

�� (6 Debra Sears Berkeley 1956-12-18 female (7 8))

�� (7 Christopher Smith Berkeley 1992-9-1 male nil)

�� (8 Steve Smith Berkeley 1994-10-11 male nil)

�� (9 Ernestine Sears Chicago 1958-7-26 female nil)

�� (10 Johannes Sears Seattle 1962-1-1 male (11))

�� (11 Suzie Sears Seattle 2004-1-1 female nil)))

��

; so persons 1 and 2 from family are married.

(setf relationship '((1 1 2 married 1958-4-1)

�� (2 3 4 married 1955-5-1)

�� (3 5 6 married 1994-4-4)))

; and some information about the cities.

(setf cities '((1 Oakland 700000 California)

�� (2 Boston 500000 Massachusetts)

�� (3 Berkeley 200000 California)

�� (4 Chicago 2000000 Illinois)

�� (5 Seattle 600000 Washington)))

��

; now fill in the instances..

(dolist (e family)

� (make-instance 'person :id (nth 0 e)

�� :first-name (nth 1 e) :last-name (nth 2 e) :city (nth 3 e)

�� :birth-date (nth 4 e) :sex (nth 5 e) :children (nth 6 e)))

; the child slot contains numbers, this form will replace the numbers

; with the actual objects

�

(doclass (e 'person)

�� (setf (children e)

�� (mapcar #'(lambda (x) (retrieve 'person 'id x))

�� (children e))))

(dolist (r relationship)

� (make-instance 'relationship

�� :p1 (retrieve 'person 'id (nth 1 r))�

�� :p2 (retrieve 'person 'id (nth 2 r))

�� :relationship-type (nth 3 r) :start-date (nth 4 r)))

(dolist (c cities)

� (make-instance 'city :name (nth 1 c) :inhabitants (nth 2 c)

�� :state (nth 3 c)))

; do a commit, everything would work the same if you didn't do a

;� commit..

(commit); the following reads as follows:

; this demonstrates how you can change a class definition

(defclass* city ()

� ((name :index :any)

�� inhabitants state

�� mayor)) ; <-- add the mayor slot..

; create a new instance...��

(make-instance 'city :name �SanFrancisco :inhabitants 400000

� :mayor 'GavinNewsom)

(commit)

(mayor (retrieve 'city 'name �SanFrancisco)) ; will return GavinNewsom

(mayor (retrieve 'city 'name �Boston))�� ; will obviously return nil

; low level retrieve

(doclass (e 'person)

�� (print e))

(retrieve 'person 'first-name 'Bob) ; works if you have an index

; if you provide the :oid keyword you only get the oid back.

(setf oid (retrieve 'person 'first-name 'Bob :oid t))

(oid-to-object (find-class 'person) oid)

; prolog as one retrieval language

�you need to know a little bit of prolog to understand something like

�(<-- (father ?x ?y)

�� (db person ?x sex male children ?c)

�� (member ?y ?c))

��

?x is the father of ?y if

in the db there is an object ?x of class person that is male and has a

�list of children ?c and ?y is a member of ?c.

��

the result of running (?- (father ?x ?y)) would be

example(26): (?- (father ?x ?y))

?x = #<person Bob Smith>

?y = #<person John Smith>

?x = #<person Mike Sears>

?y = #<person Debra Sears>

�

(<-- (father ?x ?y)

�� (db person ?x sex male children ?c)

�� (member ?y ?c))

(<-- (mother ?x ?y)

�� (db person ?x sex female children ?c)

�� (member ?y ?c))

(<-- (parent ?x ?y)

�� (father ?x ?y))

(<- (parent ?x ?y)

�� (mother ?x ?y))

��

(<-- (grandparent ?x ?y)

�� (parent ?x ?z)

�� (parent ?z ?y))

(<-- (grandchild ?x ?y)

�� (grandparent ?y ?x))

(<-- (ancestor ?x ?y)

�� (parent ?x ?y))

(<-� (ancestor ?x ?y)

�� (parent ?x ?z)

�� (ancestor ?z ?y))

(<-- (descendent ?x ?y)

�� (ancestor ?y ?x))

(<-- (parent-child-in-different-state ?x ?y)��

�� (db person ?x city ?n1)

�� (parent ?x ?y)

�� (db city ?c1 name ?n1 state ?state1)��

�� (db person ?y city ?n2)

�� (db city ?c2 name ?n2 state ?state2)

�� (not (= ?state1 ?state2)))

(<-- (female ?x)

�� (db person ?x sex female))

(<-- (male ?x)

�� (db person ?x sex male))

(<-- (aunt ?x ?y)

�� (father ?z ?x)

�� (female ?x)

�� (father ?z ?w)

�� (not (= ?x ?w))

�� (parent ?w ?y))

(<-- (uncle ?x ?y)

�� (father ?z ?x)

�� (male ?x)

�� (father ?z ?w)

�� (not (= ?x ?w))

�� (parent ?w ?y))

(<-- (nephew ?x ?y)

�� (aunt ?y ?x)

�� (male ?x))

(<- (nephew ?x ?y)

�� (uncle ?y ?x)

�� (male ?x))

(<-- (niece ?x ?y)

�� (aunt ?y ?x)

�� (female ?x))

(<- (niece ?x ?y)

�� (uncle ?y ?x)

�� (female ?x))

; using names instead of instances..

(<-- (father2 ?x ?y)

�� (db person ?x1 first-name ?x sex male children ?c)

�� (member ?y1 ?c)

�� (db person ?y1 first-name ?y))

;; some fun metaprogramming...

(load "metarelations.cl")

(clear-relations)

(register-relations

�'father 'mother 'parent 'grandparent 'grandchild

�'ancestor 'descendent 'parent-child-in-different-state

�'aunt 'uncle 'nephew 'niece)

so now you can ask for all the relations that exist between ?x and ?y

(<-- (relation ?x ?y father) (father ?x ?y))

(<- (relation ?x ?y mother) (mother ?x ?y))

(<- (relation ?x ?y parent) (parent ?x ?y))

(<- (relation ?x ?y grandparent) (grandparent ?x ?y))

(<- (relation ?x ?y grandchild) (grandchild ?x ?y))

(<- (relation ?x ?y ancestor) (ancestor ?x ?y))

(<- (relation ?x ?y descendent) (descendent ?x ?y))

(<- (relation ?x ?y parent-child-in-different-state) (parent-child-in-different-state ?x ?y))

(<- (relation ?x ?y aunt) (aunt ?x ?y))

(<- (relation ?x ?y uncle) (uncle ?x ?y))

(<- (relation ?x ?y nephew) (nephew ?x ?y))

(<- (relation ?x ?y niece) (niece ?x ?y))

(<-- (person ?x)

�� (db person ?x))

; prologgers will know how to read this

; note how we start prolog within lisp and then within prolog call

; lisp again.

(defun show-all-relations ()

� (prolog (person ?x)

�� (person ?y)

�� (not (= ?x ?y))

�� (bagof ?r (relation ?x ?y ?r) ?bag)

�� (lisp (print (list ?x ?y ?bag)))))

(show-all-relations)

;; maps

(setf m1 (open-map "square" :if-does-not-exist :create

�� :if-exists :open))

(setf (map-value m1 'eurolisp2005) 'Amsterdam)

(map-value m1 'eurolisp2005)

(setf foo (list

�� 1.001

�� "a relatively short string"

�� '(1 2 3)

�� 'horse

�� #(1 2 3)

�� (retrieve 'person 'first-name 'John)

�� m1

�� ))

; everything that can be stored in allegrocache can be used as a key for a map. So

; in the following example we take every item in foo both as a key and as a value..

(dolist (e foo)

� (setf (map-value m1 e) e))

(dolist (e foo)

� (print (map-value m1 e)))

(map-map #'(lambda (k v)

�� (print (list k v)))

�� m1)

(commit)

;; sets

(setf set (make-instance 'ac-set))

(push set foo)

(dolist (e foo)

� (add-to-set set e))

(commit)

(doset (var set)

�� (print var))

(remove-from-set set '(1 2 3))

(doset (var set) (print var))

;a typical usages of sets as slots of objects..

(setf p1 (make-instance 'person

�� :first-name 'fritz :last-name 'kunze

�� :children (make-instance 'ac-set)))

(add-to-set (children p1)

�� (make-instance 'person :first-name 'michael

�� :last-name 'kunze))

(add-to-set (children p1)

�� (make-instance 'person :first-name 'lauren

�� :last-name 'kunze))

(doset (ch (children p1))

�� (print (first-name ch)))

(commit)

;(close-database *allegrocache*)

;;; frames

; the autor implemented a very simple example of how to implement a

; frame system on top of allegrocache. It is forbidden to use it for any

; serious work :-) it is just an example of how to do frames on top of objects.

(load "../frames/funframes.fasl")

(start-frame-system)

(defframe 'animal :isa 'living-thing)

(defframe 'mammal :isa 'animal :legs 4 :eyes 2)

(defframe 'dog :isa 'mammal)

(defframe 'fido :isa 'dog)

(fsv! 'fido :owner 'jans) ; add a new slot to fido

(fsv 'fido :legs) ; use the isa hierarchy to derive that fido has 4 legs.

;;; xml

; The function objectify-xml is a function that takes an arbitrary XML

; document, analyses it and turns it into CLOS objects that are stored

; in the database.

(objectify-xml :file #P"../xml-demo/sersamp2005.xml"

�� :persistent t)

; this shows all the classes that are built from the xml file

(dump-schema)

; this shows that class-definitions are first class objects in the

� database.

�

(doclass (e 'ac-class)

�� (print (ac-class-name e)))

; and this uses some of the classes read in.

(doclass (e 's::DateCreated)

�� (print (s::Year e)))