Direct memory access
Each record is stored as an array (N-tuple) of integers: configurable as either 32 or 64 bits. The integers in the tuple encode values directly or as pointers. Columns have no type: any encoded value can be stored to any field.
You can always get a direct pointer to a record, store it into a field of a record or use it in your own program directly. A record pointer can thus be used as an automatically assigned id of the record which requires no search at all to access the record.
To search for a record, either scan the chain of all records, scan a sublist/tree you have built yourself or perform an index search on an indexed field.
Data encoding
The low bits of an integer in a record indicate the type of data. Anything which does not fit into the remainining bits is allocated separately and pointed to by the same integer.
The datatypes are null, record(pointer), integer, double, string, xml literal, uri, blob, char, date, time, pointer to record.
Long strings are allocated uniquely, i.e. using the same string in many fields does not take up additional space and allows fast string equality check.
A record pointer is a persistent offset of the record, usable as an automatic id of the record. Pointers allow fast traversal of complex data without search.
Allocation and garbage collection
Conventional malloc does not function in shared memory, since we have to use offsets instead of conventional pointers. Hence WhiteDB uses its own implementation of malloc for shared memory.
A record and a uniquely kept long string can be pointed to from several fields. Hence we use reference counting garbage collection embedded into our allocation algorithm when deleting records and long strings. Reference counting is incremental and does not cause long pauses.
|