Recent
DBMS research has established the superiority of a "column-store"
architecture for read-mostly database systems such as LucidDB.
In LucidDB, database tables are vertically partitioned and stored in a
highly compressed form. Vertical partitioning means that each page
on disk stores values from only one column rather than entire rows; as
a result, compression algorithms are much more effective because they
can operate on homogeneous value domains, often with only a few
distinct values. For example, a column storing the state component of
a US address only has 50 possible values, so each value can be stored
using only 6 bits instead of the 2-byte character strings used in a
traditional uncompressed representation.
Vertical partitioning also means that a query that only accesses a
subset of the columns of the referenced tables can avoid reading the
other columns entirely. The net effect of vertical partitioning is
greatly improved performance due to reduced disk I/O and more
effective caching (data compression allows a greater logical dataset
size to fit into a given amount of physical memory). Compression also
allows disk storage to be used more effectively (e.g. for maintaining
more indexes).
The companion to column store is bitmap indexing, which has well-known
advantages for data warehousing. LucidDB's bitmap index
implementation takes advantage of column store features; for example,
bitmaps are built directly off of the compressed row representation,
and are themselves stored compressed, reducing load time
significantly. And at query time, they can be rapidly intersected to
identify the exact portion of the table which contributes to query
results. All access paths support asynchronous
I/O with intelligent prefetch for optimal use of disk bandwidth.
It should be noted that LucidDB is not suitable for use as a
transactional database. LucidDB is very fast at bulk-loading or
updating large amounts of data at once, but it is not intended to work
well for the single-row operations typical of transactional systems.
Best practice is to separate analytical systems from transactional
systems; LucidDB can be used as a data warehouse, data mart, or
operational data store in tandem with the traditional transactional
systems used as data sources.
More information on data storage and access in LucidDB is available.