This page serves as a performance tuning guide for LucidDB. Some of the detailed information is covered in other pages, which are linked from this page. The goal is to consolidate all the information and links into this page.
One of the keys to tuning performance is to monitor the current state of the system to identify potential performance bottlenecks and to assess where resources are being used. This will likely require a combination of monitoring the OS directly as well as viewing information provided by LucidDB.
Monitoring on Linux-based systems can be done using the OS commands, top and iostat. Top can tell you much CPU and memory is being used by the LucidDB process, while iostat can tell you the rate at which disk reads and writes are taking place. LucidDB integrates a graphical performance monitor as well.
To better understand the performance information provided by LucidDB, some background on the architecture of the buffer pool and I/O scheduler implementation in LucidDB might be helpful.
LucidDB implements a 2Q page victimization policy. Unlike an LRU page victimization policy, the 2Q policy is scan resistant. The scan resistant property prevents large sequential data scans from flooding the buffer pool, which would otherwise victimize pages that are more frequently referenced, e.g., index root pages. Ideally, you want to keep the index root pages cached and victimize the scan pages instead, even if the latter was more recently referenced. The LucidDB implementation also has a couple of optimizations not specified in the original 2Q algorithm, but those details are beyond the scope of this writeup.
Pages are populated in the buffer pool either when a page needs to be read, or when pages are pre-fetched. The best case scenario is all column store and index leaf pages that will need to be read are pre-fetched and remain in the buffer pool when the page is ready to be read. The worst case scenario is the page is pre-fetched but then victimized before it needs to be read, resulting not only in a wasted pre-fetch I/O, but also the I/O delay incurred by having to wait for the page to be put back into the buffer pool when it needs to be read.
To avoid flooding the buffer pool with pre-fetched pages, pre-fetch requests are rejected if there is no room in the buffer pool. In other words, victimization is never done to make room for a pre-fetch page, and therefore, pre-fetch requests are ignored in this case.
LucidDB also wants to avoid flooding the OS with pre-fetch I/O requests when it's busy with other I/O requests. When the number of asynchronous requests that LucidDB has submitted to the OS exceeds the OS's capacity, LucidDB queues the requests and resubmits (or retries) them after a short wait. Pre-fetch requests will also be put into this queue, so although they aren't completely rejected, they're not immediately processed, and therefore, are included in reject counts.
If pre-fetches are being rejected, LucidDB will automatically throttle back the pre-fetch rate by reducing the number of pages that are pre-fetched to avoid overloading the buffer pool and/or OS. Additional rejects will cause the rate to continuously throttle downward to the point where pre-fetches are completely disabled. When the rate has been throttled down, only after a certain number of successful pre-fetches have occurred, will the rate be throttled back up incrementally to its original value.
Note though that the I/O retry mechanism is only necessary when OS-level asynchronous I/O is used on Linux. This corresponds to setting the deviceSchedulerType system parameter to aioLinux, which is the default on Linux. See #I/O Scheduler for further details.
Normally, dirty pages in the buffer pool are not written to disk eagerly. The only exception is during data loads. Because the data is bulk loaded, once a column store or index page is filled, the page will no longer be written to by the current statement, so LucidDB asynchronously flushes it to disk. All other dirty pages are asynchronously written in one of three ways.
- When the buffer pool is full and space is required for a new page, a non-dirty page that is not currently in-use, is victimized to make room for the new page. To help clean up the buffer pool so more pages are available for future victimizations, while trying to locate a page for victimization, if dirty pages are encountered, a certain number of those are flushed to disk.
- LucidDB has a thread known as the lazy page writer running in the background. Every 100 ms, the thread awakens and examines the number of dirty pages in the buffer pool. If the number of dirty pages exceeds a certain high water mark, then a certain number of dirty pages are flushed. The next time the thread awakens, if that high water mark was previously hit, then dirty pages will continue to be flushed until the number of dirty pages reaches a low water mark.
- At the end of every statement, LucidDB performs a database checkpoint. Any remaining dirty pages that were modified by the statement are flushed to disk by the checkpoint. Ideally, you want the number of these pages to be minimal because the checkpoint has to wait until all of these pages have been flushed.
I/O requests are processed by a LucidDB device scheduler. There are different device scheduler implementations depending on the OS.
On Linux, there are two scheduler types -- threadPool and aioLinux. threadPool fakes asynchronous I/O by passing off the requests to a pool of threads, where each thread synchronously performs a subset of the I/O operations. It also uses buffered file access, meaning that there will be double buffering of all pages, once in the OS buffer pool and then a second time in the LucidDB buffer pool.
aioLinux uses libaio for asynchronous I/O, with O_DIRECT for unbuffered file access. The default is to to use libaio always (failing on startup if the .so is not installed).
On Windows, only ioCompletionPort is available. It uses asynchronous I/O via completion ports and unbuffered file access.
On Linux, because of the double buffering that occurs when using threadPool, there should be a performance disadvantage due to the extra memory copies. Moreover, if your LucidDB buffer pool is large, a smaller portion of physical memory is available for the OS buffer pool cache, which would further impact performance in the threadPool case. However, there are limitations with O_DIRECT on Linux that result in it performing poorly when writing out new pages in a file. (Updating an existing page is not an issue.) To avoid this, when using aioLinux, all data and temporary file space should be pre-allocated, rather than relying on LucidDB to automatically extend the file sizes.
LucidDB Performance Counters
LucidDB provides performance counters that allow you to monitor:
- The number of database checkpoints that have occurred.
- How much of space is actually being used in the database files.
- How much of the buffer pool is being used and how many of those pages are dirty.
- The number of requests made to access pages in the buffer pool and how many of those required I/O to read the page into the buffer pool.
- The number of times pages had to be victimized from the buffer pool to make room for other pages.
- The number of I/O requests that required retries.
- The number of pages read from disk into the buffer pool, how many of these were a result of pre-fetch requests, and how many pre-fetch requests were rejected.
- The number of pages written to disk and the manner in which those pages were flushed.
See LucidDbMemoryManagement for a discussion of the parameters available to tune the LucidDB buffer pool, in relation to how memory is used across all of LucidDB.
Two parameters are available to tune the rate of page pre-fetches -- prefetchPagesMax and prefetchThrottleRate. prefetchPagesMax controls the number of pages to pre-fetch in advance for each column store cluster and index scanned, as well as each temporary partition created when hash joins, hash aggregations, sorts, and internal buffers cannot be fully processed in memory.
prefetchThrottleRate comes into the picture when LucidDB throttles back the pre-fetch rate after a pre-fetch has been rejected. LucidDB throttles back the pre-fetch rate by temporarily reducing the number of pages to pre-fetch to how ever many pre-fetch pages are currently outstanding for the operation that resulted in the pre-fetch reject. prefetchThrottleRate indicates the number of successful pre-fetches that have to occur before the number of pages to be pre-fetched can increment back up by one page.
For example, if prefetchPagesMax is set to 12, a scan of a column store cluster has successfully pre-fetched 5 pages that have not yet been read, and it incurs a reject when trying to pre-fetch a 6th page, then the pre-fetch rate is throttled down to 5 pages. If prefetchThrottleRate is set to 10, after the 5 existing pre-fetched pages have been read and 10 new successful pre-fetches have occurred, then the pre-fetch rate will be throttled back up to 6 pages. After 60 more successful pre-fetches, the rate returns back to its original value of 12. Note that the throttling is localized to each individual operation that does pre-fetches, as opposed to across the entire system.
Monitoring the CachePagePrefetchesRejected and CachePagePrefetchesRejectedSinceInit performance counters will tell you if pre-fetch throttling has taken place. Ideally, these numbers should be zero or very low. Although LucidDB has the automatic throttling mechanism, it is generally better to set the pre-fetch rate to an optimal value to avoid the extra work incurred by the pre-fetch rejects.
If the pre-fetch rejects are occurring because the OS is overloaded, then prefetchPagesMax is set too high. You can determine if this is the case by monitoring the CachePageIoRetries and CachePageIoRetriesSinceInit performance counters. If these numbers are lower than the number of pre-fetch rejects, then the OS load may not be the only contributing factor. There may also be an issue with the LucidDB buffer pool. The buffer pool size (controlled by the cachePagesInit and cachePagesMax parameters) may be too small, or perhaps the cacheReservePercentage system parameter is set too low. It is best to experiment with these different parameter settings, as modifying one vs another will likely impact other parts of the system and may not result in an overall improvement.
If you are incurring no pre-fetch rejects, then that may indicate that you can increase your prefetchPagesMax setting. If so, you should experiment with larger settings to ensure that the larger setting doesn't result in rejects, and to see if it results in any performance gains.
Pre-allocating File Space (Linux Only)
As noted above, when using the aioLinux device scheduler type, you need to pre-allocate your file space for optimal performance.
LucidDB provides a utility for doing this, available in the bin directory of Linux distributions.
lucidDbAllocFile --append-pages=<number of pages> <filename>
The utility will append to the end of the file the number of pages specified. Each page is 32 KB. Note that the utility cannot be run while the LucidDB server is running.
You need to estimate the amount of space you expect your data to occupy and then pre-allocate your db.dat file accordingly. db.dat is located in the catalog directory of your installation. You should round up the number of pages to a multiple of the maximum of 2000 and the number specified by the databaseIncrementSize system parameter.
You should also pre-allocate space for the temp.dat file, specifying a number that is a multiple of the maximum of 3000 and the tempIncrementSize system parameter. You'll need to do an initial start up of LucidDB to create the file. Thereafter, the file should be in the catalog directory. The size of this file is more difficult to predict and depends on a number of factors, including your data size, the size of your LucidDB buffer pool, and the type of queries you will be executing. E.g., if you have configured a very large LucidDB buffer pool, you may not need this file to be very big, if all complex query processing can be done in memory, avoiding the need to partition any sorts or hashes to the temporary disk space in temp.dat. As a point of reference, running TPC-H on a 10 GB dataset with a 6 GB LucidDB buffer pool requires about 1.5 GB of temporary space. (This translates to 47000 additional LucidDB pages, assuming tempInitSize is set to 3000 pages and tempIncrementSize is 1000 pages.)
The performance counters noted in the Storage category section can help determine how much space is currently in use and therefore, if you are running short on space and need to pre-allocate additional space. If either DatabasePagesExtendedSinceInit or TempPagesExtendedSinceInit is non-zero, then that is an indication that you did not pre-allocate enough space in the corresponding file, and therefore LucidDB had to automatically extend the size of the file. This is when you will incur the performance hit noted earlier; you want to avoid this.