Mailing list users (luciddb-users) can "claim" their threads in the archives by registering with the same account they used to post to the email list. NOTE: List threads are updated once every 24hrs.
After a hiatus, I am back to playing with LucidDB, and now that I have the Pentaho PDI Streaming Loading working, I have some best practices questions. I like the Pentaho PDI Dimension lookup/update step (it saves a bunch of boilerplate stuff for loading or querying a dimension table), but I see why it might not be the best match for LucidDB. So who do you all deal with
Yeah, I would expect it to be slow. My understanding is that LucidDB (and columnar DBs in general) is fast for queries on a star schema, fast for bulk load, but slow for single row updates. And single row updates is how the PDI dimension step works. I would guess the best way might be to use the streaming loader's CUSTOM option and write some SQL (if that is even possible).
A related question is dimension look ups. Yesterday I did a 1.3 million row ETL and using the streaming loader I got the dimension tables loading in 2-3 minutes each. When I did the facts, I has a PDI DB Look-up setup to get the dimension keys, and that ETL took 2.5 hours. Is there a better way to do those as well? If it is considered best practice, I would be happy to use PDI to produce a CSV, and then use the Lucid build it ETL to bulk load and lookup dimensions.
Thus this discussion: what do people do and what is best practice?