Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In Apply for Membership

Tagged

Mailing list users (luciddb-users) can "claim" their threads in the archives by registering with the same account they used to post to the email list. NOTE: List threads are updated once every 24hrs.
Slowly changing dimension
  • epfreed January 17
    Hi all, 

    After a hiatus, I am back to playing with LucidDB, and now that I have the Pentaho PDI Streaming Loading working, I have some best practices questions. I like the Pentaho PDI Dimension lookup/update step (it saves a bunch of boilerplate stuff for loading or querying a dimension table), but I see why it might not be the best match for LucidDB. So who do you all deal with 

    1) looking up dimensions on a load
    and/or
    2) loading a slowly changing dimension

    -Eric


  • haubuchon January 18
    I have the same question. Inital testing using PDI standard step takes 10X the time (compared to SQLServer).

    I would like not having to completely change all of my existing PDI.
  • epfreed January 18
    Yeah, I would expect it to be slow. My understanding is that LucidDB (and columnar DBs in general) is fast for queries on a star schema, fast for bulk load, but slow for single row updates. And single row updates is how the PDI dimension step works. I would guess the best way might be to use the streaming loader's CUSTOM option and write some SQL (if that is even possible). 

    A related question is dimension look ups. Yesterday I did a 1.3 million row ETL and using the streaming loader I got the dimension tables loading in 2-3 minutes each. When I did the facts, I has a PDI DB Look-up setup to get the dimension keys, and that ETL took 2.5 hours. Is there a better way to do those as well? If it is considered best practice, I would be happy to use PDI to produce a CSV, and then use the Lucid build it ETL to bulk load and lookup dimensions. 

    Thus this discussion: what do people do and what is best practice?