This is a research page for approaches to profiling Fennel stream graph execution.
The SystemProfiling page explains how to set up a traditional profiler to get raw low-level breakdowns. This can be useful for seeing where time is spent in the code, but for a complex execution plan with reuse of the same ExecStream class (e.g. a sorter) in multiple parts of the plan, it may be difficult to get a higher-level breakdown this way. Doing this requires improving the ExecStream infrastructure by adding explicit instrumentation.
Let's take an example from LucidDB:
create schema x; create table x.t(i int, j int); create index xi on x.t(i); create index xj on x.t(j); create index xij on x.t(i,j); create table x.s(i int, j int); merge into x.t tgt using x.s src on tgt.i=src.i when not matched then insert (i,j) values (src.i, src.j);
Using ExecStreamGraphVisualization for the MERGE statement above, notice that its execution plan is split into three distinct phases:
- Bottom part below the SegBufferExecStream: the implicit join used to implement the upsert matching logic
- Middle part with the two LcsClusterAppendExectStreams: this loads data into the two column-stores for columns i and j
- Top part with the LbmSplicerExecStreams: this updates the three indexes
(TODO: add result image here.)
For a real data load, getting the breakdown for the three phases would be very useful. In general, getting a breakdown like this could be difficult, since stream graph execution is pipelined; it would be necessary to instrument the scheduler to tally CPU time taken for each individual ExecStream instance's quantum. But in this specific case, an easier approach is available because the phases do not overlap, so we can rely on real time instead. The reason the phases do not overlap is the presence of blocking ExecStreams situated at chokepoints: BarrierExecStream and SegBufferExecStream.
So, we can collect the timestamps when the two SplitterExecStreams start producing data as the phase transition points. Combining these with the start and end timestamps for the overall execution, we can compute the durations for each phase.
The simplest instrumentation is to add Fennel-level tracing and then scrape the events from the trace log. Adding a new trace logger name such as "fennel.xo.timing" would make it easy to locate these (see FarragoTracing for existing trace logger names); this requires a bit of plumbing to create a corresponding TraceSource object and make it available to the ExecStreamScheduler and SplitterExecStream.
A more sophisticated instrumentation would add new performance counters queryable from the SQL level after the execution of each statement. However, the current global performance counter framework isn't really designed for per-statement collection, so it would need to be extended first.