FarragoParser
This page provides an overview of the Farrago SQL parsing framework.
Contents |
Eigenbase SQL Object Model and Parser
Farrago parsing is based on the org.eigenbase library. Whereas Farrago itself is a full DBMS framework (including heavyweight aspects such as sessions, persistence, and metadata catalogs), the org.eigenbase library is a lightweight, standalone component suitable for embedding in applications such as client tools which can benefit from SQL-awareness.
The following packages in org.eigenbase are relevant to SQL parsing:
- org.eigenbase.sql: object model for the query and DML portions of the SQL language
- org.eigenbase.sql.parser: non-generated classes and data structures for the SQL parser, plus unit tests
- org.eigenbase.sql.parser.impl: classes generated by JavaCC to implement the SQL parser
- org.eigenbase.sql.fun: definitions for builtin functions and operators from SQL:2003
- org.eigenbase.sql.util: utility classes
The diagram below shows the object model instantiation after parsing a query such as select *,42 as the_answer from sales.depts as d where deptno = 10.
Each box represents an instance of an object implementing some class descended from org.eigenbase.sql.SqlNode. Some operators (like SqlSelect) are important enough to get their own classes, but in most cases a generic SqlCall is used. For example, in the diagram above, there is no "SqlEquals" class instantiation; instead, there's a SqlCall where the operator reference is set to the singleton org.eigenbase.sql.fun.SqlStdOperatorTable.equalsOperator.
The parser grammar is defined primarily in eigensrc://open/dev/farrago/src/org/eigenbase/sql/parser/CommonParser.jj. However, that file by itself is incomplete, as it is designed to be combined with extension grammars via concatenation. For the vanilla SQL parser with no extensions, it is concatenated with eigensrc://open/dev/farrago/src/org/eigenbase/sql/parser/Parser.jj to produce a well-formed grammar (written to farrago/src/org/eigenbase/sql/parser/impl/CombinedParser.jj) which is fed in as the real input to JavaCC. The grammar also provides some of the best examples of how to instantiate the SQL language object model.
From CombinedParser.jj, the build also generates and publishes a full hyperlinked BNF grammar via the jjdoc ant task. Unfortunately, it comes out in jumbled order; the main parsing entry point production is SqlStmtEof, which accepts a query or DML statement terminated by EOF.
Invoking the parser is very easy:
import org.eigenbase.sql.*; import org.eigenbase.sql.parser.*; ... SqlParser parser = new SqlParser(sqlString); SqlNode node = parser.parseExpression();
This example uses the parseExpression entry point (for something like a+b); you can instead use parseQuery (expecting a SELECT or equivalent) or parseStmt (any query or DML). A constructor which takes a Reader instead of a String is also provided.
Once you have a SqlNode tree, you can traverse and manipulate it. org.eigenbase.sql.util provides a standard visitor pattern implementation, as well as a "shuttle" pattern for transforming or modifying trees. Here's some example code from the validator which instead walks and modifies a tree directly:
private void registerSubqueries(
SqlValidatorScope parentScope,
SqlNode node)
{
if (node == null) {
return;
} else if (node.isA(SqlKind.Query)) {
registerQuery(parentScope, null, node, null, false);
} else if (node.isA(SqlKind.MultisetQueryConstructor)) {
registerQuery(parentScope, null, node, null, false);
} else if (node instanceof SqlCall) {
validateNodeFeature(node);
SqlCall call = (SqlCall) node;
final SqlNode [] operands = call.getOperands();
for (int i = 0; i < operands.length; i++) {
registerOperandSubqueries(parentScope, call, i);
}
} else if (node instanceof SqlNodeList) {
SqlNodeList list = (SqlNodeList) node;
for (int i = 0, count = list.size(); i < count; i++) {
SqlNode listNode = list.get(i);
if (listNode.isA(SqlKind.Query)) {
listNode =
SqlStdOperatorTable.scalarQueryOperator.createCall(
listNode.getParserPosition(),
listNode);
list.set(i, listNode);
}
registerSubqueries(parentScope, listNode);
}
} else {
// atomic node -- can be ignored
}
}
SqlNode.toString can be used to "unparse" a node back into text via pretty-printing (as described below). The toSqlString methods provide finer control over the SQL dialect and pretty printer implementation used. (SqlUtil.eigenbaseDialect produces something like SQL:2003 standard.)
The parser annotates instantiated nodes with SqlParserPos location information. These can be used for error reporting. Note that preserving positional correctness in the face of SQL construction or transformation can be challenging, so sometimes an unparse+reparse is the easiest solution.
Unit tests for the parser are in SqlParserTest. Positive tests work by parsing, unparsing, and then comparing the unparsed result to an expected string. Negative tests work by verifying that the parser throws a particular error, and that the positional information is where it is expected to be (using a shorthand of carets around the offending substring). Here's an example:
public void testNullIf()
{
checkExp("nullif(v1,v2)",
"NULLIF(`V1`, `V2`)");
checkExpFails(
"1 ^+^ nullif + 3",
"(?s)Encountered \"\\+ nullif \\+\" at line 1, column 3.*");
}
Eigenbase SQL Pretty Printer
Package org.eigenbase.sql.pretty implements an SQL pretty printer which works in terms of the SqlNode object model. The SqlPrettyWriter class provides a number of configurable options for controlling the way unparsing is performed. It also has a few bugs as of this writing.
Unit tests are in SqlPrettyWriterTest. Many of them work by diffing the pretty printer results against an expected string stored in farrago/src/org/eigenbase/sql/test/SqlPrettyWriterTest.ref.xml.
Farrago Parser Including DDL
The Farrago parser augments the Eigenbase parser with DDL parsing support. However, it does not currently supply a DDL object model; instead, it populates the catalog directly as it parses. Other than that, the general pattern followed is the same; net.sf.farrago.parser is the public package, whereas net.sf.farrago.parser.impl contains the classes generated by JavaCC.
The DDL parser grammar is defined in eigensrc://open/dev/farrago/src/net/sf/farrago/parser/CommonDdlParser.jj. This is combined with the Eigenbase CommonParser.jj plus (by default when there is no extension active) eigensrc://open/dev/farrago/src/net/sf/farrago/parser/DdlParser.jj to produce farrago/src/net/sf/farrago/parser/CombinedParser.jj.
A corresponding hyperlinked BNF grammar is also generated.
For an explanation of how Farrago uses the SQL parser for processing non-DDL statements, see the query processing overview (somewhat stale).
Parser Extensions
The Farrago extensibility overview explains why you might want to extend the parser with your own constructs in the context of creating a pluggable extension to Farrago. It also provides some examples of how this works. The corresponding random number generator example packages follow the usual JavaCC pattern:
eigensrc://open/dev/farrago/examples/rng/src/net/sf/farrago/rng/RngParser.jj shows how to extend the standard grammars. It is combined with CommonParser.jj from Eigenbase plus CommonDdlParser.jj from Farrago to produce the input to JavaCC, generating the actual parser used by the RNG extension.
TBD
- How parsers are plugged in to Farrago's session framework
- Interactions between parser, SQL validator, and DDL validator
- com.disruptivetech.farrago.sql.advise, a SQL autocomplete facility based on the validator
- How we maintain FarragoSqlReservedWords