FarragoMedMqlPlugin
Contents |
Introduction
This page describes an experimental Farrago SQL/MED plugin which allows Freebase (or any Metaweb server) to be queried via JDBC+SQL. The plugin converts relational algebra expressions into MQL queries and sends them to an mqlread web service, then converts the JSON results back into tuples for further processing inside of Farrago's own executor.
MQL is designed for queries over semistructured graph-oriented data, so in general, it should be used directly. However, an optional relational view may also be useful:
- for cases where query complexity requires SQL operators such as join, aggregation, union, and subqueries
- for federated SQL queries over a metaweb graph combined with tables or other SQL/MED data sources
- for cases where a front end wants to talk JDBC+SQL (intergalactic dataspeak)
This page walks through the setup and a few examples.
Set Up Eigenbase Release
The first step is to download and install a binary release of Eigenbase 0.8.0. Pick the right one for your platform from the Sourceforge download page. We'll use Linux in the examples on this page.
Make sure you have JAVA_HOME pointing to a 1.5 JDK or JRE.
Unpack the Eigenbase distribution, and then under the eigenbase-0.8.0/install directory, run install.sh (or install.bat for Windows).
A LucidDB 0.8.0 distribution should work also since it is derived from Eigenbase (the LucidDB SQL optimizer may give better results for complex queries).
Set Up The Plugin
Download and unpack the plugin from downloads.sourceforge.net.
Then, start sqllineEngine for running SQL commands interactively:
$ cd eigenbase-0.8.0/bin/ ~/eigenbase-0.8.0/bin$ ./sqllineEngine Connecting to jdbc:farrago: Connected to: Eigenbase Data Management System (version 0.8.0) Driver: FarragoJdbcDriver (version 0.8) Autocommit status: true Transaction isolation: TRANSACTION_REPEATABLE_READ sqlline version 1.0.5-eb by Marc Prud'hommeaux 0: jdbc:farrago:>
Edit the following script to match the paths for your site, and then paste it into sqlline to execute it:
create schema metaweb; create or replace foreign data wrapper mql_wrapper library '/path/to/farrago-mql-plugin/farrago-mql.jar' language java; create or replace server mql_server foreign data wrapper mql_wrapper; create or replace jar metaweb.mql_jar library 'file:/path/to/farrago-mql-plugin/farrago-mql.jar' options(0);
Then run this additional setup step (no path editing required):
create or replace function metaweb.mql_query( url varchar(4096), mql varchar(65535), row_type varchar(65535)) returns table( objects varchar(128)) language java parameter style system defined java dynamic_function no sql external name 'metaweb.mql_jar:net.sf.farrago.namespace.mql.MedMqlUdx.execute';
The plugin is now installed and ready for use.
Test The Plugin
The plugin is not yet capable of retrieving metadata, so we need to explicitly create a table definition corresponding to the Metaweb type of interest:
create or replace foreign table metaweb.artists(
"name" varchar(128), "id" varchar(128))
server mql_server
options (metaweb_type '/music/artist');
The columns of interest need to be explicitly referenced and assigned types; they can be a subset of all of the available attributes on the type. Note that the plugin is not yet capable of dealing with multivalued attributes (e.g. the tracks of an album), so omit those.
Now you can run queries against the foreign table:
0: jdbc:farrago:> select "name" from metaweb.artists where "id"='/en/gene_kelly'; +-------------+ | name | +-------------+ | Gene Kelly | +-------------+
0: jdbc:farrago:> select count(*) from metaweb.artists; +---------+ | EXPR$0 | +---------+ | 100 | +---------+
(Well, there are actually a lot more than 100, but by default MQL imposes a limit on the number of results returned!)
You can also combine local data with MQL data:
create table metaweb.artist_list(name varchar(128) not null primary key);
insert into metaweb.artist_list values ('Gene Kelly'), ('Depeche Mode'), ('ABBA');
select "id" from metaweb.artists, metaweb.artist_list where artists."name" = artist_list.name;
(There are some bugs with subqueries, so use a join instead.)
Tracing
You can see the MQL sent to the webservice by using EXPLAIN PLAN:
0: jdbc:farrago:> !set maxwidth 1000
0: jdbc:farrago:> !set outputformat csv
0: jdbc:farrago:> explain plan for select "id" from metaweb.artists where "name" = 'Gene Kelly';
'column0'
'FennelToIteratorConverter'
' FennelReshapeRel(projection=[[1]], outputRowType=[RecordType(VARCHAR(128) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary" id) NOT NULL])'
' IteratorToFennelConverter'
' FarragoJavaUdxRel(invocation=[CAST(MQL_QUERY(CAST('http://api.freebase.com/api/service/mqlread'):VARCHAR(4096) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary",
CAST('{"query":[{"type":"/music/artist","id":null,"name":"Gene Kelly"}]}'):VARCHAR(65535)...
You can also see the full MQL request/response by turning on trace net.sf.farrago.namespace.mql.level=FINE in trace/Trace.properties and then checking the output in trace/Trace.log after executing a query.
Currently, the optimization rules only support projections and simple conjunctive filters of the form COLUMN1 = literal1 AND COLUMN2 = literal2 AND .... Non-equality operators are not translated, and are instead executed by Farrago:
0: jdbc:farrago:> explain plan for select "id" from metaweb.artists where "name" like 'Gene%';
'column0'
'IterCalcRel(expr#0..1=[{inputs}], expr#2=['Gene%'], expr#3=[LIKE($t0, $t2)], id=[$t1], $condition=[$t3])'
' FarragoJavaUdxRel(invocation=[CAST(MQL_QUERY(CAST('http://api.freebase.com/api/service/mqlread'):VARCHAR(4096) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary",
CAST('{"query":[{"type":"/music/artist","id":null,"name":null}]}'):...
Server Options
By default, the URL http://api.freebase.com/api/service/mqlread is used for communicating with the mqlread service. You can override this with an option on the server definition, e.g. to switch to a sandbox:
create or replace server mql_server foreign data wrapper mql_wrapper options (url 'http://api.sandbox.freebase.com/api/service/mqlread');
Enhancements Needed
The plugin is currently just a proof of concept. For real use, a lot of enhancements would be needed:
- use a real JSON library; the current result set parsing makes a lot of assumptions about the formatting, meaning it will break if the metaweb service implementation changes anything
- fix some assumptions about field name translation (currently causing the query select "id" from metaweb.artists where "name" in (select * from metaweb.artist_list); to return no results)
- push down joins, sorts, and aggregations into MQL (not just projections and filters)
- support retrieval of large results via cursors instead of using the default 100-item limit
- add comprehensive filter translation support
- allow complex MQL queries to be specified in foreign table definitions (instead of just type extents)
- feed MQL's result count estimation up into the SQL optimizer
- implement the SQL/MED metadata retrieval interfaces so that relations can be inferred from types automatically
- when Farrago multiset and UDT support gets completed, allow mapping hierarchical result sets from MQL back into SQL
