Uniqueness constraints are defined during table creation, as part of the CREATE TABLE statement.
CREATE TABLE employee_dimension( emp_key INT GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, empno INT NOT NULL, ename VARCHAR(128) NOT NULL, dname VARCHAR(128) NOT NULL, job VARCHAR(128) NOT NULL, CONSTRAINT ed_empno_unique UNIQUE(empno) ) CREATE INDEX ed_names ON employee_dimension(ename, dname) ;
The two types of uniqueness constraints are (1) PRIMARY KEY constraints and (2) UNIQUE constraints. Uniqueness constraints may apply to a single column or to several columns. In general, the constraints guarantee that no two rows will have the same key value, but the two types of constraints differ slightly with regard to the handling of null values. While SQL:2003 describes a deferred constraint enforcement feature, LucidDB does not yet support this (constraints are always active).
- PRIMARY KEY constraints do not permit any key column to be null.
- UNIQUE constraints do permit key columns to be null (unless explicitly declared as NOT NULL). Unique constraints do not apply to rows where any key columns are null.
PRIMARY KEY null [not allowed] null, 23, foo [not allowed] UNIQUE null [allowed] null [allowed to repeat] null, 23, foo [allowed] null, 23, foo [allowed to repeat] null, 23, null [allowed]
A system-defined index is created automatically for each uniqueness constraint defined; this allows LucidDB to enforce constraints efficiently. For multi-column constraints, index column order matches constraint column order. Note that column order is not significant for constraint semantics, e.g. a constraint on (A,B) is logically equivalent to a constraint on (B,A). But column order is significant to the optimizer for the automatically-created indexes, so take this into account when declaring a constraint over multiple columns.
LucidDB does not support a CREATE UNIQUE INDEX command; use a constraint for this purpose instead.
Bulk Load and Upsert Semantics
Warning: LucidDB diverges from the SQL:2003 specification in this area
SQL:2003 specifies the following rules for predictable behavior:
|SQL:2003 Rule||LucidDB Enforcement|
|Constraints are evaluated only on the net effect of a DML statement (avoiding ambiguities due to intermediate states, e.g. update before insert vs. insert before update)||LucidDB always follows this rule|
|DML statements are ACID (either all requested modifications are performed without error, or the entire statement is rolled back)||LucidDB follows this rule by default, but allows for more permissive behavior (while still preserving constraint integrity) via the non-standard ALTER SESSION command (the "errorMax" parameter described in LucidDbSessionParameters controls this behavior, as described in LucidDbErrorHandling)|
|An upsert attempt in which multiple source rows match the same target row results in an error||LucidDB only follows this rule when there is a uniqueness constraint defined on the columns of the target table used in the ON clause of the MERGE statement; it treats violations the same as any other uniqueness violations. Forgetting to define a uniqueness constraint will lead to surprising results (multiplication of rows in the target table).|
The rules above are useful in transactional applications, but often too inflexible in data warehousing ETL scenarios. So, when LucidDB detects constraint violations and other errors during bulk load, it is capable of rejecting them and continuing ("keep-going" behavior) up to a user-specified error limit, rather than failing with a rollback ("fail-fast" behavior).
In keep-going mode, LucidDB uses the following rules:
- The SET clause of an UPDATE or MERGE WHEN MATCHED statement is not allowed to update any column which participates in at least one uniqueness constraint. Any such attempt will be rejected by the SQL validator. This eliminates many possible anomalies which would otherwise prevent LucidDB from efficiently enforcing uniqueness constraints during bulk load.
- During MERGE, when multiple source rows match the same target row, and a uniqueness constraint is defined on the match key, one of the source rows is arbitrarily chosen to replace the target row. All other source rows are rejected.
- During MERGE or INSERT, when multiple new rows (non-matching for MERGE) have the same new key value, one of the new rows is chosen arbitrarily and inserted. All other source rows with the same key value are rejected. (If the arbitrarily chosen new row happens to be rejected due to some other error, then all of them are rejected; no "standby" is chosen.) New rows which contain keys conflicting with existing rows are always rejected.
Note that in fail-fast mode, a MERGE or UPDATE statement is free to update columns which participate in uniqueness constraints, and the SQL:2003 "net effect" rule applies. This makes it possible to perform an operation such as SET K = K + 1 (without the net-effect rule, this can cause spurious failures, e.g. if existing values of K are 1,2,3).
Example (MERGE on primary key EMPID):
[ORIGINAL DATA] EMPID, NAME 1, Harry Osborn 2, Mary Jane [UPSERTS] EMPID, NAME 1, Peter Parker 1, John Jameson 2, Mary Parker 3, Drake Roberts 4, Anjelica Jones 4, Johnny Storm [HYPOTHETICAL SQL:2003 RESULT, IF RULES WERE CHANGED TO SUPPORT KEEP-GOING BEHAVIOR, ALL-OR-NOTHING PER KEY] EMPID, NAME 1, Harry Osborn 2, Mary Parker 3, Drake Roberts [Multiple updates on EMPID 1 are rejected] [Multiple inserts to EMPID 4 are rejected] [LUCIDDB RESULT WITH KEEP-GOING BEHAVIOR] EMPID, NAME 1, Peter Parker [LucidDB arbitrarily selects first update] 2, Mary Parker 3, Drake Roberts 4, Anjelica Jones [LucidDB arbitrarily selects first insert]
Example (MERGE on primary key EMPID, unique constraint on NAME):
[ORIGINAL DATA] EMPID, NAME, INFO 1, Harry Osborn, Wealthy teenager [UPSERTS] EMPID, NAME, INFO 1, Harry Osborn, President of Osborn Inc 2, Harry Osborn, Hobgoblin [LUCIDDB RESULT WITH KEEP-GOING BEHAVIOR] EMPID, NAME, INFO 1, Harry Osborn, President of Osborn Inc [new Hobgoblin row is rejected because value for NAME conflicts with existing key]
On insertion, LucidDB accepts at most one row per unique key. A complex pattern of constraint failures may lead to more rows being rejected than is strictly required (due to the "no standby" policy).
Example (primary key EMPID, unique constraints on NAME and ALIAS):
[ORIGINAL DATA] EMPID, NAME, ALIAS, INFO 5, Dr Otto Octavius, Doctor Octopus, Scientist [INSERTS] EMPID, NAME, ALIAS 6, Dr Otto Octavius, Doc Oct [fails NAME constraint due to conflict with existing EMPID 5] 7, Dr Octavius, Doc Oct [fails ALIAS constraint due to conflict with previous new row] 8, Otto, Doc Oct [fails ALIAS constraint due to conflict with previous new rows] [LUCIDDB RESULT] EMPID, NAME, ALIAS, INFO 5, Dr Otto Octavius, Doctor Octopus, Scientist
If instead the rows had been processed in the order 7, 6, 8, then row 7 could have been accepted.