The NuoDB Sequence Generator: GENERATED (ALWAYS/BY DEFAULT) AS IDENTITY and the UNIQUE constrant

{question}

NuoDB allows a field to be automatically generated using the clause GENERATED (ALWAYS/BY DEFAULT) AS IDENTITY. The documentation notes

You can use GENERATED ALWAYS AS IDENTITY to improve performance for inserts and updates. When you specify this clause the database itself is responsible for generating unique values for the column. This makes index operations against the column complete more quickly. In other words, strictly from the performance point of view, it is typically faster to perform inserts against a table whose primary key is GENERATED ALWAYS AS IDENTITY than one whose keys are provided by the client.

How does the use of ALWAYS/BY DEFAULT affect performance? How does the sequence generator interact with the UNIQUE constraint?

{question}

{answer}

Guaranteeing uniqueness can be complicated in a distributed database. The sequence generator simplifies this by allowing Transaction Engines (TEs) to reserve blocks of unique values. So if a UNIQUE field is GENERATED ALWAYS AS IDENTITY, a TE can assign a value without consulting the others.

If the field is UNIQUE and generated BY DEFAULT, the TE can't reserve blocks of values because it can't guarantee they haven't been used. This means that ensuring uniqueness involves checking with the other TEs. This is the performance difference between ALWAYS and BY DEFAULT.

If no UNIQUE constraint is specified for the field, there isn't a performance difference between ALWAYS and BY DEFAULT.

Consider the following use case:

There is a table EMPLOYEES where each employee (row) has a unique ID. Sometimes these rows are moved to an archive. When we move them back, we want to keep their initial ID value. Can this be achieved without incurring performance overhead?

The way blocks are reserved by the sequence generator follows three properties:

1. Sequences are non-repeating - This is the relevant rule for the example. If a row in EMPLOYEES is generated with empID=5 and is later deleted or moved to an archive, "5" will not be reused by the sequence generator and the row can safely be restored from archive.

2. Sequences are non-monotonically increasing across TEs - If rows are being inserted across multiple TEs the order can be something like 1, 101, 201, 2, 3, 102... While these do not increase with every INSERT they will increase over time as lower values are exhausted.

3. Sequences are monotonically increasing within a TE - Each TE goes through its chunk in order and the next chunk it reserves will be a higher range, so there might be something like 1,2,3, ... 100, 301, 302...

Because of the first property GENERATE BY DEFAULT can be used without the UNIQUE constraint as long as the application can guarantee that the manually specified values have already been generated and removed from the table.

{answer}

Have more questions? Submit a request

Comments