Idempotent database inserts: Getting it right
Idempotence is a critical concept in software systems, ensuring that a given operation can be performed multiple times without changing the system's state beyond the initial application. It's particularly important in database operations, where duplicate inserts can wreak havoc on data integrity and lead to subtle, hard-to-debug errors. When done right, idempotent inserts enable fault tolerance and consistency in the face of retries, failures, and distributed system complexities.
A common solution
A common solution to idempotent database inserts looks as follows, using PostgreSQL as the database of choice:
CREATE TABLE idempotent_table (
id SERIAL PRIMARY KEY,
data VARCHAR NOT NULL,
idempotency_key VARCHAR UNIQUE NOT NULL,
);
-- Insert a row
INSERT INTO idempotent_table (data, idempotency_key)
VALUES ('foo', 'action1')
ON CONFLICT (idempotency_key) DO NOTHING;
-- Retry of the operation above
INSERT INTO idempotent_table (data, idempotency_key)
VALUES ('foo', 'action1')
ON CONFLICT (idempotency_key) DO NOTHING;
We create a table and attach a unique non-nullable column idempotency_key
, when writing the record to the table
we check if the operation is colliding on the unique constraint of the idempotency_key
and in that case do nothing.
With this implementation, we increased the resilience of the system by allowing it to retry a given action. But let's look a bit deeper into edge cases and the behavior of the proposed solution.
The flaw
Let’s imagine the following. At first, a record is inserted in the database with idempotency key action1
:
INSERT INTO idempotent_table (data, idempotency_key)
VALUES ('foo', 'action1')
ON CONFLICT (idempotency_key) DO NOTHING;
And then shortly after a new record should be inserted, but due to an error the same idempotency_key
value is used.
INSERT INTO idempotent_table (data, idempotency_key)
VALUES ('bar', 'action1')
ON CONFLICT (idempotency_key) DO NOTHING;
The outcome of the two statements would be: Only the first insert statement would be saved in the database and the second insert will execute successfully but will not write a second row.
The flaw is, that we assume that the presence of an idempotency key in the database guarantees the correctness of the corresponding data. This oversimplification can lead to systems that quietly accumulate stale, incomplete, or even incorrect records under the radar. And as everyone knows: assuming makes an ass out of you and me.
How to fix it? - The improved solution
The solution to the problem is rather simple, just don't assume that the record you want to insert and the record existing in the database are the same, but rather validate that they. This can be achieved by building a utility function. The example will be using Python and SQLAlchemy, but this can be adapted to your tech stack.
async def idempotent_insert(con: AsyncConnection, insert_statement: Insert, idempotency_key_column: Column) -> Row:
# Perform insert
cursor = await con.execute(
insert_statement.on_conflict_do_nothing(index_elements=[idempotency_key_column])
)
# If a record has been inserted, it is a new record and stop here
if cursor.rowcount:
return
values_to_insert: dict = {k: v.value for k, v in insert_statement._values.items()}
# Retrieve the existing idempotency key record from the table
idempotency_key_value = values_to_insert.get(idempotency_key_column.name)
cursor = await con.execute(
select(insert_statement.table).where(idempotency_key_column == idempotency_key_value)
)
existing_record: dict = cursor.fetchone()._mapping
# Compare the record in the database with the record we wanted to insert,
# if column values are differentiating, they will populate the variable not_matching_columns with a set of tuples (colum, value)
if not_matching_columns := set(values_to_insert.items()) - set(
existing_record.items()
):
raise IdempotencyConflictError(
f"To be inserted record does not match with existing record on the same idempotency "
f"'{idempotency_key_value}'. Different columns: {', '.join(map(lambda x: x[0], not_matching_columns))}"
)
With this implementation, a second usage of the same idempotency key leads to a full validation of the record to be inserted is equal to the record existing in the database table:
async with db_connection_pool.begin() as con:
await idempotent_insert(
con,
Insert(idempotent_table).values(idempotency_key='action', data='foo'),
idempotent_table.c.idempotency_key
)
# will raise IdempotencyConflictError pointing out that the entries differ in colum data
await idempotent_insert(
con,
Insert(idempotent_table).values(idempotency_key='action', data='bar'),
idempotent_table.c.idempotency_key
)
Caveats
This solution is not a silver bullet and comes with a few caveats, that I need to point out.
Alternative solution: The issue can also be resolved by implementing a reconciliation, that reconciles that e.g. for a given process A a record for A in the right shape is existing in the database. However, I found that implementing a utility that checks the consistency right at runtime, is simpler and points out the problem faster than implementing a reconciliation.
The proposed solution also works best in insert-only tables. As soon as modifications as updates come into the picture, updated columns need to be excluded when comparing the inserting record to the existing record in the table.