Duplicate key value violates

Lately I see lots of errors in postgresql.
I need to pay attention to it? What to do with it?

PostgreSQL 9.5

ERROR:  duplicate key value violates unique constraint "sentry_eventuser_project_id_1a96e3b719e55f9a_uniq"
DETAIL:  Key (project_id, hash)=(2, 36de309602983818b4c2b9b606ef4ff4) already exists.
STATEMENT:  INSERT INTO "sentry_eventuser" ("project_id", "hash", "ident", "email", "username", "ip_address", "date_added") VALUES (2, '36de309602983818b4c2b9b606ef4ff4', '40644', NULL, '5ba1ccc87asdasdasd11202420a000003', '10.0.0.10', '2017-02-11 07:51:09.136418+00:00') RETURNING "sentry_eventuser"."id"

ERROR:  duplicate key value violates unique constraint "sentry_organizationonboar_organization_id_47e98e05cae29cf3_uniq"
DETAIL:  Key (organization_id, task)=(1, 5) already exists.
STATEMENT:  INSERT INTO "sentry_organizationonboardingtask" ("organization_id", "user_id", "task", "status", "date_completed", "project_id", "data") VALUES (1, NULL, 5, 1, '2017-02-11 07:51:09.239742+00:00', 2, '{}') RETURNING "sentry_organizationonboardingtask"."id"

ERROR:  duplicate key value violates unique constraint "sentry_grouprelease_group_id_46ba6e430d088d04_uniq"
DETAIL:  Key (group_id, release_id, environment)=(1930, 262, ) already exists.
STATEMENT:  INSERT INTO "sentry_grouprelease" ("project_id", "group_id", "release_id", "environment", "first_seen", "last_seen") VALUES (2, 1930, 262, '', '2017-02-11 07:50:20+00:00', '2017-02-11 07:50:20+00:00') RETURNING "sentry_grouprelease"."id"

You can ignore these errors. We use this pattern a lot in our code. We try to blindly INSERT a row, then depend on the database error, catch the error, and move on.

So these logs are safe to ignore.

1 Like

Thanks you.

:evergreen_tree:

I honestly cannot believe I just read this. Here I thought I had some corruption in my database, and I’m finding this is expected behavior. IMO relying on catching errors is a TERRIBLE idea. This is not inspiring my confidence in this code.

1 Like

I think you’re trolling, but I’ll bite anyways.

The behavior we’re doing is more performant. We rely on attempting to insert and on conflict, ignore. If you have someone to complain to, complain to Postgres. Without doing this, we’d have to double query count by doing a SELECT first just to check if it exists before INSERTing. So there’s very little value in doing that when we can just blindly INSERT and be faster.

Now, the reason we do this is kinda legacy. I wanna say Postgres 9.5, but could be wrong, introduced the ability to do INSERT … ON CONFLICT DO NOTHING, which is exactly the behavior we are mimicking here, but doing it at the database level, which would avoid the errors.

Until recently, we were running Postgres 9.3 in production thus we couldn’t even use this feature if we wanted. Now we run Postgres 9.6 where we can, but the Django ORM doesn’t support syntax to construct a query like this.

We have the potential to support this by extending the ORM ourselves or slapping down custom SQL everywhere to add ON CONFLICT DO NOTHING.

We have no plans on doing this just yet, but I personally kinda want to do it. There’s just little incentive or motivation to do so, so it’s hard to prioritize when the behavior won’t change.

I’ve been using postgres since 7.x, and while I too have had to account for deficiencies in it over the years, allowing code to error and considering that a viable solution was never a thing I had to do. Sure, having to select before insert adds some overhead, but that would have been the right thing to do at the time. IMNSHO.

Just saying, I did have db corruption yesterday, and Sentry was throwing errors because of it. Once I fixed it (had to drop a column and recreate it), and the db kept complaining about duplicate keys, I wasted a few hours trying to address what I figured to be more corruption, only to find it was a feature!

So, in closing, I’m going to argue that anything that’s spamming logs with errors might lead to someone trying to fix them.

“Right” is pretty subjective here. It’s not overhead we could have afforded and still can’t since it’s too many extra queries. I’d argue that the fact that Postgres yells loudly about it is the problem since it’s not an error. Arguable it’s a warning or info level type log.

We tune Postgres and just disable these logs so we don’t see it. You’re free to do the same. Maybe at some point in the future we’ll change this behavior, but like I said, this is hard to justify the time for right now since it’s a big change with functionally no difference.