Extremely large initial download. Login takes 1+ minutes. What is this?

wedamija · July 22, 2021, 9:53pm

Ok, so looking at our code:

getsentry/sentry/blob/ee22e171f2beffa93cf9928ee7ee2fe5e24f00f9/src/sentry/models/organizationonboardingtask.py#L60

    
      
          #   USER_CONTEXT:    User has added user context to sdk
          #   ISSUE_TRACKER:   Tracker added, issue not yet created
          
          

          
class OrganizationOnboardingTaskManager(BaseManager):
              def record(self, organization_id, task, **kwargs):
                  cache_key = f"organizationonboardingtask:{organization_id}:{task}"
                  if cache.get(cache_key) is None:
                      try:
                          with transaction.atomic():
                              self.create(organization_id=organization_id, task=task, **kwargs)
                              return True
                      except IntegrityError:
                          pass
          
          
            # Store marker to prevent running all the time
                      cache.set(cache_key, 1, 3600)
          
          
        return False

This is where we write these rows. We rely entirely on an integrity error being thrown here to prevent duplicates, and since you don’t have the unique on organization_id, task, then you’re getting duplicate rows here. I’m confused about how this could be missing, main options are:

Is there a chance someone removed this on your install?
If not, then what version of Sentry did you start with, what was your upgrade path to your current version, etc? I want to make sure that this isn’t affecting other on-premise users. It’d be helpful to know your upgrade path so that we could try and reproduce this ourselves.

What you want to do here is correct your data so that there’s only one row per task per org, then you can add the unique constraint back in place.

You can identify duplicate rows per task with this. I’ll leave it up to you to write a delete here:

select *
from sentry_organizationonboardingtask
where organization_id = 1 and task = 5
and id != (select min(id) from sentry_organizationonboardingtask where organization_id = 1 and task = 5)

You’ll need to repeat this for each task in your system that has duplicates.

Then this recreates the unique index

CREATE UNIQUE INDEX sentry_organizationonboar_organization_id_47e98e05cae29cf3_uniq ON public.sentry_organizationonboardingtask USING btree (organization_id, task)

This will fail unless you remove all duplicate rows, but it’s important to get this back in place, otherwise you’ll hit this issue again.

There might be some timing issues here with more duplicate rows getting added. One option to avoid that is to stop ingestion for your Sentry instance while you are fixing data and adding this index in place.

Topic		Replies	Views
Performance Sentry API - on Premise, Kubernetes On-Premise	2	1172	November 2, 2021
Memory usage of web workers On-Premise	11	3465	November 17, 2021
Performance Issues On-Premise	2	2369	January 28, 2019
Sentry-web Slow startup On-Premise	7	3006	September 23, 2021
I really can't see why this timeout is why? I want to be sure my configuration is okay? Or does my sentry not have access to this account?	1	877	May 15, 2018

Extremely large initial download. Login takes 1+ minutes. What is this?

Related topics