Ability export issue events to csv

Hey guys,

It would be great for devs to be able export events of issue. In that case if something goes wrong - we can reproduce events with same data (stored with tags). :slight_smile:

Just simple export button would be great. I tried to use events api, but it have limit to 100 so I just changed a code a little bit it still provides not all events (I don’t know why, maybe some grouping applied).

Code which I changed in api/endpoints/group_events.py:

        limit = request.GET.get('limit')

        return self.paginate(
            request=request,
            queryset=events,
            order_by='-datetime',
            on_results=lambda x: serialize(x, request.user),
            paginator_cls=DateTimePaginator,
            default_per_page=limit,
        )

2 Likes

So we have something like this for tag exports (not exposed in the UI). I definitely dont see why we shouldn’t do a similar feature for events.

I think we can safely offer “export the last 1,000 events in this issue” without having to make any drastic infrastructure changes.

2 Likes

I found tag export also only by looking into source code - it could be mentioned somewhere or button in UI would be great.

Also question why default_per_page not providing all events even if I set limit to like 100000000000. Events in DB: 2.6k, exported: 900

Thats likely due to sampling. Sentry doesn’t store every single event, and the sample rate gets higher with issues that occur more frequently.

Any suggestion of fast way to export all events (I can hardcode something in code for one time)?

Sorry, to be clear the sampling isn’t a limitation of the data being returned, its a limit of what’s stored. It’s how we keep Sentry’s storage cheap and scaleable for customers:

https://docs.sentry.io/hosted/learn/rollups/

So if I understand correctly I can export all values of one tag (~2.6), but it’s not possible to rebuild events from tags?

Not with full accuracy.

When an event comes in we:

  • normalize it
  • increment counters and unique sets for issues / tags
  • possibly store the full JSON of the event (given the sampling logic)

If you do have an Event returned from the API you have 100% of the details.

Our goal in the future is to remove sampling, but to do that we need to build some new, non-trivial technology to ensure it is achievable without increasing the cost to our customers.

1 Like

Great answer :slight_smile: Keep it up :slight_smile:

Any updates on this feature? Was looking into exactly this today. We’re looking for a way to do some analysis on Events contained within a single Issue via their tag data. For example, if a given Issue had thousands of Events (potentially millions in our case), we’d like to be able to easily query and see how many Events had tag X versus tag Y, or a given tag value, and so on.

It looks like we can do some manual UI searching via the query interface at https://sentry.io///issues//events/, but I think that UI is missing some basic result metadata to allow the analysis. The resulting event list doesn’t include page size, number of pages, etc. Exposing that type of metadata in the UI would cover the simple scenarios for us now.

For more complex scenarios, being able to export in a JSON/CSV format would allow much more complex analysis.

Keep up the great work!

I may have spoken too soon - this may be the API endpoint we’re looking for:

https://docs.sentry.io/api/events/get-group-events/

However, exposing some of the result metadata in the UI would still be really helpful for quick analysis :slight_smile:

1 Like

Here’s a python script I wrote to convert the issues in a group into a csv file, using the endpoint you discovered. I’ve also noticed that for group with a large event count not all values are returned, but this is probably good enough if you want to do some extra analysis.

Hope it helps !

import os
import sys
import logging
import traceback
import csv

# Third party module, not installed with python. Needs to be installed with virtualenv + pip
import requests


def fetchUrl(url, authToken):
    '''Fetch a url and deal with authentication using an authToken'''

    try:
        s = requests.Session()
        r = s.get(url, headers={'Authorization': 'Bearer {}'.format(authToken)})
        r.raise_for_status()
        return True, r

    except requests.exceptions.RequestException:
        msg = traceback.format_exc()
        msg = 'Request error: {}'.format(msg)
        logging.fatal(msg)
        return False, None


def mkRow(event):
    # Here you can use the python debugger to inspect events
    # print event.keys()
    # import pdb; pdb.set_trace()

    row = {}
    # user_id, ts, device, device_family, os, release, version

    userInfo = event['user']
    userId = userInfo['data'].get('userid', 'n/a')
    row['userid'] = userId

    # If a better way to print time is needed, use this 
    # format "%Y-%m-%d %H:%M:%S" for excel when printing time,
    # after parsing from dateCreated
    row['timestamp'] = event['dateCreated']

    # (Pdb) event['contexts']['device']
    '''
      {u'model_id': u'D101AP', 
       u'family': u'iPhone', 
       u'simulator': False, 
       u'network_operator': u'AT&T', 
       u'architecture': u'64-bit', 
       u'model': u'iPhone9,3', 
       u'type': u'device'}
    '''
    deviceInfo = event['contexts']['device']
    row['device_family'] = deviceInfo.get('family', 'n/a')
    row['device_model'] = deviceInfo.get('model', 'n/a')
    row['device_model_id'] = deviceInfo.get('model_id', 'n/a')

    # OS
    '''
    (Pdb) event['contexts']['os']
    {u'kernel_version': u'n/a', 
     u'version': u'10.3.2', 
     u'type': u'os', 
     u'name': u'iOS', 
     u'build': u'14F89'}
    '''
    row['os_name'] = event['contexts']['os'].get('name', 'n/a')
    row['os_version'] = event['contexts']['os'].get('version', 'n/a')

    # parse tags
    tags = {}
    for item in event['tags']:
        key = item['key']
        value = item['value']
        tags[key] = value

    release = tags['release']
    version = tags['version']

    row['release'] = release
    row['version'] = version

    return row


def processEvents(url, output_path):
    '''Process all events for an issue and do something fun with it'''

    authToken = 'XXXX_YOUR_AUTH_TOKEN_XXXX'

    with open(output_path, 'w') as csvfile:
        fieldnames = [
            'userid', 
            'timestamp', 
            'device_family',
            'device_model',
            'device_model_id',
            'os_name',
            'os_version',
            'release', 
            'version'
        ]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

        writer.writeheader()

        while True:
            print 'fetching', url
            ok, r = fetchUrl(url, authToken)
            events = r.json()
            print 'retrieved {} events'.format(len(events))

            if len(events) == 0:
                break

            hasNext = r.links.get('next')
            if hasNext is None:
                break

            url = r.links['next']['url']

            for event in events:
                writer.writerow(mkRow(event))


if __name__ == '__main__':
    # Script input
    
    # 1. A url to the crash
    # https://getsentry.io/sentry/a_project_name/issues/5163217

    url = 'https://getsentry.io/api/0/issues/5163217/events/'

    # 2. The output csv file
    output_path = 'out.csv'

    processEvents(url, output_path)
2 Likes

Yeah, I’m noticing that it seems to cap out at 100 events. Thanks for the script!

Any update here?

We would want to export a query parameter value for each issue’s event. These values are the test input for which the issue should occur, and would drastically help us in debugging the problem and validating the proposed solution on a bigger scale.