What's the point of scopes (and event_processors) if we have a configurable before_send function?

saurabhnanda · October 24, 2020, 6:23pm

The title says it all. If the before_event function is added to “stack”, doesn’t it achieve everything that scopes and event_processors achieve? With lesser moving parts?

untitaker · October 24, 2020, 6:38pm

Yes, we could have users replace all uses of of sentry_sdk.set_tag with registering a callback that sets the event, but I don’t think people would appreciate that. They don’t often want to think about what our internal event structure looks like.

event_processors already live on the scope stack and are the same thing that you call before-event.

We do have some other usecases for storing data separately, particularly when we want to attach it to sessions. Sessions are not events, so calling the callback would not be possible (as we don’t have the event argument)

saurabhnanda · October 24, 2020, 7:04pm

Sorry, I meant, before_send

How do other SDKs deal with the following flow: sdk.set_tag('foo') => fork a new thread => capture_event? Is the hub’s default scope used now, or is the scope with tag=foo used?

Further, IIUC, at https://github.com/getsentry/sentry-python/blob/644bfa842bc31a020da1fc8dc53e070febacad9a/sentry_sdk/scope.py#L335-L406 some properties in the scope are added to the event’s existing properties, while some replace the existing properties. How does the user know? For example, if at one place in my code I call sdk.set_user(email: 'foo@bar.com') and at another place I call sdk.set_user(ip_address: 'a.b.c.d'), what will the final event contain?

saurabhnanda · October 24, 2020, 7:13pm

Is there a document which specifies what can be in the scope, or is the Python SDK the canonical definition? https://github.com/getsentry/sentry-python/blob/52830558bb535d7ff8e09b27703c99425262067f/sentry_sdk/scope.py#L77-L96

saurabhnanda · October 24, 2020, 7:18pm

I guess that document would be https://develop.sentry.dev/sdk/unified-api/#scope

untitaker · October 24, 2020, 7:20pm

Regarding set_user specifically, you always set the entire dictionary so it would override the user property entirely. This should be the case for all methods. For example we don’t try to destructure arguments to set_extra. set_extra takes two arguments, a key, and a value, precisely because it updates self._extra instead of replacing (but then, it does not update, but replace, the inner key).

If a new thread is forked, we fork the data from the main thread everytime (we call this the main hub, you call it default scope). This not only affects the scope data but also which DSN is used, so you can use a different DSN on a different thread.

We eventually want to change this behavior to fork from the spawning thread such that tag=foo is inherited in your example. The propagate_hub option within the ThreadingIntegration can actually be used to control this behavior to be either of the two behaviors you describe: https://docs.sentry.io/platforms/python/configuration/integrations/default-integrations/#threading

saurabhnanda · October 25, 2020, 8:40am

Wrt functions like set_user, et al (as documented at https://develop.sentry.dev/sdk/unified-api/#scope ) is there any SDK which handle scopes in an immutable fashion? I’m having a hard time coming up with a sensible design in Haskell, given the following constraints:

Scope should be mutable, i.e. if I call setUser at one point in the code it should stay that way as long as the thread executes. This can be implemented using an IORef or MVar in Haskell.
Forking a new thread should establish a new scope – this new scope might be a copy of the existing scope, but it should NOT reference the same scope. i.e. changes to the scope in thread A, should not impact the scope in thread B. The only way to do this in Haskell (that I know of) is to write a wrapper on top of the core forkIO call, can expect SDK users to use that. Or introduce a new clearScope (or unlinkScope) function, and expect SDK users to call it immediately after they fork a thread.

Has this problem been solved in any other SDK? Is it alright if mutable scopes are dropped, and the only way to use scopes is the following:

withScope $ \scope -> do
  -- scope will not be accessible outside this block

-- OR 

let scope = Scope { user = ..., tags= ... }
captureEvent scope evt

untitaker · October 25, 2020, 10:59am

I don’t know much about Haskell but there’s a secondary API in the Python SDK that takes out most of the thread-local magic, and is supposed to be used when the SDK doesn’t follow the control flow properly. Perhaps only that can be replicated in Haskell, which would then at least be a strict subset, not something else entirely:

hub = Hub(Client(dsn))

# I guess in Haskell this would give you a new scope that you need to put into a forked hub?
# possibly reexport methods on hub the same way they're reexported as module functions in 
hub.scope.set_tag(a, b)

hub.capture_message(...)

forked_hub = Hub(hub)

def run():
    # this sets forked_hub to be Hub.current, such that sentry_sdk.set_tag does the right thing... probably not replicable, instead Haskell would call methods on forked_hub
    with forked_hub:
        pass # do some work with forked_hub here, that hub has the same tag a=b

t = Thread(target=run)
t.start()

If you manage to replicate that, at a later point you could figure out the thread local storage situation in haskell, if that is ever something achievable, and implement it as a wrapper potentially without breaking changes

Has this problem been solved in any other SDK?

We don’t have immutable data structures like that in other SDKs. Closest is IMO Rust where we just acquire a lock or something. But that’s an implementation detail. Rust does allow for thread-local storage so there’s not a lot of changes compared to Python at all.

untitaker · October 25, 2020, 11:14am

The most interesting edge-cases to look at are probably the Go SDK (no working thread-local storage), mobile/browser SDKs (just very different requirements as to how to follow execution flow). I think you’ll find that with regards to how execution flow is actually followed every SDK just does its own thing once thread-local storage is no longer an option.

Twisted async is kinda similar in Python to the Go situation, but we just decided not to have first-class support for that (and have people use hubs directly as shown above)

saurabhnanda · October 25, 2020, 11:30am

IIUC the first SDK to be written was Python, and a lot of docs and guidelines are written with the Python implementation in mind, which may not directly translate to other languages. Is that right?

So, I’m approaching this from first principles and trying to come up with a sensible developer UX native to Haskell without compromising the SDK’s feature set. In this regards, what are the advantages/use-cases of having a mutable scope that can be modified from anywhere in the code?

Immutable, but nested, scopes allow one to write code that looks like the following:

withScope (\s -> addTags s "tname" "tval") $ do
  captureMessage "whatever"        -- tags: {"tname": "tval"} 
  withScope (\s -> addExtra s "eName" "eVal") $ do
     captureMessage "whatever"     -- tags: {"tname": "tval"}  AND extra: {"eName": "eVal"}
  captureMessage "whatever"        -- tags: {"tname": "tval"}

Mutable scopes allow one to write code that looks like the following:

catch handler action
where
  handler e = captureException e       -- tags: {"tname": "tval"}
  action = do
    setTags "tname" "tval"

I’m dog-fooding the WIP SDK and integrating it in my Haskell code-base that has a web-server and job-queue. All useful properties/context that I’d like to capture can be captured by immutable, but nested, scopes. What are some use-cases for mutable scopes in the context of a server-side language?

saurabhnanda · October 25, 2020, 11:39am

I’m reading the docs for sentry-go, and this is one of the things that I"m worried about if I introduce mutable scopes in Haskell:

Otherwise, data races can introduce subtle bugs to your programs, and the consequences vary from nothing apparent to unexpected crashes or, worse, accidentally mixing up data stored in the Scope .

And the way the Go SDK seems to handle this is by expecting the programmer to manually ensure that two threads aren’t referring to the same Scope:

The easiest way to handle this, is to create a new Hub for every goroutine you start, however this would require you to rebind the current Client and handle Scope yourself. That is why we provide a helper method called Clone . It takes care of creating a Hub , cloning existing Scope and reassigning it alongside Client to newly create instance.

untitaker · October 25, 2020, 11:39am

IIUC the first SDK to be written was Python, and a lot of docs and guidelines are written with the Python implementation in mind, which may not directly translate to other languages. Is that right?

We did start out with Python, yeah, but did adjustments as far as possible when we revised the design to work for other langs. No doubt there are still assumptions left though.

Some notes about your use of withScope and capture*:

If you e.g. look at how Django applications are instrumented with Sentry, each Django request handler runs within its own scope. That allows you to run sentry_sdk.set_tag once in the request handler and have every error happening as part of that request flow automatically annotated with that tag.

In Python and JS, there is often no need to ever call capture* for basic instrumentation, since we hook into so many global exception signals you mostly just initialize the SDK, and set tags/extra at the appropriate places to enrich events. But even that last part is optional.

The point I am trying to make is that creating scopes for the sole purpose of having a single capture* call in it is not the main usecase. Rather your entire business logic for handling a HTTP request/running a task from a task queue is wrapped in a scope.

I can’t tell you which of the two code snippets is better though, I don’t know Haskell well enough to understand the tradeoffs.

saurabhnanda · October 25, 2020, 11:49am

Right. So, if I were using the bare Python SDK in, say, a terminal app, which had the following flow…

# PS: Pardon my syntax -- been a long time since I wrote code in Python.

def thread1():
    Hub.scope.set_extra('thread_name', 'thread1')
    # long running thread where any errors are to be reported with thread_name=thread1

def thread2():
    Hub.scope.set_extra('thread_name', 'thread2')
    # long running thread where any errors are to be reported with thread_name=thread2

init_sdk
t1 = Thead(target=thread1)
t1 = Thead(target=thread2)
t1.start()
t2.start()

… who would be responsible for ensuring there are no race conditions between the scopes of thread1 and thrad2? Will the current Python SDK automagically ensure that, or will the programmer have to do this?

untitaker · October 25, 2020, 11:53am

That works automatically. You get race conditions in async code (pre-asyncio) and when you mess with hubs manually and screw it up:

hub = Hub.current

def thread2():
    with hub:
        set_extra(...)

def thread1():
    with hub:
        set_extra(...)

this now refers to the same scope in both threads. But if you remove all lines that mention hub OR replace with hub with with Hub(hub), you’re good again.

untitaker · October 25, 2020, 11:59am

See also https://docs.sentry.io/platforms/python/troubleshooting/

Topic		Replies	Views
How to customize event in before_send SDKs	4	6235	August 19, 2019
Scopes and multithreading in python SDKs	5	4249	November 12, 2018
Adding tags in before_send hook (Python)	1	2651	August 15, 2019
Unified SDK API SDKs	0	2359	August 2, 2018
Sentry.setTag() doesn't report tags for events all the time SDKs	4	1426	July 22, 2021

What's the point of scopes (and event_processors) if we have a configurable before_send function?

Related topics