@bittner its almost all inside of Sentry itself. The SDKs do provide a way override the behavior (‘checksum’, deprecated, and ‘fingerprint’).
Here’s more or less how it works:
Each interface provides a couple of hash abstractions. For example, here’s where a portion of an exception enters:
To answer a few of your questions:
Does this mean if we need to filter or ignore the same exception, say, for different web browsers then this information is not possible to have; you get a single, aggregated logging item (event) in Sentry regardless of other, say, front-end related, details. Is this correct?
Our long term goal is to aggregate by root cause. We dont achieve this as well as we’d like, and there’s some efforts going to improve our standard heuristics as well as some basic machine learning concepts to find similar-but-not-exact matches.
What happens with additional information (that causes aggregation not to happen) when events are merged manually in Sentry? Which information is thrown away (if any)?
At this stage I believe we say “no data is lost on merging”. That may not be 100% true yet, but there’s an effort undergoing that allows you to merge, split, rehash, etc events (on the backend) which is needed to power some upcoming things.
Does this mean filtering or ignoring exceptions for specific tags, say, specific browsers, devices or OSes is not reliably possible (because some information may be thrown away or ignored to facilitate aggregation)?
Nope, works very well. Filtering/ignore behavior happens before aggregation, but after processing.