Source map not in utf8 encoding

We are getting the following message in our events.

Source file was not ‘utf8’ encoding: https://d2te007166g1k5.cloudfront.net/assets/ray-424d6b63283b18d7f8e4.js.map
{
“url”: “https://d2te007166g1k5.cloudfront.net/assets/ray-424d6b63283b18d7f8e4.js.map”,
“value”: “utf8”
}

This seems to prevent source maps from being used for our errors. I’m confused, because I believe that the file is encoded correctly.

What causes this error? How does sentry check for utf8 compliance?

Cheers,

Lang

@langer8191 – this typically means you’re trying to upload a gzipped file. What command are you using to upload this?

What causes this error? How does sentry check for utf8 compliance?

Here’s the code.

We are having a similar issues with source maps scraped by Sentry.

Sentry reports that our source file was not utf8.

However, it is valid utf8

  1. Uploading the map directly to Sentry with the API works fine. The problem only arises when it is scraped.

  2. Looking at the code that @benvinegar linked to, loading the file directly into a python repl and checking the contents, I see that the contents are indeed six.binary_type, and can be successfully decoded as utf8, so it’s unusual that the file is actually making it way into that code branch.

  3. Running iconv -f utf-8 against the file is successful.

Is it possible Sentry is incorrectly raising a utf8 error when the problem is something else? Maybe the file isn’t downloaded completely or something?

We also pull this information out of the HTTP headers in the response. Can you share a link to one of these assets? If it’s on sentry.io, we can also help in support and check there.

Also, I checked the URL you first posted, and it 404s. :slight_smile:

@matt thanks for mentioning the headers.

Sentry stumbles on my files when the content header is as follows:

Content-Type: text/plain

Sentry does fine when I force the content type header as follows:

Content-Type: text/plain; charset=utf-8

In either case, the file is valid utf-8.

Is this the expected behavior?

We did this because I’m pretty sure without the charset being utf-8, it’s handled as ascii. Thought I think it might be safe to not be so strict since I’m pretty sure that ascii is just a safe subset of utf-8.

@mitsuhiko is that correct?

Suggestion:

Decode it as utf8 (when not specified) and trap decode errors. This way, valid utf-8 will work even if it isn’t explicitly stated to be utf8.

We’re having the same issue, here is an example asset: https://s3.amazonaws.com/auctionex-website/assets/common.7aa9c65f2166f124023f.js.map

@naw’s suggestion seems like a sensible approach.

I’ll try and hack through this very soon. It shouldn’t be hard to implement

1 Like

I threw up a pull request with more information: https://github.com/getsentry/sentry/pull/4960

This problem surfaces when your server is sending back a text/* Content-Type without a charset. In which case were were getting an explicit ISO-8859-1 encoding value whereas we expected either None or utf-8. So the PR is explicitly allowing this charset since it’s a strict subset of utf-8 therefore fully compatible.

1 Like