They’re stored on the filesystem - in 2019 there are infinite ways you can backup disks, or better yet you could just offload your disks to Amazon (or any other cloud provider).
Wow, thanks for that incredible insight - not condescending at all. I feel all warm and fuzzy inside now.
I am running Sentry on GKE and the disks are offloaded to persistent volumes in the cloud. As I’m sure you know, even in 2019, having guaranteed disks is hardly the same as having backups. File systems and individual files can get corrupted because of software errors etc.
If I look at gitlab, for instance, it comes with a backup script that backs up its database, files, object storage, etc. all in one neat restorable package. All I was asking is if there is a similar thing available for Sentry, or if there is some sort of best practice, or perhaps if I was missing anything.
I understand now there is not, so thank you for your answer.
A backup script would just generate a continuous stream of “integrate MY solution” requests. It’d also be unhelpful for anyone applying their existing config mgmt. solutions for backup.
What would be an improvement is to fix the docs if they indeed do not mention all the places that need backup, preferably at one place (and include things like “what will I loose if I restart Redis” there). Something like “Sentry State & Persistence” with a “Backups” sub-section.
I would not recommend using GKE volumes for sentry data. The data needs to be shared between all running processes. You should configure the GCS filestore backend.
I have issued a ticket on that repo as well, as the chart has issues with read/write rights across pods. It’s indeed not ideal and I agree it should be GCS. I might fix that in the chart and send a merge request, but I’m waiting to see if it gets picked up by the maintainer.
I also agree that the documentation here is lacking; there is little to no mention of what actually needs backing up to fully restore a Sentry instance.
I know nothing about this helm chart, but tell them in any distributed system, it must have some central file storage. Whether you wanna NFS it yourself, or just offload to S3 or GCS (which is what I’d prefer).
Then you can worry less about backups. On top of that, if you care about your other state full datasets, I would not run them in Kubernetes, just so you have more control over backup strategies. Like, I would not run Postgres, keep it on a VM or use CloudSQL and manage backup options there.
I guess my point is, in Sentry’s case, we generally find a generic “backup script” to be difficult since everyone’s situation is difficult. We also potentially have terabytes of data spread across multiple systems, and each system probably has a better backup strategy. For example, if you’re running Postgres on a VM, disk snapshotting is great and much simpler than a SQL dump or something like that. Same with running systems like Kafka or Clickhouse in the future. It’s just extremely nontrivial to backup the world in a single command, and probably impractical.
I hope this adds some more clarifications. If you need any help with configuration of filestore options, let me know. I don’t think we document them too well, and honestly, maybe not at all for GCS.
It seems that way. Also I did find the docs on GCS file storage, it’s just hard to configure without building a custom helm chart. That pull request looks promising, I’ll try to add some weight to it.
As for Postgres; we are already using the managed option Google provides.