Clickhouse sharding and replication are now completed in multi-node state. I know that after sharding, I need to create a shared database and send queries to that database. But in order to query this database I know I need cluster_name. When I looked up the settings on the Snuba side, it seems that there is an environment variable that can only respond to the clickhouse of a single node.
How do I use a clustered clickhouse? Looking for more repo, I checked the following multi node cluster in the snuba test code.
Can I get an environment variable for cluster_name that I can use when querying the clickhouse from snuba? Or is there another way?
When working with a single node setting without an existing cluster_name, a table is created only in one clickhouse, which was set as an endpoint, and sharding or replication was not performed on the other clickhouse nodes.
Basically, I see and apply environment variables that can be used in snuba images from the link below.
An environment variable corresponding to cluster_name is required so that it can be used in a clustered clickhouse.
Currently, when executing install.sh, tables are not created in clickhouses on other nodes except for the clickhouse corresponding to the endpoint.
I found a way to create a table with sharded engine during migration in sentry code.
But I canât understand the instructions in the comments.
ââ"
Provides the SQL for create table operations.
If the operation is being performed on a cluster marked as multi node, the
replicated versions of the tables will be created instead of the regular ones.
The shard and replica values will be taken from the macros configuration for
multi node clusters, so these need to be set prior to running the migration.
If unsharded=True is passed, data will be replicated on every shard of the cluster.
ââ"
How can I check the sharding and replication options and then migrate to use the table with the engine applied?
Hi @seungjinlee. Unfortunately we only support single node ClickHouse installations out of the box currently. Some parts of the snuba codebase may refer to multi node clusters - this is because itâs a feature we started to build out and are planning to support in the future. However this isnât on our immediate roadmap currently so I canât give you a timeframe for it right now.
If you need to run replicated or distributed tables, the only way to do so currently is manually create all of the ClickHouse tables yourself (and keep them up to date each time you update Snuba) - you will not be able to use Snubaâs migration system.
Oh, I see. Still, I managed to solve the problem by creating a separate table like the way you said.
Hereâs how I solved the problem.
Add âCLICKHOUSE_DATABASEâ environment variable on the SNUBA side and receive all sentry schema and data created or migrated to a specific database
You can get the Create query for each table through tabix (I recommend this method) or you can work based on the metadata in the storage connected to the host.
When creating Replicated and Distributed tables, always add âon Clusterâ so that all shards and replicas are also created at the same time.
After creating a separate DATABASE, create the Replicated~, Distributed tables. The default mergeTreeFamily has a corresponding ReplicatedFamily for each, so move them one by one. For example, in the case of ReplacingMergeTree, change it to ReplicatedReplacingMergeTree, etc., and in the case of a simple merge table, MATERIALIZED VIEW, it is kept as it is. (At this time, the combined table is based on the distributed table)
In the case of replicated tables, wrap once more with distributed tables for sharding. Each Distributed table points to a replicated table. In the case of me, the shardKey is the projectId
You can see that it is resharded by inserting the sentry data previously received in the distributed table after all the relevant tasks are finished. (At this time, the name of the distributed table must be the same as the original sentry table name)
Finally, change the âCLICKHOUSE_DATABASEâ environment variable that was changed on the snuba side to the newly created database.
Make sure the data is well sharded and replicated
Of course, as you mentioned, when the version is upgraded, the table schema is changed. Maybe I have to do this every time to upgrade. However, in the production stage, it was determined that sharding was essential, so we proceeded with the work and confirmed that it works without problems. Thank you for answer!