forked from ClickHouse/ClickHouse
-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Describe the unexpected behaviour
gin_post file in a single part can get really huge.
How to reproduce
- create table http_log_local, and insert sample data.
- add gin index
- materialize the index
wait for the mutation activity to finish.
- inspect the total size of data parts on filesystem:
clickhouse@node-1-0-0:~/data/business/http_log_local$ du -sch * | sort -k 1 -h
4.0K detached
4.0K format_version.txt
76K e7efb3c67698badd53ab74b731772f76_10_10_0_21
112K 2ba8abac480f6c1100bffd944c4ecc48_42_65_12_77
128K 03a57d5e8fe0ec0fab9e1c68dc210ef0_0_24_7_35
132K 78e93e4cc3886fa2c9f2ef23967198e7_0_36_10_48
1.9M 26efcc325bb68f310ed15a6a9b2eff95_255_255_0_266
4.1M 1e11287921c2d5c214973c711c58a36f_10609_10609_0
3.5M 983a1a1dfef4543e6d83515d25983333_533_533_0_544
6.1M tmp_mut_300bbf56ad64c576b4d79a9172fcfe31_0_2401_24_5776
59M 0c924ec6272d27d57f728aaf94071b39_4245_4295_3_4306
172M 0c924ec6272d27d57f728aaf94071b39_490_4244_27_4306
206M 0c924ec6272d27d57f728aaf94071b39_194_489_8_4306
868M 983a1a1dfef4543e6d83515d25983333_287_532_10_544
1.2G 300bbf56ad64c576b4d79a9172fcfe31_5738_5765_2_5776
1.5G 983a1a1dfef4543e6d83515d25983333_5_286_6_544
3.6G 26efcc325bb68f310ed15a6a9b2eff95_44_254_4_266
5.2G 300bbf56ad64c576b4d79a9172fcfe31_5579_5737_5_5776
24G 300bbf56ad64c576b4d79a9172fcfe31_2402_5578_22_5776
157G 300bbf56ad64c576b4d79a9172fcfe31_0_2401_24_5775
190G totalnotice that there is an abnormally large part 300bbf56ad64c576b4d79a9172fcfe31_0_2401_24_5775.
clickhouse@node-1-0-0:~/data/business/http_log_local/300bbf56ad64c576b4d79a9172fcfe31_0_2401_24_5775$ du -sch *
8.0K checksums.txt
4.0K columns.txt
4.0K count.txt
4.0K partition.dat
56K primary.idx
268K recordTimestamp.bin
24K recordTimestamp.mrk2
6.3M requestBody.bin
24K requestBody.mrk2
538M requestHead.bin
24K requestHead.mrk2
13M responseBody.bin
24K responseBody.mrk2
17M responseHead.bin
24K responseHead.mrk2
226M skp_idx_ginIndex.gin_dict
158G skp_idx_ginIndex.gin_post
24K skp_idx_ginIndex.gin_seg
4.0K skp_idx_ginIndex.gin_sid
12K skp_idx_ginIndex.idx
28K skp_idx_ginIndex.mrk2
40K tenant.bin
4.0K tenant.dict.bin
24K tenant.dict.mrk2
24K tenant.mrk2
270M uuId.bin
24K uuId.mrk2
160G totalSize of the index file skp_idx_ginIndex.gin_post is so big.
While total size of corresponding data files (I mean the .bin files the index is created against) is about 571MB, how would the index file be amazingly sized 158GB !
-
Which ClickHouse server version to use
custom built ftsearch branch -
CREATE TABLEstatements for all tables involved
CREATE TABLE http_log_local
(
-- irrelevant columns removed
`tenant` LowCardinality(String),
`recordTimestamp` Int64,
`uuId` String,
`requestHead` String,
`responseHead` String,
`requestBody` String,
`responseBody` String,
`rowLog` String DEFAULT concat(requestHead, '--', responseHead, '--', requestBody, '--', responseBody),
INDEX ginIndex rowLog TYPE gin(3) GRANULARITY 1
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/http_log_local', '{replica}')
PARTITION BY (tenant, toYYYYMMDD(toDate(recordTimestamp)))
PRIMARY KEY uuId
ORDER BY (uuId, recordTimestamp)
SETTINGS index_granularity = 8192Metadata
Metadata
Assignees
Labels
No labels