ClickHouse 18.12.13-2018-09-10版本新特性解析

ClickHouse的发版速度是众所周知的快
在最近,他们正式发出了18.12.13-2018-09-10版本
相关的CHANGELOG更是多的吓人
为了能够更好的使用新版特性,特做了详细的介绍

其中新特性部分,为人工翻译、校对,毕竟这部分内容是重点,后面为Google翻译

新特性列表(按照GitHub中CHANGELOG顺序)

支持Decimal

  • Added the DECIMAL(digits, scale) data type (Decimal32(scale), Decimal64(scale), Decimal128(scale)). To enable it, use the setting allow_experimental_decimal_type. #2846 #2970 #3008 #3047
1
2
3
4
5
6
7
8
9
10
SELECT *
FROM data_type_families
WHERE name LIKE '%De%'
┌─name───────┬─case_insensitive─┬─alias_to─┐
│ Decimal32 │ 1 │ │
│ Decimal64 │ 1 │ │
│ Decimal128 │ 1 │ │
│ Decimal │ 1 │ │
└────────────┴──────────────────┴──────────┘

新的WITH ROLLUP修饰符GROUP BY,替代语法 GROUP BY ROLLUP(…)

  • New WITH ROLLUP modifier for GROUP BY (alternative syntax: GROUP BY ROLLUP(…)). #2948

JOIN查询会把*解析为字段

  • In requests with JOIN, the star character expands to a list of columns in all tables, in compliance with the SQL standard. You can restore the old behavior by setting - asterisk_left_columns_only to 1 on the user configuration level. Winter Zhang

JOIN支持table functions(remote/merge/numbers/url)

  • Added support for JOIN with table functions. Winter Zhang

终端支持tab自动补全

  • Autocomplete by pressing Tab in clickhouse-client. Sergey Shcherbin

终端支持ctrl c取消输入

  • Ctrl+C in clickhouse-client clears a query that was entered. #2877

可指定默认的JOIN行为

  • Added the join_default_strictness setting (values: “, ‘any’, ‘all’). This allows you to not specify ANY or ALL for JOIN. #2982

server log关联查询ID

  • Each line of the server log related to query processing shows the query ID. #2482
1
2
3
4
5
6
7
8
9
10
11
12
13
14
2018.09.15 23:01:12.934700 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: (from xx.xx.80.34:37066, user: user) select * from numbers(100)
2018.09.15 23:01:12.934984 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2018.09.15 23:01:12.935029 [ 86 ] <Trace> SystemLog (system.query_log): Flushing system log
2018.09.15 23:01:12.935036 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: Query pipeline:
Expression
Expression
Limit
Numbers
2018.09.15 23:01:12.935391 [ 277 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> ThreadStatus: Thread 277 exited
2018.09.15 23:01:12.935449 [ 87 ] <Trace> SystemLog (system.query_thread_log): Flushing system log
2018.09.15 23:01:12.935517 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Information> executeQuery: Read 100 rows, 800.00 B in 0.001 sec., 146631 rows/sec., 1.12 MiB/sec.
2018.09.15 23:01:12.935557 [ 275 ] <Debug> MemoryTracker: Peak memory usage (total): 1.00 MiB.
2018.09.15 23:01:12.935572 [ 275 ] <Information> TCPHandler: Processed in 0.001 sec.

终端可以直接print日志

  • Now you can get query execution logs in clickhouse-client (use the send_logs_level setting). With distributed query processing, logs are cascaded from all the servers. #2482
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
SELECT *
FROM system.settings
WHERE name = 'send_logs_level'
┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ send_logs_level │ none │ 0 │ Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' │
└─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
1 rows in set. Elapsed: 0.002 sec.
xx.xx.xx.xx. :) set send_logs_level = 'trace';
SET send_logs_level = 'trace'
Ok.
0 rows in set. Elapsed: 0.001 sec.
xx.xx.xx.xx. :) select * from system.settings where name = 'send_logs_level' ;
SELECT *
FROM system.settings
WHERE name = 'send_logs_level'
[xx.xx.xx.xx.] 2018.09.15 23:30:51.158294 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: (from 127.0.0.1:59056, user: user) select * from system.settings where name = 'send_logs_level'
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159056 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159152 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: Query pipeline:
Expression
Expression
Filter
One
┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ send_logs_level │ trace │ 1 │ Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' │
└─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159654 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 398 ] <Trace> ThreadStatus: Thread 398 exited
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159803 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Information> executeQuery: Read 178 rows, 25.15 KiB in 0.001 sec., 127014 rows/sec., 17.53 MiB/sec.
# 分布式查询的日志也会打到当前终端

记录setting行为到query_log

  • The system.query_log and system.processes (SHOW PROCESSLIST) tables now have information about all changed settings when you run a query (the nested structure of the Settings data). - Added the log_query_settings setting. #2482
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
SELECT *
FROM system.query_log
ORDER BY event_time DESC
LIMIT 1
Row 1:
──────
type: 1
event_date: 2018-09-15
event_time: 2018-09-15 23:42:17
query_start_time: 2018-09-15 23:42:17
query_duration_ms: 0
read_rows: 0
read_bytes: 0
written_rows: 0
written_bytes: 0
result_rows: 0
result_bytes: 0
memory_usage: 0
query: select * from system.metrics
exception:
stack_trace:
is_initial_query: 1
user: user
query_id: 0881d528-a79c-4bd7-8c0a-38ce270b95f1
address: ��
M�
port: 33384
initial_user: user
initial_query_id: 0881d528-a79c-4bd7-8c0a-38ce270b95f1
initial_address: ��
M�
initial_port: 33384
interface: 2
os_user:
client_hostname:
client_name:
client_revision: 0
client_version_major: 0
client_version_minor: 0
client_version_patch: 0
http_method: 1
http_user_agent: Go-http-client/1.1
quota_key:
revision: 54407
thread_numbers: []
ProfileEvents.Names: []
ProfileEvents.Values: []
Settings.Names: ['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','readonly','max_memory_usage']
Settings.Values: ['48','0','64','random','1','1','32212254720']

记录线程数

  • The system.query_log and system.processes tables now show information about the number of threads that are participating in query execution (see the thread_numbers column). #2482
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
SELECT thread_numbers
FROM system.query_log
ORDER BY event_time DESC
LIMIT 10
┌─thread_numbers─┐
│ [] │
│ [88] │
│ [] │
│ [77,173] │
│ [] │
│ [] │
│ [77,174] │
│ [] │
│ [77,171] │
│ [] │
└────────────────┘
SELECT thread_numbers
FROM system.processes
LIMIT 10
┌─thread_numbers─┐
│ [77] │
└────────────────┘

增加进程统计信息

  • Added ProfileEvents counters that measure the time spent on reading and writing over the network and reading and writing to disk, the number of network errors, and the time spent - waiting when network bandwidth is limited. #2482

  • Added ProfileEventscounters that contain the system metrics from rusage (you can use them to get information about CPU usage in userspace and the kernel, page faults, and context -
    switches), as well as taskstats metrics (use these to obtain information about I/O wait time, CPU wait time, and the amount of data read and recorded, both with and without page c ache). -#2482

  • The ProfileEvents counters are applied globally and for each query, as well as for each query execution thread, which allows you to profile resource consumption by query in detail. #2482

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
SELECT *
FROM system.processes
LIMIT 10
Row 1:
──────
is_initial_query: 1
user: user
query_id: 7e65449f-8899-4c5f-8859-20eeae32d1b1
address: 127.0.0.1
port: 38610
initial_user: user
initial_query_id: 7e65449f-8899-4c5f-8859-20eeae32d1b1
initial_address: 127.0.0.1
initial_port: 38610
interface: 1
os_user: root
client_hostname: xx.xx.xx.xx.
client_name: ClickHouse client
client_version_major: 18
client_version_minor: 12
client_version_patch: 13
client_revision: 54407
http_method: 0
http_user_agent:
quota_key:
elapsed: 0.000800632
is_cancelled: 0
read_rows: 0
read_bytes: 0
total_rows_approx: 0
written_rows: 0
written_bytes: 0
memory_usage: 880
peak_memory_usage: 880
query: select * from system.processes limit 10
thread_numbers: [181]
ProfileEvents.Names: ['Query','SelectQuery','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','ContextLock','RWLockAcquiredReadLocks']
ProfileEvents.Values: [1,1,36,1,10,1,57,3,1]
Settings.Names: ['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','max_memory_usage']
Settings.Values: ['48','0','64','random','1','64424509440']

新增每个查询执行线程的信息

  • Added the system.query_thread_log table, which contains information about each query execution thread. Added the log_query_threads setting. #2482
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
SELECT *
FROM system.query_thread_log
ORDER BY event_time DESC
LIMIT 3
Row 1:
──────
event_date: 2018-09-15
event_time: 2018-09-15 22:29:17
query_start_time: 2018-09-15 22:29:17
query_duration_ms: 4
read_rows: 2178
read_bytes: 1446750
written_rows: 0
written_bytes: 0
memory_usage: 27136
peak_memory_usage: 44996168
thread_name: ParalInputsProc
thread_number: 252
os_thread_id: 19255
master_thread_number: 74
master_os_thread_id: 9227
query: select * from system.query_thread_log order by event_time desc limit 10
is_initial_query: 1
user: user
query_id: 55740e27-e796-4b0f-a9ff-530363f91d76
address: ��
port: 52456
initial_user: user
initial_query_id: 55740e27-e796-4b0f-a9ff-530363f91d76
initial_address: ��
initial_port: 52456
interface: 1
os_user: root
client_hostname: xx.xx.xx.xx.
client_name: ClickHouse client
client_revision: 54407
client_version_major: 18
client_version_minor: 12
client_version_patch: 13
http_method: 0
http_user_agent:
quota_key:
revision: 54407
ProfileEvents.Names: ['FileOpen','ReadBufferFromFileDescriptorRead','ReadBufferFromFileDescriptorReadBytes','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','MarkCacheMisses','CreatedReadBufferOrdinary','DiskReadElapsedMicroseconds','ContextLock','RealTimeMicroseconds','UserTimeMicroseconds','SystemTimeMicroseconds','SoftPageFaults']
ProfileEvents.Values: [80,118,231033,229113,46,1101082,80,42862276,40,40,1135,2,4328,890,2671,678]

新增 system.metrics and system.events

  • The system.metrics and system.events tables now have built-in documentation. #3016
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
SELECT *
FROM system.metrics
┌─metric───────────────────────────────────┬──────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Query │ 1 │ Number of executing queries │
│ Merge │ 0 │ Number of executing background merges │
│ PartMutation │ 0 │ Number of mutations (ALTER DELETE/UPDATE) │
│ ReplicatedFetch │ 0 │ Number of data parts fetching from replica │
│ ReplicatedSend │ 0 │ Number of data parts sending to replicas │
│ ReplicatedChecks │ 0 │ Number of data parts checking for consistency
SELECT *
FROM system.events
LIMIT 10
┌─event───────────────────────────────────┬────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Query │ 127 │ Number of queries started to be interpreted and maybe executed. Does not include queries that are failed to parse, that are rejected due to AST size limits; rejected due to quota limits or limits on number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. │
│ SelectQuery │ 124 │ Same as Query, but only for SELECT queries. │
│ FileOpen │ 54504 │ Number of files opened. │
│ Seek │ 47 │ Number of times the 'lseek' function was called.

新增arrayEnumerateDense函数

  • Added the arrayEnumerateDense function. Amos Bird
1
2
3
4
5
6
SELECT arrayEnumerateDense([(1, 2), (3, 4), (1, 2), (1, 2), (2, 3), (2, 3)])
┌─arrayEnumerateDense(array(tuple(1, 2), tuple(3, 4), tuple(1, 2), tuple(1, 2), tuple(2, 3), tuple(2, 3)))─┐
│ [1,2,1,1,3,3] │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

新增arrayCumSumNonNegativ/arrayDifference函数

  • Added the arrayCumSumNonNegative and arrayDifference functions. Aleksey Studnev

  • 似乎还没上,无法使用。

新增retention函数

  • Added the retention aggregate function. Sundy Li
1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT
uid,
retention(date = '2018-08-06', date = '2018-08-07', date = '2018-08-08') AS r
FROM retention_test
WHERE date IN ('2018-08-06', '2018-08-07', '2018-08-08')
GROUP BY uid
ORDER BY uid ASC
LIMIT 3
┌─uid─┬─r───────┐
│ 0 │ [1,1,1] │
│ 1 │ [1,1,1] │
│ 2 │ [1,1,1] │
└─────┴─────────┘

states函数可以使用加号云算法

  • Now you can add (merge) states of aggregate functions by using the plus operator, and multiply the states of aggregate functions by a nonnegative constant. #3062 #3034
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
CREATE TABLE add_aggregate
(
a UInt32,
b UInt32
)
ENGINE = Memory
INSERT INTO add_aggregate VALUES(1, 2);
INSERT INTO add_aggregate VALUES(3, 1);
SELECT minMerge(x)
FROM
(
SELECT minState(a) + minState(b) AS x
FROM add_aggregate
)
┌─minMerge(x)─┐
│ 1 │
└─────────────┘

虚拟列

  • Tables in the MergeTree family now have the virtual column _partition_id. #3089
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
SELECT _partition_id
FROM test.partition_id
ORDER BY _partition_id ASC
┌─_partition_id─┐
│ 197004 │
│ 197004 │
│ 197007 │
│ 197007 │
│ 197010 │
│ 197010 │
│ 201809 │
│ 201809 │
│ 201809 │
└───────────────┘
SELECT *
FROM test.partition_id
ORDER BY d ASC
┌──────────d─┬───x─┐
│ 1970-04-11 │ 1 │
│ 1970-04-11 │ 1 │
│ 1970-07-20 │ 2 │
│ 1970-07-20 │ 2 │
│ 1970-10-28 │ 3 │
│ 1970-10-28 │ 3 │
│ 2018-09-13 │ 100 │
│ 2018-09-14 │ 100 │
│ 2018-09-15 │ 100 │
└────────────┴─────┘

Bug修复:

  • 修复了Dictionary表的问题(抛出Size of offsets doesn't match size of columnUnknown compression method异常)。此错误出现在版本18.10.3中。#2913
  • 修复了合并CollapsingMergeTree表时如果其中一个数据部分为空(这些部分是在合并期间形成或者ALTER DELETE所有数据都已删除)的错误,并且该vertical算法用于合并。#3049
  • 在修正了比赛条件DROPTRUNCATE用于Memory与同时表SELECT,这可能导致服务器崩溃。此错误出现在1.1.54388版本中。#3038
  • 修复了在返回错误时插入Replicated表时数据丢失的可能性Session is expired(可以通过ReplicatedDataLoss度量检测到数据丢失)。版本1.1.54378中发生此错误。#2939 #2949 #2964
  • 修复了一段时间内的段错误JOIN ... ON#3000
  • 修复了WHERE表达式完全由限定列名称组成时的错误搜索列名称,例如WHERE table.column#2994
  • 修复了在执行分布式查询时发生的“未找到列”错误,如果从远程服务器请求包含带有子查询的IN表达式的单个列。#3087
  • 修复了Block structure mismatch in UNION stream: different number of columns分布式查询发生的错误,如果其中一个分片是本地分区而另一个分片不是,并且PREWHERE触发了移动优化。#2226 #3037 #3055 #3065 #3073 #3090 #3093
  • 修复了pointInPolygon非凸多边形的某些情况的函数。#2910
  • 修正了nan与整数比较时的错误结果。#3024
  • 修复了zlib-ng库中可能导致段错误的错误。#2854
  • 修复了插入带有AggregateFunction列的表时的内存泄漏,如果聚合函数的状态不简单(单独分配内存),以及单个插入请求是否导致多个小块。#3084
  • 修复了同时创建和删除相同BufferMergeTree表格时的竞争条件。
  • 修复了比较由某些非平凡类型组成的元组(例如元组)时出现段错误的可能性。#2989
  • 修复了运行某些ON CLUSTER查询时出现段错误的可能性。张冬
  • 修复了数组元素arrayDistinct函数中的错误Nullable#2845 #2937
  • enable_optimize_predicate_expression选项现在可以正确支持案例SELECT *张冬
  • 修复了重新初始化ZooKeeper会话时的段错误。#2917
  • 修复了使用ZooKeeper时潜在的阻塞问题。
  • 修复了在a中添加嵌套数据结构的错误代码SummingMergeTree
  • 在为聚合函数的状态分配内存时,正确地考虑了对齐,这使得在实现聚合函数的状态时可以使用需要对齐的操作。晨星-XC

安全修复:

  • 安全使用ODBC数据源。与ODBC驱动程序的交互使用单独的clickhouse-odbc-bridge进程。第三方ODBC驱动程序中的错误不再导致服务器稳定性或漏洞问题。#2828 #2879 #2886 #2893 #2921
  • 修复了catBoostPool表函数中文件路径的错误验证问题。#2894
  • 系统表(的内容tablesdatabasespartscolumnsparts_columnsmergesmutationsreplicas,和replication_queue)根据用户的配置访问数据库过滤(allow_databases)。张冬

向后不兼容的变化:

  • 在使用JOIN的请求中,星形字符扩展为所有表中的列列表,符合SQL标准。您可以通过asterisk_left_columns_only在用户配置级别设置为1 来恢复旧行为。

构建更改:

  • 现在大多数集成测试都可以通过提交来运行。
  • 代码样式检查也可以通过提交运行。
  • memcpy上CentOS7 / Fedora的建设时,实施正确选择。Etienne Champetier
  • 使用clang进行构建时,-Weverything除了常规警告外,还添加了一些警告-Wall-Wextra -Werror#2957
  • 调试构建使用jemalloc调试选项。
  • 用于与ZooKeeper交互的库的接口被声明为abstract。#2950

Reference

RPM-Download

其他

  • 目前官方增加了很多针对问题排查以及性能监控相关的变量,相应的监控程序也需要升级了。
  • 以clickhouse_exporter为例,它的主要工作原理,就是读取system库

热评文章