转自:https://blog.csdn.net/scutshuxue/article/details/6791157
在Postgresql/Greenplum数据库中,聚合函数有两种实现方式:HashAggregate与GroupAggregate。
我们现在通过一个最简单的sql来分析这两种聚合的区别以及其应用场景。
SELECT COUNT(1) FROM pg_class GROUP BY oid;
对于hash聚合来说,数据库会根据group by字段后面的值算出hash值,并根据前面使用的聚合函数在内存中维护对应的列表。如果select后面有两个聚合函数,那么在内存中就会维护两个对应的数据。同样的,有n个聚合函数就会维护n个同样的数组。对于hash算法来说,数组的长度肯定是大于group by的字段的distinct值的个数的,且跟这个值应该呈线性关系,group by后面的值越唯一,使用的内存也就越大。
执行计划如下:
aligputf8=# EXPLAIN SELECT COUNT(1) FROM pg_class GROUP BY oid; QUERY PLAN ---------------------------------------------------------------------- HashAggregate (cost=1721.40..2020.28 ROWS=23910 width=4) GROUP BY: oid -> Seq Scan ON pg_class (cost=0.00..1004.10 ROWS=143460 width=4) Settings: enable_seqscan=ON (4 ROWS)
对于普通聚合函数,使用group聚合,其原理是先将表中的数据按照group by的字段排序,这样子同一个group by的值就在一起,这样就只需要对排好序的数据进行一次全扫描,就可以得到聚合的结果了。
执行计划如下:
aligputf8=# SET enable_hashagg = off; SET aligputf8=# EXPLAIN SELECT COUNT(1) FROM pg_class GROUP BY oid; QUERY PLAN ---------------------------------------------------------------------------- GroupAggregate (cost=13291.66..14666.48 ROWS=23910 width=4) GROUP BY: oid -> Sort (cost=13291.66..13650.31 ROWS=143460 width=4) Sort KEY: oid -> Seq Scan ON pg_class (cost=0.00..1004.10 ROWS=143460 width=4) Settings: enable_hashagg=off; enable_seqscan=ON (6 ROWS)
从上面的两个执行计划的cost来说,GroupAggregate 由于需要排序,效率很差,消耗是HashAggregate的7倍。所以在GP里面,对于这种聚合函数的使用,采用的都是HashAggregate。
先建立一张测试表,并且往里面insert数据,通过每个字段的数据唯一性不一致,还有聚合函数的个数来观察HashAggregate与GroupAggregate在内存的消耗情况以及实际的计算时间的比较。
1.表结构如下:
CREATE TABLE test_group( id INTEGER ,col1 NUMERIC ,col2 NUMERIC ,col3 NUMERIC ,col4 NUMERIC ,col5 NUMERIC ,col6 NUMERIC ,col7 NUMERIC ,col8 NUMERIC ,col9 NUMERIC ,col11 VARCHAR(100) ,col12 VARCHAR(100) ,col13 VARCHAR(100) ,col14 VARCHAR(100) )distributed BY(id);
2.插入数据,通过random函数,实现每个字段数据的唯一性不一样
aligputf8=# INSERT INTO test_group aligputf8-# SELECT generate_series(1,100000), aligputf8-# (random()*200)::INT, aligputf8-# (random()*800)::INT, aligputf8-# (random()*1600)::INT, aligputf8-# (random()*3200)::INT, aligputf8-# (random()*6400)::INT, aligputf8-# (random()*12800)::INT, aligputf8-# (random()*40000)::INT, aligputf8-# (random()*100000)::INT, aligputf8-# (random()*1000000)::INT, aligputf8-# 'hello', aligputf8-# 'welcome', aligputf8-# 'haha', aligputf8-# 'chen'; INSERT 0 100000
表大小为:
aligputf8=# SELECT pg_size_pretty(pg_relation_size('test_group')); pg_size_pretty ---------------- 12 MB (1 ROW)
3.使用explain analyze来观察实际数据库消耗的内存差异:
以下是底层单个节点来计算的,避免了广播的时间跟内存消耗
aligputf8=# EXPLAIN analyze SELECT SUM(col1),SUM(col2),SUM(col3),SUM(col4),SUM(col5),SUM(col6),SUM(col7),SUM(col8),SUM(col9) FROM test_group GROUP BY col5; QUERY PLAN ---------------------------------------------------------------------------------------------------- HashAggregate (cost=4186.96..5432.88 ROWS=38336 width=62) GROUP BY: col5 ROWS OUT: 6401 ROWS WITH 289 ms TO FIRST ROW, 295 ms TO END, START offset BY 0.143 ms. Executor memory: 2818K bytes. -> Seq Scan ON test_group (cost=0.00..1480.56 ROWS=108256 width=62) ROWS OUT: 100000 ROWS WITH 0.023 ms TO FIRST ROW, 48 ms TO END, START offset BY 0.218 ms. Slice statistics: (slice0) Executor memory: 2996K bytes. Settings: enable_seqscan=off Total runtime: 296.283 ms (10 ROWS)
aligputf8=# EXPLAIN analyze SELECT SUM(col1),SUM(col2),SUM(col3),SUM(col4),SUM(col5),SUM(col6),SUM(col7),SUM(col8),SUM(col9) FROM test_group GROUP BY col5; QUERY PLAN ---------------------------------------------------------------------------------------------------------- GroupAggregate (cost=10532.97..14755.93 ROWS=38336 width=62) GROUP BY: col5 ROWS OUT: 6401 ROWS WITH 306 ms TO FIRST ROW, 585 ms TO END, START offset BY 0.092 ms. Executor memory: 8K bytes. -> Sort (cost=10532.97..10803.61 ROWS=108256 width=62) Sort KEY: col5 ROWS OUT: 100000 ROWS WITH 306 ms TO FIRST ROW, 342 ms TO END, START offset BY 0.093 ms. Executor memory: 19449K bytes. Work_mem used: 19449K bytes. -> Seq Scan ON test_group (cost=0.00..1480.56 ROWS=108256 width=62) ROWS OUT: 100000 ROWS WITH 0.021 ms TO FIRST ROW, 46 ms TO END, START offset BY 0.116 ms. Slice statistics: (slice0) Executor memory: 19623K bytes. Work_mem: 19449K bytes MAX. Settings: enable_hashagg=off; enable_seqscan=off Total runtime: 586.114 ms (15 ROWS)
通过这种方法,可以看出,消耗的内存跟实际执行时间的比例:
SQL:
EXPLAIN analyze SELECT SUM(col1),SUM(col2),SUM(col3),SUM(col4),SUM(col5),SUM(col6),SUM(col7),SUM(col8),SUM(col9) FROM test_group GROUP BY id;
9个聚合函数 | |||||||||||
————– | ————— | —— | —— | —— | —— | —— | —— | —— | —— | —— | —— |
group by字段 | col1 | col2 | col3 | col4 | col5 | col6 | col7 | col8 | col9 | id | |
HashAggregate | Executor memory | 554K | 786K | 1074K | 1715K | 2996K | 5469K | 13691K | 21312K | 29428K | 29476K |
时间(ms) | 266 | 272 | 275 | 281 | 296 | 323 | 357 | 359 | 352 | 340 | |
GroupAggregate | Executor memory | 19623K | 19623K | 19623K | 19623K | 19623K | 19623K | 19623K | 19623K | 19623K | 19615K |
时间(ms) | 500 | 533 | 547 | 568 | 589 | 609 | 636 | 652 | 649 | 387 |
SQL:
27个聚合函数 | |||||||||||
————– | ————— | —— | —— | —— | —— | —— | —— | ——- | ——- | ——- | ——- |
group by字段 | col1 | col2 | col3 | col4 | col5 | col6 | col7 | col8 | col9 | id | |
HashAggregate | Executor memory | 514K | 1299K | 2340K | 4405K | 8504K | 19687K | 69947K | 93859K | 106419K | 106876K |
时间(ms) | 504.91 | 511.03 | 523.36 | 559.85 | 616.94 | 937.73 | 1179.05 | 1395.56 | 1391.27 | 1391.14 | |
GroupAggregate | Executor memory | 19687K | 19687K | 19687K | 19687K | 19687K | 19687K | 19687K | 19687K | 19687K | 19687K |
时间(ms) | 759.58 | 782.56 | 802.4 | 838.07 | 880.38 | 939.52 | 1104.75 | 1256.92 | 1365.61 | 1142 |
EXPLAIN analyze SELECT SUM(col1),SUM(col2),SUM(col3),SUM(col4),SUM(col5),SUM(col6),SUM(col7),SUM(col8),SUM(col9), MAX(col1),MAX(col2),MAX(col3),MAX(col4),MAX(col5),MAX(col6),MAX(col7),MAX(col8),MAX(col9), avg(col1),avg(col2),avg(col3),avg(col4),avg(col5),avg(col6),avg(col7),avg(col8),avg(col9) FROM test_group GROUP BY id;
可以看出,对于GroupAggregate来说,消耗的内存基本上是恒定的,无论group by哪个字段。当聚合函数较少的时候,速度也相对较慢,但是相对稳定。
HashAggregate在少数聚合函数是表现优异,但是很多聚合函数,性能跟消耗的内存差异很明显。尤其是受group by字段的唯一性很明显,字段count(district)值越大,hash聚合消耗的内存越多,性能下降剧烈。
所以在sql中有大量聚合函数,group by 的字段由相对比较唯一的时候,应该用GroupAggregate,而不能用HashAggregate。
在GP4.1中,之前出现过 有大量聚合函数,并且group by 的字段由相对比较唯一的SQL报错如下:
ERROR: Unexpected internal error: Segment process received signal SIGSEGV
这个sql其实应该就是占用内存太多,进程被操作系统发出信号干掉导致的报错。
查看执行计划,发现是HashAggregate搞得鬼。一般来说,数据库会根据统计信息来选择HashAggregate或者是GroupAggregate,但是有可能统计信息不够详细或者sql太复杂而选错执行计划。
一般遇到这种问题,有两张办法:
下次如果再遇到这种sql报错,建议采用这种方法改一下脚本试一下。
注:当work_mem不够内存使用时:
aligputf8=# EXPLAIN analyze SELECT SUM(col1),SUM(col2),SUM(col3),SUM(col4),SUM(col5),SUM(col6),SUM(col7),SUM(col8),SUM(col9), MAX(col1),MAX(col2),MAX(col3),MAX(col4),MAX(col5),MAX(col6),MAX(col7),MAX(col8),MAX(col9), avg(col1),avg(col2),avg(col3),avg(col4),avg(col5),avg(col6),avg(col7),avg(col8),avg(col9) FROM test_group GROUP BY id; QUERY PLAN ---------------------------------------------------------------------------------------------------- HashAggregate (cost=15225.85..29783.06 ROWS=108256 width=66) GROUP BY: id ROWS OUT: 100000 ROWS WITH 722 ms TO FIRST ROW, 1367 ms TO END, START offset BY 0.125 ms. Executor memory: 32536K bytes. Work_mem used: 32001K bytes. Work_mem wanted: 106876K bytes TO lessen workfile I/O. 100000 groups total IN 32 batches; 1 overflows; 100000 spill groups. Hash chain LENGTH 1.8 avg, 20 MAX, USING 74100 OF 135168 buckets. -> Seq Scan ON test_group (cost=0.00..1480.56 ROWS=108256 width=66) ROWS OUT: 100000 ROWS WITH 0.016 ms TO FIRST ROW, 51 ms TO END, START offset BY 0.142 ms. Slice statistics: (slice0) * Executor memory: 32697K bytes. Work_mem: 32001K bytes MAX, 106876K bytes wanted. Settings: enable_groupagg=off; enable_hashagg=ON; enable_seqscan=off; work_mem=32000kB Total runtime: 1391.138 ms (14 ROWS)
当work_mem足够时:
QUERY PLAN ---------------------------------------------------------------------------------------------------- HashAggregate (cost=9058.48..17448.32 ROWS=108256 width=66) GROUP BY: id ROWS OUT: 100000 ROWS WITH 460 ms TO FIRST ROW, 1014 ms TO END, START offset BY 0.120 ms. Executor memory: 110093K bytes. -> Seq Scan ON test_group (cost=0.00..1480.56 ROWS=108256 width=66) ROWS OUT: 100000 ROWS WITH 0.019 ms TO FIRST ROW, 52 ms TO END, START offset BY 0.554 ms. Slice statistics: (slice0) Executor memory: 110271K bytes. Settings: enable_groupagg=off; enable_hashagg=ON; enable_seqscan=off; work_mem=320000kB Total runtime: 1038.209 ms (10 ROWS)
PostgreSQL v13 beta 1 开始支持 HashAggregate 溢出到磁盘。
PostgreSQL 13 can now use disk storage for hash aggregation (used as part of aggregate queries) with large aggregation sets.