转自:https://blog.csdn.net/scutshuxue/article/details/6791157

在Postgresql/Greenplum数据库中,聚合函数有两种实现方式:HashAggregate与GroupAggregate。

我们现在通过一个最简单的sql来分析这两种聚合的区别以及其应用场景。

snippet.sql
SELECT COUNT(1) FROM pg_class GROUP BY oid;

一、两种实现算法的比较:

HashAggregate

对于hash聚合来说,数据库会根据group by字段后面的值算出hash值,并根据前面使用的聚合函数在内存中维护对应的列表。如果select后面有两个聚合函数,那么在内存中就会维护两个对应的数据。同样的,有n个聚合函数就会维护n个同样的数组。对于hash算法来说,数组的长度肯定是大于group by的字段的distinct值的个数的,且跟这个值应该呈线性关系,group by后面的值越唯一,使用的内存也就越大。

执行计划如下:

snippet.sql
aligputf8=# EXPLAIN SELECT COUNT(1) FROM pg_class GROUP BY oid;
                              QUERY PLAN                              
----------------------------------------------------------------------
 HashAggregate  (cost=1721.40..2020.28 ROWS=23910 width=4)
   GROUP BY: oid
   ->  Seq Scan ON pg_class  (cost=0.00..1004.10 ROWS=143460 width=4)
 Settings:  enable_seqscan=ON
(4 ROWS)

GroupAggregate

对于普通聚合函数,使用group聚合,其原理是先将表中的数据按照group by的字段排序,这样子同一个group by的值就在一起,这样就只需要对排好序的数据进行一次全扫描,就可以得到聚合的结果了。

执行计划如下:

snippet.sql
aligputf8=# SET enable_hashagg = off;
SET
aligputf8=#  EXPLAIN SELECT COUNT(1) FROM pg_class GROUP BY oid;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 GroupAggregate  (cost=13291.66..14666.48 ROWS=23910 width=4)
   GROUP BY: oid
   ->  Sort  (cost=13291.66..13650.31 ROWS=143460 width=4)
         Sort KEY: oid
         ->  Seq Scan ON pg_class  (cost=0.00..1004.10 ROWS=143460 width=4)
 Settings:  enable_hashagg=off; enable_seqscan=ON
(6 ROWS)

从上面的两个执行计划的cost来说,GroupAggregate 由于需要排序,效率很差,消耗是HashAggregate的7倍。所以在GP里面,对于这种聚合函数的使用,采用的都是HashAggregate。

二、两种实现的内存消耗

先建立一张测试表,并且往里面insert数据,通过每个字段的数据唯一性不一致,还有聚合函数的个数来观察HashAggregate与GroupAggregate在内存的消耗情况以及实际的计算时间的比较。

1.表结构如下:

snippet.sql
CREATE TABLE test_group(
 id   INTEGER
,col1 NUMERIC
,col2 NUMERIC
,col3 NUMERIC
,col4 NUMERIC
,col5 NUMERIC
,col6 NUMERIC
,col7 NUMERIC
,col8 NUMERIC
,col9 NUMERIC
,col11 VARCHAR(100)
,col12 VARCHAR(100)
,col13 VARCHAR(100)
,col14 VARCHAR(100)
)distributed BY(id);

2.插入数据,通过random函数,实现每个字段数据的唯一性不一样

snippet.sql
aligputf8=# INSERT INTO test_group 
aligputf8-# SELECT generate_series(1,100000),
aligputf8-#        (random()*200)::INT,
aligputf8-#        (random()*800)::INT,
aligputf8-#        (random()*1600)::INT,
aligputf8-#        (random()*3200)::INT,
aligputf8-#        (random()*6400)::INT,
aligputf8-#        (random()*12800)::INT,
aligputf8-#        (random()*40000)::INT,
aligputf8-#        (random()*100000)::INT,
aligputf8-#        (random()*1000000)::INT,
aligputf8-#        'hello',
aligputf8-#        'welcome',
aligputf8-#        'haha',
aligputf8-#        'chen';
INSERT 0 100000

表大小为:

snippet.sql
aligputf8=# SELECT pg_size_pretty(pg_relation_size('test_group'));
 pg_size_pretty 
----------------
 12 MB
(1 ROW)

3.使用explain analyze来观察实际数据库消耗的内存差异:

以下是底层单个节点来计算的,避免了广播的时间跟内存消耗

HashAggregate

snippet.sql
aligputf8=#  EXPLAIN analyze SELECT SUM(col1),SUM(col2),SUM(col3),SUM(col4),SUM(col5),SUM(col6),SUM(col7),SUM(col8),SUM(col9) FROM test_group GROUP BY col5;
                                             QUERY PLAN                                            
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=4186.96..5432.88 ROWS=38336 width=62)
   GROUP BY: col5
   ROWS OUT:  6401 ROWS WITH 289 ms TO FIRST ROW, 295 ms TO END, START offset BY 0.143 ms.
   Executor memory:  2818K bytes.
   ->  Seq Scan ON test_group  (cost=0.00..1480.56 ROWS=108256 width=62)
         ROWS OUT:  100000 ROWS WITH 0.023 ms TO FIRST ROW, 48 ms TO END, START offset BY 0.218 ms.
 Slice statistics:
   (slice0)    Executor memory: 2996K bytes.
 Settings:  enable_seqscan=off
 Total runtime: 296.283 ms
(10 ROWS)

GroupAggregate

snippet.sql
aligputf8=#  EXPLAIN analyze SELECT SUM(col1),SUM(col2),SUM(col3),SUM(col4),SUM(col5),SUM(col6),SUM(col7),SUM(col8),SUM(col9) FROM test_group GROUP BY col5;
                                                QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=10532.97..14755.93 ROWS=38336 width=62)
   GROUP BY: col5
   ROWS OUT:  6401 ROWS WITH 306 ms TO FIRST ROW, 585 ms TO END, START offset BY 0.092 ms.
   Executor memory:  8K bytes.
   ->  Sort  (cost=10532.97..10803.61 ROWS=108256 width=62)
         Sort KEY: col5
         ROWS OUT:  100000 ROWS WITH 306 ms TO FIRST ROW, 342 ms TO END, START offset BY 0.093 ms.
         Executor memory:  19449K bytes.
         Work_mem used:  19449K bytes.
         ->  Seq Scan ON test_group  (cost=0.00..1480.56 ROWS=108256 width=62)
               ROWS OUT:  100000 ROWS WITH 0.021 ms TO FIRST ROW, 46 ms TO END, START offset BY 0.116 ms.
 Slice statistics:
   (slice0)    Executor memory: 19623K bytes.  Work_mem: 19449K bytes MAX.
 Settings:  enable_hashagg=off; enable_seqscan=off
 Total runtime: 586.114 ms
(15 ROWS)

通过这种方法,可以看出,消耗的内存跟实际执行时间的比例:

SQL:

snippet.sql
EXPLAIN analyze 
SELECT SUM(col1),SUM(col2),SUM(col3),SUM(col4),SUM(col5),SUM(col6),SUM(col7),SUM(col8),SUM(col9) 
FROM test_group 
GROUP BY id;
9个聚合函数
————– ————— —— —— —— —— —— —— —— —— —— ——
group by字段 col1 col2 col3 col4 col5 col6 col7 col8 col9 id
HashAggregate Executor memory 554K 786K 1074K 1715K 2996K 5469K 13691K 21312K 29428K 29476K
时间(ms) 266 272 275 281 296 323 357 359 352 340
GroupAggregate Executor memory 19623K 19623K 19623K 19623K 19623K 19623K 19623K 19623K 19623K 19615K
时间(ms) 500 533 547 568 589 609 636 652 649 387

SQL:

27个聚合函数
————– ————— —— —— —— —— —— —— ——- ——- ——- ——-
group by字段 col1 col2 col3 col4 col5 col6 col7 col8 col9 id
HashAggregate Executor memory 514K 1299K 2340K 4405K 8504K 19687K 69947K 93859K 106419K 106876K
时间(ms) 504.91 511.03 523.36 559.85 616.94 937.73 1179.05 1395.56 1391.27 1391.14
GroupAggregate Executor memory 19687K 19687K 19687K 19687K 19687K 19687K 19687K 19687K 19687K 19687K
时间(ms) 759.58 782.56 802.4 838.07 880.38 939.52 1104.75 1256.92 1365.61 1142
snippet.sql
EXPLAIN analyze 
SELECT SUM(col1),SUM(col2),SUM(col3),SUM(col4),SUM(col5),SUM(col6),SUM(col7),SUM(col8),SUM(col9),
MAX(col1),MAX(col2),MAX(col3),MAX(col4),MAX(col5),MAX(col6),MAX(col7),MAX(col8),MAX(col9),
avg(col1),avg(col2),avg(col3),avg(col4),avg(col5),avg(col6),avg(col7),avg(col8),avg(col9) 
FROM test_group 
GROUP BY id;

可以看出,对于GroupAggregate来说,消耗的内存基本上是恒定的,无论group by哪个字段。当聚合函数较少的时候,速度也相对较慢,但是相对稳定。

HashAggregate在少数聚合函数是表现优异,但是很多聚合函数,性能跟消耗的内存差异很明显。尤其是受group by字段的唯一性很明显,字段count(district)值越大,hash聚合消耗的内存越多,性能下降剧烈。

所以在sql中有大量聚合函数,group by 的字段由相对比较唯一的时候,应该用GroupAggregate,而不能用HashAggregate。

三、在GP4.1出现的SQL报错

在GP4.1中,之前出现过 有大量聚合函数,并且group by 的字段由相对比较唯一的SQL报错如下:

ERROR: Unexpected internal error: Segment process received signal SIGSEGV

这个sql其实应该就是占用内存太多,进程被操作系统发出信号干掉导致的报错。

查看执行计划,发现是HashAggregate搞得鬼。一般来说,数据库会根据统计信息来选择HashAggregate或者是GroupAggregate,但是有可能统计信息不够详细或者sql太复杂而选错执行计划。

一般遇到这种问题,有两张办法:

  1. 拆分成多个sql来跑,减少HashAggregate使用的内存.
  2. 在跑sql之前,先执行enable_hashagg = off;将hash聚合参数关掉。强制不走HashAggregate,建议用这种。

下次如果再遇到这种sql报错,建议采用这种方法改一下脚本试一下。

注:当work_mem不够内存使用时:

snippet.sql
aligputf8=# EXPLAIN analyze SELECT SUM(col1),SUM(col2),SUM(col3),SUM(col4),SUM(col5),SUM(col6),SUM(col7),SUM(col8),SUM(col9),   
MAX(col1),MAX(col2),MAX(col3),MAX(col4),MAX(col5),MAX(col6),MAX(col7),MAX(col8),MAX(col9), 
avg(col1),avg(col2),avg(col3),avg(col4),avg(col5),avg(col6),avg(col7),avg(col8),avg(col9) FROM test_group GROUP BY id;
                                             QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=15225.85..29783.06 ROWS=108256 width=66)
   GROUP BY: id
   ROWS OUT:  100000 ROWS WITH 722 ms TO FIRST ROW, 1367 ms TO END, START offset BY 0.125 ms.
   Executor memory:  32536K bytes.
   Work_mem used:  32001K bytes.
   Work_mem wanted: 106876K bytes TO lessen workfile I/O.
   100000 groups total IN 32 batches; 1 overflows; 100000 spill groups.
   Hash chain LENGTH 1.8 avg, 20 MAX, USING 74100 OF 135168 buckets.
   ->  Seq Scan ON test_group  (cost=0.00..1480.56 ROWS=108256 width=66)
         ROWS OUT:  100000 ROWS WITH 0.016 ms TO FIRST ROW, 51 ms TO END, START offset BY 0.142 ms.
 Slice statistics:
   (slice0)  * Executor memory: 32697K bytes.  Work_mem: 32001K bytes MAX, 106876K bytes wanted.
 Settings:  enable_groupagg=off; enable_hashagg=ON; enable_seqscan=off; work_mem=32000kB
 Total runtime: 1391.138 ms
(14 ROWS)

当work_mem足够时:

snippet.sql
                                             QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=9058.48..17448.32 ROWS=108256 width=66)
   GROUP BY: id
   ROWS OUT:  100000 ROWS WITH 460 ms TO FIRST ROW, 1014 ms TO END, START offset BY 0.120 ms.
   Executor memory:  110093K bytes.
   ->  Seq Scan ON test_group  (cost=0.00..1480.56 ROWS=108256 width=66)
         ROWS OUT:  100000 ROWS WITH 0.019 ms TO FIRST ROW, 52 ms TO END, START offset BY 0.554 ms.
 Slice statistics:
   (slice0)    Executor memory: 110271K bytes.
 Settings:  enable_groupagg=off; enable_hashagg=ON; enable_seqscan=off; work_mem=320000kB
 Total runtime: 1038.209 ms
 
(10 ROWS)

四、PS

PostgreSQL v13 beta 1 开始支持 HashAggregate 溢出到磁盘。

PostgreSQL 13 can now use disk storage for hash aggregation (used as part of aggregate queries) with large aggregation sets.

请参考:PostgreSQL 13 Beta 1 Released!