转自：[https://blog.csdn.net/scutshuxue/article/details/6791157](https://blog.csdn.net/scutshuxue/article/details/6791157)

在Postgresql/Greenplum数据库中，聚合函数有两种实现方式：HashAggregate与GroupAggregate。

我们现在通过一个最简单的sql来分析这两种聚合的区别以及其应用场景。

```sql
select count(1) from pg_class group by oid;
```

# 一、两种实现算法的比较：

## HashAggregate

对于hash聚合来说，数据库会根据group by字段后面的值算出hash值，并根据前面使用的聚合函数在内存中维护对应的列表。如果select后面有两个聚合函数，那么在内存中就会维护两个对应的数据。同样的，有n个聚合函数就会维护n个同样的数组。对于hash算法来说，数组的长度肯定是大于group by的字段的distinct值的个数的，且跟这个值应该呈线性关系，group by后面的值越唯一，使用的内存也就越大。

执行计划如下：

```sql
aligputf8=# explain select count(1) from pg_class group by oid;
                              QUERY PLAN                              
----------------------------------------------------------------------
 HashAggregate  (cost=1721.40..2020.28 rows=23910 width=4)
   Group By: oid
   ->  Seq Scan on pg_class  (cost=0.00..1004.10 rows=143460 width=4)
 Settings:  enable_seqscan=on
(4 rows)
```


## GroupAggregate

对于普通聚合函数，使用group聚合，其原理是先将表中的数据按照group by的字段排序，这样子同一个group by的值就在一起，这样就只需要对排好序的数据进行一次全扫描，就可以得到聚合的结果了。

执行计划如下：

```sql
aligputf8=# set enable_hashagg = off;
SET
aligputf8=#  explain select count(1) from pg_class group by oid;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 GroupAggregate  (cost=13291.66..14666.48 rows=23910 width=4)
   Group By: oid
   ->  Sort  (cost=13291.66..13650.31 rows=143460 width=4)
         Sort Key: oid
         ->  Seq Scan on pg_class  (cost=0.00..1004.10 rows=143460 width=4)
 Settings:  enable_hashagg=off; enable_seqscan=on
(6 rows)
```

从上面的两个执行计划的cost来说，GroupAggregate 由于需要排序，效率很差，消耗是HashAggregate的7倍。所以在GP里面，对于这种聚合函数的使用，采用的都是HashAggregate。

# 二、两种实现的内存消耗

先建立一张测试表，并且往里面insert数据，通过每个字段的数据唯一性不一致，还有聚合函数的个数来观察HashAggregate与GroupAggregate在内存的消耗情况以及实际的计算时间的比较。

1.表结构如下：

```sql
create table test_group(
 id   integer
,col1 numeric
,col2 numeric
,col3 numeric
,col4 numeric
,col5 numeric
,col6 numeric
,col7 numeric
,col8 numeric
,col9 numeric
,col11 varchar(100)
,col12 varchar(100)
,col13 varchar(100)
,col14 varchar(100)
)distributed by(id);
```

2.插入数据，通过random函数，实现每个字段数据的唯一性不一样

```sql
aligputf8=# insert into test_group 
aligputf8-# select generate_series(1,100000),
aligputf8-#        (random()*200)::int,
aligputf8-#        (random()*800)::int,
aligputf8-#        (random()*1600)::int,
aligputf8-#        (random()*3200)::int,
aligputf8-#        (random()*6400)::int,
aligputf8-#        (random()*12800)::int,
aligputf8-#        (random()*40000)::int,
aligputf8-#        (random()*100000)::int,
aligputf8-#        (random()*1000000)::int,
aligputf8-#        'hello',
aligputf8-#        'welcome',
aligputf8-#        'haha',
aligputf8-#        'chen';
INSERT 0 100000
```

 表大小为：

```sql
aligputf8=# select pg_size_pretty(pg_relation_size('test_group'));
 pg_size_pretty 
----------------
 12 MB
(1 row)
```

3.使用explain analyze来观察实际数据库消耗的内存差异：

以下是底层单个节点来计算的，避免了广播的时间跟内存消耗

## HashAggregate

```sql
aligputf8=#  explain analyze select sum(col1),sum(col2),sum(col3),sum(col4),sum(col5),sum(col6),sum(col7),sum(col8),sum(col9) from test_group group by col5;
                                             QUERY PLAN                                            
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=4186.96..5432.88 rows=38336 width=62)
   Group By: col5
   Rows out:  6401 rows with 289 ms to first row, 295 ms to end, start offset by 0.143 ms.
   Executor memory:  2818K bytes.
   ->  Seq Scan on test_group  (cost=0.00..1480.56 rows=108256 width=62)
         Rows out:  100000 rows with 0.023 ms to first row, 48 ms to end, start offset by 0.218 ms.
 Slice statistics:
   (slice0)    Executor memory: 2996K bytes.
 Settings:  enable_seqscan=off
 Total runtime: 296.283 ms
(10 rows)
```

##  GroupAggregate

```sql
aligputf8=#  explain analyze select sum(col1),sum(col2),sum(col3),sum(col4),sum(col5),sum(col6),sum(col7),sum(col8),sum(col9) from test_group group by col5;
                                                QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=10532.97..14755.93 rows=38336 width=62)
   Group By: col5
   Rows out:  6401 rows with 306 ms to first row, 585 ms to end, start offset by 0.092 ms.
   Executor memory:  8K bytes.
   ->  Sort  (cost=10532.97..10803.61 rows=108256 width=62)
         Sort Key: col5
         Rows out:  100000 rows with 306 ms to first row, 342 ms to end, start offset by 0.093 ms.
         Executor memory:  19449K bytes.
         Work_mem used:  19449K bytes.
         ->  Seq Scan on test_group  (cost=0.00..1480.56 rows=108256 width=62)
               Rows out:  100000 rows with 0.021 ms to first row, 46 ms to end, start offset by 0.116 ms.
 Slice statistics:
   (slice0)    Executor memory: 19623K bytes.  Work_mem: 19449K bytes max.
 Settings:  enable_hashagg=off; enable_seqscan=off
 Total runtime: 586.114 ms
(15 rows)
```

通过这种方法，可以看出，消耗的内存跟实际执行时间的比例：

SQL:

```sql
explain analyze 
select sum(col1),sum(col2),sum(col3),sum(col4),sum(col5),sum(col6),sum(col7),sum(col8),sum(col9) 
from test_group 
group by id;
```

 
| 9个聚合函数    |                 |        |        |        |        |        |        |        |        |        |        |
| -------------- | --------------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
|                | group by字段    | col1   | col2   | col3   | col4   | col5   | col6   | col7   | col8   | col9   | id     |
| HashAggregate  | Executor memory | 554K   | 786K   | 1074K  | 1715K  | 2996K  | 5469K  | 13691K | 21312K | 29428K | 29476K |
|                | 时间（ms）      | 266    | 272    | 275    | 281    | 296    | 323    | 357    | 359    | 352    | 340    |
|                |                 |        |        |        |        |        |        |        |        |        |        |
| GroupAggregate | Executor memory | 19623K | 19623K | 19623K | 19623K | 19623K | 19623K | 19623K | 19623K | 19623K | 19615K |
|                | 时间（ms）      | 500    | 533    | 547    | 568    | 589    | 609    | 636    | 652    | 649    | 387    |

 SQL:

| 27个聚合函数   |                 |        |        |        |        |        |        |         |         |         |         |
| -------------- | --------------- | ------ | ------ | ------ | ------ | ------ | ------ | ------- | ------- | ------- | ------- |
|                | group by字段    | col1   | col2   | col3   | col4   | col5   | col6   | col7    | col8    | col9    | id      |
| HashAggregate  | Executor memory | 514K   | 1299K  | 2340K  | 4405K  | 8504K  | 19687K | 69947K  | 93859K  | 106419K | 106876K |
|                | 时间（ms）      | 504.91 | 511.03 | 523.36 | 559.85 | 616.94 | 937.73 | 1179.05 | 1395.56 | 1391.27 | 1391.14 |
|                |                 |        |        |        |        |        |        |         |         |         |         |
| GroupAggregate | Executor memory | 19687K | 19687K | 19687K | 19687K | 19687K | 19687K | 19687K  | 19687K  | 19687K  | 19687K  |
|                | 时间（ms）      | 759.58 | 782.56 | 802.4  | 838.07 | 880.38 | 939.52 | 1104.75 | 1256.92 | 1365.61 | 1142    |

```sql
explain analyze 
select sum(col1),sum(col2),sum(col3),sum(col4),sum(col5),sum(col6),sum(col7),sum(col8),sum(col9),
max(col1),max(col2),max(col3),max(col4),max(col5),max(col6),max(col7),max(col8),max(col9),
avg(col1),avg(col2),avg(col3),avg(col4),avg(col5),avg(col6),avg(col7),avg(col8),avg(col9) 
from test_group 
group by id;
```

可以看出，对于GroupAggregate来说，消耗的内存基本上是恒定的，无论group by哪个字段。当聚合函数较少的时候，速度也相对较慢，但是相对稳定。

HashAggregate在少数聚合函数是表现优异，但是很多聚合函数，性能跟消耗的内存差异很明显。尤其是受group by字段的唯一性很明显，字段count（district）值越大，hash聚合消耗的内存越多，性能下降剧烈。

所以在sql中有大量聚合函数，group by 的字段由相对比较唯一的时候，应该用GroupAggregate，而不能用HashAggregate。

# 三、在GP4.1出现的SQL报错

在GP4.1中，之前出现过 有大量聚合函数，并且group by 的字段由相对比较唯一的SQL报错如下：

ERROR: Unexpected internal error: Segment process received signal SIGSEGV 

这个sql其实应该就是占用内存太多，进程被操作系统发出信号干掉导致的报错。

查看执行计划，发现是HashAggregate搞得鬼。一般来说，数据库会根据统计信息来选择HashAggregate或者是GroupAggregate，但是有可能统计信息不够详细或者sql太复杂而选错执行计划。

一般遇到这种问题，有两张办法：

1. 拆分成多个sql来跑，减少HashAggregate使用的内存.
2. 在跑sql之前，先执行enable_hashagg = off;将hash聚合参数关掉。强制不走HashAggregate，建议用这种。

下次如果再遇到这种sql报错，建议采用这种方法改一下脚本试一下。

注：当work_mem不够内存使用时：

```sql
aligputf8=# explain analyze select sum(col1),sum(col2),sum(col3),sum(col4),sum(col5),sum(col6),sum(col7),sum(col8),sum(col9),   
max(col1),max(col2),max(col3),max(col4),max(col5),max(col6),max(col7),max(col8),max(col9), 
avg(col1),avg(col2),avg(col3),avg(col4),avg(col5),avg(col6),avg(col7),avg(col8),avg(col9) from test_group group by id;
                                             QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=15225.85..29783.06 rows=108256 width=66)
   Group By: id
   Rows out:  100000 rows with 722 ms to first row, 1367 ms to end, start offset by 0.125 ms.
   Executor memory:  32536K bytes.
   Work_mem used:  32001K bytes.
   Work_mem wanted: 106876K bytes to lessen workfile I/O.
   100000 groups total in 32 batches; 1 overflows; 100000 spill groups.
   Hash chain length 1.8 avg, 20 max, using 74100 of 135168 buckets.
   ->  Seq Scan on test_group  (cost=0.00..1480.56 rows=108256 width=66)
         Rows out:  100000 rows with 0.016 ms to first row, 51 ms to end, start offset by 0.142 ms.
 Slice statistics:
   (slice0)  * Executor memory: 32697K bytes.  Work_mem: 32001K bytes max, 106876K bytes wanted.
 Settings:  enable_groupagg=off; enable_hashagg=on; enable_seqscan=off; work_mem=32000kB
 Total runtime: 1391.138 ms
(14 rows)
```

当work_mem足够时：

```sql
                                             QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=9058.48..17448.32 rows=108256 width=66)
   Group By: id
   Rows out:  100000 rows with 460 ms to first row, 1014 ms to end, start offset by 0.120 ms.
   Executor memory:  110093K bytes.
   ->  Seq Scan on test_group  (cost=0.00..1480.56 rows=108256 width=66)
         Rows out:  100000 rows with 0.019 ms to first row, 52 ms to end, start offset by 0.554 ms.
 Slice statistics:
   (slice0)    Executor memory: 110271K bytes.
 Settings:  enable_groupagg=off; enable_hashagg=on; enable_seqscan=off; work_mem=320000kB
 Total runtime: 1038.209 ms
 
(10 rows)
```

# 四、PS

PostgreSQL v13 beta 1 开始支持 HashAggregate 溢出到磁盘。

> PostgreSQL 13 can now use disk storage for hash aggregation (used as part of aggregate queries) with large aggregation sets.

请参考：[PostgreSQL 13 Beta 1 Released!](https://www.postgresql.org/about/news/postgresql-13-beta-1-released-2040/)