PostgreSQL版本:13.2

一、概要

PG 13 开始支持哈希聚集溢出功能

PostgreSQL v13 beta 1 开始支持 HashAggregate 溢出到磁盘。

PostgreSQL 13 can now use disk storage for hash aggregation (used as part of aggregate queries) with large aggregation sets.

请参考:PostgreSQL 13 Beta 1 Released!

二、验证

测试用例

TPC-H 1s数据库。先查询执行计划:

snippet.sql
tpch1s=# EXPLAIN SELECT SUM(l_quantity), l_partkey, l_suppkey FROM lineitem GROUP BY l_partkey, l_suppkey;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=410884.88..570431.61 ROWS=600139 width=40)
   GROUP KEY: l_partkey, l_suppkey
   ->  Gather MERGE  (cost=410884.88..550927.09 ROWS=1200278 width=40)
         Workers Planned: 2
         ->  Sort  (cost=409884.86..411385.20 ROWS=600139 width=40)
               Sort KEY: l_partkey, l_suppkey
               ->  Partial HashAggregate  (cost=303953.73..335875.20 ROWS=600139 width=40)
                     GROUP KEY: l_partkey, l_suppkey
                     Planned Partitions: 64
                     ->  Parallel Seq Scan ON lineitem  (cost=0.00..137508.81 ROWS=2500581 width=13)
(10 ROWS)

再实际执行,执行过程中,在数据目录中可以看到溢出的临时文件:

snippet.bash
[postgres@yz opt]$ ll pgdata/base/pgsql_tmp/
总用量 589440
-rw------- 1 postgres postgres 129245184 11 09:56 pgsql_tmp123907.2
-rw------- 1 postgres postgres  42131456 11 09:56 pgsql_tmp123907.3
-rw------- 1 postgres postgres 129228800 11 09:56 pgsql_tmp124902.0
……

三、调试

聚集过程中,进入溢出模式的堆栈:

snippet.gdb
(gdb) bt
#0  hash_agg_enter_spill_mode (aggstate=0x2b66a58) at nodeAgg.c:1892
#1  hash_agg_check_limits (aggstate=0x2b66a58) at nodeAgg.c:1879
#2  initialize_hash_entry (aggstate=aggstate@entry=0x2b66a58, entry=0x2bd4780, hashtable=0x2ba0918) at nodeAgg.c:2055
#3  0x000000000062d802 in lookup_hash_entries (aggstate=aggstate@entry=0x2b66a58) at nodeAgg.c:2129
#4  0x000000000062e8d3 in agg_fill_hash_table (aggstate=0x2b66a58) at nodeAgg.c:2563
#5  ExecAgg (pstate=0x2b66a58) at nodeAgg.c:2176
#6  0x0000000000647ec6 in ExecProcNode (node=0x2b66a58) at ../../../src/include/executor/executor.h:248
#7  ExecSort (pstate=0x2b66848) at nodeSort.c:108
#8  0x00000000006357aa in ExecProcNode (node=0x2b66848) at ../../../src/include/executor/executor.h:248
#9  gather_merge_readnext (gm_state=0x2b665a8, reader=<optimized out>, nowait=<optimized out>) at nodeGatherMerge.c:656
#10 0x0000000000635b07 in gather_merge_init (gm_state=0x2b665a8) at nodeGatherMerge.c:490
#11 gather_merge_getnext (gm_state=0x2b665a8) at nodeGatherMerge.c:558
#12 ExecGatherMerge (pstate=0x2b665a8) at nodeGatherMerge.c:267
#13 0x000000000062bb69 in ExecProcNode (node=0x2b665a8) at ../../../src/include/executor/executor.h:248
#14 fetch_input_tuple (aggstate=aggstate@entry=0x2b65f80) at nodeAgg.c:589
#15 0x000000000062e832 in agg_retrieve_direct (aggstate=0x2b65f80) at nodeAgg.c:2368
#16 ExecAgg (pstate=0x2b65f80) at nodeAgg.c:2183
#17 0x000000000061b4f2 in ExecProcNode (node=0x2b65f80) at ../../../src/include/executor/executor.h:248
#18 ExecutePlan (execute_once=<optimized out>, dest=0x2b57e58, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x2b65f80, estate=0x2b65d28) at execMain.c:1632
#19 standard_ExecutorRun (queryDesc=0x2abbf28, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:350
#20 0x00000000007716cb in PortalRunSelect (portal=portal@entry=0x2afc238, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x2b57e58) at pquery.c:912
#21 0x0000000000772948 in PortalRun (portal=portal@entry=0x2afc238, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x2b57e58, altdest=altdest@entry=0x2b57e58, qc=qc@entry=0x7ffd77fbe2a0) at pquery.c:756
#22 0x000000000076e68e in exec_simple_query (query_string=0x2a9a228 "select  sum(l_quantity), l_partkey, l_suppkey from  lineitem group by l_partkey, l_suppkey;") at postgres.c:1239
#23 0x000000000076fa07 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x2ac4300, dbname=0x2a96dd8 "tpch1s", username=<optimized out>) at postgres.c:4329
#24 0x0000000000482149 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4526
#25 BackendStartup (port=0x2abc220) at postmaster.c:4210
#26 ServerLoop () at postmaster.c:1739
#27 0x00000000006fd103 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2a94d20) at postmaster.c:1412
#28 0x0000000000482d8e in main (argc=3, argv=0x2a94d20) at main.c:210

四、相关参数

work_mem

Sets the maximum memory to be used for query workspaces.
This much memory can be used by each internal sort operation and hash table before switching to temporary disk files.

内存敏感算子可用内存大小(KB)。

hash_mem_multiplier

Multiple of work_mem to use for hash tables.

work_mem可以作为hash table的内存比例。