本文适用于PostgreSQL 14。
PosgreSQL 9.6开始支持并行查询(Parallel Query)[1],这个版本的对并行的支持还非常有限,之后的发行版本不断进行了完善。不同于多线程模式,PostgreSQL采用多进程模式,并行查询将把一条查询任务拆分给多个 worker 分布执行。因此,需要解决不同进程之间数据共享问题。为此,PostgreSQL引入了动态共享内存(DSM,Dynamic Shared Memory)机制。
守护进程 PostMaster 初始化阶段初始化共享内存信息,代码主要集中在 CreateSharedMemoryAndSemaphores 函数,代码片段:
void CreateSharedMemoryAndSemaphores(void) { PGShmemHeader *shim = NULL; if (!IsUnderPostmaster) { Size size; size = 100000; size = add_size(size, PGSemaphoreShmemSize(numSemas)); size = add_size(size, SpinlockSemaSize()); size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE, sizeof(ShmemIndexEnt))); size = add_size(size, dsm_estimate_size()); …… /* * Create the shmem segment */ seghdr = PGSharedMemoryCreate(size, &shim); } …… /* Initialize dynamic shared memory facilities. */ if (!IsUnderPostmaster) dsm_postmaster_startup(shim); }
所有DSM段由 dsm_control 维护,数据结构如下:
/* Shared-memory state for a dynamic shared memory segment. */ typedef struct dsm_control_item { dsm_handle handle; uint32 refcnt; /* 2+ = active, 1 = moribund, 0 = gone */ void *impl_private_pm_handle; /* only needed on Windows */ bool pinned; } dsm_control_item; /* Layout of the dynamic shared memory control segment. */ typedef struct dsm_control_header { uint32 magic; uint32 nitems; uint32 maxitems; dsm_control_item item[FLEXIBLE_ARRAY_MEMBER]; } dsm_control_header; static dsm_control_header *dsm_control;
一个DSM段可能会被分有很多内存片(chunk),这些 chunk 使用“目录(TOC,Table of Contents)”进行检索和管理。
typedef struct shm_toc_entry { uint64 key; /* Arbitrary identifier */ Size offset; /* Offset, in bytes, from TOC start */ } shm_toc_entry; struct shm_toc { uint64 toc_magic; /* Magic number identifying this TOC */ slock_t toc_mutex; /* Spinlock for mutual exclusion */ Size toc_total_bytes; /* Bytes managed by this TOC */ Size toc_allocated_bytes; /* Bytes allocated of those managed */ uint32 toc_nentry; /* Number of entries in TOC */ shm_toc_entry toc_entry[FLEXIBLE_ARRAY_MEMBER]; };
关注 InitializeParallelDSM 这个函数,在算子需要并行执行时,首先会为并行上下文初始化一个DSM段。
InitializeParallelDSM 创建和使用DSM过程:
/* * Tools for estimating how large a chunk of shared memory will be needed * to store a TOC and its dependent objects. Note: we don't really support * large numbers of keys, but it's convenient to declare number_of_keys * as a Size anyway. */ typedef struct { Size space_for_chunks; Size number_of_keys; } shm_toc_estimator;
并行查询的DSM段TOC保存在 ParallelContext 中:
typedef struct ParallelContext { shm_toc_estimator estimator; dsm_segment *seg; void *private_memory; shm_toc *toc; …… } ParallelContext;
ParallelContext::toc 实际就是 dsm_create 函数内部 mmap 的共享内存首地址,因此,shm_toc内存布局实际就是DSM段的内存布局。如下图所示,chunk 内存使用在DSM段中从后向前分配,而TOC项是从前向后分配内存。