转自:https://developer.aliyun.com/article/7593
使用greenplum时,如果需要调用一个函数,这个函数很可能就在master执行,而不会跑到segment上去执行。
例如 random()函数。
通过select random()来调用的话,不需要将这条SQL发送到segment节点,所以执行计划如下,没有gather motion的过程。
- snippet.sql
postgres=# EXPLAIN analyze SELECT random(); QUERY PLAN ---------------------------------------------------------------------------------------- RESULT (cost=0.01..0.02 ROWS=1 width=0) ROWS OUT: 1 ROWS WITH 0.017 ms TO END, START offset BY 0.056 ms. InitPlan -> RESULT (cost=0.00..0.01 ROWS=1 width=0) ROWS OUT: 1 ROWS WITH 0.004 ms TO END OF 2 scans, START offset BY 0.059 ms. Slice statistics: (slice0) Executor memory: 29K bytes. (slice1) Executor memory: 29K bytes. Statement statistics: Memory used: 128000K bytes Total runtime: 0.074 ms (11 ROWS)
如果要让这个函数在segment执行,怎么办呢?
通过gp_dist_random('gp_id')来调用,gp_dist_random的参数是一个可查询的视图,或表。
- snippet.sql
postgres=# EXPLAIN analyze SELECT random() FROM gp_dist_random('gp_id'); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 240:1 (slice1; segments: 240) (cost=0.00..4.00 ROWS=240 width=0) ROWS OUT: 240 ROWS at destination WITH 6.336 ms TO FIRST ROW, 59 ms TO END, START offset BY 4195 ms. -> Seq Scan ON gp_id (cost=0.00..4.00 ROWS=1 width=0) ROWS OUT: Avg 1.0 ROWS x 240 workers. MAX 1 ROWS (seg0) WITH 0.073 ms TO FIRST ROW, 0.075 ms TO END, START offset BY 4207 ms. Slice statistics: (slice0) Executor memory: 471K bytes. (slice1) Executor memory: 163K bytes avg x 240 workers, 163K bytes MAX (seg0). Statement statistics: Memory used: 128000K bytes Total runtime: 4279.445 ms (10 ROWS)
gp_id在每个segment中都有一条记录,所以以上SQL会在每个SEGMENT中调用一次random()并返回所有结果,例如我的测试环境中有240个segment, 那么以上SQL将返回240条记录。
在gp_id的定义中,介绍了gp_dist_random用它可以做一些管理的工作:
譬如查询数据库的大小,查询表的大小,其实都是这样统计的。
- snippet.c
/*------------------------------------------------------------------------- * * gp_id.h * definition of the system "database identifier" relation (gp_dbid) * along with the relation's initial contents. * * Copyright (c) 2009-2010, Greenplum inc * * NOTES * Historically this table was used to supply every segment with its * identification information. However in the 4.0 release when the file * replication feature was added it could no longer serve this purpose * because it became a requirement for all tables to have the same physical * contents on both the primary and mirror segments. To resolve this the * information is now passed to each segment on startup based on the * gp_segment_configuration (stored on the master only), and each segment * has a file in its datadirectory (gp_dbid) that uniquely identifies the * segment. * * The contents of the table are now irrelevant, with the exception that * several tools began relying on this table for use as a method of remote * function invocation via gp_dist_random('gp_id') due to the fact that this * table was guaranteed of having exactly one row on every segment. The * contents of the row have no defined meaning, but this property is still * relied upon. */ #ifndef _GP_ID_H_ #define _GP_ID_H_ #include "catalog/genbki.h" /* * Defines for gp_id table */ #define GpIdRelationName "gp_id" /* TIDYCAT_BEGINFAKEDEF CREATE TABLE gp_id with (shared=true, oid=false, relid=5001, content=SEGMENT_LOCAL) ( gpname name , numsegments smallint , dbid smallint , content smallint ); TIDYCAT_ENDFAKEDEF */
查询数据库大小的GP函数
- snippet.sql
postgres=# \df+ pg_database_size List OF functions Schema | Name | RESULT DATA TYPE | Argument DATA types | TYPE | DATA access | Volatility | Owner | LANGUAGE | SOURCE code | Description ------------+------------------+------------------+---------------------+--------+----------------+------------+----------+----------+-----------------------+------------------------------------------------------------- pg_catalog | pg_database_size | BIGINT | name | normal | reads SQL DATA | volatile | dege.zzz | internal | pg_database_size_name | Calculate total disk SPACE usage FOR the specified DATABASE pg_catalog | pg_database_size | BIGINT | oid | normal | reads SQL DATA | volatile | dege.zzz | internal | pg_database_size_oid | Calculate total disk SPACE usage FOR the specified DATABASE (2 ROWS)
其中pg_database_size_name 的源码如下:
很明显,在统计数据库大小时也用到了select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');
- snippet.c
Datum pg_database_size_name(PG_FUNCTION_ARGS) { int64 size = 0; Name dbName = PG_GETARG_NAME(0); Oid dbOid = get_database_oid(NameStr(*dbName)); if (!OidIsValid(dbOid)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_DATABASE), errmsg("database \"%s\" does not exist", NameStr(*dbName)))); size = calculate_database_size(dbOid); if (Gp_role == GP_ROLE_DISPATCH) { StringInfoData buffer; initStringInfo(&buffer); appendStringInfo(&buffer, "select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');", NameStr(*dbName)); size += get_size_from_segDBs(buffer.data); } PG_RETURN_INT64(size); }
不信我们可以直接查询这个SQL,和使用pg_database_size函数得到的结果几乎是一样的,只差了calculate_database_size的部分。
- snippet.sql
postgres=# SELECT SUM(pg_database_size('postgres'))::int8 FROM gp_dist_random('gp_id'); SUM ---------------- 16006753522624 (1 ROW) postgres=# SELECT pg_database_size('postgres'); pg_database_size ------------------ 16006763924106 (1 ROW)
gp_dist_random('gp_id')本质上就是在所有节点查询gp_id,
gp_dist_random('pg_authid')就是在所有节点查询pg_authid,
例如:
- snippet.sql
postgres=# SELECT * FROM gp_dist_random('gp_id'); gpname | numsegments | dbid | content -----------+-------------+------+--------- Greenplum | -1 | -1 | -1 Greenplum | -1 | -1 | -1 Greenplum | -1 | -1 | -1 Greenplum | -1 | -1 | -1 Greenplum | -1 | -1 | -1 Greenplum | -1 | -1 | -1 Greenplum | -1 | -1 | -1 Greenplum | -1 | -1 | -1 Greenplum | -1 | -1 | -1 Greenplum | -1 | -1 | -1 ……
如果不想返回太多记录,可以使用limit 来过滤,但是执行还是会在所有的segment都执行,如下:
- snippet.sql
postgres=# EXPLAIN analyze SELECT random() FROM gp_dist_random('gp_id') LIMIT 1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- LIMIT (cost=0.00..0.04 ROWS=1 width=0) ROWS OUT: 1 ROWS WITH 5.865 ms TO FIRST ROW, 5.884 ms TO END, START offset BY 4212 ms. -> Gather Motion 240:1 (slice1; segments: 240) (cost=0.00..0.04 ROWS=1 width=0) ROWS OUT: 1 ROWS at destination WITH 5.857 ms TO END, START offset BY 4212 ms. -> LIMIT (cost=0.00..0.02 ROWS=1 width=0) ROWS OUT: Avg 1.0 ROWS x 240 workers. MAX 1 ROWS (seg0) WITH 0.062 ms TO FIRST ROW, 0.063 ms TO END, START offset BY 4228 ms. -> Seq Scan ON gp_id (cost=0.00..4.00 ROWS=1 width=0) ROWS OUT: Avg 1.0 ROWS x 240 workers. MAX 1 ROWS (seg0) WITH 0.060 ms TO END, START offset BY 4228 ms. Slice statistics: (slice0) Executor memory: 463K bytes. (slice1) Executor memory: 163K bytes avg x 240 workers, 163K bytes MAX (seg0). Statement statistics: Memory used: 128000K bytes Total runtime: 4288.007 ms (14 ROWS)