wiki:Database:GreenPlum:执行器与优化器:Greenplum函数gp_dist_random - FAIRYFAR-INTERNAL

您的足迹: •

Greenplum函数gp_dist_random

转自：https://developer.aliyun.com/article/7593

使用greenplum时，如果需要调用一个函数，这个函数很可能就在master执行，而不会跑到segment上去执行。

例如 random()函数。

通过select random()来调用的话，不需要将这条SQL发送到segment节点，所以执行计划如下，没有gather motion的过程。

postgres=# EXPLAIN analyze SELECT random();  
                                       QUERY PLAN                                         
----------------------------------------------------------------------------------------  
 RESULT  (cost=0.01..0.02 ROWS=1 width=0)  
   ROWS OUT:  1 ROWS WITH 0.017 ms TO END, START offset BY 0.056 ms.  
   InitPlan  
     ->  RESULT  (cost=0.00..0.01 ROWS=1 width=0)  
           ROWS OUT:  1 ROWS WITH 0.004 ms TO END OF 2 scans, START offset BY 0.059 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 29K bytes.  
   (slice1)    Executor memory: 29K bytes.  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 0.074 ms  
(11 ROWS)

如果要让这个函数在segment执行，怎么办呢？

通过gp_dist_random('gp_id')来调用，gp_dist_random的参数是一个可查询的视图，或表。

snippet.sql

postgres=# EXPLAIN analyze SELECT random() FROM gp_dist_random('gp_id');  
                                                               QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------  
 Gather Motion 240:1  (slice1; segments: 240)  (cost=0.00..4.00 ROWS=240 width=0)  
   ROWS OUT:  240 ROWS at destination WITH 6.336 ms TO FIRST ROW, 59 ms TO END, START offset BY 4195 ms.  
   ->  Seq Scan ON gp_id  (cost=0.00..4.00 ROWS=1 width=0)  
         ROWS OUT:  Avg 1.0 ROWS x 240 workers.  MAX 1 ROWS (seg0) WITH 0.073 ms TO FIRST ROW, 0.075 ms TO END, START offset BY 4207 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 471K bytes.  
   (slice1)    Executor memory: 163K bytes avg x 240 workers, 163K bytes MAX (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 4279.445 ms  
(10 ROWS)

gp_id在每个segment中都有一条记录，所以以上SQL会在每个SEGMENT中调用一次random()并返回所有结果，例如我的测试环境中有240个segment, 那么以上SQL将返回240条记录。

在gp_id的定义中，介绍了gp_dist_random用它可以做一些管理的工作：

譬如查询数据库的大小，查询表的大小，其实都是这样统计的。

snippet.c

/*-------------------------------------------------------------------------  
 *  
 * gp_id.h  
 *        definition of the system "database identifier" relation (gp_dbid)  
 *        along with the relation's initial contents.  
 *  
 * Copyright (c) 2009-2010, Greenplum inc  
 *  
 * NOTES  
 *    Historically this table was used to supply every segment with its  
 * identification information.  However in the 4.0 release when the file  
 * replication feature was added it could no longer serve this purpose  
 * because it became a requirement for all tables to have the same physical  
 * contents on both the primary and mirror segments.  To resolve this the  
 * information is now passed to each segment on startup based on the  
 * gp_segment_configuration (stored on the master only), and each segment  
 * has a file in its datadirectory (gp_dbid) that uniquely identifies the  
 * segment.  
 *  
 *   The contents of the table are now irrelevant, with the exception that  
 * several tools began relying on this table for use as a method of remote  
 * function invocation via gp_dist_random('gp_id') due to the fact that this  
 * table was guaranteed of having exactly one row on every segment.  The  
 * contents of the row have no defined meaning, but this property is still  
 * relied upon.  
 */  
#ifndef _GP_ID_H_  
#define _GP_ID_H_  
 
#include "catalog/genbki.h"  
/*  
 * Defines for gp_id table  
 */  
#define GpIdRelationName                        "gp_id"  
 
/* TIDYCAT_BEGINFAKEDEF  
 
   CREATE TABLE gp_id  
   with (shared=true, oid=false, relid=5001, content=SEGMENT_LOCAL)  
   (  
   gpname       name     ,  
   numsegments  smallint ,  
   dbid         smallint ,  
   content      smallint   
   );  
 
   TIDYCAT_ENDFAKEDEF  
*/

查询数据库大小的GP函数

snippet.sql

postgres=# \df+ pg_database_size  
                                                                                                     List OF functions  
   Schema   |       Name       | RESULT DATA TYPE | Argument DATA types |  TYPE  |  DATA access   | Volatility |  Owner   | LANGUAGE |      SOURCE code      |                         Description                           
------------+------------------+------------------+---------------------+--------+----------------+------------+----------+----------+-----------------------+-------------------------------------------------------------  
 pg_catalog | pg_database_size | BIGINT           | name                | normal | reads SQL DATA | volatile   | dege.zzz | internal | pg_database_size_name | Calculate total disk SPACE usage FOR the specified DATABASE  
 pg_catalog | pg_database_size | BIGINT           | oid                 | normal | reads SQL DATA | volatile   | dege.zzz | internal | pg_database_size_oid  | Calculate total disk SPACE usage FOR the specified DATABASE  
(2 ROWS)

其中pg_database_size_name 的源码如下：

很明显，在统计数据库大小时也用到了select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');

snippet.c

Datum  
pg_database_size_name(PG_FUNCTION_ARGS)
{
        int64           size = 0;
        Name            dbName = PG_GETARG_NAME(0);
        Oid                     dbOid = get_database_oid(NameStr(*dbName));  
 
        if (!OidIsValid(dbOid))  
                ereport(ERROR,  
                                (errcode(ERRCODE_UNDEFINED_DATABASE),
                                 errmsg("database \"%s\" does not exist",  
                                                NameStr(*dbName))));
 
        size = calculate_database_size(dbOid);
 
        if (Gp_role == GP_ROLE_DISPATCH)
        {  
                StringInfoData buffer;
                initStringInfo(&buffer);
                appendStringInfo(&buffer, "select sum(pg_database_size('%s'))::int8 from gp_dist_random('gp_id');", NameStr(*dbName));
 
                size += get_size_from_segDBs(buffer.data);  
        }
 
        PG_RETURN_INT64(size);
}

不信我们可以直接查询这个SQL，和使用pg_database_size函数得到的结果几乎是一样的，只差了calculate_database_size的部分。

snippet.sql

postgres=# SELECT SUM(pg_database_size('postgres'))::int8 FROM gp_dist_random('gp_id');  
      SUM         
----------------  
 16006753522624  
(1 ROW)  
 
postgres=# SELECT pg_database_size('postgres');  
 pg_database_size   
------------------  
   16006763924106  
(1 ROW)

gp_dist_random('gp_id')本质上就是在所有节点查询gp_id，

gp_dist_random('pg_authid')就是在所有节点查询pg_authid，

例如：

snippet.sql

postgres=# SELECT * FROM gp_dist_random('gp_id');  
  gpname   | numsegments | dbid | content   
-----------+-------------+------+---------  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
 Greenplum |          -1 |   -1 |      -1  
……

如果不想返回太多记录，可以使用limit 来过滤，但是执行还是会在所有的segment都执行，如下：

snippet.sql

postgres=# EXPLAIN analyze SELECT random() FROM gp_dist_random('gp_id') LIMIT 1;  
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 LIMIT  (cost=0.00..0.04 ROWS=1 width=0)  
   ROWS OUT:  1 ROWS WITH 5.865 ms TO FIRST ROW, 5.884 ms TO END, START offset BY 4212 ms.  
   ->  Gather Motion 240:1  (slice1; segments: 240)  (cost=0.00..0.04 ROWS=1 width=0)  
         ROWS OUT:  1 ROWS at destination WITH 5.857 ms TO END, START offset BY 4212 ms.  
         ->  LIMIT  (cost=0.00..0.02 ROWS=1 width=0)  
               ROWS OUT:  Avg 1.0 ROWS x 240 workers.  MAX 1 ROWS (seg0) WITH 0.062 ms TO FIRST ROW, 0.063 ms TO END, START offset BY 4228 ms.  
               ->  Seq Scan ON gp_id  (cost=0.00..4.00 ROWS=1 width=0)  
                     ROWS OUT:  Avg 1.0 ROWS x 240 workers.  MAX 1 ROWS (seg0) WITH 0.060 ms TO END, START offset BY 4228 ms.  
 Slice statistics:  
   (slice0)    Executor memory: 463K bytes.  
   (slice1)    Executor memory: 163K bytes avg x 240 workers, 163K bytes MAX (seg0).  
 Statement statistics:  
   Memory used: 128000K bytes  
 Total runtime: 4288.007 ms  
(14 ROWS)

打赏作者以资鼓励：

搜索：

移动端扫码阅读：