# 一、单机模式问题 ## 1. “select * from pg_resqueue_status; ” crash crash重现方法: ```bash [gpadmin@bogon ~]$ gpstart -a -m [gpadmin@bogon ~]$ PGOPTIONS='-c gp_session_role=utility' psqlp's -d postgres psql (9.4.24) Type "help" for help. postgres=# select * from pg_resqueue_status; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. ``` 原因: 单机模式下,Gp\_role == GP\_ROLE\_UTILITY,InitResScheduler()函数不为0,导致ResQueueHash没有初始化: ```c void ResManagerShmemInit(void) { if (IsResQueueEnabled() && Gp_role == GP_ROLE_DISPATCH) { InitResScheduler(); InitResPortalIncrementHash(); } else if (IsResGroupEnabled() && !IsUnderPostmaster) { ResGroupControlInit(); } } ``` 在查询pg\_resqueue\_status视图时,访问ResQueueHash空指针: ```c /** * This copies out the current state of resource queues. */ static void BuildQueueStatusContext(QueueStatusContext *fctx) { …… /* Initialize for a sequential scan of the resource queue hash. */ hash_seq_init(&status, ResQueueHash); num_calls = hash_get_num_entries(ResQueueHash); Assert(num_calls == ResScheduler->num_queues); ``` # 2. “gp_toolkit.gp_param_setting” crash crash重现方法: ```bash [gpadmin@bogon ~]$ gpstart -a -m [gpadmin@bogon ~]$ PGOPTIONS='-c gp_session_role=utility' psql -d postgres psql (9.4.24) Type "help" for help. postgres=# select * FROM gp_toolkit.gp_param_setting('max_connections'); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. ``` 原因:未知。 # 二、权限 ## 新用户无法登录 参阅:[Greenplum新建用户无法登录问题](../安装与调试/Greenplum新建用户无法登录问题) ## 新用户无法查询数据库 现象: ```sql [gpadmin@bogon ~]$ psql -d tpch1s -U fairyfar tpch1s=> select count(*) from region; ERROR: permission denied for relation region ``` 解决方法: ```sql # 使用管理员登录 [gpadmin@bogon ~]$ psql -d tpch1s tpch1s=# GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO fairyfar; ``` # 三、语法 ## 1. ERROR: failed to find conversion function from unknown to text 使用GPDB执行TPC-DS q14查询时,报该错误。 原因: 在select的语句的中包含常量列,但是没有指定常量列类型。 用例: ```sql create table t (a int); insert into t values (1); select distinct * from (select a, '2021-01-01' as st, '2021-01-02' as lt from t)x; ``` 解决方法: 明确常量的类型。 ```sql select distinct * from (select a, '2021-01-01'::text as st, '2021-01-02'::text as lt from t)x; -- 或者: select distinct * from (select a, cast('2021-01-01' as text) as st, cast('2021-01-02' as text) as lt from t)x; ``` 参考: [ERROR: failed to find conversion function from unknown to text](https://stackoverflow.com/questions/25192813/error-failed-to-find-conversion-function-from-unknown-to-text) # 四、未分类问题 ## 1. 移除角色superuser属性 NOSUPERUSER可以移除超级用户权限。 ```sql tpch1s=# \du List of roles Role name | Attributes | Member of -----------+----------------------------------------------------------------+----------- gpadmin | Superuser, Create role, Create DB, Ext http Table, Replication | {} yz | Superuser, Create role, Create DB, Replication | {gpadmin} tpch1s=# ALTER ROLE yz with NOSUPERUSER; tpch1s=# \du List of roles Role name | Attributes | Member of -----------+----------------------------------------------------------------+----------- gpadmin | Superuser, Create role, Create DB, Ext http Table, Replication | {} yz | Create role, Create DB, Replication | {gpadmin} ``` ## 2. ERROR: deadlock detected, locking against self 资源队列设置memory\_limit后,执行SQL,出现以下问题: ```sql tpch1s=> select * from nation; ERROR: deadlock detected, locking against self ``` 解决方法是,调小statement\_mem配置参数值,使: $$ statement\_mem < = 资源队列的memory\_limit $$ ```sql tpch1s=> set statement_mem='10MB'; ``` 参考:[Conflict with Resource Queue memory_limit and statement_mem results in "ERROR: deadlock detected, locking against self"](https://pvtl.force.com/s/article/Conflict-with-Resource-Queue-memorylimit-and-statementmem-results-in-ERROR-deadlock-detected-locking-against-self) ## 3. 资源队列不起作用 使用queue(资源队列)管理资源时,发现资源管理队列不起作用。以下视图,rsqcountvalue和rsqmemoryvalue始终为0: ```sql tpch1s=# select * from gp_toolkit.gp_resqueue_status; queueid | rsqname | rsqcountlimit | rsqcountvalue | rsqcostlimit | rsqcostvalue | rsqmemorylimit | rsqmemoryvalue | rsqwaiters | rsqholders ---------+------------+---------------+---------------+--------------+--------------+----------------+----------------+------------+------------ 6055 | pg_default | 20 | 0 | -1 | 0 | 1.04858e+07 | 0 | 0 | 0 ``` 经查,是因为“超级用户不受资源队列限制”: > Roles with the SUPERUSER attribute are exempt from resource queue limits. Superuser queries always run immediately regardless of limits imposed by their assigned resource queue. ## 4. 查询用户与资源组/资源队列之间的关系 用户对应的资源队列: ```sql SELECT rolname, rsqname FROM pg_roles, gp_toolkit.gp_resqueue_status WHERE pg_roles.rolresqueue=gp_toolkit.gp_resqueue_status.queueid; ``` 用户对应的资源组: ```sql SELECT rolname, rsgname FROM pg_roles, pg_resgroup WHERE pg_roles.rolresgroup=pg_resgroup.oid; ``` ## 5. 资源管理语法解析重复执行问题 同一个测试用例。在gp\_resource\_manager='queue'时,有以下查询与结果: ```sql postgres=# set standard_conforming_strings = off; SET postgres=# select 'a\\bcd' as f1; WARNING: nonstandard use of \\ in a string literal LINE 1: select 'a\\bcd' as f1; ^ HINT: Use the escape string syntax for backslashes, e.g., E'\\'. f1 ------- a\bcd (1 row) ``` 而当gp\_resource\_manager='group'时,有以下查询与结果: ```sql postgres=# set standard_conforming_strings = off; SET postgres=# select 'a\\bcd' as f1; WARNING: nonstandard use of \\ in a string literal LINE 1: select 'a\\bcd' as f1; ^ HINT: Use the escape string syntax for backslashes, e.g., E'\\'. WARNING: nonstandard use of \\ in a string literal LINE 1: select 'a\\bcd' as f1; ^ HINT: Use the escape string syntax for backslashes, e.g., E'\\'. f1 ------- a\bcd (1 row) ``` 即,资源组资源管理方式下,语法解析报了两遍warning。 gdb跟踪,资源组时,第一遍warning堆栈如下: ```gdb (gdb) bt #0 check_string_escape_warning (ychar=92 '\\', yyscanner=0x1a90220) at src/backend/parser/scan.l:1622 #1 0x0000000000ac5982 in core_yylex (yylval_param=0x7ffd7de70830, yylloc_param=0x7ffd7de7082c, yyscanner=0x1a90220) at src/backend/parser/scan.l:746 #2 0x0000000000ac46b1 in base_yylex (lvalp=0x7ffd7de70830, llocp=0x7ffd7de7082c, yyscanner=0x1a90220) at src/backend/parser/parser.c:101 #3 0x0000000000a4f3f9 in base_yyparse (yyscanner=0x1a90220) at src/backend/parser/gram.c:31959 #4 0x0000000000ac460b in raw_parser (str=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/parser/parser.c:53 #5 0x0000000000ba6fd9 in pg_parse_query (query_string=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/tcop/postgres.c:701 #6 0x0000000000daf4eb in shouldBypassQuery (query_string=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/utils/resgroup/resgroup.c:3671 #7 0x0000000000dad3cb in AssignResGroupOnMaster () at src/backend/utils/resgroup/resgroup.c:2597 #8 0x0000000000641b64 in StartTransaction () at src/backend/access/transam/xact.c:2648 #9 0x0000000000642dd5 in StartTransactionCommand () at src/backend/access/transam/xact.c:3659 #10 0x0000000000baa000 in start_xact_command () at src/backend/tcop/postgres.c:2722 #11 0x0000000000ba75c9 in exec_simple_query (query_string=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/tcop/postgres.c:1111 #12 0x0000000000bad50c in PostgresMain (argc=1, argv=0x1ae66f0, dbname=0x1ae6568 "postgres", username=0x1ae6548 "yz") at src/backend/tcop/postgres.c:5165 #13 0x0000000000af3b15 in BackendRun (port=0x1ab9b60) at src/backend/postmaster/postmaster.c:5196 #14 0x0000000000af32cb in BackendStartup (port=0x1ab9b60) at src/backend/postmaster/postmaster.c:4887 #15 0x0000000000aeede6 in ServerLoop (receive_encryption_key=false) at src/backend/postmaster/postmaster.c:2060 #16 0x0000000000aee5b6 in PostmasterMain (argc=9, argv=0x1a8a350) at src/backend/postmaster/postmaster.c:1668 #17 0x00000000009238d0 in main (argc=9, argv=0x1a8a350) at src/backend/main/main.c:228 ``` 关键在于资源组管理时,SET/RESET/SHOW 不受资源组限制,以下函数目的是语法分析是否是这三类语句,其中完整地执行了一遍postgresql的语法解析函数: ```cpp /* * Parse the query and check if this query should * bypass the management of resource group. * * Currently, only SET/RESET/SHOW command can be bypassed */ static bool shouldBypassQuery(const char *query_string) { …… parsetree_list = pg_parse_query(query_string); ``` 即,资源管理时,多执行了一遍语法解析。 ## 6. 修改hosname 当GPDB集群安装完成后,如果修改Linux主机名称,GPDB启动将失败,这是因为GPDB记录了在系统表中记录了hostname,Linux修改主机名后,GPDB系统表的信息没有更新。 应该按照以下方法修改主机名称: ```bash # 进入运维模式 [gpadmin-master@bogon ~]$ gpstart -m -a # 运行更新系统表 [gpadmin-master@bogon ~]$ PGOPTIONS='-c gp_session_role=utility' psql -d postgres postgres=# set allow_system_table_mods='dml'; # 更新系统表 postgres=# update gp_segment_configuration set hostname='new-hostname',address='new-hostname'; ``` 然后修改Linux主机名。