一、单机模式问题
1. “select * from pg_resqueue_status; ” crash
crash重现方法:
- snippet.bash
[gpadmin@bogon ~]$ gpstart -a -m [gpadmin@bogon ~]$ PGOPTIONS='-c gp_session_role=utility' psqlp's -d postgres psql (9.4.24) Type "help" for help. postgres=# select * from pg_resqueue_status; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed.
原因:
单机模式下,Gp_role == GP_ROLE_UTILITY,InitResScheduler()函数不为0,导致ResQueueHash没有初始化:
- snippet.c
void ResManagerShmemInit(void) { if (IsResQueueEnabled() && Gp_role == GP_ROLE_DISPATCH) { InitResScheduler(); InitResPortalIncrementHash(); } else if (IsResGroupEnabled() && !IsUnderPostmaster) { ResGroupControlInit(); } }
在查询pg_resqueue_status视图时,访问ResQueueHash空指针:
- snippet.c
/** * This copies out the current state of resource queues. */ static void BuildQueueStatusContext(QueueStatusContext *fctx) { …… /* Initialize for a sequential scan of the resource queue hash. */ hash_seq_init(&status, ResQueueHash); num_calls = hash_get_num_entries(ResQueueHash); Assert(num_calls == ResScheduler->num_queues);
2. “gp_toolkit.gp_param_setting” crash
crash重现方法:
- snippet.bash
[gpadmin@bogon ~]$ gpstart -a -m [gpadmin@bogon ~]$ PGOPTIONS='-c gp_session_role=utility' psql -d postgres psql (9.4.24) Type "help" for help. postgres=# select * FROM gp_toolkit.gp_param_setting('max_connections'); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed.
原因:未知。
二、权限
新用户无法登录
新用户无法查询数据库
现象:
- snippet.sql
[gpadmin@bogon ~]$ psql -d tpch1s -U fairyfar tpch1s=> SELECT COUNT(*) FROM region; ERROR: permission denied FOR relation region
解决方法:
- snippet.sql
# 使用管理员登录 [gpadmin@bogon ~]$ psql -d tpch1s tpch1s=# GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO fairyfar;
三、语法
1. ERROR: failed to find conversion function from unknown to text
使用GPDB执行TPC-DS q14查询时,报该错误。
原因:
在select的语句的中包含常量列,但是没有指定常量列类型。
用例:
- snippet.sql
CREATE TABLE t (a INT); INSERT INTO t VALUES (1); SELECT DISTINCT * FROM (SELECT a, '2021-01-01' AS st, '2021-01-02' AS lt FROM t)x;
解决方法:
明确常量的类型。
- snippet.sql
SELECT DISTINCT * FROM (SELECT a, '2021-01-01'::text AS st, '2021-01-02'::text AS lt FROM t)x; -- 或者: SELECT DISTINCT * FROM (SELECT a, CAST('2021-01-01' AS text) AS st, CAST('2021-01-02' AS text) AS lt FROM t)x;
参考:
ERROR: failed to find conversion function from unknown to text
四、未分类问题
1. 移除角色superuser属性
NOSUPERUSER可以移除超级用户权限。
- snippet.sql
tpch1s=# \du List OF roles ROLE name | Attributes | Member OF -----------+----------------------------------------------------------------+----------- gpadmin | Superuser, CREATE ROLE, CREATE DB, Ext http TABLE, Replication | {} yz | Superuser, CREATE ROLE, CREATE DB, Replication | {gpadmin} tpch1s=# ALTER ROLE yz WITH NOSUPERUSER; tpch1s=# \du List OF roles ROLE name | Attributes | Member OF -----------+----------------------------------------------------------------+----------- gpadmin | Superuser, CREATE ROLE, CREATE DB, Ext http TABLE, Replication | {} yz | CREATE ROLE, CREATE DB, Replication | {gpadmin}
2. ERROR: deadlock detected, locking against self
资源队列设置memory_limit后,执行SQL,出现以下问题:
- snippet.sql
tpch1s=> SELECT * FROM nation; ERROR: deadlock detected, locking against SELF
解决方法是,调小statement_mem配置参数值,使: $$ statement\_mem < = 资源队列的memory\_limit $$
- snippet.sql
tpch1s=> SET statement_mem='10MB';
3. 资源队列不起作用
使用queue(资源队列)管理资源时,发现资源管理队列不起作用。以下视图,rsqcountvalue和rsqmemoryvalue始终为0:
- snippet.sql
tpch1s=# SELECT * FROM gp_toolkit.gp_resqueue_status; queueid | rsqname | rsqcountlimit | rsqcountvalue | rsqcostlimit | rsqcostvalue | rsqmemorylimit | rsqmemoryvalue | rsqwaiters | rsqholders ---------+------------+---------------+---------------+--------------+--------------+----------------+----------------+------------+------------ 6055 | pg_default | 20 | 0 | -1 | 0 | 1.04858e+07 | 0 | 0 | 0
经查,是因为“超级用户不受资源队列限制”:
Roles with the SUPERUSER attribute are exempt from resource queue limits. Superuser queries always run immediately regardless of limits imposed by their assigned resource queue.
4. 查询用户与资源组/资源队列之间的关系
用户对应的资源队列:
- snippet.sql
SELECT rolname, rsqname FROM pg_roles, gp_toolkit.gp_resqueue_status WHERE pg_roles.rolresqueue=gp_toolkit.gp_resqueue_status.queueid;
用户对应的资源组:
- snippet.sql
SELECT rolname, rsgname FROM pg_roles, pg_resgroup WHERE pg_roles.rolresgroup=pg_resgroup.oid;
5. 资源管理语法解析重复执行问题
同一个测试用例。在gp_resource_manager='queue'时,有以下查询与结果:
- snippet.sql
postgres=# SET standard_conforming_strings = off; SET postgres=# SELECT 'a\\bcd' AS f1; WARNING: nonstandard USE OF \\ IN a string literal LINE 1: SELECT 'a\\bcd' AS f1; ^ HINT: USE the escape string syntax FOR backslashes, e.g., E'\\'. f1 ------- a\bcd (1 ROW)
而当gp_resource_manager='group'时,有以下查询与结果:
- snippet.sql
postgres=# SET standard_conforming_strings = off; SET postgres=# SELECT 'a\\bcd' AS f1; WARNING: nonstandard USE OF \\ IN a string literal LINE 1: SELECT 'a\\bcd' AS f1; ^ HINT: USE the escape string syntax FOR backslashes, e.g., E'\\'. WARNING: nonstandard USE OF \\ IN a string literal LINE 1: SELECT 'a\\bcd' AS f1; ^ HINT: USE the escape string syntax FOR backslashes, e.g., E'\\'. f1 ------- a\bcd (1 ROW)
即,资源组资源管理方式下,语法解析报了两遍warning。
gdb跟踪,资源组时,第一遍warning堆栈如下:
- snippet.gdb
(gdb) bt #0 check_string_escape_warning (ychar=92 '\\', yyscanner=0x1a90220) at src/backend/parser/scan.l:1622 #1 0x0000000000ac5982 in core_yylex (yylval_param=0x7ffd7de70830, yylloc_param=0x7ffd7de7082c, yyscanner=0x1a90220) at src/backend/parser/scan.l:746 #2 0x0000000000ac46b1 in base_yylex (lvalp=0x7ffd7de70830, llocp=0x7ffd7de7082c, yyscanner=0x1a90220) at src/backend/parser/parser.c:101 #3 0x0000000000a4f3f9 in base_yyparse (yyscanner=0x1a90220) at src/backend/parser/gram.c:31959 #4 0x0000000000ac460b in raw_parser (str=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/parser/parser.c:53 #5 0x0000000000ba6fd9 in pg_parse_query (query_string=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/tcop/postgres.c:701 #6 0x0000000000daf4eb in shouldBypassQuery (query_string=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/utils/resgroup/resgroup.c:3671 #7 0x0000000000dad3cb in AssignResGroupOnMaster () at src/backend/utils/resgroup/resgroup.c:2597 #8 0x0000000000641b64 in StartTransaction () at src/backend/access/transam/xact.c:2648 #9 0x0000000000642dd5 in StartTransactionCommand () at src/backend/access/transam/xact.c:3659 #10 0x0000000000baa000 in start_xact_command () at src/backend/tcop/postgres.c:2722 #11 0x0000000000ba75c9 in exec_simple_query (query_string=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/tcop/postgres.c:1111 #12 0x0000000000bad50c in PostgresMain (argc=1, argv=0x1ae66f0, dbname=0x1ae6568 "postgres", username=0x1ae6548 "yz") at src/backend/tcop/postgres.c:5165 #13 0x0000000000af3b15 in BackendRun (port=0x1ab9b60) at src/backend/postmaster/postmaster.c:5196 #14 0x0000000000af32cb in BackendStartup (port=0x1ab9b60) at src/backend/postmaster/postmaster.c:4887 #15 0x0000000000aeede6 in ServerLoop (receive_encryption_key=false) at src/backend/postmaster/postmaster.c:2060 #16 0x0000000000aee5b6 in PostmasterMain (argc=9, argv=0x1a8a350) at src/backend/postmaster/postmaster.c:1668 #17 0x00000000009238d0 in main (argc=9, argv=0x1a8a350) at src/backend/main/main.c:228
关键在于资源组管理时,SET/RESET/SHOW 不受资源组限制,以下函数目的是语法分析是否是这三类语句,其中完整地执行了一遍postgresql的语法解析函数:
- snippet.cpp
/* * Parse the query and check if this query should * bypass the management of resource group. * * Currently, only SET/RESET/SHOW command can be bypassed */ static bool shouldBypassQuery(const char *query_string) { …… parsetree_list = pg_parse_query(query_string);
即,资源管理时,多执行了一遍语法解析。
6. 修改hosname
当GPDB集群安装完成后,如果修改Linux主机名称,GPDB启动将失败,这是因为GPDB记录了在系统表中记录了hostname,Linux修改主机名后,GPDB系统表的信息没有更新。
应该按照以下方法修改主机名称:
- snippet.bash
# 进入运维模式 [gpadmin-master@bogon ~]$ gpstart -m -a # 运行更新系统表 [gpadmin-master@bogon ~]$ PGOPTIONS='-c gp_session_role=utility' psql -d postgres postgres=# set allow_system_table_mods='dml'; # 更新系统表 postgres=# update gp_segment_configuration set hostname='new-hostname',address='new-hostname';
然后修改Linux主机名。