FAIRYFAR-INTERNAL
 
  FAIRYFAR-INTERNAL  |  SITEMAP  |  ABOUT-ME  |  HOME  
您的足迹: Greenplum问题杂记
Greenplum问题杂记

一、单机模式问题

1. “select * from pg_resqueue_status; ” crash

crash重现方法:

snippet.bash
[gpadmin@bogon ~]$ gpstart -a -m
[gpadmin@bogon ~]$ PGOPTIONS='-c gp_session_role=utility' psqlp's -d postgres
psql (9.4.24)
Type "help" for help.
 
postgres=# select * from pg_resqueue_status;
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

原因:

单机模式下,Gp_role == GP_ROLE_UTILITY,InitResScheduler()函数不为0,导致ResQueueHash没有初始化:

snippet.c
void
ResManagerShmemInit(void)
{
	if (IsResQueueEnabled() && Gp_role == GP_ROLE_DISPATCH)
	{
		InitResScheduler();
		InitResPortalIncrementHash();
	}
	else if (IsResGroupEnabled() && !IsUnderPostmaster)
	{
		ResGroupControlInit();
	}
}

在查询pg_resqueue_status视图时,访问ResQueueHash空指针:

snippet.c
/**
 * This copies out the current state of resource queues.
 */
static void
BuildQueueStatusContext(QueueStatusContext *fctx)
{
……
	/* Initialize for a sequential scan of the resource queue hash. */
	hash_seq_init(&status, ResQueueHash);
	num_calls = hash_get_num_entries(ResQueueHash);
	Assert(num_calls == ResScheduler->num_queues);

2. “gp_toolkit.gp_param_setting” crash

crash重现方法:

snippet.bash
[gpadmin@bogon ~]$ gpstart -a -m
[gpadmin@bogon ~]$ PGOPTIONS='-c gp_session_role=utility' psql -d postgres
psql (9.4.24)
Type "help" for help.
 
postgres=# select * FROM gp_toolkit.gp_param_setting('max_connections');
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

原因:未知。

二、权限

新用户无法登录

新用户无法查询数据库

现象:

snippet.sql
[gpadmin@bogon ~]$ psql -d tpch1s -U fairyfar
tpch1s=> SELECT COUNT(*) FROM region;
ERROR:  permission denied FOR relation region

解决方法:

snippet.sql
# 使用管理员登录
[gpadmin@bogon ~]$ psql -d tpch1s
tpch1s=# GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO fairyfar;

三、语法

1. ERROR: failed to find conversion function from unknown to text

使用GPDB执行TPC-DS q14查询时,报该错误。

原因:

在select的语句的中包含常量列,但是没有指定常量列类型。

用例:

snippet.sql
CREATE TABLE t (a INT);
INSERT INTO t VALUES (1);
SELECT DISTINCT * FROM (SELECT a, '2021-01-01' AS st, '2021-01-02' AS lt FROM t)x;

解决方法:

明确常量的类型。

snippet.sql
SELECT DISTINCT * FROM (SELECT a, '2021-01-01'::text AS st, '2021-01-02'::text AS lt FROM t)x;
-- 或者:
SELECT DISTINCT * FROM (SELECT a, CAST('2021-01-01' AS text) AS st, CAST('2021-01-02' AS text) AS lt FROM t)x;

参考:

ERROR: failed to find conversion function from unknown to text

四、未分类问题

1. 移除角色superuser属性

NOSUPERUSER可以移除超级用户权限。

snippet.sql
tpch1s=# \du
                                       List OF roles
 ROLE name |                              Attributes                        | Member OF 
-----------+----------------------------------------------------------------+-----------
 gpadmin   | Superuser, CREATE ROLE, CREATE DB, Ext http TABLE, Replication | {}
 yz        | Superuser, CREATE ROLE, CREATE DB, Replication                 | {gpadmin}
 
tpch1s=# ALTER ROLE yz WITH NOSUPERUSER;
tpch1s=# \du
                                       List OF roles
 ROLE name |                              Attributes                        | Member OF 
-----------+----------------------------------------------------------------+-----------
 gpadmin   | Superuser, CREATE ROLE, CREATE DB, Ext http TABLE, Replication | {}
 yz        | CREATE ROLE, CREATE DB, Replication                            | {gpadmin}

2. ERROR: deadlock detected, locking against self

资源队列设置memory_limit后,执行SQL,出现以下问题:

snippet.sql
tpch1s=>  SELECT * FROM nation;
ERROR:  deadlock detected, locking against SELF

解决方法是,调小statement_mem配置参数值,使: $$ statement\_mem < = 资源队列的memory\_limit $$

snippet.sql
tpch1s=> SET statement_mem='10MB';

参考:Conflict with Resource Queue memory_limit and statement_mem results in "ERROR: deadlock detected, locking against self"

3. 资源队列不起作用

使用queue(资源队列)管理资源时,发现资源管理队列不起作用。以下视图,rsqcountvalue和rsqmemoryvalue始终为0:

snippet.sql
tpch1s=#  SELECT * FROM gp_toolkit.gp_resqueue_status;
 queueid |  rsqname   | rsqcountlimit | rsqcountvalue | rsqcostlimit | rsqcostvalue | rsqmemorylimit | rsqmemoryvalue | rsqwaiters | rsqholders 
---------+------------+---------------+---------------+--------------+--------------+----------------+----------------+------------+------------
    6055 | pg_default |            20 |             0 |           -1 |            0 |    1.04858e+07 |              0 |          0 |          0

经查,是因为“超级用户不受资源队列限制”:

Roles with the SUPERUSER attribute are exempt from resource queue limits. Superuser queries always run immediately regardless of limits imposed by their assigned resource queue.

4. 查询用户与资源组/资源队列之间的关系

用户对应的资源队列:

snippet.sql
SELECT rolname, rsqname FROM pg_roles,
gp_toolkit.gp_resqueue_status
WHERE pg_roles.rolresqueue=gp_toolkit.gp_resqueue_status.queueid;

用户对应的资源组:

snippet.sql
SELECT rolname, rsgname FROM pg_roles, pg_resgroup
WHERE pg_roles.rolresgroup=pg_resgroup.oid;

5. 资源管理语法解析重复执行问题

同一个测试用例。在gp_resource_manager='queue'时,有以下查询与结果:

snippet.sql
postgres=# SET standard_conforming_strings = off;
SET
postgres=# SELECT 'a\\bcd' AS f1;
WARNING:  nonstandard USE OF \\ IN a string literal
LINE 1: SELECT 'a\\bcd' AS f1;
               ^
HINT:  USE the escape string syntax FOR backslashes, e.g., E'\\'.
  f1   
-------
 a\bcd
(1 ROW)

而当gp_resource_manager='group'时,有以下查询与结果:

snippet.sql
postgres=# SET standard_conforming_strings = off;
SET
postgres=# SELECT 'a\\bcd' AS f1;
WARNING:  nonstandard USE OF \\ IN a string literal
LINE 1: SELECT 'a\\bcd' AS f1;
               ^
HINT:  USE the escape string syntax FOR backslashes, e.g., E'\\'.
WARNING:  nonstandard USE OF \\ IN a string literal
LINE 1: SELECT 'a\\bcd' AS f1;
               ^
HINT:  USE the escape string syntax FOR backslashes, e.g., E'\\'.
  f1   
-------
 a\bcd
(1 ROW)

即,资源组资源管理方式下,语法解析报了两遍warning。

gdb跟踪,资源组时,第一遍warning堆栈如下:

snippet.gdb
(gdb) bt
#0  check_string_escape_warning (ychar=92 '\\', yyscanner=0x1a90220) at src/backend/parser/scan.l:1622
#1  0x0000000000ac5982 in core_yylex (yylval_param=0x7ffd7de70830, yylloc_param=0x7ffd7de7082c, yyscanner=0x1a90220) at src/backend/parser/scan.l:746
#2  0x0000000000ac46b1 in base_yylex (lvalp=0x7ffd7de70830, llocp=0x7ffd7de7082c, yyscanner=0x1a90220) at src/backend/parser/parser.c:101
#3  0x0000000000a4f3f9 in base_yyparse (yyscanner=0x1a90220) at src/backend/parser/gram.c:31959
#4  0x0000000000ac460b in raw_parser (str=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/parser/parser.c:53
#5  0x0000000000ba6fd9 in pg_parse_query (query_string=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/tcop/postgres.c:701
#6  0x0000000000daf4eb in shouldBypassQuery (query_string=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/utils/resgroup/resgroup.c:3671
#7  0x0000000000dad3cb in AssignResGroupOnMaster () at src/backend/utils/resgroup/resgroup.c:2597
#8  0x0000000000641b64 in StartTransaction () at src/backend/access/transam/xact.c:2648
#9  0x0000000000642dd5 in StartTransactionCommand () at src/backend/access/transam/xact.c:3659
#10 0x0000000000baa000 in start_xact_command () at src/backend/tcop/postgres.c:2722
#11 0x0000000000ba75c9 in exec_simple_query (query_string=0x1a8f9f0 "select 'a\\\\bcd' as f1;") at src/backend/tcop/postgres.c:1111
#12 0x0000000000bad50c in PostgresMain (argc=1, argv=0x1ae66f0, dbname=0x1ae6568 "postgres", username=0x1ae6548 "yz") at src/backend/tcop/postgres.c:5165
#13 0x0000000000af3b15 in BackendRun (port=0x1ab9b60) at src/backend/postmaster/postmaster.c:5196
#14 0x0000000000af32cb in BackendStartup (port=0x1ab9b60) at src/backend/postmaster/postmaster.c:4887
#15 0x0000000000aeede6 in ServerLoop (receive_encryption_key=false) at src/backend/postmaster/postmaster.c:2060
#16 0x0000000000aee5b6 in PostmasterMain (argc=9, argv=0x1a8a350) at src/backend/postmaster/postmaster.c:1668
#17 0x00000000009238d0 in main (argc=9, argv=0x1a8a350) at src/backend/main/main.c:228

关键在于资源组管理时,SET/RESET/SHOW 不受资源组限制,以下函数目的是语法分析是否是这三类语句,其中完整地执行了一遍postgresql的语法解析函数:

snippet.cpp
/*
 * Parse the query and check if this query should
 * bypass the management of resource group.
 *
 * Currently, only SET/RESET/SHOW command can be bypassed
 */
static bool
shouldBypassQuery(const char *query_string)
{
  ……
	parsetree_list = pg_parse_query(query_string);

即,资源管理时,多执行了一遍语法解析。

6. 修改hosname

当GPDB集群安装完成后,如果修改Linux主机名称,GPDB启动将失败,这是因为GPDB记录了在系统表中记录了hostname,Linux修改主机名后,GPDB系统表的信息没有更新。

应该按照以下方法修改主机名称:

snippet.bash
 # 进入运维模式
 [gpadmin-master@bogon ~]$ gpstart -m -a
 
 # 运行更新系统表
 [gpadmin-master@bogon ~]$ PGOPTIONS='-c gp_session_role=utility' psql -d postgres
 postgres=# set allow_system_table_mods='dml';
 
 # 更新系统表
 postgres=# update gp_segment_configuration set hostname='new-hostname',address='new-hostname';

然后修改Linux主机名。



打赏作者以资鼓励: