FAIRYFAR-INTERNAL
 
  FAIRYFAR-INTERNAL  |  SITEMAP  |  ABOUT-ME  |  HOME  
Greenplum TPC-H测试中那些坑

一、测试环境

项目
———————- ————————————
Greenplum Database版本 6.9.0
数据量 TPC-H 1000s
操作系统 Redhat 7.6
数据分布 3节点,每节点32个segment,无mirror。

二、问题

2.1 安装失败1

gpinitsystem时,提示删除“gpsegcreate.sh_backout.*”失败:

snippet.bash
20200530:16:22:25:2010621 gpinitsystem:test-4:gpadmin-[INFO]:-Deleting distributed backout files
/bin/rm: cannot remove ‘/tmp/gpsegcreate.sh_backout.2111154’: Operation not permitted
/bin/rm: cannot remove ‘/tmp/gpsegcreate.sh_backout.2111167’: Operation not permitted

原因是其它用户的残留文件/tmp/gpsegcreate.sh_backout.*文件,当前用户无权限删除,所以手动删除这些文件后,再重试:

snippet.bash
sudo rm -rf /tmp/gpsegcreate.sh_backout.*

2.2 安装失败2

gpinitsystem阶段,莫名其妙的失败,安装日志只是提示有segment安装失败,除此之外没有其它错误:

snippet.bash
20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:------------------------------------------------
20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:-Parallel process exit status
20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:------------------------------------------------
20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:-Total processes marked as completed           = 172
20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:-Total processes marked as killed              = 0
20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[WARN]:-Total processes marked as failed              = 20 <<<<<
20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:------------------------------------------------
20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[FATAL]:-Errors generated from parallel processes
20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:-Dumped contents of status file to the log file
20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:-Building composite backout file
20200530:18:38:37:3113385 gpinitsystem:test-4:gpadmin-[FATAL]:-Failures detected, see log file /home/gpadmin/gpAdminLogs/gpinitsystem_20200530.log for more detail Script Exiting!

此时,可以尝试gpinitsystem带-m参数,指定安装时的并行度,例如:

snippet.bash
[gpadmin@test-4 ~]$ gpinitsystem -m 25 -c ~/gpinitsystem_config

2.3 磁盘空间不足

查询执行过程中报磁盘空间不足:

ERROR: could not write XXX bytes to temporary file: No space left on device

GP查询执行过程中,可能会将查询算子中间结果临时溢出到磁盘,如果GP临时文件所在磁盘空间不足,会导致查询失败。按照上述测试环境,建议为每节点保留200GB磁盘空间。

2.4 优化参数

GP默认参数是大约是按照10个并发场景配置的典型值。大内存环境,对于单个事务的OLTP性能测试,GP的默认参数偏小。建议调整相关参数。

statement_mem

一个事务的单条查询语句在一个segment上,可以使用的最大内存量。

max_statement_mem

一个事务的单条查询语句在一个节点的所有segment上,可以使用的最大内存量。属于一条查询可以使用的内存上限,是对statement_mem的保护。

gp_vmem_protect_limit

有些执行失败,报以下VM(虚拟内存)错误。该参数是单个segment可以使用的VM大小,可以根据本地总内存大小和segment数调整配置。

snippet.bash
ERROR:  Out of memory  (seg3 slice4 10.10.0.4:6348 pid=3535218)
DETAIL:  VM protect failed to allocate 8388616 bytes from system, VM Protect 7335 MB available

官方文档关于此参数的说明:“the amount of memory (in number of MBs) that all postgres processes of an active segment instance can consume.”

使用以下方法修改,并重启GP集群:

snippet.bash
gpconfig -c gp_vmem_protect_limit -v 8192
gpstop -r -a

三、参考



打赏作者以资鼓励: