一、测试环境
项目 | 值 |
———————- | ———————————— |
Greenplum Database版本 | 6.9.0 |
数据量 | TPC-H 1000s |
操作系统 | Redhat 7.6 |
数据分布 | 3节点,每节点32个segment,无mirror。 |
二、问题
2.1 安装失败1
gpinitsystem时,提示删除“gpsegcreate.sh_backout.*”失败:
- snippet.bash
20200530:16:22:25:2010621 gpinitsystem:test-4:gpadmin-[INFO]:-Deleting distributed backout files /bin/rm: cannot remove ‘/tmp/gpsegcreate.sh_backout.2111154’: Operation not permitted /bin/rm: cannot remove ‘/tmp/gpsegcreate.sh_backout.2111167’: Operation not permitted
原因是其它用户的残留文件/tmp/gpsegcreate.sh_backout.*文件,当前用户无权限删除,所以手动删除这些文件后,再重试:
- snippet.bash
sudo rm -rf /tmp/gpsegcreate.sh_backout.*
2.2 安装失败2
gpinitsystem阶段,莫名其妙的失败,安装日志只是提示有segment安装失败,除此之外没有其它错误:
- snippet.bash
20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:------------------------------------------------ 20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:-Parallel process exit status 20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:------------------------------------------------ 20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:-Total processes marked as completed = 172 20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:-Total processes marked as killed = 0 20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[WARN]:-Total processes marked as failed = 20 <<<<< 20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:------------------------------------------------ 20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[FATAL]:-Errors generated from parallel processes 20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:-Dumped contents of status file to the log file 20200530:18:38:36:3113385 gpinitsystem:test-4:gpadmin-[INFO]:-Building composite backout file 20200530:18:38:37:3113385 gpinitsystem:test-4:gpadmin-[FATAL]:-Failures detected, see log file /home/gpadmin/gpAdminLogs/gpinitsystem_20200530.log for more detail Script Exiting!
此时,可以尝试gpinitsystem带-m参数,指定安装时的并行度,例如:
- snippet.bash
[gpadmin@test-4 ~]$ gpinitsystem -m 25 -c ~/gpinitsystem_config
2.3 磁盘空间不足
查询执行过程中报磁盘空间不足:
ERROR: could not write XXX bytes to temporary file: No space left on device
GP查询执行过程中,可能会将查询算子中间结果临时溢出到磁盘,如果GP临时文件所在磁盘空间不足,会导致查询失败。按照上述测试环境,建议为每节点保留200GB磁盘空间。
2.4 优化参数
GP默认参数是大约是按照10个并发场景配置的典型值。大内存环境,对于单个事务的OLTP性能测试,GP的默认参数偏小。建议调整相关参数。
statement_mem
一个事务的单条查询语句在一个segment上,可以使用的最大内存量。
max_statement_mem
一个事务的单条查询语句在一个节点的所有segment上,可以使用的最大内存量。属于一条查询可以使用的内存上限,是对statement_mem的保护。
gp_vmem_protect_limit
有些执行失败,报以下VM(虚拟内存)错误。该参数是单个segment可以使用的VM大小,可以根据本地总内存大小和segment数调整配置。
- snippet.bash
ERROR: Out of memory (seg3 slice4 10.10.0.4:6348 pid=3535218) DETAIL: VM protect failed to allocate 8388616 bytes from system, VM Protect 7335 MB available
官方文档关于此参数的说明:“the amount of memory (in number of MBs) that all postgres processes of an active segment instance can consume.”
使用以下方法修改,并重启GP集群:
- snippet.bash
gpconfig -c gp_vmem_protect_limit -v 8192 gpstop -r -a