高并发psql连接报Resource temporarily unavailable
问题
在Postgresql
并发测试时,我们发现当并发psql数比较大时,会报告以下类似错误:
psql: error: could not connect to server: Resource temporarily unavailable Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5404"?
搜索Postgresql的源码推断,“Resource temporarily unavailable”错误不是Postgresql数据库报的错,推断是操作系统拦截了psql拦截。
复现方法
测试用例如下:
先修改Postgresql的max_conections
参数值为500,并重启Postgresql服务:
max_conections = 500
编写测试脚本(test.sh):
for ((i=1; i<300; i++)); do psql -p 5404 postgres -c "select pg_sleep(2);" & done
执行脚本可能会报以下错误:
[yz@bogon ~]$ ./test.sh psql: error: could not connect to server: Resource temporarily unavailable Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5404"? psql:psql: error: could not connect to server: Resource temporarily unavailable Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5404"? ……
原因与解决
经查内核net.core.somaxconn
值太小导致的(默认128)。
[yz@bogon ~]$ sysctl -a | grep somaxconn net.core.somaxconn = 128
调大net.core.somaxconn
值并重启Postgresql服务(注意:修改该参数后务必重启服务)。
[yz@bogon ~]$ sudo sysctl -w net.core.somaxconn=1024 net.core.somaxconn = 1024 [yz@bogon ~]$ sudo sysctl -a | grep somaxconn net.core.somaxconn = 1024
net.core.somaxconn
内核参数控制一个socket端口最大监听队列长度,因为上述测试中,psql连接的是同一个端口,并发数远远高于128,大概率会造成该队列超限。以下引自华为云文档:
`net.core.somaxconn` indicates the maximum number of half-open connections that can be backlogged in a listening queue. The default value is 128. If the queue is overloaded, you need to increase the listening queue length.
打赏作者以资鼓励: