openGauss查询超时不起作用
本文描述 openGauss 1.0.1 build 6942b898 版本中的一个bug。
验证
- snippet.sql
[yz@test openGauss]$ gsql tpch1s tpch1s=# SET statement_timeout=10; SET TIME: 2.079 ms tpch1s=# SHOW statement_timeout; statement_timeout ------------------- 10ms (1 ROW) TIME: 1.976 ms tpch1s=# \d ERROR: canceling statement due TO statement timeout tpch1s=# SELECT COUNT(*) FROM lineitem; COUNT --------- 6001215 (1 ROW) TIME: 6141.025 ms
可见,查询耗时6秒,仍然可以没有报超时。同样的用例在原生PostgreSQL上执行,查询超时会触发正常报错。
原因
查询执行时,对同一个定时器,先后多次设置了时间。导致后面的定时时间覆盖了前面的时间。
以下堆栈是同一个执行线程("gaussdb" )先后出现的3各堆栈。
(gdb) info threads Id Target Id Frame 1 Thread 0x7fbed0dd0480 (LWP 150411) "GaussMaster" 0x00007fbed1b45bed in poll () from /usr/lib64/libc.so.6 2 Thread 0x7fbed03ff700 (LWP 150541) "jemalloc_bg_thd" 0x00007fbed1e2b9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 3 Thread 0x7fbeab5ff700 (LWP 151475) "GaussMaster" 0x00007fbed1e2f381 in sigwait () from /usr/lib64/libpthread.so.0 …… 25 Thread 0x7fbe8d7f5700 (LWP 153766) "WlmArbiter" 0x00007fbed1b47953 in select () from /usr/lib64/libc.so.6 * 26 Thread 0x7fbe889f4700 (LWP 1243811) "gaussdb" gs_signal_settimer (interval=0x7fbe889d8760) at gs_signal.cpp:1017
首先,ReadCommand设置定时器(缺省600s),此为正常逻辑。
(gdb) bt #0 gs_signal_settimer (interval=0x7fbe889d88d0) at gs_signal.cpp:1017 #1 0x0000000001dedd33 in enable_session_sig_alarm (delayms=600000) at proc.cpp:2117 #2 0x0000000001710d30 in ReadCommand (inBuf=0x7fbe889d8a60) at postgres.cpp:655 #3 0x0000000001720b8f in PostgresMain (argc=1, argv=0x7fbe86ff32b0, dbname=0x7fbe86e9a590 "tpch1s", username=0x7fbe86e9a548 "yz") at postgres.cpp:7869 #4 0x000000000169b39f in BackendRun (port=0x7fbe889d8f30) at postmaster.cpp:7004 #5 0x00000000016a6c31 in GaussDbThreadMain<(knl_thread_role)1> (arg=0x7fbeb01021e0) at postmaster.cpp:10366 #6 0x00000000016a3d9a in InternalThreadFunc (args=0x7fbeb01021e0) at postmaster.cpp:10730 #7 0x0000000001fc6532 in ThreadStarterFunc (arg=0x7fbeb01021d0) at gs_thread.cpp:381 #8 0x00007fbed1e27e65 in start_thread () from /usr/lib64/libpthread.so.0 #9 0x00007fbed1b5088d in clone () from /usr/lib64/libc.so.6
然后,statement超时定时(上例为10ms),这个是我们期望的查询超时时间:
(gdb) bt #0 gs_signal_settimer (interval=0x7fbe889d8760) at gs_signal.cpp:1017 #1 0x0000000001ded71d in enable_sig_alarm (delayms=2, is_statement_timeout=true) at proc.cpp:2042 #2 0x000000000171a39f in start_xact_command () at postgres.cpp:4974 #3 0x0000000001713d32 in exec_simple_query (query_string=0x7fbe872b7060 "select count(*) from lineitem;", messageType=QUERY_MESSAGE, msg=0x7fbe889d8a60) at postgres.cpp:1966 #4 0x0000000001720f20 in PostgresMain (argc=1, argv=0x7fbe86ff32b0, dbname=0x7fbe86e9a590 "tpch1s", username=0x7fbe86e9a548 "yz") at postgres.cpp:8049 #5 0x000000000169b39f in BackendRun (port=0x7fbe889d8f30) at postmaster.cpp:7004 #6 0x00000000016a6c31 in GaussDbThreadMain<(knl_thread_role)1> (arg=0x7fbeb01021e0) at postmaster.cpp:10366 #7 0x00000000016a3d9a in InternalThreadFunc (args=0x7fbeb01021e0) at postmaster.cpp:10730 #8 0x0000000001fc6532 in ThreadStarterFunc (arg=0x7fbeb01021d0) at gs_thread.cpp:381 #9 0x00007fbed1e27e65 in start_thread () from /usr/lib64/libpthread.so.0 #10 0x00007fbed1b5088d in clone () from /usr/lib64/libc.so.6
但是,之后WLM又设置定时器(缺省30s),把之前的statement超时时间覆盖了:
(gdb) bt #0 gs_signal_settimer (interval=0x7fbe889d8510) at gs_signal.cpp:1017 #1 0x00000000012ea7ec in WLMSetTimer (start=669806818927860, stop=669806848927860) at statctl.cpp:3821 #2 0x00000000012f1119 in WLMSetDNodeInfoStatus (qid=0x7fbe86f28230, status=WLM_STATUS_RUNNING) at statctl.cpp:5531 #3 0x00000000012f2bc2 in WLMSetCollectInfoStatus (status=WLM_STATUS_RUNNING) at statctl.cpp:6041 #4 0x000000000132326e in dywlm_client_manager (queryDesc=0x7fbe87307060, isQueryDesc=true) at dywlm_client.cpp:2612 #5 0x000000000172a9e2 in PortalRun (portal=0x7fbe8730f060, count=9223372036854775807, is_top_level=true, dest=0x7fbe872b8a58, altdest=0x7fbe872b8a58, completion_tag=0x7fbe889d88d0 "") at pquery.cpp:1051 #6 0x00000000017154b0 in exec_simple_query (query_string=0x7fbe872b7060 "select count(*) from lineitem;", messageType=QUERY_MESSAGE, msg=0x7fbe889d8a60) at postgres.cpp:2447 #7 0x0000000001720f20 in PostgresMain (argc=1, argv=0x7fbe86ff32b0, dbname=0x7fbe86e9a590 "tpch1s", username=0x7fbe86e9a548 "yz") at postgres.cpp:8049 #8 0x000000000169b39f in BackendRun (port=0x7fbe889d8f30) at postmaster.cpp:7004 #9 0x00000000016a6c31 in GaussDbThreadMain<(knl_thread_role)1> (arg=0x7fbeb01021e0) at postmaster.cpp:10366 #10 0x00000000016a3d9a in InternalThreadFunc (args=0x7fbeb01021e0) at postmaster.cpp:10730 #11 0x0000000001fc6532 in ThreadStarterFunc (arg=0x7fbeb01021d0) at gs_thread.cpp:381 #12 0x00007fbed1e27e65 in start_thread () from /usr/lib64/libpthread.so.0 #13 0x00007fbed1b5088d in clone () from /usr/lib64/libc.so.6
这还可以解释另一种情况。当statement_timeout设置的足够小,例如1ms,那么在定时器被重新设置之前,定时器已经1ms超时,那么会正常触发查询超时报错。即,
- snippet.sql
tpch1s=# SET statement_timeout=1; SET TIME: 2.023 ms tpch1s=# SELECT COUNT(*) FROM lineitem; ERROR: canceling statement due TO statement timeout
打赏作者以资鼓励:
![]() | ![]() |