AsyncConnectionPool::CleanUpTimer segfaults #13

y3llowcake · 2021-06-02T20:23:09Z

We (slack) have been seeing a very slow trickle of segfaults from code paths in the cleanup timer in our production environment. This issue is not new, it's been occurring for a while. I do not yet have a repro for these segfaults.

Relevant version of squangle we are running:

┌─(cy@zebu:~/sl/hhvm/third-party/squangle/src)
└─(29)% git log -n 1 --pretty=short  
commit 9b3d6adf34d4f1ec1c1713a54b9def947384b17b (HEAD)
Author: Jay Edgar <[email protected]>

    Update state_ inside mutex

There are two unique stack traces we see. The first is more frequent and appears to occur on a call to std::unordered_map::erase():

0:string"raise at ../sysdeps/unix/sysv/linux/raise.c:51"
1:string"HPHP::bt_handler at /build/hhvm/hphp/runtime/base/crash-reporter.cpp:270"
2:string"std::_Hashtable<facebook::common::mysql_client::PoolKey, at /usr/include/c  /7/bits/hashtable.h:1627"
3:string"std::_Hashtable<facebook::common::mysql_client::PoolKey, at /usr/include/c  /7/bits/hashtable.h:1864"
4:string"std::_Hashtable<facebook::common::mysql_client::PoolKey, at /usr/include/c  /7/bits/hashtable.h:755"
5:string"std::unordered_map<facebook::common::mysql_client::PoolKey, at /usr/include/c  /7/bits/unordered_map.h:797"
6:string"facebook::common::mysql_client::AsyncConnectionPool::ConnStorage::cleanupConnections at /build/hhvm/third-party/squangle/squangle/mysql_client/AsyncConnectionPool.cpp:619"
7:string"facebook::common::mysql_client::AsyncConnectionPool::CleanUpTimer::timeoutExpired at /build/hhvm/third-party/squangle/squangle/mysql_client/AsyncConnectionPool.cpp:519"

The second one looks like use of an invalid map iterator ('ref_iter->second' where ref_iter is probably pointing to end()?) :

0:string"raise at ../sysdeps/unix/sysv/linux/raise.c:51"
1:string"HPHP::bt_handler at /build/hhvm/hphp/runtime/base/crash-reporter.cpp:270"
2:string"facebook::common::mysql_client::AsyncMysqlClient::activeConnectionRemoved at /build/hhvm/third-party/squangle/src/squangle/mysql_client/AsyncMysqlClient.h:357"
3:string"facebook::common::mysql_client::MysqlConnectionHolder::~MysqlConnectionHolder at /build/hhvm/third-party/squangle/squangle/mysql_client/Connection.cpp:63"
4:string"facebook::common::mysql_client::MysqlConnectionHolder::~MysqlConnectionHolder at /build/hhvm/third-party/squangle/squangle/mysql_client/Connection.cpp:64"
5:string"std::default_delete<facebook::common::mysql_client::MysqlPooledHolder>::operator() at /usr/include/c  /7/bits/unique_ptr.h:78"
6:string"std::unique_ptr<facebook::common::mysql_client::MysqlPooledHolder, at /usr/include/c  /7/bits/unique_ptr.h:263"
7:string"__gnu_cxx::new_allocator<std::_List_node<std::unique_ptr<facebook::common::mysql_client::MysqlPooledHolder, at /usr/include/c  /7/ext/new_allocator.h:140"
8:string"std::allocator_traits<std::allocator<std::_List_node<std::unique_ptr<facebook::common::mysql_client::MysqlPooledHolder, at /usr/include/c  /7/bits/alloc_traits.h:487"
9:string"std::__cxx11::list<std::unique_ptr<facebook::common::mysql_client::MysqlPooledHolder, at /usr/include/c  /7/bits/stl_list.h:1815"
10:string"facebook::common::mysql_client::AsyncConnectionPool::ConnStorage::cleanupConnections at /build/hhvm/third-party/squangle/squangle/mysql_client/AsyncConnectionPool.cpp:613"
11:string"facebook::common::mysql_client::AsyncConnectionPool::CleanUpTimer::timeoutExpired at /build/hhvm/third-party/squangle/squangle/mysql_client/AsyncConnectionPool.cpp:519"
12:string"folly::AsyncTimeout::libeventCallback at /build/hhvm/third-party/folly/src/folly/io/async/AsyncTimeout.cpp:171"
13:string"folly::EventBase::loopBody at /build/hhvm/third-party/folly/src/folly/io/async/EventBase.cpp:394"
14:string"folly::EventBase::loop at /build/hhvm/third-party/folly/src/folly/io/async/EventBase.cpp:312"
15:string"folly::EventBase::loopForever at /build/hhvm/third-party/folly/src/folly/io/async/EventBase.cpp:535"
16:string"facebook::common::mysql_client::AsyncMysqlClient::<lambda()>::operator() at /build/hhvm/third-party/squangle/squangle/mysql_client/AsyncMysqlClient.cpp:80"
17:string"std::__invoke_impl<void, at /usr/include/c  /7/bits/invoke.h:60"
18:string"std::__invoke<facebook::common::mysql_client::AsyncMysqlClient::init()::<lambda()> at /usr/include/c  /7/bits/invoke.h:95"
19:string"std::thread::_Invoker<std::tuple<facebook::common::mysql_client::AsyncMysqlClient::init()::<lambda()> at /usr/include/c  /7/thread:234"
20:string"std::thread::_Invoker<std::tuple<facebook::common::mysql_client::AsyncMysqlClient::init()::<lambda()> at /usr/include/c  /7/thread:243"
21:string"std::thread::_State_impl<std::thread::_Invoker<std::tuple<facebook::common::mysql_client::AsyncMysqlClient::init()::<lambda()> at /usr/include/c  /7/thread:186"
22:string"start_thread at pthread_create.c:463"
23:string"clone at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95"

Additionally noteworthy is our non standard hacklang pool configuration:

new AsyncMysqlConnectionPool(darray[
	'per_key_connection_limit' => 20,
	'idle_timeout_micros' => 4000000,
	'expiration_policy' => "IdleTime",
]);

The text was updated successfully, but these errors were encountered:

jupyung · 2021-06-03T21:19:47Z

Thanks for reporting the issue. Our team started to look at this issue. We will get back to you soon.

y3llowcake · 2021-09-10T16:51:10Z

Curious if you have any intuition about what might be causing this. Based on other crashes, we think we might be victims to subtle memory corruption bugs in HHVM, but given the frequency and predictability of these stack traces I am more inclined to think the bug is in squangle.

y3llowcake · 2022-06-01T19:26:19Z

I am curious if the following commit is related to this issue: 9737cfd

jupyung · 2022-06-01T19:38:24Z

Yes, the commit you mentioned was meant to fix rare segfault happening in connection cleanup, which looks related to this issue.

fredemmott · 2022-09-13T19:01:37Z

Yep, cherry-picking that commit fixed this issue for Slack :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AsyncConnectionPool::CleanUpTimer segfaults #13

AsyncConnectionPool::CleanUpTimer segfaults #13

y3llowcake commented Jun 2, 2021

jupyung commented Jun 3, 2021

y3llowcake commented Sep 10, 2021

y3llowcake commented Jun 1, 2022

jupyung commented Jun 1, 2022

fredemmott commented Sep 13, 2022

AsyncConnectionPool::CleanUpTimer segfaults #13

AsyncConnectionPool::CleanUpTimer segfaults #13

Comments

y3llowcake commented Jun 2, 2021

jupyung commented Jun 3, 2021

y3llowcake commented Sep 10, 2021

y3llowcake commented Jun 1, 2022

jupyung commented Jun 1, 2022

fredemmott commented Sep 13, 2022