You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to use version 6.0.2 as we used the 5.3.7 without connection problems.
Actual behaviour
After about 10 hours of operation, we start getting these kind of errors when hundreds of writes are done in a foreach loop on a single backend server:
Redis server went away
Redis::exec(): Send of 2504 bytes failed with errno=32 Broken pipe
Redis::hMset(): Send of 1431 bytes failed with errno=32 Broken pipe
...
We run about 30k jobs by day with Laravel Horizon, so we had hundreds of thousands exceptions with 6.0.2 in just a few days.
Found workaround
Restarting our web servers resolves temporarily the problem.
We rolled back on Scalingo PaaS to phpredis 5.3.7 and everything is fine.
Current investigations
There is no error on the Redis server logs
We have checked the count of connections from our backend to Redis cluster and we stay under 100, including cluster connections, which is a far cry from the default 10k connections.
The Redis servers metrics are ok, there is no memory leak, no CPU peak, ...
This is not a network problem in the datacenter given that it works fine with 5.3.7
I'm seeing this behaviour on
Infrastructure: Scalingo PaaS
OS: Linux app-sirenergies-one-off-8185 5.4.0-121-generic # 137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
I have no idea on how to reproduce this behaviour because it's not systematic. For example, when we send SMS to our customers at 8am and reboot the servers 1h before at 7am, we don't have any problems. On the other hand, when we only rebooted them the day before at 8pm, they crash at 8am the next day, even though it's the same code, the same volume, ...
phpinfo on 5.3.7
phpinfo on 6.0.2
The Redis sentinel version hase changed from 0.1 to 1.0
redis.session.early_refresh is new on 6.0.2
The default value of redis.session.lock_retries has changed from 10 to 100
The default value of redis.session.lock_wait_time has changed from 2000 to 20000
I've checked
There is no similar issue from other users
Issue isn't fixed in develop branch
The text was updated successfully, but these errors were encountered:
The broken pipe (errno 32) error often happens when sending huge payloads, such as massive multi-exec or pipeline blocks, which hit kernel limits.
However, your situation appears slightly different. The trick is going to be reproducing the behavior.
Laravel horizon are long-running job queues, right? Maybe we can simulate similar activity with runners that execute the same Redis commands and same general volume? Perhaps PhpRedis 6.0.2 has a bug where we continue to try and use a socket even after it's failed.
Another option would be to run one of your jobs under something like rr. If you replicate the problem that way it would almost certainly identify exactly what's going wrong. The debugging would need to be on your side though because it would record everything including all of the payloads to and from Redis.
yep Laravel Horizon is a long-running job queue tool like Sidekiq for example. The possible problem with a dead socket is an idea but how to test this? Complicated if think.
Using rr is a very good idea. We need to check this with the Scalingo team:
what kind ov VM do they use because it seems to me that rr did not work on all VM hosts
how to get debug logs because given that we run on ephemeral servers, the local disk is not accessible
Expected behaviour
We would like to use version 6.0.2 as we used the 5.3.7 without connection problems.
Actual behaviour
After about 10 hours of operation, we start getting these kind of errors when hundreds of writes are done in a
foreach
loop on a single backend server:We run about 30k jobs by day with Laravel Horizon, so we had hundreds of thousands exceptions with 6.0.2 in just a few days.
Found workaround
Current investigations
I'm seeing this behaviour on
Steps to reproduce, backtrace or example script
I have no idea on how to reproduce this behaviour because it's not systematic. For example, when we send SMS to our customers at 8am and reboot the servers 1h before at 7am, we don't have any problems. On the other hand, when we only rebooted them the day before at 8pm, they crash at 8am the next day, even though it's the same code, the same volume, ...
phpinfo on 5.3.7
phpinfo on 6.0.2
redis.session.early_refresh
is new on 6.0.2redis.session.lock_retries
has changed from10
to100
redis.session.lock_wait_time
has changed from2000
to20000
I've checked
develop
branchThe text was updated successfully, but these errors were encountered: