Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests hang from dotnet test #2080

Closed
JamesNK opened this issue Jun 30, 2019 · 43 comments
Closed

Tests hang from dotnet test #2080

JamesNK opened this issue Jun 30, 2019 · 43 comments
Assignees
Labels

Comments

@JamesNK
Copy link
Member

JamesNK commented Jun 30, 2019

Steps to reproduce

Tests run on travis CI via dotnet test now intermittently fail. Failures started after updating to a newer .NET Core SDK. It appears that the test tool increased from 16.0.1 to 16.1.1 with the new SDK.

Source code: https://github.com/grpc/grpc-dotnet/commits/master

Expected behavior

Tests run and exit

Actual behavior

Tests hang and the build is terminated

Diagnostic logs

Failure: https://travis-ci.org/grpc/grpc-dotnet/builds/551562232?utm_source=github_status&utm_medium=notification
Microsoft (R) Test Execution Command Line Tool Version 16.1.1

Success: https://travis-ci.org/grpc/grpc-dotnet/builds/551058627?utm_source=github_status&utm_medium=notification
Microsoft (R) Test Execution Command Line Tool Version 16.0.1

@mayankbansal018
Copy link
Contributor

@JamesNK , if this is an intermediate issue, can you please enable diagnostics logs for testplatform & share those with us https://github.com/Microsoft/vstest-docs/blob/master/docs/diagnose.md

@JamesNK
Copy link
Member Author

JamesNK commented Jul 2, 2019

I can't easily get logs off the build server but I can repo freezing on my dev machine with this. It freezes all the time within 5 minutes:

while($true) { dotnet test --diag:log.txt }

testlogs.zip

@JamesNK
Copy link
Member Author

JamesNK commented Jul 4, 2019

This is an ongoing problem. I'd like to make some progress on fixing it.

Do you need anymore information from me?

@mayankbansal018
Copy link
Contributor

@JamesNK I went through the logs & from the logs I didn't see any hang state. The only interesting thing I observed was that it seemed we start 10 different testhost processes in sequence, but you have shared logs for 30 testhost process. Can you please share how many test dlls are you running ?

@JamesNK
Copy link
Member Author

JamesNK commented Jul 6, 2019

Because I'm doing it in a loop: while($true) { dotnet test --diag:log.txt }

In the example of the logs I attached it freezes after 3 runs.

Have you tried reproing it?

@mayankbansal018
Copy link
Contributor

I'm looking into it, I've cloned the repo, & all I need to do is run dotnet on sln right?

@mayankbansal018
Copy link
Contributor

I tried it locally multiple times, but it did not repro for me.

@tasadar2
Copy link

We are currently experiencing the same issue. We have been using mcr.microsoft.com/dotnet/core/sdk:2.2.204 to avoid the performance degradation issue which is now resolved. But when attempting the latest mcr.microsoft.com/dotnet/core/sdk, currently b4c25c26dc73f498073fcdb4aefe167793eb3a8c79effa76df768006b5c345b8, only a couple test runs finish while the rest seem to hang.
As with the performance issue, it seems to be related to non-interactive hosts.

Situation

  • Our solution contains 10 test projects, each one has <IsTestProject>true</IsTestProject> set, and we are running dotnet test against the solution in a CircleCI environment.
  • When each test is run individually, everything works as intended.
  • When run locally, i see only 4 tests run at any time. And the local system only has 4 cpu cores.

Replication
I condensed our project to share an example. Each test project has a single test that sleeps for 5sec.
Repo: https://github.com/tasadar2/vstest-issue-2080
CircleCI: https://circleci.com/gh/tasadar2/vstest-issue-2080/3

It doesn't always replicate the issue, but often does.

@JamesNK
Copy link
Member Author

JamesNK commented Jul 13, 2019

Running test projects individually has fixed this problem for us - grpc/grpc-dotnet@152255e

@singhsarab
Copy link
Contributor

@tasadar2 @JamesNK sdk:2.2.301 has a fix, can you please check if you are still hitting the issue ?

@tasadar2
Copy link

Still appears to be an issue.
image: mcr.microsoft.com/dotnet/core/sdk@sha256:a50e175acd618c3e90bc91dceb5194e6c3764c5b4d179390cef874a887476ba9
example: https://circleci.com/gh/tasadar2/vstest-issue-2080/7

@JamesNK
Copy link
Member Author

JamesNK commented Jul 30, 2019

I've narrowed down my hanging issue. It is caused by something with how vstest writes to the console. If no tests fail then vstest completes without any problems. If a test fails then vstest hangs until the CI build times out (this is in Travis CI)

The workaround I am using is to write the test output to a text file, and the write the text file to console. If I do that then it never hangs.

@singhsarab
Copy link
Contributor

@JamesNK What is the dotnet sdk version you are using ? Can you share the link to the CI ?

@JamesNK
Copy link
Member Author

JamesNK commented Jul 30, 2019

@tasadar2
Copy link

Looks like this is still an issue on the recent 3.0.
image: mcr.microsoft.com/dotnet/core/sdk@sha256:3afea8958440231a77b3daea267951cc8ba9026fc1015bcbccc206d6f1d031f7
example: https://app.circleci.com/jobs/github/tasadar2/vstest-issue-2080/10

@singhsarab
Copy link
Contributor

@tasadar2 Can you please try to use --logger:console;noprogress=true argument and check the issue reproduces for you ?

@tasadar2
Copy link

That arg produces the same results, https://app.circleci.com/jobs/github/tasadar2/vstest-issue-2080/11

@tasadar2
Copy link

Though, when quoting the argument value, that seems to work --logger:"console;noprogress=true"
https://app.circleci.com/jobs/github/tasadar2/vstest-issue-2080/14

@NicolasDorier
Copy link

NicolasDorier commented Oct 9, 2019

I had the same problem, two workaround worked:

Adding --logger:"console;noprogress=true" like @tasadar2

However, I did not like it because I could not see the progress (--logger:"console" has same issue), so instead added < /dev/null

dotnet test .... < /dev/null

This allow me to see the progress of tests, without hanging.

image

@NicolasDorier
Copy link

NicolasDorier commented Oct 9, 2019

Full logs of a successful build with feedback https://circleci.com/gh/dgarage/NBXplorer/430 before the hack you can see things stalling https://circleci.com/gh/dgarage/NBXplorer/409

Happening on mcr.microsoft.com/dotnet/core/sdk:3.0.100
xUnit.net VSTest Adapter v2.4.1 (64-bit .NET Core 3.0.0)

Sebazzz added a commit to Sebazzz/Return that referenced this issue Oct 19, 2019
enricosada pushed a commit to enricosada/Paket that referenced this issue Nov 27, 2019
@cvpoienaru cvpoienaru modified the milestones: 16.9.0, 16.10 Mar 1, 2021
@dhilst
Copy link

dhilst commented Mar 19, 2021

I had this problem with a small vm (2 cores, 2GB ram), increasing to 8cores and 8gb ram make it work. I read at some place that this was thread related so I came up with the more cores idea. The < /dev/null workaround didn't help in my case.

@diegosasw
Copy link

diegosasw commented Mar 29, 2021

Not sure if it's related. I'm also a GitLab user and found a very similar problem so I suspect it's probably related to the SDK image, but in my case happens with Microsoft (R) Test Execution Command Line Tool Version 16.9.1 and the workaround didn't work for me.
https://forum.gitlab.com/t/dotnet-test-hangs-in-gitlab-gitlab-runner-13-9-0-rc2/50977

I have also checked whether it has anything to do with async calls in case there is a deadlock, but I seem to be awaiting properly everywhere.

@diegosasw
Copy link

diegosasw commented Mar 30, 2021

Out of curiosity, is anybody experiencing this only when using IHostedService? In my case it seems that's what causing the tests to freeze within a linux image and disabling parallelization avoids the deadlock.

@tremblaysimon
Copy link

@diegosasw, for me it's happening when using dotnet test on a solution file even if there is no IHostedService usage.

@victor-fialkin-deltatre
Copy link

victor-fialkin-deltatre commented Dec 9, 2021

This helped us in case of IHostedService:
https://www.strathweb.com/2021/05/the-curious-case-of-asp-net-core-integration-test-deadlock/

@Evangelink Evangelink removed this from the 16.10 milestone Aug 3, 2022
@Evangelink Evangelink added the needs-triage This item should be discussed in the next triage meeting. label Aug 3, 2022
@gdoron
Copy link

gdoron commented Aug 16, 2022

@Evangelink @davidfowl Even on our very powerful Macbook pros, when we are running one of our integration test projects that has hundreds of tests, many of which use WebApplicationFactory we often get into deadlocks.

We set xunit to use unlimited amount of threads (-1), we have overridden a bunch of xunit stuff to implement a semaphore to limit the number of concurrent tests.
And yet, often we get into deadlocks after the ~600 tests completed mark.
After verifying we have no sync-over-async, we found this beauty in asp.net core code itself...
image

https://github.com/dotnet/aspnetcore/blob/main/src/Hosting/TestHost/src/TestServer.cs#L101

Can this be the root cause for all our deadlocks only in Unit tests problems?

@gdoron
Copy link

gdoron commented Aug 16, 2022

Finally! I managed to reproduce it in a small test project, where I can do dotnet-dump and attach to process without everything crashing.

I found this fella in one of the threads' callstack.
image

And going up the callstack we can see it's inside (or trying to get inside and the debugger is misleading) a lock block of NLog.
image

@davidfowl
Copy link

Can you file this issue on dotnet/aspnet

@gdoron
Copy link

gdoron commented Aug 17, 2022

Your request is my command: dotnet/aspnetcore#43353

@JBoothUA
Copy link

just in case this helps someone, at a random time this line started randomly hanging our tests

_task = Task.Factory.StartNew(async () =>
{
await Task.Delay((int)cleanInterval.TotalMilliseconds, _cancellationTokenSource.Token);
--> while (!_cancellationTokenSource.Token.IsCancellationRequested)

@Evangelink Evangelink removed the needs-triage This item should be discussed in the next triage meeting. label Nov 30, 2022
@Evangelink
Copy link
Member

@cvpoienaru Please investigate this issue.

@Piedone
Copy link

Piedone commented Dec 14, 2022

Did anything happen in the investigation by chance?

@diegosasw
Copy link

I am experiencing the same when using TestServer.
It works well locally, when running in GitLab runner, it hangs.
I've noticed it gets frozen when trying to dispose TestServer because the disposing of IWebHost hangs.
No exceptions thrown.

I've tried to stop all the IHostedService in case that's the problem, but still unable to dispose TestServer to see if that solves the problem.

Could I know how are you troubleshooting this? The only workaround i have is to set

#if DEBUG
[assembly: CollectionBehavior(CollectionBehavior.CollectionPerClass, DisableTestParallelization = false)]
#else
[assembly: CollectionBehavior(CollectionBehavior.CollectionPerClass, DisableTestParallelization = true)]
#endif

in the test assembly, so that it runs sequentially in CI/CD pipeline.

@nohwnd
Copy link
Member

nohwnd commented Jan 23, 2023

@cvpoienaru Please investigate this issue.

@psoladoye
Copy link

This issue is still unresolved.

@nohwnd nohwnd assigned nohwnd and unassigned cvpoienaru Jul 9, 2024
@nohwnd
Copy link
Member

nohwnd commented Jul 11, 2024

This issue is a mix of different problems, some related to other products, but I did not find a clear repro. If someone is still experiencing this problem and has a simple repro, please file a new issue.

@nohwnd nohwnd closed this as completed Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests