Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests fail on 1-core VM #37276

Closed
bmwiedemann opened this issue Aug 21, 2023 · 9 comments · Fixed by #37327
Closed

Tests fail on 1-core VM #37276

bmwiedemann opened this issue Aug 21, 2023 · 9 comments · Fixed by #37327

Comments

@bmwiedemann
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

While working on reproducible builds for openSUSE, I found that
our apache-arrow-12.0.1 package failed tests when run on a 1-core VM:

Here is an extract from the log:

 /home/abuild/rpmbuild/BUILD/arrow-apache-arrow-12.0.1/cpp/src/arrow/csv/reader_test.cc:190: Failure
 Expected equality of these values:
   -1
   row.number
     Which is: 961
 /home/abuild/rpmbuild/BUILD/arrow-apache-arrow-12.0.1/cpp/src/arrow/csv/reader_test.cc:190: Failure
 Expected equality of these values:
   -1
   row.number
     Which is: 981
 /home/abuild/rpmbuild/BUILD/arrow-apache-arrow-12.0.1/cpp/src/arrow/csv/reader_test.cc:190: Failure
 Expected equality of these values:
   -1
   row.number
     Which is: 1001
 [  FAILED  ] StreamingReaderTests.InvalidRowsSkippedAsync (2 ms)
 [ RUN      ] StreamingReaderTests.BytesRead
 [       OK ] StreamingReaderTests.BytesRead (0 ms)

 [==========] 253 tests from 44 test suites ran. (978 ms total)
 [  PASSED  ] 252 tests.  
 [  FAILED  ] 1 test, listed below:
 [  FAILED  ] StreamingReaderTests.InvalidRowsSkippedAsync

  1 FAILED TEST
 ~/rpmbuild/BUILD/arrow-apache-arrow-12.0.1/cpp/build/src/arrow/csv

       Start 39: arrow-acero-plan-test
 39/78 Test #39: arrow-acero-plan-test ....................***Failed    1.78 sec
 Running arrow-acero-plan-test, redirecting output into /home/abuild/rpmbuild/BUILD/arrow-apache-arrow-12.0.1/cpp/build/build/test-logs/arrow-acero-plan-test.txt (attempt 1/1)
 Running main() from /home/abuild/rpmbuild/BUILD/googletest-release-1.12.1/googletest/src/gtest_main.cc



 [ RUN      ] ExecPlanExecution.SegmentedAggregationWithMultiThreading
 /home/abuild/rpmbuild/BUILD/arrow-apache-arrow-12.0.1/cpp/src/arrow/acero/plan_test.cc:1615: Failure
 Value of: _st.IsNotImplemented()
   Actual: false
 Expected: true
 Expected "DeclarationToExecBatches(std::move(plan))" to fail with NotImplemented, but got Invalid: Group-by aggregation with field "i32" as both key and segment key
 /home/abuild/rpmbuild/BUILD/arrow-apache-arrow-12.0.1/cpp/src/arrow/acero/plan_test.cc:1615: Failure
 Value of: _st.ToString() 
 Expected: has substring "multi-threaded"
   Actual: "Invalid: Group-by aggregation with field "i32" as both key and segment key"
 [  FAILED  ] ExecPlanExecution.SegmentedAggregationWithMultiThreading (0 ms)


 WARNING: Logging before InitGoogleLogging() is written to STDERR
 W20390831 20:38:32.254076 25012 source_node.cc:76] An input buffer was poorly aligned.  This could lead to crashes or poor performance on some hardware.  Please ensure that all Acero sources generate aligned buffers, or change the unaligned buffer handling configuration to silence this warning.
 [       OK ] ExecPlanExecution.UnalignedInput (0 ms)
 [----------] 50 tests from ExecPlanExecution (1584 ms total)

 [----------] Global test environment tear-down
 [==========] 60 tests from 4 test suites ran. (1716 ms total)
 [  PASSED  ] 59 tests.   
 [  FAILED  ] 1 test, listed below:
 [  FAILED  ] ExecPlanExecution.SegmentedAggregationWithMultiThreading

  1 FAILED TEST
 ~/rpmbuild/BUILD/arrow-apache-arrow-12.0.1/cpp/build/src/arrow/acero

Steps to reproduce on Debian or openSUSE:

osc co openSUSE:Factory/apache-arrow && cd $_
osc build --vm-type=kvm --noservice -j1 standard

Component(s)

C++, Other

@pitrou
Copy link
Member

pitrou commented Aug 22, 2023

cc @westonpace

@pitrou
Copy link
Member

pitrou commented Aug 22, 2023

It seems OMP_THREAD_LIMIT=1 taskset -c 0 <test command> should be enough to reproduce locally without relying on the OpenSUSE build infrastructure.

@pitrou
Copy link
Member

pitrou commented Aug 22, 2023

Also @bmwiedemann could you give a link to the full test logs?

@bmwiedemann
Copy link
Author

full build+test log

@pitrou
Copy link
Member

pitrou commented Aug 23, 2023

Ok, so these are the same errors that can be reproduced locally as described above.

@js8544
Copy link
Collaborator

js8544 commented Aug 23, 2023

Both tests assume a multi-threaded environment without checking the actual threading capacity. Given that they intend to check the behavior of multi-threaded execution, I think they should be skipped if there is only one thread available:

if (internal::GetCpuThreadPool()->GetCapacity() < 2) {
  GTEST_SKIP() << "Test requires at least 2 threads";
}

The above patch works on my machine. Should I submit a PR for this? @pitrou

@bmwiedemann
Copy link
Author

In principle you can run multiple threads on a single CPU through the kernel scheduling one part after another. But that might already be covered in a different test, then skipping should be fine.

@js8544
Copy link
Collaborator

js8544 commented Aug 23, 2023

In principle you can run multiple threads on a single CPU through the kernel scheduling one part after another. But that might already be covered in a different test, then skipping should be fine.

True. These two tests use the default capacity which is infered with this logic. We can also force them to use a custom threadpool with multiple threads, which will need some refactoring. I"m not sure if it"s worth it though.

@pitrou
Copy link
Member

pitrou commented Aug 23, 2023

We can also force them to use a custom threadpool with multiple threads, which will need some refactoring. I"m not sure if it"s worth it though.

Probably not. It"s interesting that no public CI platform seems to provide single-CPU VMs then.

pitrou pushed a commit that referenced this issue Aug 23, 2023
### Rationale for this change

Some tests assume a multi-threaded environment without checking the actual threading capacity. Given that they intend to check the behavior of multi-threaded execution, they should be skipped if there is only one thread available.

### What changes are included in this PR?

`SegmentedAggregationWithMultiThreading` and `InvalidRowsSkippedAsync` are skipped if there is only one thread available in the default ThreadPool.

### Are these changes tested?

Tested on my local machine. Unfortunately there is no single core CI pipeline avaible.

### Are there any user-facing changes?
 No.

* Closes: #37276

Authored-by: Jin Shang <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
@pitrou pitrou added this to the 14.0.0 milestone Aug 23, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…ache#37327)

### Rationale for this change

Some tests assume a multi-threaded environment without checking the actual threading capacity. Given that they intend to check the behavior of multi-threaded execution, they should be skipped if there is only one thread available.

### What changes are included in this PR?

`SegmentedAggregationWithMultiThreading` and `InvalidRowsSkippedAsync` are skipped if there is only one thread available in the default ThreadPool.

### Are these changes tested?

Tested on my local machine. Unfortunately there is no single core CI pipeline avaible.

### Are there any user-facing changes?
 No.

* Closes: apache#37276

Authored-by: Jin Shang <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants