Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iperf3 hangs with -R and -Z flags #129

Closed
bmah888 opened this issue Feb 28, 2014 · 14 comments
Closed

iperf3 hangs with -R and -Z flags #129

bmah888 opened this issue Feb 28, 2014 · 14 comments
Assignees

Comments

@bmah888
Copy link
Contributor

bmah888 commented Feb 28, 2014

From [email protected] on December 20, 2013 14:51:23

When running the new test script (test_commands.sh), the iperf3 client hangs on 2 of the tests:

./src/iperf3 -c $host -P 2 -t 5 -R
and
./src/iperf3 -c $host -Z -t 5

And when you ^C the client, the server dies.

Original issue: http://code.google.com/p/iperf/issues/detail?id=129

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From [email protected] on December 20, 2013 15:20:49

This happened on OSX, but Linux seems OK.

Labels: Milestone-3.0-Release

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From [email protected] on December 22, 2013 07:09:16

This seems to reliably reproduce the problem on linux:

#!/bin/sh
set -x
while [ 1 ]
do
./src/iperf3 -P 2 -c localhost -t 5
./src/iperf3 -P 2 -c localhost -t 5 -R
done

It works for 3-6 loops, and then locks up. (1 time the server crashed).

Hopefully that will help track it down.

Owner: [email protected]
Labels: -Priority-Medium Priority-High

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From [email protected] on December 24, 2013 08:15:42

Running the server in gdb shows that the server is crashing on this line:

Program received signal SIGSEGV, Segmentation fault.
0x000000305784812c in vfprintf () from /lib64/libc.so.6

Which is called from here:

1808 iprintf(test, report_sum_bw_retrans_format, start_time, end_time, ubuf, nbuf, retransmits, irp->omitted?report_omitted:"");

Maybe Sasant's new patch will fix this?

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From [email protected] on December 24, 2013 09:26:55

I am too able to reproduce this . The reverse -R option server getting crashed

getsockopt(5, SOL_TCP, TCP_INFO, "\1\0\0\0\0\7w\0(\21\3\0@\234\0\0\270\377\0\0\30\2\0\0\0\0\0\0\0\0\0\0"..., [104]) = 0
getsockopt(7, SOL_TCP, TCP_INFO, "\1\0\0\0\0\7w\0(\21\3\0@\234\0\0\270\377\0\0\30\2\0\0\0\0\0\0\0\0\0\0"..., [104]) = 0
write(1, "- - - - - - - - - - - - - - - - "..., 50- - - - - - - - - - - - - - - - - - - - - - - - -
) = 50
write(1, "[  5]   8.02-9.00   sec   382 MB"..., 67[  5]   8.02-9.00   sec   382 MBytes  3.27 Gbits/sec    5         
) = 67
write(1, "[  7]   8.02-9.00   sec   381 MB"..., 67[  7]   8.02-9.00   sec   381 MBytes  3.26 Gbits/sec    0         
) = 67
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x5} ---
    killed by SIGSEGV (core dumped)    
Segmentation fault (core dumped)

(gdb) bt
#0 0x000000399144908f in vfprintf () from /lib64/libc.so.6
#1 0x000000000040542a in vprintf (__arg=0x7fffffffda08,
__fmt=0x4110e0 <report_sum_bw_retrans_format> "\340SUM] %6.2f-%-6.2f sec %ss %ss/sec", ' ' <repeats 14 times>, "%s\n") at /usr/include/bits/stdio.h:38
#2 iprintf (test=test@entry=0x617010, format=0x4110e0 <report_sum_bw_retrans_format> "\340SUM] %6.2f-%-6.2f sec %ss %ss/sec", ' ' <repeats 14 times>, "%s\n")
at iperf_api.c:2405
#3 0x000000000040618b in iperf_print_intermediate (test=test@entry=0x617010) at iperf_api.c:1808
#4 0x0000000000406468 in iperf_reporter_callback (test=0x617010) at iperf_api.c:2008
#5 0x000000000040c9ac in tmr_run (nowP=nowP@entry=0x7fffffffdd10) at timer.c:189
#6 0x0000000000409f43 in iperf_run_server (test=test@entry=0x617010) at iperf_server_api.c:586
#7 0x0000000000401e92 in run (test=0x617010) at main.c:116
#8 main (argc=, argv=0x7fffffffdf68) at main.c:91

gdb) f 0
#0 0x000000399144908f in vfprintf () from /lib64/libc.so.6
(gdb) list
43 __STDIO_INLINE int
44 getchar (void)
45 {
46 return _IO_getc (stdin);
47 }
48
49
50 # ifdef __USE_MISC
51 /* Faster version when locking is not necessary. */
52 __STDIO_INLINE int

Looks like the stack is getting corrupted somewhere which is leading to crash
Need to dig more what is really causing the crash

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From [email protected] on December 24, 2013 11:12:44

I've been doing some digging into this. The hang and the crash might have two different causes, or might be two different manifestations of the same problem. Notes from a private email on this subject, where I was describing what I saw with FreeBSD 10.0 and -R. There's a hang but no crash.


A slightly lower level symptom of this problem is that at the end of the
test, the client tries to send an TEST_END state change message to the
server over the control connection. When in -R mode, the server doesn't
seem to get it or read it reliably. However if I kill the client
(because it seems hung) the server immediately gets the TEST_END and
tries to do the end-of-test processing (it can't do this successfully
because at this point the client has died and closed its side of the
control connection).

In non -R mode this part all works as expected (I see the client send
the TEST_END and the server receives it immediately, as we would expect).

This is all on FreeBSD 10.0, client and server on the same machine (so
far it looks like the configuration where client and server are on the
same machine is particularly vulnerable to this problem).

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From [email protected] on January 03, 2014 10:09:27

Partial fix committed in c499d0008f7d. There was basically a deadlock between the client and server in -R mode, see commit log for more details.

Not closing this yet...need to do some more tests to get a warm fuzzy feeling about the fix first. Also note that this doesn't address the server-side crashes that have been reported (but which I have not personally witnessed).

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From [email protected] on January 03, 2014 10:38:48

Fixed the -P and -R server-side crash reported via Comments 2, 3, and 4, in 423166a54849. This only affected Linux; it was a mangled printf format string that only got used on that platform (it would have been used on any other platform with retransmit statistics, but there aren't currently any).

It's clear to me now that there were multiple issues being reported in this one bug. :-p

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From AaronMatthewBrown on January 03, 2014 10:43:53

If gcc isn't spitting out warnings on format strings as const char variables, it'd probably make sense to turn the format strings into typedefs or something to ensure that gcc spits out a warning if this kind of mismatch happens.

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From [email protected] on January 03, 2014 11:04:17

Good point. I don't see any warning messages for the format string mismatch (on a working copy rolled back to before my fix), but gcc isn't compiling with any warnings enabled either, as far as I can tell:

gcc -DHAVE_CONFIG_H -I. -g -O2 -MT iperf_api.o -MD -MP -MF .deps/iperf_api.Tpo -c -o iperf_api.o iperf_api.c

I'm not sure why this is...I'm used to living under -Wall and -Werror. Yet another thing to investigate.

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From [email protected] on January 03, 2014 14:52:55

Update: Just one sub-issue remaining from this bug report...that's the hang with -Z. I've been able to observe this on Mac OS, as reported in the initial bug report. It doesn't happen every time, at least not on my MacBook; sometimes the -Z test works just fine.

So far I have not been able to reproduce this problem on my other two development platforms (FreeBSD 10 and CentOS 6).

It's not clear to me if there's something platform-specific lurking about or not, although the sendfile(2) call used by the -Z option is slightly different on the three platforms I've been using (therefore there are slightly different codepaths being used).

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From bltierney on January 04, 2014 07:21:54

In my tests, OSX hangs every time. Linux is now working fine.

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From [email protected] on January 21, 2014 13:08:21

Update: I'm still seeing this issue (but not consistently) on MacOS 10.8 and MacOS 10.9.

@bmah888 bmah888 added this to the 3.0 milestone Feb 28, 2014
bmah888 added a commit that referenced this issue May 1, 2014
@bmah888 bmah888 self-assigned this May 12, 2014
@bmah888 bmah888 removed this from the 3.0 milestone Jun 10, 2014
@bmah888
Copy link
Contributor Author

bmah888 commented Jan 2, 2015

Somewhat prompted by issue #231, I retested this (MacOS, -Z flag TCP tests, mainline code) on MacOS 10.10.1. I did twelve 10-second tests and didn't see a single failure. I'm now running a bunch of 5-second tests in a tight loop; haven't seen anything yet. That doesn't mean the bug is gone, although it's doing much better than I've ever remember seeing before.

@bmah888
Copy link
Contributor Author

bmah888 commented Jan 5, 2015

By mutual agreement, @bltierney and I decided we should just close this bug, since it can't be reproduced (see previous comment).

@bmah888 bmah888 closed this as completed Jan 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant