-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bandwidth limit overshoot after micro outages #1747
Comments
Here's a more degenerative example. A 5Mb/s iperf throttle over a network capable of 20Mb/s with a 16 seconds outage in the middle of the test, which causes iperf to saturate the network for another 10 seconds afterwards.
|
Seem to be a real issue when Cellular/RF networks are used (e.g. a car going into a tunnel for several seconds). I tried to think about what may be such algorithm and came up with the following options:
The question is which of the above (or other) options is better for the use case? |
I think I'm overthinking this, but also consider the following:
I had a quick look at common algorithms - most of these are designed for the the two control systems in the other order (traffic generator first, flowing into some type of network throttle afterwards, managing a queue and controlling the output of the queue - eg token bucket, leaky bucket). The algorithm for the exit point of the queue is the part we're interested in, which seem to boil down to some controlled release over a time quanta, which is the options you've got above anyway. My complex and slow way of getting there. Another thought - what about quantisation? Do we slam the link at 100% until our 1-second amount has been completed, then stop dead until the next second, or something more fine-grained and smoother pacing, or doesn't it matter? A funny observation while reading up on shaping algorithms - there's a handy tool available called iperf to generate traffic to test your algorithm :) |
"quantisation" is already how iperf3 works, using the
It is a good idea to look at these algorithms. I didn't do it before. Reading about the shaping algorithms, they seem to be too complex for iperf3. In addition, your funny observation (which I agree is funny) leads me to believe that actually there is no need for such complex algorithms in iperf3, as it is a tool used for loading the network for such algorithms. What I think may be good enough and easy enough to be implemented in iperf3 is:
|
Option 2 sounds OK. Pretty simple, and it's similar to what's already there, but with limited view into history rather than back to the start. This would mean you could still get small bursts, but at least it's limited to a fraction of a second. Option 2 with a smaller time quanta becomes an implementation of Option 1. If increments of the pacing timer was used instead of the reporting timer, then the user has full control, but also more confusing for a user to think about and calculate. Where the average is calculated over multiple intervals, it becomes a moving average which will have a smoothing effect. A broader question - Is the current -b behaviour an expectation for users that is actively exploited as a feature, or is it considered a bug, or are historical behaviours left as they are to minimise change? |
A more disruptive thought - the burst function looks like a token bucket algorithm already, but in the code it looks separately implemented to the throttle (at a quick glance). In theory they could be unified into the one simple algorithm that does both - allocate tokens at a constant rate based on the target throughput rate, collect up to "burst" number of tokens in reserve. Less code, less logic, unified concept, same interface. Disclaimer - I'm talking more than reading code.. |
This situation is somewhat unusual in the environments for which iperf3 was originally intended (high-speed R&E networks, which tend to have both high bandwidth and high reliability). It's definitely counterintutive. Basically, iperf3 doesn't really know when the network is unavailable or when it's in "catch up" mode with respect to its software pacing. If you really want to cap the sending rate of the connection so that the sender never, under any circumstances, exceeds some bitrate you specify, then (at least under Linux) you can try using the EDIT: I'm going to advise against trying to add better pacing mechanisms within iperf3. Really the main use case iperf3 is to check end-to-end network and application performance on high-speed R&E networks. In this type of scenario the application-level pacing isn't very useful, and even small code changes can affect the ability of iperf3 to test on high-bandwidth paths (100 Gbps). |
@jgc234, I created a branch in my private iperf3 clone with a suggested fix to the problem you raised: limit the maximum sending bitrate. This is not exactly the approach discussed above, as the average bitrate is still calculated over the full test period. However, it does not allow the sending rate to be over a defined maximum. This change is much simpler than calculating the average bitrate only over the last intervals (would require to keep the last intervals data) and also does not impact performance when the maximum rate is not set. Is such change good enough? Will you be able to test this version to see that it works o.k. in you environment? To test, the new 3rd value of the |
Context
Bug Report
The --bitrate option is misleading, or not documented well, depends how you want to classify it. Instead of being maximum target bitrate, it is a long-term averaging target - it can overdrive without a limit until the long-term average has settled. The comment on optional burst rate ("can temporarily exceed the specified bandwidth limit") implies the non-burst version does not temporarily exceed the intended bandwidth.
Actual Behavior
If you have small outages on a network (eg 10 seconds), the bitrate throttle will attempt to catch up on the lost traffic by behaving if no throttle limit exists, driving the traffic as fast as it possibly can until the average bitrate since the start of the matches the long-term target bitrate. This seems to make sense looking at iperf_check_throttle, which calculates the average since the start time.
This doesn't look too exciting on a LAN or high-speed network (maybe a second or so at maximum), but on a slower WAN it may saturate the link for many minutes trying make up for the lost data.
On a LAN, the overshoot looks like a quantisation error - just filling up the congestion window for a short blip.
Unfortunately I only have an example on a LAN for the moment. I can generate a WAN looking example if required.
Steps to Reproduce
Possible Solution
The text was updated successfully, but these errors were encountered: