Ghosts of Unix past, part 2: Conflated designs

Posted Nov 11, 2010 23:55 UTC (Thu) by bronson (subscriber, #4806)
Parent article: Ghosts of Unix past, part 2: Conflated designs

Just a quick nod to the unix gurus for NOT conflating fork/exec, something that intuition would suggest should be conflated (if you think fork/exec actually is intuitive, just look at all the OSes that got it wrong).

Not a week goes by that I don't find my life made better by forking, tweaking, then execing. Mad props.

Ghosts of Unix past, part 2: Conflated designs

Posted Nov 17, 2010 12:42 UTC (Wed) by brinkmd (guest, #45122) [Link] (1 responses)

But fork() is very much a conflated system call, so is exec(). Fork duplicates the address space, and the descriptor table, and a bunch of other stuff. exec() loads a binary image into an address space and creates a thread and makes that thread runnable in the address space.

That fork() is conflated is even visible within the limited world view of Linux, see clone().

Ghosts of Unix past, part 2: Conflated designs

Posted Nov 17, 2010 12:52 UTC (Wed) by brinkmd (guest, #45122) [Link]

Sorry, of course exec does not create a thread, that would be spawn(), another conflated call. By the way, fork/exec is an example of bad design, see the interaction of fork with pthreads, or the problems of open file descriptors being inherited unwillingly (FD_CLOEXEC). People who write portable software learn to forget about fork and exec very quickly.

Ghosts of Unix past, part 2: Conflated designs

Posted Jan 4, 2011 22:27 UTC (Tue) by lwn555 (guest, #72175) [Link] (8 responses)

Admittedly, the fork design pattern is appealing for achieving parallelism in a simple way. However it has many shortcomings which are not immediately obvious.

The fork design pattern is terribly inefficient on systems without virtual memory hardware. Even on those with a VMM unit, copying all the page tables just to perform a simple task is often needlessly expensive. This encourages applications to manage process pools which defeats the simplicity of using fork for simple tasks.

As mentioned by another poster, unfortunately unix file descriptors default to inheritable, which is the opposite of what is desired. In just about 100% of cases, the code doing the fork knows exactly which file handles it wants to pass into a child, yet this code knows nothing about the file descriptors opened in 3rd party libraries. In fact, even if the third party code sets CLOEXEC correctly for itself, a process wishing to spawn multiple children has no way to set the flags correctly for all children. This problem is amplified for multithreaded programs, which can be cloned with file handles and mutexes in invalid states, necessitating the kludge which is pthread_atfork.

This is exactly the reason it's common for security minded linux apps to cycle through closing the first 1024 file descripters immediately before calling exec. This is the only way to be reasonably confident (but not 100%) that handles are not inadvertently leaked to children.

In order to be efficient, the operating system must over commit resources to accommodate all processes using fork. Consider a web browser session occupying some 100MB of ram. Suppose it forks children to do parallel processing, such as downloading files. Now, the main browser continues to fetch new pages and media, which fits into the same 100MB of ram, however the existence of forked children means the kernel cannot free the old unused 100MB of ram since it belongs to a child.

Fork just gets more problematic as the parent processes get larger.
In principal it's not unreasonable for a 1.5GB database process to spawn a 5MB job, yet the fork implies over-committing 1.5GB of ram to this single child at least temporarily. In practice, over-committing can lead to insufficient memory conditions, which is why kernel developers invented the dreaded "Out of memory process killer" to kill otherwise well behaved processes under linux.

Consider that without fork, the fundamental need to overcommit disappears.

Combine all this with the fact that fork isn't very portable, one must come to the conclusion that fork should generally be avoided in large scale projects. Or, if it is used, the parent's role should be limited to forking and monitoring children. This largely precludes the benefits of the fork programming pattern in the first place.

Ghosts of Unix past, part 2: Conflated designs

Posted Jan 4, 2011 23:28 UTC (Tue) by dlang (guest, #313) [Link] (7 responses)

what you are missing is that linux has for years not actually allocated all that extra ram for a fork, instead it has marked the ram as being shared, but copy-on-write (COW), so that if the memory is not written to, it is never duplicated.

there is some overhead in changing the page tables, but it's pretty low.

Ghosts of Unix past, part 2: Conflated designs

Posted Jan 5, 2011 4:40 UTC (Wed) by khc (guest, #45209) [Link]

I don't think he missed it, isn't that all the overcommit stuff he was talking about?

Ghosts of Unix past, part 2: Conflated designs

Posted Jan 5, 2011 5:04 UTC (Wed) by lwn555 (guest, #72175) [Link]

"what you are missing is that linux has for years not actually allocated all that extra ram for a fork, instead it has marked the ram as being shared, but copy-on-write (COW)"

I don't believe I've said anything to contradict this.
On systems with a MMU, fork copies the page tables and not the pages themselves such that the new processes share physical ram until they are written to.

" so that if the memory is not written to, it is never duplicated."

Whether you've realized it or not, the problem of over-committed memory remains present. At the time the kernel receives the "fork()" syscall from a large process (imagine 1.5GB working set) which uses more ram than is available to the child, it has to choose between two bad choices:
1. Either deny the request up front due to low memory constraints.
or
2. over-commit memory in a gamble that neither the parent nor the child will change too many pages.

Both answers are seriously flawed. I gave two examples of applications which demonstrate either the inefficiency of fork(), or the risky over commit behavior.

Most administrators will agree that the "OOM Killer" has no place in stable production environments. The only way to guaranty well behaved processes are not killed is for the kernel to guaranty resources by not over-committing them. This spells trouble for interfaces like fork(), which depend on over-committed memory to work efficiently.

Without over-committed memory, a large process would find itself unable to issue fork/exec calls to spawn a small process.

If the parent is a tiny daemon who's only purpose is to spawn children, this isn't such a big deal. However, it is a disappointment that the fork syscall is either very risky, or a resource hog when called from large parents.

Even if fork had no other problems, this is an excellent reason to seek alternatives.

Ghosts of Unix past, part 2: Conflated designs

Posted Jan 5, 2011 6:32 UTC (Wed) by lwn555 (guest, #72175) [Link] (4 responses)

Obviously the following link is for Solaris rather than linux, but it provides additional insight into the problems of forking which I've attempted to explain.

http://developers.sun.com/solaris/articles/subprocess/sub...

Ghosts of Unix past, part 2: Conflated designs

Posted Jan 7, 2011 0:32 UTC (Fri) by bronson (subscriber, #4806) [Link] (3 responses)

From the paper:

> Even though fork() has been improved over the years to use the COW (copy-on-write) semantics

If the years the author is referring to is the 70s, then sure! Otherwise, the paper appears to be little more than an indictment of a poor implementation of fork.

Ghosts of Unix past, part 2: Conflated designs

Posted Jan 7, 2011 23:53 UTC (Fri) by lwn555 (guest, #72175) [Link] (2 responses)

I would not know when COW fork was implemented in various kernels.
Presumably not long after MMU hardware became available.

Still, a 1GB process needs 244,140 * 4KB page entries to be copied for the child. That's a lot of baggage if the child's sole purpose is to call exec(). Better to use vfork/exec when possible.

I'd like to be clear that the over commit issues with fork() are not an implementation problem but are a fundamental consequence of what fork does.

If the parent has a working data set of 100MB, and the child only needs 5MB from the parent, fork() still marks the remaining 95MB as needed by the child.

Assume the parent modifies it's entire 100MB working set while the child continues running with it's 5MB working set, then eventually both processes will consume 200MB instead of the 105MB which is technically needed.

So, regardless of the fork implementation, 95MB out of 200MB is wasted. As the parent spawns more children over time, the % wasted only gets worse.

Of course there are workarounds, but they come at the expense of forgoing the semantics which make fork appealing in the first place: inheriting context and data structures from the parent without IPC.

Ghosts of Unix past, part 2: Conflated designs

Posted Jan 8, 2011 0:12 UTC (Sat) by dlang (guest, #313) [Link] (1 responses)

if the child really only needs the 5MB, it can free the rest of the allocations and you are back to the 105MB total.

if the programmer isn't sure if the child needs 5MB of data of the entire 100MB of data then they would need to keep everything around in any case.

the worst-case of COW is that you use as much memory as you would without it. In practice this has been shown empirically to be a very large savings. some people are paranoid about this and turn off overcommit so that even in this worst case they would have the memory, but even they benefit from the increased speed, and from the fact that almost all the time the memory isn't needed.

so I disagree with your conclusion that there is so much memory wasted.

Ghosts of Unix past, part 2: Conflated designs

Posted Jan 8, 2011 9:12 UTC (Sat) by lwn555 (guest, #72175) [Link]

"if the child really only needs the 5MB, it can free the rest of the allocations and you are back to the 105MB total."

Easily said. While it's technically possible to free all unused memory pages after a fork, it's unusual to actually do this. The piece of code calling fork() may not really be aware or related to the memory allocated by the rest of the process.

Consider how difficult it would be for one library to deallocate the structures of other libraries after performing a fork.

Even if we did track all objects to free after forking, malloc may or may not actually be able to free the pages back to the system, particularly with pages allocated linearly via sbrk() since objects needed by the child are likely to be near the end.

"the worst-case of COW is that you use as much memory as you would without it."

We can agree there are no reasons not to use copy on write to implement fork.

"so I disagree with your conclusion that there is so much memory wasted."

Then I think you misunderstood the example. No matter which way you cut it, so long as the child doesn't do anything to explicitly free unused pages, it is stuck with 95MB of unusable ram. If the parent updates it's entire working set, then the child will be the sole owner of the data. If the parent quits and the child is allowed to continue, then the useless 95MB is still there. And this is only for one child.

You may feel this is a contrived example, but I can think of many instances where it would be desirable for a large parent to branch work into child processes such that this is a problem.

Fork works great in academic examples and programs where the parent is small, doesn't touch it's data, or the children are short lived. But there are applications where the fork paradigm in and of itself leads to excessive memory consumption.