|
|
Subscribe / Log in / New account

Ghosts of Unix past, part 3: Unfixable designs

November 16, 2010

This article was contributed by Neil Brown

In the second installment of this series, we documented two designs that were found to be imperfect and have largely (though not completely) been fixed through ongoing development. Though there was some evidence that the result was not as elegant as we might have achieved had the original mistakes not been made, it appears that the current design is at least adequate and on a path towards being good.

However, there are some designs mistakes that are not so easily corrected. Sometimes a design is of such a character that fixing it is never going to produce something usable. In such cases it can be argued that the best way forward is to stop using the old design and to create something completely different that meets the same need. In this episode we will explore two designs in Unix which have seen multiple attempts at fixes but for which it isn't clear that the result is even heading towards "good". In one case a significant change in approach has produced a design which is both simpler and more functional than the original. In the other case, we are still waiting for a suitable replacement to emerge. After exploring these two "unfixable designs" we will try to address the question of how to distinguish an unfixable design from a poor design which can, as we saw last time, be fixed.

Unix signals

Our first unfixable design involves the delivery of signals to processes. In particular it is the registration of a function as a "signal handler" which gets called asynchronously when the signal is delivered. That this design was in some way broken is clear from the fact that the developers at UCB (The University of California at Berkeley, home of BSD Unix) found the need to introduce the sigvec() system call, along with a few other calls, to allow individual signals to be temporarily blocked. They also changed the semantics of some system calls so that they would restart rather than abort if a signal arrived while the system call was active.

It seems there were two particular problems that these changes tried to address. Firstly there is the question of when to re-arm a signal handler. In the original Unix design a signal handler was one-shot - it would only respond the first time a signal arrived. If you wanted to catch a subsequent signal you would need to make the signal handler explicitly re-enable itself. This can lead to races, such as, if a signal is delivered before the signal handler is re-enabled it can be lost forever. Closing these races involved creating a facility for keeping the signal handler always available, and blocking new deliveries while the signal was being processed.

The other problem involves exactly what to do if a signal arrives while a system call is active. Options include waiting for the system call to complete, aborting it completely, allowing it to return partial results, or allowing it to restart after the signal has been handled. Each of these can be the right answer in different contexts; sigvec() tried to provide more control so the programmer could choose between them.

Even these changes, however, were not enough to make signals really usable, so the developers of System V (at AT&T) found the need for a sigaction() call which adds some extra flags to control the fine details of signal delivery. This call also allows a signal handler to be passed a "siginfo_t" data structure with information about the cause of the signal, such as the UID of the process which sent the signal.

As these changes, particularly those from UCB, were focused on providing "reliable" signal delivery, one might expect that at least the reliability issues would be resolved. Not so it seems. The select() system call (and related poll()) did not play well with signals so pselect() and ppoll() had to be invented and eventually implemented. The interested reader is encouraged to explore their history. Along with these semantic "enhancements" to signal delivery, both teams of developers chose to define more signals generated by different events. Though signal delivery was already problematic before these were added, it is likely that these new demands stretched the design towards breaking point.

An interesting example is SIGCHLD and SIGCLD, which are sent when a child exits or is otherwise ready for the parent to wait() for it. The difference between these two (apart from the letter "H" and different originating team) is that SIGCHLD is delivered once per event (as is the case with other signals) while SIGCLD would be delivered constantly (unless blocked) while any child is ready to be waited for. In the language of hardware interrupts, SIGCHLD is edge triggered while SIGCLD is level triggered. The choice of a level-triggered signal might have been an alternate attempt to try to improve reliability. Adding SIGCLD was more than just defining a new number and sending the signal at the right time. Two of the new flags added for sigaction() are specifically for tuning the details of handling this signal. This is extra complexity that signals didn't need and which arguably did not belong there.

In more recent years the collection of signal types has been extended to include "realtime" signals. These signals are user-defined signals (like SIGUSR1 and SIGUSR2) which are only delivered if explicitly requested in some way. They have two particular properties. Firstly, realtime signals are queued so the handler in the target process is called exactly as many times as the signal was sent. This contrasts with regular signals which simply set a flag on delivery. If a process has a given (regular) signal blocked and the signal is sent several times, then, when the process unblocks the signal, it will still only see a single delivery event. With realtime signals it will see several. This is a nice idea, but introduced new reliability issues as the depth of the queue was limited, so signals could still be lost. Secondly (and this property requires the first), a realtime signal can carry a small datum, typically a number or a pointer. This can be sent explicitly with sigqueue() or less directly with, e.g., timer_create().

It could be thought that this addition of more signals for more events is a good example of the "full exploitation" pattern that was discussed at the start of this series. However, when adding new signal types require significant changes to the original design, it could equally seem that the original design wasn't really strong enough to be so fully exploited. As can be seen from this retrospective, though the original signal design was quite simple and elegant, it was fatally flawed. The need to re-arm signals made them hard to use reliably, the exact semantics of interrupting a system call was hard to get right, and developers repeatedly needed to significantly extend the design to make it work with new types of signals.

The most recent step in the saga of signals is the signalfd() system call which was introduced to Linux in 2007 for 2.6.22. This system call extends "everything has a file descriptor" to work for signals too. Using this new type of descriptor returned by signalfd(), events that would normally be handled asynchronously via signal handlers can now be handled synchronously just like all I/O events. This approach makes many of the traditional difficulties with signals disappear. Queuing becomes natural so re-arming becomes a non-issue. Interaction with system calls ceases to be interesting and an obvious way is provided for extra data to be carried with a signal. Rather than trying to fix a problematic asynchronous delivery mechanism, signalfd() replaces it with a synchronous mechanism that is much easier to work with and which integrates well into other aspect of the Unix design - particularly the universality of file descriptors.

It is a fun, though probably pointless, exercise to imagine what the result might have been had this approach been taken to signals when problems were first observed. Instead of adding new signal types we might have new file descriptor types, and the set of signals that were actually used could have diminished rather than grown. Realtime signals might instead be a general and useful form of interprocess communication based on file descriptors.

It should be noted that there are some signals which signalfd() cannot be used for. These include SIGSEGV, SIGILL, and other signals that are generated because the process tried to do something impossible. Just queueing these signals to be processed later cannot work, the only alternatives are switching control to a signal handler, or aborting the process. These cases are handled perfectly by the original signal design. They cannot occur while a system call is active (system calls return EFAULT rather than raising a signal) and issues with when to re-arm the signal handler are also less relevant.

So while signal handlers are perfectly workable for some of the early use cases (e.g. SIGSEGV) it seems that they were pushed beyond their competence very early, thus producing a broken design for which there have been repeated attempts at repair. While it may now be possible to write code that handles signal delivery reliably, it is still very easy to get it wrong. The replacement that we find in signalfd() promises to make event handling significantly easier and so more reliable.

The Unix permission model

Our second example of an unfixable design which is best replaced is the owner/permission model for controlling access to files. A well known quote attributed to H. L. Mencken is "there is always a well-known solution to every human problem - neat, plausible, and wrong." This is equally true of computing problems, and the Unix permissions model could be just such a solution. The initial idea is deceptively simple: six bytes per file gives simple and broad access control. When designing an operating system to fit in 32 kilobytes of RAM (or less), such simplicity is very appealing, and thinking about how it might one day be extended is not a high priority, which is understandable though unfortunate.

The main problems with this permission model is that it is both too simple and too broad. The breadth of the model is seen in the fact that every file stores its own owner, group owner, and permission bits. Thus every file can have distinct ownership or access permissions. This is much more flexibility than is needed. In most cases, all the files in a given directory, or even directory tree have the same ownership and much the same permissions. This fact was leveraged by the Andrew filesystem which only stores ownership and permissions on a per-directory basis, with little real loss of functionality.

When this only costs six bytes per file it might seem a small price to pay for the flexibility. However once more than 65,536 different owners are wanted, or more permission bits and more groups are needed, storing this information begins to become a real cost. However the bigger cost is in usability.

While a computer may be able to easily remember six bytes per file, a human cannot easily remember why various different settings might have been assigned and so are very likely to create sets of permission settings which are inconsistent, inappropriate, and hence not particularly secure. Your author has memories from University days of often seeing home directories given "0777" permissions (everyone has any access) simply because a student wanted to share one file with a friend, but didn't understand the security model.

The excessive simplicity of the Unix permission model is seen in the fixed, small number of permission bits, and, particularly, that there is only one "group" that can have privileged access. Another maxim from computer engineering, attributed to Alan Kay, is that "Simple things should be simple, complex things should be possible." The Unix permission model makes most use cases quite simple but once the need exceeds that common set of cases, further refinement becomes impossible. The simple is certainly simple, but the complex is truly impossible.

It is here that we start to see real efforts to try to "fix" the model. The original design gave each process a "user" and a "group" corresponding to the "owner" and "group owner" in each file, and they were used to determine access. The "only one group" limit is limiting on both sides; the Unix developers at UCB saw that, for the process side at least, this limit was easy to extend. They allowed a process to have a list of groups for checking filesystem access against. (Unfortunately this list originally had a firm upper limit of 16, and that limit made its way into the NFS protocol where it was hard to change and is still biting us today.)

Changing the per-file side of this limit is harder as that requires changing the way data is encoded in a filesystem to allow multiple groups per file. As each group would also need its own set of permission bits a file would need a list of groups and permission bits and these became known quite reasonably as "access control lists" or ACLs. The Posix standardization effort made a couple of attempts to create a standard for ACLs, but never got past draft stage. Some Unix implementations have implemented these drafts, but they have not been widely successful.

The NFSv4 working group (under the IETF umbrella) were tasked with creating a network filesystem which, among other goals, would provide interoperability between POSIX and WIN32 systems. As part of this effort they developed yet another standard for ACLs which aimed to support the access model of WIN32 while still being usable on POSIX. Whether this will be more successful remains to be seen, but it seems to have a reasonable amount of momentum with an active project trying to integrate it into Linux (under the banner of "richacls") and various Linux filesystems.

One consequence of using ACLs is that the per-file storage space needed to store the permission information is not only larger than six bytes, it is not of a fixed length. This is, in general, more challenging than any fixed size. Those filesystems which implement these ACLs do so using "extended attributes" and most impose some limit on the size of these - each filesystem choosing a different limit. Hopefully most ACLs that are actually used will fit within all these arbitrary limits.

Some filesystems - ext3 at least - attempt to notice when multiple files have the same extended attributes and just store a single copy of those attributes, rather than one copy for each file. This goes some way to reduce the space cost (and access-time cost) of larger ACLs that can be (but often aren't) unique per file, but does nothing to address the usability concerns mentioned earlier. In that context, it is worth quoting Jeremy Allison, one of the main developers of Samba, and so with quite a bit of experience with ACLs from WIN32 systems and related interoperability issues. He writes: "But Windows ACLs are a nightmare beyond human comprehension :-). In the 'too complex to be usable' camp." It is worth reading the context and follow up to get a proper picture, and remembering that richacls, like NFSv4 ACLs, are largely based on WIN32 ACLs.

Unfortunately it is not possible to present any real example of replacing rather than fixing the Unix permission model. One contender might be that part of "SELinux" that deals with file access. This doesn't really aim to replace regular permissions but rather tries to enhance them with mandatory access controls. SELinux follows much the same model of Unix permissions, associating a security context with every file of interest, and does nothing to improve the usability issues.

There are however two partial approaches that might provide some perspective. One partial approach began to appear in Level 7 Unix with the chroot() system call. It appears that chroot() wasn't originally created for access control but rather to have a separate namespace in which to create a clean filesystem for distribution. However it has since been used to provide some level of access control, particularly for anonymous FTP servers. This is done by simply hiding all the files that the FTP server shouldn't access. Anything that cannot be named cannot be accessed.

This concept has been enhanced in Linux with the possibility for each process not just to have its own filesystem root, but also to have a private set of mount points with which to build a completely customized namespace. Further it is possible for a given filesystem to be mounted read-write in one namespace and read-only in another namespace, and, obviously, not at all in a third. This functionality is suggestive of a very different approach to controlling access permissions. Rather than access control being per-file, it allows it to be per-mount. This leads to the location of a file being a very significant part of determining how it can be accessed. Though this removes some flexibility, it seems to be a concept that human experience better prepares us to understand. If we want to keep a paper document private we might put it in a locked drawer. If we want to make it publicly readable, we distribute copies. If we want it to be writable by anyone in our team, we pin it to the notice board in the tea room.

This approach is clearly less flexible than the Unix model as the control of permissions is less fine grained, but it could well make up for that in being easier to understand. Certainly by itself it would not form a complete replacement, but it does appear to be functionality that is growing - though it is too early yet to tell if it will need to grow beyond its strength. One encouraging observation is that it is based on one of those particular Unix strengths observed in our first pattern, that of "a hierarchical namespace" which would be exploited more fully.

A different partial approach can be seen in the access controls used by the Apache web server. These are encoded in a domain-specific language and stored in centralized files or in ".htaccesss" files near the files that are being controlled. This method of access control has a number of real strengths that would be a challenge to encode into anything based on the Unix permission model:

  • The permission model is hierarchical, matching the filesystem model. Thus controls can be set at whichever point makes most sense, and can be easily reviewed in their entirety. When the controls set at higher levels are not allowed to be relaxed at lower levels it becomes easy to implement mandatory access controls.

  • The identity of the actor requesting access can be arbitrary, rather than just from the set of identities that are known to the kernel. Apache allows control based on source IP address or username plus password. Using plug-in modules almost anything else that could be available.

  • Access can be provided indirectly through a CGI program. Thus, rather than trying to second-guess all possible access restrictions that might be desirable and define permission bits for them in a new ACL, the model can allow any arbitrary action to be controlled by writing a suitable script to mediate that access.

It should be fairly obvious that this model would not be an easy fit with kernel-based access checking and, in any case, would have a higher performance cost than a simpler model. As such it would not be suitable to apply universally. However it could be that such a model would be suitable for that small percentage of needs that do not fit in a simple namespace based approach. There the cost might be a reasonable price for the flexibility.

While an alternate approach such as these might be appealing, it would face a much bigger barrier to introduction than signalfd() did. signalfd() could be added as a simple alternate to signal handlers. Programs could continue to use the old model with no loss, while new programs can make use of the new functionality. With permission models, it is not so easy to have two schemes running in parallel. People who make serious use of ACLs will probably already have a bunch of ACLs carefully tuned to their needs and enabling an alternate parallel access mechanism is very likely to break something. So this is the sort of thing that would best be trialed in a new installation rather than imposed on an existing user-base.

Discerning the pattern

If we are to have a convincing pattern of "unfixable designs" it must be possible to distinguish them from fixable designs such as those that we found last time. In both cases, each individual fix appears to be a good idea addressing a real problem without obviously introducing more problems. In some case this series of small steps leads to a good result, in others these steps only help you get past the small problems enough to be able to see the bigger problem.

We could use mathematical terminology to note that a local maximum can be very different from a global maximum. Or, using mountain-climbing terminology, it is hard to know the true summit from a false summit which just gives you a better view of the mountain. In each case the missing piece is a large scale perspective. If we can see the big picture we can more easily decide if a particular path will lead anywhere useful or if it is best to head back to base and start again.

Trying to move this discussion back to the realm of software engineering, it is clear that we can only head off unfixable designs if we can find a position that can give us a clear and broad perspective. We need to be able to look beyond the immediate problem, to see the big picture and be willing to tackle it. The only known source of perspective we have for engineering is experience, and few of us have enough experience to see clearly into the multiple facets and the multiple levels of abstraction that are needed to make right decisions. Whether we look for such experience by consulting elders, by researching multiple related efforts, or finding documented patterns that encapsulate the experience of others, it is vitally important to leverage any experience that is available rather than run the risk of simply adding bandaids to an unfixable design.

So there is no easy way to distinguish an unfixable design from a fixable one. It requires leveraging the broad perspective that is only available through experience. Having seen the difficulty of identifying unfixable designs early we can look forward to the final part of this series, where we will explore a pernicious pattern in problematic design. While unfixable designs give a hint of deeper problems by appearing to need fixing, these next designs do not even provide that hint. The hints that there is a deeper problem must be found elsewhere.

Exercises

  1. Though we found that signal handlers had been pushed well beyond their competence, we also found at least one area (i.e. SIGSEGV) when they were still the right tool for the job. Determine if there are other use cases that avoid the observed problems, and so provide a balanced assessment of where signal handlers are effective, and where they are unfixable.

  2. Research problems with "/tmp", attempts to fix them, any unresolved issues, and any known attempts to replace rather than fix this design.

  3. Describe an aspect of the IP protocol suite that fits the pattern of an "Unfixable design".

  4. It has been suggested that dnotify, inotify, fanotify are all broken. Research and describe the problems and provide an alternate design that avoids all of those issues.

  5. Explore the possibility of using fanotify to implement an "apache-like" access control scheme with decisions made in user-space. Identify enhancements requires to fanotify for this to be practical.

Next article

Ghosts of Unix past, part 4: High-maintenance designs

Index entries for this article
KernelDevelopment model/Patterns
GuestArticlesBrown, Neil


to post comments

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 16:10 UTC (Tue) by bfields (subscriber, #19510) [Link] (5 responses)

The NFSv4 working group (under the IETF umbrella) were tasked with creating a network filesystem which, among other goals, would provide interoperability between POSIX and WIN32 systems. As part of this effort they developed yet another standard for ACLs which aimed to support the access model of WIN32 while still being usable on POSIX.

Actually, it's really just a copy of Windows ACLs as far as I can tell--different implementors have made different choices as to how to reconcile with POSIX.

The Richacl implementors (mainly Andreas Gruenbacher) have added some extra "mask bits" as a way to ensure that a chmod can still restrict permissions without permanently losing information from any ACL set on the file. Interestingly enough, the hardest part then becomes mapping the resulting masked ACL to a Windows/NFSv4-like ACL....

Readers in search of a challenge can go look at their code and figure out if there's a better mapping. I've drawn a blank so far. It's likely what we'll end up doing.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 21:04 UTC (Tue) by wazoox (subscriber, #69624) [Link] (4 responses)

> Interestingly enough, the hardest part then becomes mapping the resulting masked ACL to a Windows/NFSv4-like ACL....

That reminds me of the ACL parts of the samba code. There is a long page of comments that reads something like "beware, here follows, long, hairy, complicated and untractable explanation of a longer, hairier and more incomprehensible code". Then more lines with comments like "Don't touch this code!" :)

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 23:21 UTC (Tue) by vonbrand (guest, #4458) [Link] (2 responses)

Due to the "ACL model" of Windows being a unmangeable mess?

The user/group/others model is certainly lacking (it can't describe the full permissions matrix like the Bell-LaPadula model uses), but what are the real, usable alternatives?

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 0:57 UTC (Wed) by rahvin (guest, #16953) [Link] (1 responses)

SELinux and an infinate level of fine grained control? I guess it really depends on how much control you need and how many man hours you want to put into maintaining it.

I'd imagine the US DOD has permission levels and tables that would make your head spin, after all their paper permission levels are nearly incomprehensible, I can't even imagine their computer permissions. In fact I'd wager there is an entire staff of people that do nothing but manage permissions.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 18, 2010 17:51 UTC (Thu) by davecb (subscriber, #1574) [Link]

I took the course, and they have the same four or five levels for everyone (unclassified, restricted ,confidential, secret and top secret), and a plethora of categories, possibly including "the commandant's cat's litter-box", assuming of course that you have secrets about it.

--dave

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 3:18 UTC (Wed) by jra (subscriber, #55261) [Link]

Hey, there's ascii art in there explaining everything ! How can you not love code with ascii art in it ? :-).

Jeremy.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 16:43 UTC (Tue) by foom (subscriber, #14868) [Link] (8 responses)

> signalfd

Unfortunately, signalfd has a very irritating practical issue.

To use it, you need to block the signal you're interested in (using e.g. sigsetmask). However, the set of blocked signals is not reset by exec (blocked signals and signals set to SIG_IGN are preserved, but other signal actions are reset to default). So, if you use signalfd, whenever you spawn a process, it will not receive that signal. And processes tend to misbehave when not receiving signals they expect to.

You can, of course, fix that. You simply need to unblock the signal after forking, but before exec'ing. *IF* you control everything that ever calls fork/exec from your process. In many situations, that is impossible -- programs tend to use all sorts of libraries, some of which spawn processes.

Okay, so, you might say: "Hey, that's what pthread_atfork is for! Just set an child-side after-fork handler to unblock the signal". Well, unfortunately, pthread_atfork doesn't always get called when spawning a child process, so you can't really use it for that.

Three examples of that:
1) For the system() call, POSIX says: "It is unspecified whether the handlers registered with pthread_atfork() are called as part of the creation of the child process." In glibc, they aren't.
2) Regarding posix_spawn, POSIX says: "It is implementation-defined whether the fork handlers are run when posix_spawn() or posix_spawnp() is called." In glibc, they are.
3) The linux-specific clone() system-call does not have atfork handlers called.

So, basically, end result: signalfd is unusable in many circumstances where it'd be really nice to be able to use it -- you're better off just setting a standard signal handler which writes to an fd. Sigh.

(POSIX spec URL: http://www.opengroup.org/onlinepubs/9699919799/)

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 17:17 UTC (Tue) by mjthayer (guest, #39183) [Link] (2 responses)

Wouldn't a named pipe in the filesystem do as well in many of the cases where signalfd is a feasible solution? Presumably we are looking at the more specialised cases of signal handling here which are likely to be application-specific protocols, and not just handling SIGINT.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 17:29 UTC (Tue) by foom (subscriber, #14868) [Link] (1 responses)

I think you've got the wrong idea. Signalfd isn't intended for "specialized" applications of signals (and I'm not sure how named pipes come into play at all). It would be nice to use for completely normal uses of signal handling: for example, it would be ideal for replacing a SIGCHLD handler in an application with an event loop: you're already waiting on fds to become readable/writable, so waiting on a signalfd fd to notify you of a child that finished is exactly what you want.

Most sensible such applications will already implement that by writing a signal handler for SIGCHLD which simply writes a byte into a pipe, and then has the event loop look for readability on that pipe. Signalfd would let you do that more easily -- if you could actually use it.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 17:33 UTC (Tue) by mjthayer (guest, #39183) [Link]

> Signalfd isn't intended for "specialized" applications of signals (and I'm not sure how named pipes come into play at all). It would be nice to use for completely normal uses of signal handling: for example, it would be ideal for replacing a SIGCHLD handler in an application with an event loop [...]

That makes sense - and obviously a named pipe would be no good there whatsoever. I was more thinking of things like SIGUSR1 sorts of interactions.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 8:11 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

It seems like what you want is... another arbitrary patch atop the mess! Specifically, a rule that unblocked signals which have open signalfds act exactly as if blocked (pending signals being sent down the signalfd 'instead' of being conventionally delivered) until the fd is closed. Now, assuming that the user has opened the signalfd O_CLOEXEC (hey, that should be the default! but we repeat ourselves), the signal will automatically 'unblock itself' at exec() time, which is exactly what we want.

Bonus: no change to signal semantics when signalfd is not in use, and nobody sane would want the current semantics in any case.

What am I missing?

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 14:43 UTC (Wed) by madcoder (guest, #30027) [Link]

No, you want a real atfork() interface that is not related to threads. pthread_atfork is here because when you fork() from threads, you fork one new process, not "all the threads" and that often means that you have to deregister stuff.

Anyway, there is a solution for that (which is messy butÂ…) on linux which is to redefine fork(), system(), pthread_spawn{,p} and every similar problematic fork() wrapper using dlsym chaining to reset your signal masks properly. This isn't *that* complicated, and chains nicely. Or if you're sure that pthread_atfork() works for some then only divert the ones where it doesn't. I know it's not portable but signalfd() isn't in the first place either ;)

WRT clone() I'd say that this is a very low level interface which has a really high chance to break the libc when used (e.g. TSD breaks in interesting ways in the glibc if you use clone without emulating what the glibc does IIRC), so I'd say people using it Know What They Are Doing in the first place and should have worried about resetting the signal mask to a sane default in the first place.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 24, 2010 22:04 UTC (Wed) by neilbrown (subscriber, #359) [Link] (1 responses)

I was wondering if you had brought this up with the developer of signalfd - Davide Libenzi?

Fixing it would probably require adding a new 'flags' option, so adding a new syscall and deprecating the old. This 'flags' could allow atomic setting of close-on-exec and an auto-block flag which causes all signals being tracked by signalfd to blocked just as long as the signalfd is open.

If you haven't and don't want to, I might....

Thanks,
NeilBrown

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 25, 2010 18:28 UTC (Thu) by dcoutts (subscriber, #5387) [Link]

Please do bring this up with the kernel hackers. We were thinking of using signalfd in the GHC runtime / IO system until we discovered this problem with having to block all signals in all threads which makes it unusable (it's not just child processes, libraries can make their own threads). We have to stick with the approach of installing a signal hander that writes to a pipe (or we can use eventfd for the cases where there is no data associated with the signal).

Ghosts of Unix past, part 3: Unfixable designs

Posted Apr 14, 2016 7:01 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link]

Can't you #include <dlfcn.h> and intercept calls to glibc's fork()?

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 20:05 UTC (Tue) by dlang (guest, #313) [Link] (5 responses)

.htaccess is a convienient way to think about permissions, but in terms of performance it's a disaster.

every time apache has to access a file it needs to look in .htaccess for that directory, AND EVERY PARENT DIRECTORY.

As a result, just about every production apache server disables .htaccess files.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 20:47 UTC (Tue) by alvieboy (guest, #51617) [Link] (1 responses)

That could be solved by simply caching access data in memory. dnotify() also allows you to do proper reloading of ACLs, without having to stat() every file on hierarchy.

But Apache is not only meant for Linux. Other OSes do not provide these functionalities.

What's really harder is to apply all constraints in a fast and efficient way. I never benchmarked Apache on this, but I'd bet its not that fast nor efficient.

Alvaro

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 21:21 UTC (Tue) by dlang (guest, #313) [Link]

under the covers, .htaccess files are not just access control files, they can contain any config options that can be in an apache config file, they just apply to that directory and it's subdirectories, discovered per-hit.

so yes, they are horribly inefficient

in terms of caching the combined contraints, that seems hard in the face of directories being moved around.

there's also the issue of the interaction with links and finguring out the 'true' path to a file.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 1:55 UTC (Wed) by buck (subscriber, #55985) [Link] (1 responses)

maybe i'm a moron, but i find that trying to map the non-hierarchical
layout of the apache httpd.conf to a mental overlay for the server's
(virtual) filesystem/URI space to be way beyond my competence. .htac-
cess files at least have the virtue of controls being in proximity to
the stuff they control, though that thinking runs entirely counter to
the point being made in the article, the extended-attribute bloat,
etc. so maybe i just drank the inode/xattr Kool-Aid to my permanent
detriment, but the ``composability'' of the permissions by masking
them through the filesystem's links down to the object of concern is
something i just totally grok. (i think there must be some connec-
tion i should make here about exploiting the grafted-on filesystem
trees design to the full being part-and-parcel, but i am obviously not
a big-picture type, and i think that case was made for chroot/name-
space forking)

i'll concede that maybe AFS directory-only permissions might simplify
things a bit, at the fringes

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 2:50 UTC (Wed) by dlang (guest, #313) [Link]

no disagreement that they are easier to understand, the problem is the performance implications of them.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 9:53 UTC (Wed) by iq-0 (subscriber, #36655) [Link]

The beautiful thing is that one doesn't have to do this expensive lookup if the main filesystem/kernel used such a scheme. It would work pretty much like the dentry-cache and one could probably even JIT the possible complex patterns or at least byte-code compile them in memory.
Apache doesn't do this for it is hard to get a good cross-platform file-change notification (which doesn't have possible side-effects).

It even has a good chance to be cheaper than the current unix model, since in a practical setup there would only be a few (compiled) rulesets in effect (still hundreds, but a lot less than actual dentries). One could possibly cache a pointer to the list of effective rules to a dentry/inode (depending on how the rules are to be applied, this is semantics, but I suspect you'd want them on the inode level).

But the decoupling of the details from every single inode can probably be done without any real performance impact (and possibly even performance gains). Whether you use hierarchical ACLs or matching rules shouldn't really make a difference and constant tree traversals shouldn't be necessary when done at the VFS level.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 20:35 UTC (Tue) by pj (subscriber, #4506) [Link] (8 responses)

s/using maintain climbing terminology/using mountain climbing terminology/

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 20:47 UTC (Tue) by smurf (subscriber, #17840) [Link] (7 responses)

Thou Shalt Not Post Typos As Comments.

Presumably, that also holds for spelling mis-corrections.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 23:36 UTC (Tue) by tpo (subscriber, #25713) [Link] (6 responses)

> Thou Shalt Not Post Typos As Comments

Why not? This series has very much the flavour of a classic text. Thus fixing it now for posteriority makes a lot of sense:

s/Each of these can be the right answer is different contexts/Each of these can be the right answer in different contexts/

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 23:59 UTC (Tue) by ABCD (subscriber, #53650) [Link]

To quote the text that appears every time you post a comment:

> Please do not post typos as comments, send them to [email protected] instead.

Typos

Posted Nov 17, 2010 0:05 UTC (Wed) by corbet (editor, #1) [Link] (4 responses)

"Posteriority"? You're going to sit on it? :)

As noted elsewhere; future readers of a classic text are likely to be supremely uninterested in the typos that made it through the editing process. That's why we prefer that people email them to us.

Typos

Posted Nov 18, 2010 18:15 UTC (Thu) by RobSeace (subscriber, #4435) [Link] (3 responses)

You really should add a "report typo" button at the bottom of every story,
with a web form to fill in... When people are in their browser reading a web
site, they hate to jump through the hoop of firing up their E-mail program (or
navigating to their web-mail site) just to report a typo, especially when there's
a handy-dandy easier-to-use forum thread right there that they can mention it
in instead...

And, yes, I know you've already got a "mailto:" link, but for many of us,
"mailto:" is useless... It doesn't bring up my prefered E-mail client (elm,
running on a completely different machine than where my browser is currently
running)...

Typos

Posted Nov 21, 2010 10:56 UTC (Sun) by Darkmere (subscriber, #53695) [Link] (2 responses)

Fix ye'r mailclient configuration ;)

( ssh -t user@host 'something something something %U' ) should do it for you, add to a .desktop, associate as a Mailer and you should be good to go

Typos

Posted Nov 22, 2010 18:46 UTC (Mon) by wookey (guest, #5501) [Link] (1 responses)

This is a great suggestion, but where does one put it? It's the desktop or browser config that decides what to do with mailto: links. mozilla seems to have it's own internal config with useful options like 'yahoo mail' or 'googlemail' or 'thunderbird'. I can pick 'other' and specify '/usr/bin/ssh' there but not a useful command line SFAICT. THose internal options live in some kind of internal database thing SFAIK so there is no handy text file in .mozilla/firefox/435l2hlia.default to edit (is there?)

see gnome-default-applications-properties

Posted Nov 22, 2010 19:41 UTC (Mon) by jku (subscriber, #42379) [Link]

In GNOME gnome-default-applications-properties will let you choose the default mailto handler. You can set a custom entry like "/home/wookey/bin/mail-handler.sh %s". Mozilla products respect this, I think.

The current setup seems to allow what you want but it's pretty limited in many ways. See Bastien Noceras blog for some recent mimetype-related developments: http://www.hadess.net/2010/10/new-control-center-and-you....

Signals vs. system calls

Posted Nov 16, 2010 20:43 UTC (Tue) by madscientist (subscriber, #16861) [Link] (2 responses)

Only lightly touched on here is the horrible-ness of signals interrupting system calls. The ability to set SA_RESTART is better than nothing but it has significant problems, in particular that it's not implemented appropriately everywhere (Linux is pretty good but Solaris, for example, is pretty bad).

This means that if, for example, you set a signal handler for SIGCHLD you have major problems since SA_RESTART can't be considered reliable (portably). The trick of having an internal pipe to communicate between your signal handler and your main event loop is still subject to this problem.

One assumes that signalfd() would not interrupt system calls on signals delivered through the FD so it solves that problem--but it's Linux-specific and Linux already handles SA_RESTART reliably.

Signals vs. system calls

Posted Nov 19, 2010 22:57 UTC (Fri) by giraffedata (guest, #1954) [Link] (1 responses)

I can't quite tell what problem you're pointing out. Are you saying it's horrible that anything that makes a system call of an interruptible type has to check for EINTR or partial completion and repeat/resume the system call?

Of course, any solution in which system calls are uninterruptible defeats half the purpose of a signal.

The trick of having an internal pipe to communicate between your signal handler and your main event loop is still subject to this problem.

I'm familiar with the trick of having such an internal pipe -- it solves the problem of select() not getting interrupted when a signal arrives just as select() is starting. But I don't see the connection between that and horribleness of signals interrupting system calls.

One assumes that signalfd() would not interrupt system calls on signals delivered through the FD

signalfd() just generates the file descriptor, so of course it doesn't interrupt anything. If you mean that in a program that uses signalfd(), system calls don't get interrupted, I think you're right because a program that uses signalfd() normally blocks signals, and a blocked signal can't interrupt a system call. But that just means that a program that uses signalfd() can't fully exploit signals -- control-C won't unhang some things.

Signals vs. system calls

Posted Nov 25, 2010 5:55 UTC (Thu) by rqosa (subscriber, #24136) [Link]

> But that just means that a program that uses signalfd() can't fully exploit signals -- control-C won't unhang some things.

The solution to that is to use non-blocking system calls (or at least ones that you know will only block for a short time). That's something that you should already be doing if you're using signalfd(); the purpose of signalfd() is to handle signals with an event loop, and an event loop shouldn't have any blocking system calls in it (or anything else that takes a long time) except for the one select() (or epoll_wait() or similar) that drives the event loop.

(If there's some type of event whose handler must take a long time to run, then have the event loop hand it off to a worker thread/process.)

Signals is the WORST part of Unix.

Posted Nov 16, 2010 21:08 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Signals are way too complicated, introduce a lot of corner cases. And for no real gain - it's not possible to use signals reliably (in part because signal numbers are shared) beyond simple SIGHUP signaling.

Windows DPC shows us that signals _can_ be done right.

Signals is the WORST part of Unix.

Posted Nov 19, 2010 14:57 UTC (Fri) by Yorick (guest, #19241) [Link] (1 responses)

Windows DPC shows us that signals _can_ be done right.

I'm not very familiar with Windows, but isn't DPC a pure kernel-mode concept rather than something available in userspace? Windows does not appear to believe in pre-empting running userspace threads by user code - an approach that clearly solves some problems but mainly by taking options away from the programmer. This is not necessarily a bad thing, of course.

Of course, since Unix signals are used for so many very different purposes, they cannot and should not be replaced by a single new mechanism.

Signals is the WORST part of Unix.

Posted Nov 19, 2010 17:06 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

It's possible to drop from DPC to userlevel (in effect, replacing the currently executed code). In fact, signals are trivial to implement using DPC.

>Of course, since Unix signals are used for so many very different purposes, they cannot and should not be replaced by a single new mechanism.

Unix signals are MISused for many purposes. They are broken and should be deprecated.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 16, 2010 21:14 UTC (Tue) by jengelh (subscriber, #33263) [Link] (14 responses)

Neil, will you, at some point, release short answers to the exercises?

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 4:28 UTC (Wed) by neilbrown (subscriber, #359) [Link] (13 responses)

Most of the exercises are research questions for which short answers do not exist.
Question 4 (on Xnotify) would result in an article that I would very much like to read. Question 5 could result is a potentially useful slab of code (though it is less clear whether it would be used).

I can tell you what I was thinking of in the "IP protocol suite" questions though, as no-one seems to have taken a stab that those in the comments.

The 'full exploitation' in IP relates to UDP. It is most nearly an application layer (layer 7) protocol (as applications can use it to communicate) yet it is used a multiple levels of the stack - particularly for routing (at least back when we used RIP. BGP uses TCP) which is a layer 3 concern. It is used for VPNs and other network management. And even sometimes for application level protocols.

The "conflated design" in IP is the fact that end-point addresses and rendezvous addresses are equivalent at the IP level. They aren't at higher levels. "lwn.net" is a rendezvous address, but the IP level you only see 72.51.34.34, which could (in a shared-hosting config) map from several rendezvous addresses. So upper level protocols (like http/1.1) need to communicate the *real* rendezvous address, because IP doesn't.

The "unfixable design" in IP is obviously the tiny address space, which we have attempted to fix by NAT and VPNs etc, but they aren't real fixes. Had IP used a distinct rendezvous address it would have only been needed in the first packet of a TCP connection, so it would have been cheap to make it variable-length and then we might not have needed IPv6 (though that doesn't really address UDP).

So those were my thoughts. I haven't spent as much time fighting with network protocols as I have with the Unix API so I'm a lot less confident of these ideas than of the ones I wrote formally about.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 4:40 UTC (Wed) by dlang (guest, #313) [Link] (4 responses)

just setting the address space to 64 bits instead of 32 bits would have been enough to eliminate the need for IPv6

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 10:00 UTC (Wed) by iq-0 (subscriber, #36655) [Link] (3 responses)

No it would only eliminate the address shortage part of the problem. There are more significant changes that really are long due to be made which are also addressed by IPv6. The reasoning is: don't upgrade to half a solution when you know you really must do another upgrade soon after, the cost is in the breaking upgrade not in the amount of changes.
That is not to say that IPv6 is the holy grail, it's design by committee and as such is probably too different on one front and not different enough on another. And of course it's trial by jury with a terribly large jury, so there is probably not one protocol (now or ever) that would meet all the demands.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 23:23 UTC (Wed) by dlang (guest, #313) [Link] (2 responses)

what are the other problems that IPv6 solves?

at the time it was designed, there were a lot of things that it did that were not possible in IPv4, but most (if not all) of the features that people really care about have been implemented in IPv4

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 18, 2010 8:11 UTC (Thu) by Cato (guest, #7643) [Link] (1 responses)

You are right about many things such as QoS, IPv6, etc.

However, Mobile IP is much better implemented in IPv6 so you don't get inefficient 'triangular routing' - http://www.usipv6.com/ppt/MobileIPv6_tutorial_SanDiegok.pdf

The biggest benefit of course is not having to use NAT for IPv6 traffic.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 18, 2010 13:19 UTC (Thu) by vonbrand (guest, #4458) [Link]

Yep, that's why people are clamoring for NATv6 ;-) (Just as the idiotic firewalling going on has made everything run over HTTP.)

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 18, 2010 4:27 UTC (Thu) by paulj (subscriber, #341) [Link] (7 responses)

That the rendezvous address is (potentially) at a higher level than the end-point address is normal layering. For any given layer that provides some kind of addressing semantics, there can always be another layer above it that implements richer addressing and must map its richer addresses down to the lower layer. That's good and normal.

So to look for conflation in networking addressing you probably need to stay within a layer. E.g. within IP, there is conflation in addressing because each address encodes both the identity of a node and its location in the network. Or perhaps more precisely: IP addressing lacks the notion of identity really, but an IP address is the closest you get and so many things use it for this. This may be fixed in the future with things like Shim6 or ILNP, which separate IP addressing into location and identity. This would allow upper-layer protocols like TCP to bind their state to a host identity, and so decouple them from network location.

Variable length addresses would have been nice. The ISO packet protocol CLNP uses variable length NSAP addresses. However, hardware people tend to dislike having to deal with VL address fields. The tiny address space of IPv4 perhaps needn't have been unfixable - it could perhaps have been extended in a semi-compatible way. However it was decided (for better or worse) a long time ago to create IPv6.

Possibly another problem with IP, though I don't know where it fits in your list, is multicast. This is an error of foresight, due to the fact that multicast still had to be researched and it depended on first understanding unicast - i.e. IP first had to be deployed. The basic problem is that multicast is bolted on to the side of IP. It generally doesn't work, except in very limited scopes. One case is where it can free-ride on existing underlying network multicast primitives, i.e. ones provided by local link technologies. Another is where a network provider has gone to relatively great additional trouble to configure multicast to work within some limited domain - needless to say this is both very rare and even when done is usually limited to certain applications (i.e. not available generally to network users). In any new network scheme one hopes that multicast services would be better integrated into the design and be a first-class service alongside unicast.

Another retrospectively clear error is IP fragmentation. It was originally decided that fragmentation was best done on a host by host basis, on the assumption that path MTU discovery could be done through path network control signalling and that fragmentation/reassembly was a reasonably expensive process that middle-boxes ought not to be obliged to do. IMO this was a mistake: path MTU signalling turned out to be very fragile in modern deployment (IP designers didnt anticipate securo-idiocy); it turned out fragmentation/reassembly was relatively cheap - routers routinely use links both for internal buses and external connections which require fragmenting packets into small fixed size cells. As a consequence of the IP fragmentation choices, the IP internet is effectively limited to a (outer) path MTU of 1500 for ever more, regardless of changes in hardware capability. This causes problems for any IP packet protocol which wants to encap itself or another. One imagines that any new network scheme would learn from the IP MTU mess, make different trade-offs and come up with something better and more robust.

We should of course be careful to not overly condemn errors of foresight. Anticipating the future can be hard, particularly where people are busy designing cutting-edge new technology that will define the future. ;)

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 18, 2010 8:13 UTC (Thu) by Cato (guest, #7643) [Link] (5 responses)

One performance improvement of IPv6 is that it has a much more regular IP header structure, which involves a lower cost for hardware-based forwarding.

http://en.wikipedia.org/wiki/IPv6#Features_and_difference... has a good summary of the benefits of IPv6 including this one.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 19, 2010 1:15 UTC (Fri) by dlang (guest, #313) [Link] (4 responses)

that sounds like something that was a really big deal when IPv6 was created, but with the increased processor speeds we have now, not nearly as important.

this isn't just that clock speeds are higher, but that the ratio of clock speeds to the system bus speeds is no longer 1:1, this means that it's possible to execute far more steps without slowing the traffic down.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 19, 2010 11:15 UTC (Fri) by job (guest, #670) [Link]

If you're switching IP in hardware, it makes your design simpler and faster.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 19, 2010 11:41 UTC (Fri) by Cato (guest, #7643) [Link] (2 responses)

IP routers have not done CPU-based forwarding as the main path for a long time - the largest one is probably the Cisco CRS-3 which forwards 322 terabits per second when fully scaled (http://newsroom.cisco.com/dlls/2010/prod_030910.html), but even quite low end routers now also use hardware forwarding (i.e. ASICs, network processors, etc, not CPU).

You can probably manage to forward anything in hardware, but it helps somewhat that IPv6 has a regular header design.

IPV6 and hardware-parseable IP headers

Posted Nov 19, 2010 23:26 UTC (Fri) by giraffedata (guest, #1954) [Link]

I don't think CPU speed per se (how fast a single CPU is) is relevant. It's all about cost, since most IP networks are free to balance the number of CPUs, system buses, network links, etc.

And from what I've seen, as the cost of routing in a general purpose CPU has come down, so has the cost of doing it in a specialized network link processor (what we're calling "hardware" here) -- assuming the IP header structure is simple enough. So today, as ten years ago, people would rather do routing in an ASIC than allocate x86 capacity to it.

I think system designers balance system bus and CPU speed too, so it's not the case that there are lots of idle cycles in the CPU because the system bus can't keep up with it.

Ghosts of Unix past, part 3: Unfixable designs

Posted Dec 3, 2010 9:05 UTC (Fri) by paulj (subscriber, #341) [Link]

FWIW, major router vendors still use software routing for many of their lower-end enterprise routers, even some mid-range.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 19, 2010 11:18 UTC (Fri) by job (guest, #670) [Link]

Path MTU discovery is indeed broken in practice.

One thing I never really understood is why TCP MSS is a different setting from MTU. Given the belief that the MTU could be auto detected, MSS could be deduced from it.

Perhaps someone can enlighten me?

Ghosts of Unix past, part 3: returning -1 for system call failure

Posted Nov 16, 2010 21:21 UTC (Tue) by jhhaller (guest, #56103) [Link] (1 responses)

One of the other problems resulting from early Unix implementation is the return code of -1 for failed system calls, with errno set as a side affect. If I remember correctly, this was a side effect of the PDP-11 instruction set, with the actual interface being a condition code and a single register return. If the condition code was set, the return register was errno, otherwise, it was the return value of the system call. Because of the limited data on the function return, it was impossible to give partial results when a system call failed. This has led to such things as failed writes not being able to report how many bytes were written before the failure, as well as the signal/select/poll problems described in the article. While the condition code was a good idea allowing fast failure checking, the translation into the C library interface and the single return code was a problem.

Ghosts of Unix past, part 3: returning -1 for system call failure

Posted Nov 19, 2010 23:48 UTC (Fri) by giraffedata (guest, #1954) [Link]

I don't know if it was an accident due to PDP-11 praticality or just good design philosophy, but I very much appreciate the convention in Unix of not returning information when a system call fails. I.e. a failure is a failure. If you get back useful information, or the system changes state, it's not a failure, but a different kind of success.

So I guess you're saying if a read of 10 sectors fails due to a media error on the 5th sector, you'd like to see the result, "failed due to media error, but read the first 4 sectors." I like Unix'es version much better: Instead of requesting 10 sectors, you request "up to 10 sectors" and it succeeds with "4 sectors read" and the next read truly fails.

Things that fail but don't fail are much harder to program to. They engender mistakes and convoluted code. "Failure" has the special implication that you can probably just stop thinking about it and give up on whatever you were doing. They are the inspiration for exception throwing in programming languages.

Partial answer to #2

Posted Nov 16, 2010 23:03 UTC (Tue) by jamesmrh (guest, #31622) [Link]

Per-user /tmp has been implemented with Linux namespaces for use with SELinux kiosk mode. It's configurable via PAM (see pam_namespace(8), and not limited to any particular use.

The fs namespace ideas came from Plan 9, but weren't really useful until integrated with PAM.

Null-Terminated Strings

Posted Nov 17, 2010 0:13 UTC (Wed) by ldo (guest, #40946) [Link] (19 responses)

ThatÂ’s my number-one example of an unfixable design crock in *nix systems and things that derive from them (like C).

Null-Terminated Strings

Posted Nov 17, 2010 3:23 UTC (Wed) by pr1268 (subscriber, #24648) [Link] (18 responses)

Do you have a better suggestion? Pascal-style strings?

While I agree that C-style strings are bothersome at times, there just doesn't seem to be any better alternative. And never mind that Java's Strings are hideously inefficient (but again, is there a better way?).

I don't mean to argue; I'm just playing devil's advocate here. I honestly don't know myself whether there could have been a better way to do character strings way back in the day.

Null-Terminated Strings

Posted Nov 17, 2010 4:37 UTC (Wed) by neilbrown (subscriber, #359) [Link] (13 responses)

I actually think nul terminated strings are simple and elegant and work.

The problem is strcpy and strcat and sprintf should should never have existed. strlcpy etc are much better interfaces when you have static or preallocated buffers.
If you want dynamic strings, then talloc_strdup and talloc_strdup_append etc (in libtalloc) are probably your friends, though I confess I haven't used them extensively.

strlcpy

Posted Nov 17, 2010 6:26 UTC (Wed) by ncm (guest, #165) [Link]

Sorry, strlcpy is a failure. It takes more and uglier code to use it correctly than to use strcpy with the same level of checking. As a consequence, it is rarely used correctly, and unprofitably when it is. This is not to say that one cannot improve on strcpy, just that strlcpy doesn't.

Null-Terminated Strings

Posted Nov 17, 2010 8:26 UTC (Wed) by nix (subscriber, #2304) [Link] (6 responses)

nul terminated strings have a huge problem: you can accidentally overwrite the nul, and then you're dead. But if you're doing that, you can accidentally overwrite bits of the inside of the string as well, and then you have wrong results! Is that better? Probably not. Oh, and overwriting off the start or end can break your memory allocator or stack frame anyway, so you'd be dead in any case, even if not using nul-terminated strings.

And then we have Pascal-layout strings (as opposed to actual Pascal 'strings', a nightmare for other reasons, see Kernighan). They don't fix this problem (you just have to overwrite the start of the string, not its end) and have two much bigger problems: finite string length, and an increase in size of every string. The finite string length means that writing general string-handling algorithms without special cases for the rare event of large strings is impossible, and the increase in size of every string bloats small strings, which are by far the common case. You can patch both of these: the first, by making the finite string length as large as a pointer; and the second, by noting that alignment constraints in existing systems bloat the effective size of strings anyway. But of course this soon turns into a special case of nul-terminated strings: point the pointer at the end of the string, bingo, one rather hard-to-consult nul by any other name.

The biggest downside is probably a long-term ABI problem. The scheme is inflexible. If your Pascal string-length header is too short, however do you expand it? It's wired into every string-using program out there! At least nul-terminated strings need no expansion.

The real solution to string-handling unfortunately requires a VM of some description which can prevent the program from accidentally overwriting fields in aggregates by writes to any other field or variable. Then you can do reliable Pascal strings, separating the length from the content, or reliable null-terminated strings, with the separate compartment containing a pointer into the string. Unfortunately this is incompatible with low-level all-the-world's-a-giant-arena languages like C without very specialized fine-grained MMU hardware.

(I have, like everyone, written my own dynamic string-handliing library when younger. It starts out simple but it's amazing how soon you have to introduce extra code to track pointers and make freeing them in error cases less verbose, and extra code to track memory leaks... you need that in C anyway of course but the massive increase in dynamic memory use that dynamically allocating most strings brings tends to force them on you sooner than otherwise.)

Null-Terminated Strings

Posted Nov 17, 2010 12:38 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

"The biggest downside is probably a long-term ABI problem. The scheme is inflexible. If your Pascal string-length header is too short, however do you expand it? It's wired into every string-using program out there! At least nul-terminated strings need no expansion."

Come on.

You'd naturally use 32 bits on 32 bit systems for string length. And by a strange coincidence, that's the maximum amount of contiguous RAM that you can address on 32-bit systems. On 64-bit systems, you'd naturally use 64-bit counter.

Null-Terminated Strings

Posted Nov 18, 2010 16:40 UTC (Thu) by nix (subscriber, #2304) [Link] (3 responses)

Yes, I talked about that as well. Nice to know that reading five paragraphs of text is too much for you.

Null-Terminated Strings

Posted Nov 18, 2010 17:04 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

I read it completely. However, you point about: "The biggest downside is probably a long-term ABI problem. The scheme is inflexible. If your Pascal string-length header is too short, however do you expand it? It's wired into every string-using program out there! At least nul-terminated strings need no expansion" is not correct.

Null-Terminated Strings

Posted Nov 25, 2010 13:10 UTC (Thu) by renox (guest, #23785) [Link] (1 responses)

I agree with him that Pascal's strings are inflexible: think about two computer communicating together one with a 32-bit CPU, one with a 64-bit CPU, if you use a word as a length, you have an issue with Pascal's strings,
but C-strings don't care..

Null-Terminated Strings

Posted Nov 25, 2010 16:22 UTC (Thu) by vonbrand (guest, #4458) [Link]

No, you haven't... (Original) Pascal "strings" were just (packed) arrays of characters of a fixed length.

/me ducks and runs for cover

Null-Terminated Strings

Posted Nov 19, 2010 11:24 UTC (Fri) by job (guest, #670) [Link]

A modern string handling library would have to handle different character sets and different encondings as well, so there's already metadata to be stored with every string.

If memory efficiency is a problem for you, multibyte encodings is a much worse problem than storing string length. But UTF-8/16 is here to stay, there is simply no competition. I think we have to accept it.

Null-Terminated Strings

Posted Nov 18, 2010 9:50 UTC (Thu) by stijn (guest, #570) [Link]

Does it not give an easy fuzzing attack? For anything that parses an input stream, the presence of nul bytes in that stream can lead to very unpredictable results unless one is really careful. Additionally, it is painful to have a string version (str) and a byte array version (mem) of everything, especially with a richer API (e.g. splice(), substr(), squash()). I've come to the conclusion that keeping length alongside the array is the only sane solution. Perhaps that already commits it too much down one path, so that it does not properly belong in the C library. By now I think the best is to have a byte-array API, and leave it up to the user of that API whether they want to keep it C-string compatible. If the keeping-length overhead is unacceptable, it is possible to do the string manipulations painlessly with the more generic API, and isolate a classic C-string as the very last step.

Null-Terminated Strings

Posted Nov 18, 2010 15:38 UTC (Thu) by etienne (guest, #25256) [Link] (3 responses)

> I actually think nul terminated strings are simple and elegant and work.

And you can also combine them to do things like:
enum {lang_english, lang_french, lang_german} current_language = lang_french;
const char mltstr_language[] = "english\0francais\0deutch\0"
const char *curlang(const char *mltstr)
{
/* select the right sub-string depending on current_language */
}
void fct(void)
{
printf ("LANG=%s", curlang(mltstr_language))
}
It saves *a lot of space* ; having strings, (aligned) pointers arrays everywhere, and worse having (aligned) size for pascal strings takes easily more memory than the program code and data altogether.

Null-Terminated Strings

Posted Nov 18, 2010 17:24 UTC (Thu) by pr1268 (subscriber, #24648) [Link] (2 responses)

I like your code example, but it might only work in C (not C ).

Two cases in point:

  • Using the enum value as an array index might give unpredictable results since C treats enumerations as a distinct type (instead of int as in C)1
  • The C standard library string can have '\0' characters anywhere inside the string (which may also lead to unpredictable behavior at runtime)2. Of course, you're referring to a C-style string, so this may be a moot point.

1 Stroustrup, B. The C Programming Language, Special Edition, p. 77
2 Ibid, p. 583

Null-Terminated Strings

Posted Nov 19, 2010 10:58 UTC (Fri) by etienne (guest, #25256) [Link]

The enum is only used like (to have empty substring default to english), so no problem with its C size:

const char *curlang(const char *mltstr)
{
const char *ptr = mltstr;
for (unsigned cptlang = 0; cptlang < current_language; cptlang )
while (*ptr ) {}
return (*ptr)? ptr : mltstr;
}

Oviously none of the substrings can have embedded zero char.

A C line of code like:
cout << "The " << big? "big " : "small " << "dog is " << age << " year old.";
needs an efficient storage for small strings, even more when doing a multi language software.

Null-Terminated Strings

Posted Nov 20, 2010 1:03 UTC (Sat) by cmccabe (guest, #60281) [Link]

> I like your code example, but it might only work in C (not C ).

Sorry, you are confused. It works in both C and C.

> Using the enum value as an array index might give unpredictable results
> since C treats enumerations as a distinct type (instead of int as in C)1

Nope.

Here the enum is promoted to an integer. C , like C, promotes a lot of types to integers under the right situations.

> The C standard library string can have '\0' characters anywhere inside
> the string (which may also lead to unpredictable behavior at runtime)2. Of
> course, you're referring to a C-style string, so this may be a moot point.

There is no std::string in this example. You are confused.

Null-Terminated Strings

Posted Nov 18, 2010 12:44 UTC (Thu) by Kwi (subscriber, #59584) [Link] (2 responses)

A better suggestion might be D-style strings, which are dynamic arrays of char. In D, a dynamic array is a (pointer, length) tuple. This gives you the ability to work on substrings without having to allocate new memory, since a substring is nothing more than a new reference to the same character data.

(Incidentally, Java strings work the same way behind the scenes, but are immutable, which I guess is what you object to when you call them inefficient?)

Of course, one problem with this suggestion is that it doubles the size of a string reference (8 bytes on 32-bit architechtures, 16 bytes on 64 bit architechtures).

Null-Terminated Strings

Posted Nov 18, 2010 16:14 UTC (Thu) by pr1268 (subscriber, #24648) [Link] (1 responses)

> (Incidentally, Java strings work the same way behind the scenes, but are immutable, which I guess is what you object to when you call them inefficient?)

Exactly. And, my semi-rhetorical question immediately after that ("is there a better way?") begs the question of whether the Sun engineers who developed the Java language imposed that immutability for thread safety (since thread safety was/is a primary goal of the Java language). I don't know for sure; just going off intuition here.

Null-Terminated Strings

Posted Nov 19, 2010 10:07 UTC (Fri) by mfedyk (guest, #55303) [Link]

python has this immutable storage for values as well, but then the stopped and made the GIL...

Null-Terminated Strings

Posted Nov 18, 2010 18:03 UTC (Thu) by nevyn (guest, #33129) [Link]

I do: http://www.and.org/ustr/

I think it solves almost all the "normal" problems people have with non-nil terminated strings:

1. You can easily allocate them on the stack.

2. You can easily allocate them in constant memory.

3. "" and "x" don't have overhead in the 1,000% range (depending on how you count).

...but still has good solutions to the nil terminated strings problems, in that it allows you to have know the allocated size and length used (and put \0 in your string).

Saying that, the solution was far from obvious ... so while I think it would have been usable in the 1970s, using NIL terminated strings was much more obvious.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 10:27 UTC (Wed) by rusty (guest, #26) [Link]

I've always found signal_fd a bit confusing. After all, the standard method of handling signals in server programs has long been to write to a pipe in the signal hander. Otherwise you need pselect and related nonsense. But for normal signals signal_fd didn't add anything except performance.

Cheers,
Rusty.

What Do We Want? LWN Supporter Subscription!
When Do We Want it? Before LCA 2011!

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 17, 2010 23:14 UTC (Wed) by Yorick (guest, #19241) [Link] (3 responses)

The articles in this series progress very neatly from minor inelegancies to serious mistakes. We can only speculate what unspeakable horrors the next installment will feature for our entertainment. The TTY system, perhaps? STREAMS? Or maybe telldir/seekdir?

telldir/seekdir

Posted Nov 18, 2010 16:36 UTC (Thu) by pr1268 (subscriber, #24648) [Link] (2 responses)

Just curious, why do you consider telldir()/seekdir() to be "unspeakable horrors"? Their interfaces are simple and straightforward, and even their manual pages are easy to read/understand.

telldir/seekdir

Posted Nov 18, 2010 17:19 UTC (Thu) by foom (subscriber, #14868) [Link] (1 responses)

Because of the requirements they impose on filesystem implementations: what happens when you add/delete files from the directory you have open. And with a saved position from "telldir"? How can you stuff enough information into a "long" to allow a stable iteration position in the face of concurrent modification of the directory contents? It's just a pain in the ass to implement.

And it's so tricky and so unused, that the implementation was actually horribly broken from its inception in BSD until 2008, 25 years later!

http://www.vnode.ch/fixing_seekdir

telldir/seekdir

Posted Nov 18, 2010 17:31 UTC (Thu) by pr1268 (subscriber, #24648) [Link]

Ahh yes, I almost forgot about those cases. In fact, there was some discussion here on LWN about the difficulties of these in the context of UnionFS. Thanks for jogging my memory.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 8:03 UTC (Thu) by zmi (guest, #4829) [Link] (21 responses)

It's always very amusing to read about problems with permissions and ACLs within Unix and Windows. I cannot understand why developers don't look at the model Novell developed for their Netware systems in the 1980's already.

In Netware, you could quickly define an unlimited number of users/groups to a dir/file with any privileges that should be available, and you are finished. The filesystem did *not* have to go down to every file and write that ACL/permission there.

I remember we had a big hierarchical tree, with every department having their working dir, and within that could define another department having access to some subsdirs if wanted. Like this, everything was secure, and every needed access was quickly possible.

Really, if someone would use that approach for a Linux filesystem, the world would be easier and better. Maybe the btrfs devs read this, then they should look at Netware 3, which already had this neat ACL solution.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 9:01 UTC (Thu) by Fowl (subscriber, #65667) [Link] (10 responses)

Did I miss the explanation of why Windows ACLs are so horrendous?

A list of users/groups and their permissions to this object (and optionally its' children) seems pretty straight forward to me.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 9:41 UTC (Thu) by zmi (guest, #4829) [Link] (9 responses)

> Did I miss the explanation of why Windows ACLs are so horrendous?

Maybe you've never had a big file system. Take this example:

Company with 900 employees, 40TB storage in about 40 departments, with a total of 100 million files.
Now you have a hierachical structure, each dept. has it's own dir, and below that you have other dirs shareable with other depts.

And then someone needs to set a new permission for a top-level dir. Both Unix and Windows ask to write those permissions to all files below that dir. If there are 10 million entries, the session will be blocked for a pretty long time.

Novells Netware didn't have that: Set a new permission, done within the second. I don't know how they stored permissions, but it never depended on the amount of data below that dir.

Also, in Unix and Windows there's a mix of permissions from a share and permissions to a file. In Netware you assigned a right, at that's it. Much easier to review.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 10:14 UTC (Thu) by Fowl (subscriber, #65667) [Link] (8 responses)

> If there are 10 million entries, the session will be blocked for a pretty long time.

Ah, I get you. It's the implementation that's the problem, not the concept.

> Both Unix and Windows ask to write those permissions to all files below that dir.

I'm fairly certain Explorer (the Windows shell) uses the most naive method possible for applying permissions.

> Also, in Unix and Windows there's a mix of permissions from a share and permissions to a file. In Netware you assigned a right, at that's it. Much easier to review.

For as long as I can remember giving full access to shares to "Everyone", and then using filesystem permisions has been the recommended practise. It is useful occasionally for enforcing "no remote access" policies, etc.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 10:42 UTC (Thu) by zmi (guest, #4829) [Link] (4 responses)

> For as long as I can remember giving full access to shares to "Everyone", and then using filesystem permisions has been the recommended practise.

And it brings the feature "you see the share, but clicking on it tells you you can't access it". Again it's the implementation that's wrong: If I have no right on it anyway, don't display it. Seems to be a lazyness of programmers to have chosen this way.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 13:55 UTC (Thu) by mpr22 (subscriber, #60784) [Link] (2 responses)

And it brings the feature "you see the share, but clicking on it tells you you can't access it". Again it's the implementation that's wrong: If I have no right on it anyway, don't display it. Seems to be a lazyness of programmers to have chosen this way.

Counterpoint: /bin/ls lists the names of directories not owned by the user it's running as whose access control mode is 0700 (user rwx, all others forbidden).

Access Control: take them from Novell Netware

Posted Nov 18, 2010 14:04 UTC (Thu) by dskoll (subscriber, #1630) [Link] (1 responses)

Counterpoint: /bin/ls lists the names of directories not owned by the user it's running as whose access control mode is 0700 (user rwx, all others forbidden).

Which is perfectly correct behavior according to the way UNIX permissions are defined. The ability to list names in a directory is controlled only by the r bit of the directory itself.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 14:16 UTC (Thu) by zmi (guest, #4829) [Link]

ls is a technical unix command, not a user tool (which "clicki-clicki" mouse user knows ls?). ls must show everything, and it follows the kiss principle (keep it small and simple).

Using a graphical dir browser like Dolphin could hide such unreadable contents, that would be nice, as normally users don't need to see that. Should be a config option.

Browsing a server over the network is about 20 years younger "command", solving completely different needs, and it would help security a bit if shares not accessible are not seen by a user. But by the time Microsoft reinvented networking, they did not have the slightest clue about security (and I'd say that only started with Win7, where a user can work as user not admin). Maybe we'll see that improvement once someone at Microsoft gets the idea. Or maybe the Samba team can implement a setting to hide this, and later MS adopts it as it's clever.

Access Control: take them from Novell Netware

Posted Nov 21, 2010 0:27 UTC (Sun) by Fowl (subscriber, #65667) [Link]

That issue seems completely unrelated.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 17:57 UTC (Thu) by davecb (subscriber, #1574) [Link]

Also from Multics: it is relatively parsimonious, and has the concept of "initial acls", so it really only needs to store acls that are different from the iacl (or from the base acl of the tree).

--dave

Access Control: take them from Novell Netware

Posted Nov 18, 2010 20:38 UTC (Thu) by jra (subscriber, #55261) [Link] (1 responses)

> Ah, I get you. It's the implementation that's the problem, not the concept.

No, with Windows ACLs it's the concept.

Look at this:

http://www.pcguide.com/ref/hdd/file/ntfs/secRes-c.html

as an example. Explain that to a user. Don't forget to include why the sort order of DENY's ACE's depends on where in the file hierarchy they came from.

Good luck ! :-)

Jeremy.

Access Control: take them from Novell Netware

Posted Nov 21, 2010 0:31 UTC (Sun) by Fowl (subscriber, #65667) [Link]

You recurse up the tree, until you find an applicable entry, with deny taking precedence over allows.

How is that complicated?

Access Control: take them from Novell Netware

Posted Nov 18, 2010 9:05 UTC (Thu) by Fowl (subscriber, #65667) [Link] (9 responses)

(Oops, wrong reply button.)

---

Could you explain the Netware model a bit more? It just sounds like ACLs to me.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 10:38 UTC (Thu) by zmi (guest, #4829) [Link] (8 responses)

Yes, but not stored within the filesystem, I believe. It never took more than a millisecond to assign a right, no matter how much data was below this dir.

Also, the way you assigned rights was simple: take an object (user, group, department, etc.), assign it to a dir with rights, and specify if it's for subdirs as well or only this dir.

And when you didn't have a right on a dir, you didn't even see it. I dislike the Windows approach of seeing a share, and upon click you get the info "no permission". That's just stupid.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 11:09 UTC (Thu) by dgm (subscriber, #49227) [Link]

That would be a nice addition to file managers like Nautilus or Dolphin. Not even showing whatever you cannot open would remove much clutter.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 18:43 UTC (Thu) by jeremiah (subscriber, #1221) [Link] (1 responses)

So permissions were only on the directory? and there were no file specific ACL?

Access Control: take them from Novell Netware

Posted Nov 18, 2010 22:06 UTC (Thu) by zmi (guest, #4829) [Link]

You could make file ACLs also. But if you specified a dir ACL, it was taken for each file in that dir automatically. That makes sense, as most things are done on a per-dir base anyway, right? At least, if you have a system supporting it that way, you automatically use that approach to order things in directories, as it makes life - and administration! - much easier.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 19:05 UTC (Thu) by jeremiah (subscriber, #1221) [Link] (4 responses)

One ACL approach I tried to take once, but the product got dropped before I could see the problems was the following:

you had a file/object, and a list of permissions/security attributes for each object. Object could be a group of objects, but group depth was not a concern. Mutiple applications (controlled by us) could access the permissions, and make decisions based on what they found. If there was a permission that they didn't understand, access was not allowed. This was a situation where we could trust the apps, and not the people. We also took the approach that permissions were subtractive. Everything started as readable/writable and access could only be removed. The nice thing about this was that it was extendable.

This isn't relevant to Novell ACL's just trying to get people's thoughts.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 22:09 UTC (Thu) by zmi (guest, #4829) [Link] (3 responses)

From a security point of view, I don't like it. A system should deny everything, and only allow what I explicitly allow. The "default everybody everything yes" way you describe is so Windows, and it's for this reason most viruses are for this system today.

Access Control: take them from Novell Netware

Posted Nov 18, 2010 23:41 UTC (Thu) by jeremiah (subscriber, #1221) [Link] (1 responses)

but it seems much harder to administer the other way around. Once something is marked as inaccessible, that's it. You get to stop looking. Where as it seems like when something is marked as visible you have to establish some sort of hierarchy in case a parent thinks it shouldn't be visible. Which would be indicated by nothing being set. Or you run into a situation like unix where you have permissions going either direction and you have to again determine which overrides which. I guess that would be a fail safe as opposed to a fail open though, which I prefer. But SELinux is a clear demonstration of how complicated things can get if you do it in a complete fashion. Starting with the idea that everything is hidden from everything first, and then transitions are made between them. Yet the bail when it comes to initrc, and almost mark everything as visible first.

Access Control: take them from Novell Netware

Posted Nov 19, 2010 13:19 UTC (Fri) by jeremiah (subscriber, #1221) [Link]

I feel the urge to clarify my initrc comment. Although it's been a while since I dealt with it, here's what I remember, and some context. I run a payment gateway, so we decided to use SELinux to enforce a true division of roles. We made root a second class citizen to the role a user belonged to. The most difficult part of doing this was that root could transition through rpm_t into initrc_t into any other role on the system. The idea, I think, being that root should be able to install packages, and packages, if they were related to a service, should be able to restart themselves. This had the unwanted effect of giving root the ability to transition to just about anything. Trying to remove the 20 bazillion independent transition paths took a hard 2 weeks. This was with the reference policy, and not a vendor supplied policy, which is much more strict than the strict policy. What it really boiled down to, is what it always boils down to in the end. That delicate balance between usability, and security. In the end it was doable, but it wasn't easy.

I think SELinux is amazingly complete. It allowed us to implement a solution that always requires 2 users, from a group of 3. You throw LUKS, encrypted drives, and removable media into the mix, and you have as close to a bullet proof scenario as possible. On the other hand, I don't want to have to write code that the average admin can't administer without spending a month dealing with a sharp learning curve.

Like a lot of us here I'm a developer, and a system administrator. When I have my development hat on I try to think of the user, and what they have to put up with, while balancing it with security requirements etc. As an administrator, I know I'm willing to tolerate more than most users. The difficult part for me, is defining my target audience, and understanding their abilities and tolerance, and shooting for that. And sometimes the perfect solution, has to be hobbled security wise, or the product won't sell. The only way I've found to begin addressing that is though intelligent defaults, and meaningful dialogs/user interaction.

I am intrigued by the Netware ACL's though, since you seem to have found a happy place when dealing with them as opposed to other permission systems. Thanks for the input.

Access Control: take them from Novell Netware

Posted Nov 21, 2010 0:35 UTC (Sun) by Fowl (subscriber, #65667) [Link]

The reason that most viruses are for Windows is the user, plain and simply the huge number of "users". </OT>

If you don't find a specific ACE allowing you access, you don't have access.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 18, 2010 23:11 UTC (Thu) by skissane (subscriber, #38675) [Link] (2 responses)

Regarding permissions, I think they ideally should not be stored with the files, but in some kind of separate security database. Then, one can easily review and manage them in aggregate. Much safer than littering the file system with them.

I don't think evaluating access control should belong in the kernel. Policy questions like this should be handled by a trusted user space process. When a process tries to access a resource, the kernel should ask the security daemon if it is OK. The security daemon can do whatever you want to answer that, and then just tell the kernel permit or deny. There could be several different security daemons to choose from, depending on needs. A smartphone, or home users desktop, will have very different needs from a server operated by an intelligence agency - rather than one system to fit all needs, multiple systems for different needs may be better.

In principle, what I am suggesting is similar to how I understand RACF (and TopSecret and ACF2) work on IBM mainframes. (OK, not all the associated mainframe-warts, but I think the basic idea is good.) The security software is an add-on to the base OS, the base OS just exposes standard hooks for the security software to integrate with.

To avoid the performance hit of constantly context-switching to the security daemon, the kernel should have some kind of cache. That way, first time process tries to access a resource, kernel asks security daemon for OK. If daemon says yes, then kernel remembers and doesn't need to ask again. If everything is a file descriptor, a good time to do this would be at OPEN time - the kernel asks once when FD is open what access will be permitted for that FD, then remembers that for duration of FD.

I suppose this is the classical capability architecture - if everything is an FD (or in Windows-ese, a "handle"), then we can merge the concepts of FDs with capabilities. And a security daemon is then used at FD/handle creation time to determine for the lifetime of the FD/handle what it can do. It might need to be consulted occasionally once FD is created - e.g. is it OK to pass this FD to another process? Also, if the security permissions are revoked while FD is open, the daemon should be able to ask kernel to forcibly close, or downgrade the rights, of an FD it earlier approved to be opened.

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 20, 2010 0:32 UTC (Sat) by giraffedata (guest, #1954) [Link]

You've hit the nail on the head better than any of the other comments or the article itself, by talking not about what the right permission scheme for all future applications is, but a fixable design that lets us recover if we pick the wrong permission scheme today.

Incidentally, I think the separateness of RACF happened out of necessity more than architecture. The filesystem formats were already cast in stone with no concept of permissions whatsoever in them. There was no concept of a user identity either. I don't know if designers of RACF considered building all that into the supervisor code and felt it would be less fixable that way or just that it would be harder, but I do like the result.

RACF and its alternatives also encompass resources other than files.

I think there are plentiful examples of this on Linux too, but I don't follow those things. Selinux? AppArmor?

Ghosts of Unix past, part 3: Unfixable designs

Posted Nov 25, 2010 20:18 UTC (Thu) by slashdot (guest, #22014) [Link]

Linux does this, although security is handled by kernel modules ("LSM"s) instead of daemons.

AppArmor has separately-stored policy, while SELinux has separately-stored policy which is however automatically baked into the filesystem.

The real problem Linux has is that nobody seems to have the interest, authority and/or ability to figure out the optimal security model to use, so there are several ones in wide use, but none is actually polished and widespread.

Also, security UI and user-friendliness work seems quite lacking, with the result that advanced security often gets just turned off and even if enabled, only distribution-provided policies tend to be used.

Six bytes?

Posted Nov 26, 2010 13:16 UTC (Fri) by Ross (guest, #4065) [Link] (3 responses)

I think you mean the basic rwxrwxrwx permission bits. That's two bits less than a byte, not six bytes :)

Six bytes?

Posted Nov 26, 2010 15:02 UTC (Fri) by cladisch (✭ supporter ✭, #50193) [Link] (2 responses)

I'd really like to know on what architecture bytes have eleven bits. ;-)

The user and group bits wouldn't make much sense without the corresponding IDs.

Six bytes?

Posted Nov 29, 2010 19:57 UTC (Mon) by Ross (guest, #4065) [Link] (1 responses)

True I fail to multiply 3 by 3 :) So it's nine bits. The suid, sticky, and sgid make it twelve if you count them -- one and a half bytes.

However if you are going to count uid and gid those then it started with 2 2 1.5 = 5.5 bytes (sounds like the count in the article I agree), but it moved to 4 4 1.5 = 9.5 bytes. Since the point was that this was something which can't be changed (and hasn't except for POSIX ACLs which are mostly ignored) I don't think that's what the author meant.

But maybe it's true. He seems to be responding to comment so he can clear it up very easily.

Six bytes?

Posted Nov 29, 2010 21:20 UTC (Mon) by neilbrown (subscriber, #359) [Link]

Yes, that is what I meant.

As you say, permission information only uses 5bytes and 1 bit (setuid etc are not permission bits, they are really the 'type' of the object and so are is some ways more closely related to IF_REG etc). Being that precise in the article would have been excessive I think. It is still true that the permissions were stored in 6 bytes. It is just that some room was left over for file type as well.

POSIX ACLs may well be mostly ignored, but ACLs are still the only direction being explored for making the permission model more complete. My point was simply that they have a storage cost which gets worse quickly, but worse than that it has a serious usability cost.

The "impossible"

Posted Nov 26, 2010 14:04 UTC (Fri) by Ross (guest, #4065) [Link] (1 responses)

The article says in reference to applying complex group-based permissions: "The simple is certainly simple, but the complex is truly impossible."

Clearly you haven't had to find a way to do it. :) There are a few different ways.

The easiest way to be able to apply permissions for more than one group is to create many additional groups which are unions of the others by putting the users in them. Yes, now you get to maintain these. It's best to write a tool to generate them.

If you want intersections you can do that with groups too, or by nesting subdirectories and applying the traversal permissions for each group to those.

But what if you want to mix read permission for one set of groups with write permission for another?

Well, you have to use the file's real write bit and group owner for the write permission since that's the only way to control it traditionally. Then use the parent directory's permissions to prevent read access from anyone not in the second set of groups and set the file's world-readability bit.

If you want to grant execute permission to a third set of groups -- that's a problem. That one really is impossible but execute doesn't mean much if you can read something (and it's not suid or sgid).

Please don't think I'm saying any of that is nice or preferable to POSIX or Windows ACLs, because it clearly sucks horribly for so many use cases, but it does should that it is possible to apply read/write permissions to arbitrary sets of groups if you're willing to deal with this kind of setup.

The "impossible"

Posted Nov 26, 2010 22:23 UTC (Fri) by neilbrown (subscriber, #359) [Link]

If you actually had to implement something like this - which the second line seems to suggest, you have my sympathies!

Yes: it does seem that it was a slight over-statement to say "impossible". If you have unlimited groups per user, allow users to create their own groups, and don't worry too much about giving new access to already-running processes, then many complex things are indeed possible.

Maybe we need a different maxim: "simple things should be simple, complex things shouldn't drive you insane" !

Thanks for your thoughts.


Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds