Ghosts of Unix past, part 3: Unfixable designs
In the second installment of this series, we documented two designs that were found to be imperfect and have largely (though not completely) been fixed through ongoing development. Though there was some evidence that the result was not as elegant as we might have achieved had the original mistakes not been made, it appears that the current design is at least adequate and on a path towards being good.
However, there are some designs mistakes that are not so easily corrected. Sometimes a design is of such a character that fixing it is never going to produce something usable. In such cases it can be argued that the best way forward is to stop using the old design and to create something completely different that meets the same need. In this episode we will explore two designs in Unix which have seen multiple attempts at fixes but for which it isn't clear that the result is even heading towards "good". In one case a significant change in approach has produced a design which is both simpler and more functional than the original. In the other case, we are still waiting for a suitable replacement to emerge. After exploring these two "unfixable designs" we will try to address the question of how to distinguish an unfixable design from a poor design which can, as we saw last time, be fixed.
Unix signals
Our first unfixable design involves the delivery of signals to processes. In particular it is the registration of a function as a "signal handler" which gets called asynchronously when the signal is delivered. That this design was in some way broken is clear from the fact that the developers at UCB (The University of California at Berkeley, home of BSD Unix) found the need to introduce the sigvec() system call, along with a few other calls, to allow individual signals to be temporarily blocked. They also changed the semantics of some system calls so that they would restart rather than abort if a signal arrived while the system call was active.
It seems there were two particular problems that these changes tried to address. Firstly there is the question of when to re-arm a signal handler. In the original Unix design a signal handler was one-shot - it would only respond the first time a signal arrived. If you wanted to catch a subsequent signal you would need to make the signal handler explicitly re-enable itself. This can lead to races, such as, if a signal is delivered before the signal handler is re-enabled it can be lost forever. Closing these races involved creating a facility for keeping the signal handler always available, and blocking new deliveries while the signal was being processed.
The other problem involves exactly what to do if a signal arrives while a system call is active. Options include waiting for the system call to complete, aborting it completely, allowing it to return partial results, or allowing it to restart after the signal has been handled. Each of these can be the right answer in different contexts; sigvec() tried to provide more control so the programmer could choose between them.
Even these changes, however, were not enough to make signals really usable, so the developers of System V (at AT&T) found the need for a sigaction() call which adds some extra flags to control the fine details of signal delivery. This call also allows a signal handler to be passed a "siginfo_t" data structure with information about the cause of the signal, such as the UID of the process which sent the signal.
As these changes, particularly those from UCB, were focused on providing "reliable" signal delivery, one might expect that at least the reliability issues would be resolved. Not so it seems. The select() system call (and related poll()) did not play well with signals so pselect() and ppoll() had to be invented and eventually implemented. The interested reader is encouraged to explore their history. Along with these semantic "enhancements" to signal delivery, both teams of developers chose to define more signals generated by different events. Though signal delivery was already problematic before these were added, it is likely that these new demands stretched the design towards breaking point.
An interesting example is SIGCHLD and SIGCLD, which are sent when a child exits or is otherwise ready for the parent to wait() for it. The difference between these two (apart from the letter "H" and different originating team) is that SIGCHLD is delivered once per event (as is the case with other signals) while SIGCLD would be delivered constantly (unless blocked) while any child is ready to be waited for. In the language of hardware interrupts, SIGCHLD is edge triggered while SIGCLD is level triggered. The choice of a level-triggered signal might have been an alternate attempt to try to improve reliability. Adding SIGCLD was more than just defining a new number and sending the signal at the right time. Two of the new flags added for sigaction() are specifically for tuning the details of handling this signal. This is extra complexity that signals didn't need and which arguably did not belong there.
In more recent years the collection of signal types has been extended to include "realtime" signals. These signals are user-defined signals (like SIGUSR1 and SIGUSR2) which are only delivered if explicitly requested in some way. They have two particular properties. Firstly, realtime signals are queued so the handler in the target process is called exactly as many times as the signal was sent. This contrasts with regular signals which simply set a flag on delivery. If a process has a given (regular) signal blocked and the signal is sent several times, then, when the process unblocks the signal, it will still only see a single delivery event. With realtime signals it will see several. This is a nice idea, but introduced new reliability issues as the depth of the queue was limited, so signals could still be lost. Secondly (and this property requires the first), a realtime signal can carry a small datum, typically a number or a pointer. This can be sent explicitly with sigqueue() or less directly with, e.g., timer_create().
It could be thought that this addition of more signals for more events is a good example of the "full exploitation" pattern that was discussed at the start of this series. However, when adding new signal types require significant changes to the original design, it could equally seem that the original design wasn't really strong enough to be so fully exploited. As can be seen from this retrospective, though the original signal design was quite simple and elegant, it was fatally flawed. The need to re-arm signals made them hard to use reliably, the exact semantics of interrupting a system call was hard to get right, and developers repeatedly needed to significantly extend the design to make it work with new types of signals.
The most recent step in the saga of signals is the signalfd() system call which was introduced to Linux in 2007 for 2.6.22. This system call extends "everything has a file descriptor" to work for signals too. Using this new type of descriptor returned by signalfd(), events that would normally be handled asynchronously via signal handlers can now be handled synchronously just like all I/O events. This approach makes many of the traditional difficulties with signals disappear. Queuing becomes natural so re-arming becomes a non-issue. Interaction with system calls ceases to be interesting and an obvious way is provided for extra data to be carried with a signal. Rather than trying to fix a problematic asynchronous delivery mechanism, signalfd() replaces it with a synchronous mechanism that is much easier to work with and which integrates well into other aspect of the Unix design - particularly the universality of file descriptors.
It is a fun, though probably pointless, exercise to imagine what the result might have been had this approach been taken to signals when problems were first observed. Instead of adding new signal types we might have new file descriptor types, and the set of signals that were actually used could have diminished rather than grown. Realtime signals might instead be a general and useful form of interprocess communication based on file descriptors.
It should be noted that there are some signals which signalfd() cannot be used for. These include SIGSEGV, SIGILL, and other signals that are generated because the process tried to do something impossible. Just queueing these signals to be processed later cannot work, the only alternatives are switching control to a signal handler, or aborting the process. These cases are handled perfectly by the original signal design. They cannot occur while a system call is active (system calls return EFAULT rather than raising a signal) and issues with when to re-arm the signal handler are also less relevant.
So while signal handlers are perfectly workable for some of the early use cases (e.g. SIGSEGV) it seems that they were pushed beyond their competence very early, thus producing a broken design for which there have been repeated attempts at repair. While it may now be possible to write code that handles signal delivery reliably, it is still very easy to get it wrong. The replacement that we find in signalfd() promises to make event handling significantly easier and so more reliable.
The Unix permission model
Our second example of an unfixable design which is best replaced is the owner/permission model for controlling access to files. A well known quote attributed to H. L. Mencken is "there is always a well-known solution to every human problem - neat, plausible, and wrong." This is equally true of computing problems, and the Unix permissions model could be just such a solution. The initial idea is deceptively simple: six bytes per file gives simple and broad access control. When designing an operating system to fit in 32 kilobytes of RAM (or less), such simplicity is very appealing, and thinking about how it might one day be extended is not a high priority, which is understandable though unfortunate.
The main problems with this permission model is that it is both too simple and too broad. The breadth of the model is seen in the fact that every file stores its own owner, group owner, and permission bits. Thus every file can have distinct ownership or access permissions. This is much more flexibility than is needed. In most cases, all the files in a given directory, or even directory tree have the same ownership and much the same permissions. This fact was leveraged by the Andrew filesystem which only stores ownership and permissions on a per-directory basis, with little real loss of functionality.
When this only costs six bytes per file it might seem a small price to pay for the flexibility. However once more than 65,536 different owners are wanted, or more permission bits and more groups are needed, storing this information begins to become a real cost. However the bigger cost is in usability.
While a computer may be able to easily remember six bytes per file, a human cannot easily remember why various different settings might have been assigned and so are very likely to create sets of permission settings which are inconsistent, inappropriate, and hence not particularly secure. Your author has memories from University days of often seeing home directories given "0777" permissions (everyone has any access) simply because a student wanted to share one file with a friend, but didn't understand the security model.
The excessive simplicity of the Unix permission model is seen in the fixed, small number of permission bits, and, particularly, that there is only one "group" that can have privileged access. Another maxim from computer engineering, attributed to Alan Kay, is that "Simple things should be simple, complex things should be possible." The Unix permission model makes most use cases quite simple but once the need exceeds that common set of cases, further refinement becomes impossible. The simple is certainly simple, but the complex is truly impossible.
It is here that we start to see real efforts to try to "fix" the model. The original design gave each process a "user" and a "group" corresponding to the "owner" and "group owner" in each file, and they were used to determine access. The "only one group" limit is limiting on both sides; the Unix developers at UCB saw that, for the process side at least, this limit was easy to extend. They allowed a process to have a list of groups for checking filesystem access against. (Unfortunately this list originally had a firm upper limit of 16, and that limit made its way into the NFS protocol where it was hard to change and is still biting us today.)
Changing the per-file side of this limit is harder as that requires changing the way data is encoded in a filesystem to allow multiple groups per file. As each group would also need its own set of permission bits a file would need a list of groups and permission bits and these became known quite reasonably as "access control lists" or ACLs. The Posix standardization effort made a couple of attempts to create a standard for ACLs, but never got past draft stage. Some Unix implementations have implemented these drafts, but they have not been widely successful.
The NFSv4 working group (under the IETF umbrella) were tasked with creating a network filesystem which, among other goals, would provide interoperability between POSIX and WIN32 systems. As part of this effort they developed yet another standard for ACLs which aimed to support the access model of WIN32 while still being usable on POSIX. Whether this will be more successful remains to be seen, but it seems to have a reasonable amount of momentum with an active project trying to integrate it into Linux (under the banner of "richacls") and various Linux filesystems.
One consequence of using ACLs is that the per-file storage space needed to store the permission information is not only larger than six bytes, it is not of a fixed length. This is, in general, more challenging than any fixed size. Those filesystems which implement these ACLs do so using "extended attributes" and most impose some limit on the size of these - each filesystem choosing a different limit. Hopefully most ACLs that are actually used will fit within all these arbitrary limits.
Some filesystems - ext3 at least - attempt to notice when multiple
files have the same extended attributes and just store a single copy of those
attributes, rather than one copy for each file. This goes some way to
reduce the space cost (and access-time cost) of larger ACLs that can
be (but often aren't) unique per file, but does nothing to address the
usability concerns mentioned earlier.
In that context, it is worth quoting Jeremy Allison, one of the main
developers of Samba, and so with quite a bit of experience with ACLs
from WIN32 systems and related interoperability issues.
He
writes: "But Windows ACLs are a nightmare beyond human
comprehension :-). In the 'too complex to be usable' camp.
"
It is worth reading the context and follow up to get a proper picture,
and remembering that richacls, like NFSv4 ACLs, are largely based on
WIN32 ACLs.
Unfortunately it is not possible to present any real example of replacing rather than fixing the Unix permission model. One contender might be that part of "SELinux" that deals with file access. This doesn't really aim to replace regular permissions but rather tries to enhance them with mandatory access controls. SELinux follows much the same model of Unix permissions, associating a security context with every file of interest, and does nothing to improve the usability issues.
There are however two partial approaches that might provide some perspective. One partial approach began to appear in Level 7 Unix with the chroot() system call. It appears that chroot() wasn't originally created for access control but rather to have a separate namespace in which to create a clean filesystem for distribution. However it has since been used to provide some level of access control, particularly for anonymous FTP servers. This is done by simply hiding all the files that the FTP server shouldn't access. Anything that cannot be named cannot be accessed.
This concept has been enhanced in Linux with the possibility for each process not just to have its own filesystem root, but also to have a private set of mount points with which to build a completely customized namespace. Further it is possible for a given filesystem to be mounted read-write in one namespace and read-only in another namespace, and, obviously, not at all in a third. This functionality is suggestive of a very different approach to controlling access permissions. Rather than access control being per-file, it allows it to be per-mount. This leads to the location of a file being a very significant part of determining how it can be accessed. Though this removes some flexibility, it seems to be a concept that human experience better prepares us to understand. If we want to keep a paper document private we might put it in a locked drawer. If we want to make it publicly readable, we distribute copies. If we want it to be writable by anyone in our team, we pin it to the notice board in the tea room.
This approach is clearly less flexible than the Unix model as the control of permissions is less fine grained, but it could well make up for that in being easier to understand. Certainly by itself it would not form a complete replacement, but it does appear to be functionality that is growing - though it is too early yet to tell if it will need to grow beyond its strength. One encouraging observation is that it is based on one of those particular Unix strengths observed in our first pattern, that of "a hierarchical namespace" which would be exploited more fully.
A different partial approach can be seen in the access controls used by the Apache web server. These are encoded in a domain-specific language and stored in centralized files or in ".htaccesss" files near the files that are being controlled. This method of access control has a number of real strengths that would be a challenge to encode into anything based on the Unix permission model:
-
The permission model is hierarchical, matching the filesystem
model. Thus controls can be set at whichever point makes most sense,
and can be easily reviewed in their entirety. When the controls
set at higher levels are not allowed to be relaxed at lower levels it
becomes easy to implement mandatory access controls.
-
The identity of the actor requesting access can be arbitrary,
rather than just from the set of identities that are known to the
kernel. Apache allows control based on source IP address or
username plus password. Using plug-in modules almost anything
else that could be available.
- Access can be provided indirectly through a CGI program. Thus, rather than trying to second-guess all possible access restrictions that might be desirable and define permission bits for them in a new ACL, the model can allow any arbitrary action to be controlled by writing a suitable script to mediate that access.
It should be fairly obvious that this model would not be an easy fit with kernel-based access checking and, in any case, would have a higher performance cost than a simpler model. As such it would not be suitable to apply universally. However it could be that such a model would be suitable for that small percentage of needs that do not fit in a simple namespace based approach. There the cost might be a reasonable price for the flexibility.
While an alternate approach such as these might be appealing, it would face a much bigger barrier to introduction than signalfd() did. signalfd() could be added as a simple alternate to signal handlers. Programs could continue to use the old model with no loss, while new programs can make use of the new functionality. With permission models, it is not so easy to have two schemes running in parallel. People who make serious use of ACLs will probably already have a bunch of ACLs carefully tuned to their needs and enabling an alternate parallel access mechanism is very likely to break something. So this is the sort of thing that would best be trialed in a new installation rather than imposed on an existing user-base.
Discerning the pattern
If we are to have a convincing pattern of "unfixable designs" it must be possible to distinguish them from fixable designs such as those that we found last time. In both cases, each individual fix appears to be a good idea addressing a real problem without obviously introducing more problems. In some case this series of small steps leads to a good result, in others these steps only help you get past the small problems enough to be able to see the bigger problem.
We could use mathematical terminology to note that a local maximum can be very different from a global maximum. Or, using mountain-climbing terminology, it is hard to know the true summit from a false summit which just gives you a better view of the mountain. In each case the missing piece is a large scale perspective. If we can see the big picture we can more easily decide if a particular path will lead anywhere useful or if it is best to head back to base and start again.
Trying to move this discussion back to the realm of software engineering, it is clear that we can only head off unfixable designs if we can find a position that can give us a clear and broad perspective. We need to be able to look beyond the immediate problem, to see the big picture and be willing to tackle it. The only known source of perspective we have for engineering is experience, and few of us have enough experience to see clearly into the multiple facets and the multiple levels of abstraction that are needed to make right decisions. Whether we look for such experience by consulting elders, by researching multiple related efforts, or finding documented patterns that encapsulate the experience of others, it is vitally important to leverage any experience that is available rather than run the risk of simply adding bandaids to an unfixable design.
So there is no easy way to distinguish an unfixable design from a fixable one. It requires leveraging the broad perspective that is only available through experience. Having seen the difficulty of identifying unfixable designs early we can look forward to the final part of this series, where we will explore a pernicious pattern in problematic design. While unfixable designs give a hint of deeper problems by appearing to need fixing, these next designs do not even provide that hint. The hints that there is a deeper problem must be found elsewhere.
Exercises
-
Though we found that signal handlers had been pushed well beyond their
competence, we also found at least one area (i.e. SIGSEGV) when they
were still the right tool for the job. Determine if there are other
use cases that avoid the observed problems, and so provide a balanced
assessment of where signal handlers are effective, and where they are
unfixable.
-
Research problems with "/tmp", attempts to fix them, any
unresolved issues, and any known attempts to replace rather than fix
this design.
-
Describe an aspect of the IP protocol suite that fits the pattern
of an "Unfixable design".
-
It has been suggested that dnotify, inotify, fanotify are all broken.
Research and describe the problems and provide an alternate design that
avoids all of those issues.
- Explore the possibility of using fanotify to implement an "apache-like" access control scheme with decisions made in user-space. Identify enhancements requires to fanotify for this to be practical.
Next article
Ghosts of Unix past, part 4: High-maintenance
designs
Index entries for this article | |
---|---|
Kernel | Development model/Patterns |
GuestArticles | Brown, Neil |
Posted Nov 16, 2010 16:10 UTC (Tue)
by bfields (subscriber, #19510)
[Link] (5 responses)
Actually, it's really just a copy of Windows ACLs as far as I can tell--different implementors have made different choices as to how to reconcile with POSIX.
The Richacl implementors (mainly Andreas Gruenbacher) have added some extra "mask bits" as a way to ensure that a chmod can still restrict permissions without permanently losing information from any ACL set on the file. Interestingly enough, the hardest part then becomes mapping the resulting masked ACL to a Windows/NFSv4-like ACL....
Readers in search of a challenge can go look at their code and figure out if there's a better mapping. I've drawn a blank so far. It's likely what we'll end up doing.
Posted Nov 16, 2010 21:04 UTC (Tue)
by wazoox (subscriber, #69624)
[Link] (4 responses)
That reminds me of the ACL parts of the samba code. There is a long page of comments that reads something like "beware, here follows, long, hairy, complicated and untractable explanation of a longer, hairier and more incomprehensible code". Then more lines with comments like "Don't touch this code!" :)
Posted Nov 16, 2010 23:21 UTC (Tue)
by vonbrand (guest, #4458)
[Link] (2 responses)
Due to the "ACL model" of Windows being a unmangeable mess?
The user/group/others model is certainly lacking (it can't describe the full permissions matrix like the Bell-LaPadula model uses), but what are the real, usable alternatives?
Posted Nov 17, 2010 0:57 UTC (Wed)
by rahvin (guest, #16953)
[Link] (1 responses)
I'd imagine the US DOD has permission levels and tables that would make your head spin, after all their paper permission levels are nearly incomprehensible, I can't even imagine their computer permissions. In fact I'd wager there is an entire staff of people that do nothing but manage permissions.
Posted Nov 18, 2010 17:51 UTC (Thu)
by davecb (subscriber, #1574)
[Link]
--dave
Posted Nov 17, 2010 3:18 UTC (Wed)
by jra (subscriber, #55261)
[Link]
Jeremy.
Posted Nov 16, 2010 16:43 UTC (Tue)
by foom (subscriber, #14868)
[Link] (8 responses)
Unfortunately, signalfd has a very irritating practical issue.
To use it, you need to block the signal you're interested in (using e.g. sigsetmask). However, the set of blocked signals is not reset by exec (blocked signals and signals set to SIG_IGN are preserved, but other signal actions are reset to default). So, if you use signalfd, whenever you spawn a process, it will not receive that signal. And processes tend to misbehave when not receiving signals they expect to.
You can, of course, fix that. You simply need to unblock the signal after forking, but before exec'ing. *IF* you control everything that ever calls fork/exec from your process. In many situations, that is impossible -- programs tend to use all sorts of libraries, some of which spawn processes.
Okay, so, you might say: "Hey, that's what pthread_atfork is for! Just set an child-side after-fork handler to unblock the signal". Well, unfortunately, pthread_atfork doesn't always get called when spawning a child process, so you can't really use it for that.
Three examples of that:
So, basically, end result: signalfd is unusable in many circumstances where it'd be really nice to be able to use it -- you're better off just setting a standard signal handler which writes to an fd. Sigh.
(POSIX spec URL: http://www.opengroup.org/onlinepubs/9699919799/)
Posted Nov 16, 2010 17:17 UTC (Tue)
by mjthayer (guest, #39183)
[Link] (2 responses)
Posted Nov 16, 2010 17:29 UTC (Tue)
by foom (subscriber, #14868)
[Link] (1 responses)
Most sensible such applications will already implement that by writing a signal handler for SIGCHLD which simply writes a byte into a pipe, and then has the event loop look for readability on that pipe. Signalfd would let you do that more easily -- if you could actually use it.
Posted Nov 16, 2010 17:33 UTC (Tue)
by mjthayer (guest, #39183)
[Link]
That makes sense - and obviously a named pipe would be no good there whatsoever. I was more thinking of things like SIGUSR1 sorts of interactions.
Posted Nov 17, 2010 8:11 UTC (Wed)
by nix (subscriber, #2304)
[Link] (1 responses)
Bonus: no change to signal semantics when signalfd is not in use, and nobody sane would want the current semantics in any case.
What am I missing?
Posted Nov 17, 2010 14:43 UTC (Wed)
by madcoder (guest, #30027)
[Link]
Anyway, there is a solution for that (which is messy butÂ…) on linux which is to redefine fork(), system(), pthread_spawn{,p} and every similar problematic fork() wrapper using dlsym chaining to reset your signal masks properly. This isn't *that* complicated, and chains nicely. Or if you're sure that pthread_atfork() works for some then only divert the ones where it doesn't. I know it's not portable but signalfd() isn't in the first place either ;)
WRT clone() I'd say that this is a very low level interface which has a really high chance to break the libc when used (e.g. TSD breaks in interesting ways in the glibc if you use clone without emulating what the glibc does IIRC), so I'd say people using it Know What They Are Doing in the first place and should have worried about resetting the signal mask to a sane default in the first place.
Posted Nov 24, 2010 22:04 UTC (Wed)
by neilbrown (subscriber, #359)
[Link] (1 responses)
Fixing it would probably require adding a new 'flags' option, so adding a new syscall and deprecating the old. This 'flags' could allow atomic setting of close-on-exec and an auto-block flag which causes all signals being tracked by signalfd to blocked just as long as the signalfd is open.
If you haven't and don't want to, I might....
Thanks,
Posted Nov 25, 2010 18:28 UTC (Thu)
by dcoutts (subscriber, #5387)
[Link]
Posted Apr 14, 2016 7:01 UTC (Thu)
by linuxrocks123 (subscriber, #34648)
[Link]
Posted Nov 16, 2010 20:05 UTC (Tue)
by dlang (guest, #313)
[Link] (5 responses)
every time apache has to access a file it needs to look in .htaccess for that directory, AND EVERY PARENT DIRECTORY.
As a result, just about every production apache server disables .htaccess files.
Posted Nov 16, 2010 20:47 UTC (Tue)
by alvieboy (guest, #51617)
[Link] (1 responses)
But Apache is not only meant for Linux. Other OSes do not provide these functionalities.
What's really harder is to apply all constraints in a fast and efficient way. I never benchmarked Apache on this, but I'd bet its not that fast nor efficient.
Alvaro
Posted Nov 16, 2010 21:21 UTC (Tue)
by dlang (guest, #313)
[Link]
so yes, they are horribly inefficient
in terms of caching the combined contraints, that seems hard in the face of directories being moved around.
there's also the issue of the interaction with links and finguring out the 'true' path to a file.
Posted Nov 17, 2010 1:55 UTC (Wed)
by buck (subscriber, #55985)
[Link] (1 responses)
i'll concede that maybe AFS directory-only permissions might simplify
Posted Nov 17, 2010 2:50 UTC (Wed)
by dlang (guest, #313)
[Link]
Posted Nov 17, 2010 9:53 UTC (Wed)
by iq-0 (subscriber, #36655)
[Link]
It even has a good chance to be cheaper than the current unix model, since in a practical setup there would only be a few (compiled) rulesets in effect (still hundreds, but a lot less than actual dentries). One could possibly cache a pointer to the list of effective rules to a dentry/inode (depending on how the rules are to be applied, this is semantics, but I suspect you'd want them on the inode level).
But the decoupling of the details from every single inode can probably be done without any real performance impact (and possibly even performance gains). Whether you use hierarchical ACLs or matching rules shouldn't really make a difference and constant tree traversals shouldn't be necessary when done at the VFS level.
Posted Nov 16, 2010 20:35 UTC (Tue)
by pj (subscriber, #4506)
[Link] (8 responses)
Posted Nov 16, 2010 20:47 UTC (Tue)
by smurf (subscriber, #17840)
[Link] (7 responses)
Presumably, that also holds for spelling mis-corrections.
Posted Nov 16, 2010 23:36 UTC (Tue)
by tpo (subscriber, #25713)
[Link] (6 responses)
Why not? This series has very much the flavour of a classic text. Thus fixing it now for posteriority makes a lot of sense:
s/Each of these can be the right answer is different contexts/Each of these can be the right answer in different contexts/
Posted Nov 16, 2010 23:59 UTC (Tue)
by ABCD (subscriber, #53650)
[Link]
> Please do not post typos as comments, send them to [email protected] instead.
Posted Nov 17, 2010 0:05 UTC (Wed)
by corbet (editor, #1)
[Link] (4 responses)
As noted elsewhere; future readers of a classic text are likely to be supremely uninterested in the typos that made it through the editing process. That's why we prefer that people email them to us.
Posted Nov 18, 2010 18:15 UTC (Thu)
by RobSeace (subscriber, #4435)
[Link] (3 responses)
And, yes, I know you've already got a "mailto:" link, but for many of us,
Posted Nov 21, 2010 10:56 UTC (Sun)
by Darkmere (subscriber, #53695)
[Link] (2 responses)
( ssh -t user@host 'something something something %U' ) should do it for you, add to a .desktop, associate as a Mailer and you should be good to go
Posted Nov 22, 2010 18:46 UTC (Mon)
by wookey (guest, #5501)
[Link] (1 responses)
Posted Nov 22, 2010 19:41 UTC (Mon)
by jku (subscriber, #42379)
[Link]
The current setup seems to allow what you want but it's pretty limited in many ways. See Bastien Noceras blog for some recent mimetype-related developments: http://www.hadess.net/2010/10/new-control-center-and-you....
Posted Nov 16, 2010 20:43 UTC (Tue)
by madscientist (subscriber, #16861)
[Link] (2 responses)
This means that if, for example, you set a signal handler for SIGCHLD you have major problems since SA_RESTART can't be considered reliable (portably). The trick of having an internal pipe to communicate between your signal handler and your main event loop is still subject to this problem.
One assumes that signalfd() would not interrupt system calls on signals delivered through the FD so it solves that problem--but it's Linux-specific and Linux already handles SA_RESTART reliably.
Posted Nov 19, 2010 22:57 UTC (Fri)
by giraffedata (guest, #1954)
[Link] (1 responses)
Of course, any solution in which system calls are uninterruptible defeats half the purpose of a signal.
I'm familiar with the trick of having such an internal pipe -- it solves the problem of select() not getting interrupted when a signal arrives just as select() is starting. But I don't see the connection between that and horribleness of signals interrupting system calls.
signalfd() just generates the file descriptor, so of course it doesn't interrupt anything. If you mean that in a program that uses signalfd(), system calls don't get interrupted, I think you're right because a program that uses signalfd() normally blocks signals, and a blocked signal can't interrupt a system call. But that just means that a program that uses signalfd() can't fully exploit signals -- control-C won't unhang some things.
Posted Nov 25, 2010 5:55 UTC (Thu)
by rqosa (subscriber, #24136)
[Link]
> But that just means that a program that uses signalfd() can't fully exploit signals -- control-C won't unhang some things. The solution to that is to use non-blocking system calls (or at least ones that you know will only block for a short time). That's something that you should already be doing if you're using signalfd(); the purpose of signalfd() is to handle signals with an event loop, and an event loop shouldn't have any blocking system calls in it (or anything else that takes a long time) except for the one select() (or epoll_wait() or similar) that drives the event loop. (If there's some type of event whose handler must take a long time to run, then have the event loop hand it off to a worker thread/process.)
Posted Nov 16, 2010 21:08 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Windows DPC shows us that signals _can_ be done right.
Posted Nov 19, 2010 14:57 UTC (Fri)
by Yorick (guest, #19241)
[Link] (1 responses)
I'm not very familiar with Windows, but isn't DPC a pure kernel-mode concept rather than something available in userspace? Windows does not appear to believe in pre-empting running userspace threads by user code - an approach that clearly solves some problems but mainly by taking options away from the programmer. This is not necessarily a bad thing, of course.
Of course, since Unix signals are used for so many very different purposes, they cannot and should not be replaced by a single new mechanism.
Posted Nov 19, 2010 17:06 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
>Of course, since Unix signals are used for so many very different purposes, they cannot and should not be replaced by a single new mechanism.
Unix signals are MISused for many purposes. They are broken and should be deprecated.
Posted Nov 16, 2010 21:14 UTC (Tue)
by jengelh (subscriber, #33263)
[Link] (14 responses)
Posted Nov 17, 2010 4:28 UTC (Wed)
by neilbrown (subscriber, #359)
[Link] (13 responses)
I can tell you what I was thinking of in the "IP protocol suite" questions though, as no-one seems to have taken a stab that those in the comments.
The 'full exploitation' in IP relates to UDP. It is most nearly an application layer (layer 7) protocol (as applications can use it to communicate) yet it is used a multiple levels of the stack - particularly for routing (at least back when we used RIP. BGP uses TCP) which is a layer 3 concern. It is used for VPNs and other network management. And even sometimes for application level protocols.
The "conflated design" in IP is the fact that end-point addresses and rendezvous addresses are equivalent at the IP level. They aren't at higher levels. "lwn.net" is a rendezvous address, but the IP level you only see 72.51.34.34, which could (in a shared-hosting config) map from several rendezvous addresses. So upper level protocols (like http/1.1) need to communicate the *real* rendezvous address, because IP doesn't.
The "unfixable design" in IP is obviously the tiny address space, which we have attempted to fix by NAT and VPNs etc, but they aren't real fixes. Had IP used a distinct rendezvous address it would have only been needed in the first packet of a TCP connection, so it would have been cheap to make it variable-length and then we might not have needed IPv6 (though that doesn't really address UDP).
So those were my thoughts. I haven't spent as much time fighting with network protocols as I have with the Unix API so I'm a lot less confident of these ideas than of the ones I wrote formally about.
Posted Nov 17, 2010 4:40 UTC (Wed)
by dlang (guest, #313)
[Link] (4 responses)
Posted Nov 17, 2010 10:00 UTC (Wed)
by iq-0 (subscriber, #36655)
[Link] (3 responses)
Posted Nov 17, 2010 23:23 UTC (Wed)
by dlang (guest, #313)
[Link] (2 responses)
at the time it was designed, there were a lot of things that it did that were not possible in IPv4, but most (if not all) of the features that people really care about have been implemented in IPv4
Posted Nov 18, 2010 8:11 UTC (Thu)
by Cato (guest, #7643)
[Link] (1 responses)
However, Mobile IP is much better implemented in IPv6 so you don't get inefficient 'triangular routing' - http://www.usipv6.com/ppt/MobileIPv6_tutorial_SanDiegok.pdf
The biggest benefit of course is not having to use NAT for IPv6 traffic.
Posted Nov 18, 2010 13:19 UTC (Thu)
by vonbrand (guest, #4458)
[Link]
Yep, that's why people are clamoring for NATv6 ;-)
(Just as the idiotic firewalling going on has made everything run over HTTP.)
Posted Nov 18, 2010 4:27 UTC (Thu)
by paulj (subscriber, #341)
[Link] (7 responses)
So to look for conflation in networking addressing you probably need to stay within a layer. E.g. within IP, there is conflation in addressing because each address encodes both the identity of a node and its location in the network. Or perhaps more precisely: IP addressing lacks the notion of identity really, but an IP address is the closest you get and so many things use it for this. This may be fixed in the future with things like Shim6 or ILNP, which separate IP addressing into location and identity. This would allow upper-layer protocols like TCP to bind their state to a host identity, and so decouple them from network location.
Variable length addresses would have been nice. The ISO packet protocol CLNP uses variable length NSAP addresses. However, hardware people tend to dislike having to deal with VL address fields. The tiny address space of IPv4 perhaps needn't have been unfixable - it could perhaps have been extended in a semi-compatible way. However it was decided (for better or worse) a long time ago to create IPv6.
Possibly another problem with IP, though I don't know where it fits in your list, is multicast. This is an error of foresight, due to the fact that multicast still had to be researched and it depended on first understanding unicast - i.e. IP first had to be deployed. The basic problem is that multicast is bolted on to the side of IP. It generally doesn't work, except in very limited scopes. One case is where it can free-ride on existing underlying network multicast primitives, i.e. ones provided by local link technologies. Another is where a network provider has gone to relatively great additional trouble to configure multicast to work within some limited domain - needless to say this is both very rare and even when done is usually limited to certain applications (i.e. not available generally to network users). In any new network scheme one hopes that multicast services would be better integrated into the design and be a first-class service alongside unicast.
Another retrospectively clear error is IP fragmentation. It was originally decided that fragmentation was best done on a host by host basis, on the assumption that path MTU discovery could be done through path network control signalling and that fragmentation/reassembly was a reasonably expensive process that middle-boxes ought not to be obliged to do. IMO this was a mistake: path MTU signalling turned out to be very fragile in modern deployment (IP designers didnt anticipate securo-idiocy); it turned out fragmentation/reassembly was relatively cheap - routers routinely use links both for internal buses and external connections which require fragmenting packets into small fixed size cells. As a consequence of the IP fragmentation choices, the IP internet is effectively limited to a (outer) path MTU of 1500 for ever more, regardless of changes in hardware capability. This causes problems for any IP packet protocol which wants to encap itself or another. One imagines that any new network scheme would learn from the IP MTU mess, make different trade-offs and come up with something better and more robust.
We should of course be careful to not overly condemn errors of foresight. Anticipating the future can be hard, particularly where people are busy designing cutting-edge new technology that will define the future. ;)
Posted Nov 18, 2010 8:13 UTC (Thu)
by Cato (guest, #7643)
[Link] (5 responses)
http://en.wikipedia.org/wiki/IPv6#Features_and_difference... has a good summary of the benefits of IPv6 including this one.
Posted Nov 19, 2010 1:15 UTC (Fri)
by dlang (guest, #313)
[Link] (4 responses)
this isn't just that clock speeds are higher, but that the ratio of clock speeds to the system bus speeds is no longer 1:1, this means that it's possible to execute far more steps without slowing the traffic down.
Posted Nov 19, 2010 11:15 UTC (Fri)
by job (guest, #670)
[Link]
Posted Nov 19, 2010 11:41 UTC (Fri)
by Cato (guest, #7643)
[Link] (2 responses)
You can probably manage to forward anything in hardware, but it helps somewhat that IPv6 has a regular header design.
Posted Nov 19, 2010 23:26 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
And from what I've seen, as the cost of routing in a general purpose CPU has come down, so has the cost of doing it in a specialized network link processor (what we're calling "hardware" here) -- assuming the IP header structure is simple enough. So today, as ten years ago, people would rather do routing in an ASIC than allocate x86 capacity to it.
I think system designers balance system bus and CPU speed too, so it's not the case that there are lots of idle cycles in the CPU because the system bus can't keep up with it.
Posted Dec 3, 2010 9:05 UTC (Fri)
by paulj (subscriber, #341)
[Link]
Posted Nov 19, 2010 11:18 UTC (Fri)
by job (guest, #670)
[Link]
One thing I never really understood is why TCP MSS is a different setting from MTU. Given the belief that the MTU could be auto detected, MSS could be deduced from it.
Perhaps someone can enlighten me?
Posted Nov 16, 2010 21:21 UTC (Tue)
by jhhaller (guest, #56103)
[Link] (1 responses)
Posted Nov 19, 2010 23:48 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
I don't know if it was an accident due to PDP-11 praticality or just good design philosophy, but I very much appreciate the convention in Unix of not returning information when a system call fails. I.e. a failure is a failure. If you get back useful information, or the system changes state, it's not a failure, but a different kind of success.
So I guess you're saying if a read of 10 sectors fails due to a media error on the 5th sector, you'd like to see the result, "failed due to media error, but read the first 4 sectors." I like Unix'es version much better: Instead of requesting 10 sectors, you request "up to 10 sectors" and it succeeds with "4 sectors read" and the next read truly fails.
Things that fail but don't fail are much harder to program to. They engender mistakes and convoluted code. "Failure" has the special implication that you can probably just stop thinking about it and give up on whatever you were doing. They are the inspiration for exception throwing in programming languages.
Posted Nov 16, 2010 23:03 UTC (Tue)
by jamesmrh (guest, #31622)
[Link]
The fs namespace ideas came from Plan 9, but weren't really useful until integrated with PAM.
Posted Nov 17, 2010 0:13 UTC (Wed)
by ldo (guest, #40946)
[Link] (19 responses)
Posted Nov 17, 2010 3:23 UTC (Wed)
by pr1268 (subscriber, #24648)
[Link] (18 responses)
Do you have a better suggestion? Pascal-style strings? While I agree that C-style strings are bothersome at times, there just doesn't seem to be any better alternative. And never mind that Java's Strings are hideously inefficient (but again, is there a better way?). I don't mean to argue; I'm just playing devil's advocate here. I honestly don't know myself whether there could have been a better way to do character strings way back in the day.
Posted Nov 17, 2010 4:37 UTC (Wed)
by neilbrown (subscriber, #359)
[Link] (13 responses)
The problem is strcpy and strcat and sprintf should should never have existed. strlcpy etc are much better interfaces when you have static or preallocated buffers.
Posted Nov 17, 2010 6:26 UTC (Wed)
by ncm (guest, #165)
[Link]
Posted Nov 17, 2010 8:26 UTC (Wed)
by nix (subscriber, #2304)
[Link] (6 responses)
And then we have Pascal-layout strings (as opposed to actual Pascal 'strings', a nightmare for other reasons, see Kernighan). They don't fix this problem (you just have to overwrite the start of the string, not its end) and have two much bigger problems: finite string length, and an increase in size of every string. The finite string length means that writing general string-handling algorithms without special cases for the rare event of large strings is impossible, and the increase in size of every string bloats small strings, which are by far the common case. You can patch both of these: the first, by making the finite string length as large as a pointer; and the second, by noting that alignment constraints in existing systems bloat the effective size of strings anyway. But of course this soon turns into a special case of nul-terminated strings: point the pointer at the end of the string, bingo, one rather hard-to-consult nul by any other name.
The biggest downside is probably a long-term ABI problem. The scheme is inflexible. If your Pascal string-length header is too short, however do you expand it? It's wired into every string-using program out there! At least nul-terminated strings need no expansion.
The real solution to string-handling unfortunately requires a VM of some description which can prevent the program from accidentally overwriting fields in aggregates by writes to any other field or variable. Then you can do reliable Pascal strings, separating the length from the content, or reliable null-terminated strings, with the separate compartment containing a pointer into the string. Unfortunately this is incompatible with low-level all-the-world's-a-giant-arena languages like C without very specialized fine-grained MMU hardware.
(I have, like everyone, written my own dynamic string-handliing library when younger. It starts out simple but it's amazing how soon you have to introduce extra code to track pointers and make freeing them in error cases less verbose, and extra code to track memory leaks... you need that in C anyway of course but the massive increase in dynamic memory use that dynamically allocating most strings brings tends to force them on you sooner than otherwise.)
Posted Nov 17, 2010 12:38 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
Come on.
You'd naturally use 32 bits on 32 bit systems for string length. And by a strange coincidence, that's the maximum amount of contiguous RAM that you can address on 32-bit systems. On 64-bit systems, you'd naturally use 64-bit counter.
Posted Nov 18, 2010 16:40 UTC (Thu)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Nov 18, 2010 17:04 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Nov 25, 2010 13:10 UTC (Thu)
by renox (guest, #23785)
[Link] (1 responses)
Posted Nov 25, 2010 16:22 UTC (Thu)
by vonbrand (guest, #4458)
[Link]
No, you haven't... (Original) Pascal "strings" were just (packed) arrays of characters of a fixed length.
/me ducks and runs for cover
Posted Nov 19, 2010 11:24 UTC (Fri)
by job (guest, #670)
[Link]
If memory efficiency is a problem for you, multibyte encodings is a much worse problem than storing string length. But UTF-8/16 is here to stay, there is simply no competition. I think we have to accept it.
Posted Nov 18, 2010 9:50 UTC (Thu)
by stijn (guest, #570)
[Link]
Posted Nov 18, 2010 15:38 UTC (Thu)
by etienne (guest, #25256)
[Link] (3 responses)
And you can also combine them to do things like:
Posted Nov 18, 2010 17:24 UTC (Thu)
by pr1268 (subscriber, #24648)
[Link] (2 responses)
I like your code example, but it might only work in C (not C ). Two cases in point: 1 Stroustrup, B. The C Programming Language, Special Edition, p. 77
Posted Nov 19, 2010 10:58 UTC (Fri)
by etienne (guest, #25256)
[Link]
const char *curlang(const char *mltstr)
Oviously none of the substrings can have embedded zero char.
A C line of code like:
Posted Nov 20, 2010 1:03 UTC (Sat)
by cmccabe (guest, #60281)
[Link]
Sorry, you are confused. It works in both C and C.
> Using the enum value as an array index might give unpredictable results
Nope.
Here the enum is promoted to an integer. C , like C, promotes a lot of types to integers under the right situations.
> The C standard library string can have '\0' characters anywhere inside
There is no std::string in this example. You are confused.
Posted Nov 18, 2010 12:44 UTC (Thu)
by Kwi (subscriber, #59584)
[Link] (2 responses)
(Incidentally, Java strings work the same way behind the scenes, but are immutable, which I guess is what you object to when you call them inefficient?)
Of course, one problem with this suggestion is that it doubles the size of a string reference (8 bytes on 32-bit architechtures, 16 bytes on 64 bit architechtures).
Posted Nov 18, 2010 16:14 UTC (Thu)
by pr1268 (subscriber, #24648)
[Link] (1 responses)
> (Incidentally, Java strings work the same way behind the scenes, but are immutable, which I guess is what you object to when you call them inefficient?) Exactly. And, my semi-rhetorical question immediately after that ("is there a better way?") begs the question of whether the Sun engineers who developed the Java language imposed that immutability for thread safety (since thread safety was/is a primary goal of the Java language). I don't know for sure; just going off intuition here.
Posted Nov 19, 2010 10:07 UTC (Fri)
by mfedyk (guest, #55303)
[Link]
Posted Nov 18, 2010 18:03 UTC (Thu)
by nevyn (guest, #33129)
[Link]
I think it solves almost all the "normal" problems people have with non-nil terminated strings:
1. You can easily allocate them on the stack.
2. You can easily allocate them in constant memory.
3. "" and "x" don't have overhead in the 1,000% range (depending on how you count).
...but still has good solutions to the nil terminated strings problems, in that it allows you to have know the allocated size and length used (and put \0 in your string).
Saying that, the solution was far from obvious ... so while I think it would have been usable in the 1970s, using NIL terminated strings was much more obvious.
Posted Nov 17, 2010 10:27 UTC (Wed)
by rusty (guest, #26)
[Link]
I've always found signal_fd a bit confusing. After all, the standard method of handling signals in server programs has long been to write to a pipe in the signal hander. Otherwise you need pselect and related nonsense. But for normal signals signal_fd didn't add anything except performance.
Cheers,
Posted Nov 17, 2010 23:14 UTC (Wed)
by Yorick (guest, #19241)
[Link] (3 responses)
Posted Nov 18, 2010 16:36 UTC (Thu)
by pr1268 (subscriber, #24648)
[Link] (2 responses)
Just curious, why do you consider telldir()/seekdir() to be "unspeakable horrors"? Their interfaces are simple and straightforward, and even their manual pages are easy to read/understand.
Posted Nov 18, 2010 17:19 UTC (Thu)
by foom (subscriber, #14868)
[Link] (1 responses)
And it's so tricky and so unused, that the implementation was actually horribly broken from its inception in BSD until 2008, 25 years later!
Posted Nov 18, 2010 17:31 UTC (Thu)
by pr1268 (subscriber, #24648)
[Link]
Ahh yes, I almost forgot about those cases. In fact, there was some discussion here on LWN about the difficulties of these in the context of UnionFS. Thanks for jogging my memory.
Posted Nov 18, 2010 8:03 UTC (Thu)
by zmi (guest, #4829)
[Link] (21 responses)
In Netware, you could quickly define an unlimited number of users/groups to a dir/file with any privileges that should be available, and you are finished. The filesystem did *not* have to go down to every file and write that ACL/permission there.
I remember we had a big hierarchical tree, with every department having their working dir, and within that could define another department having access to some subsdirs if wanted. Like this, everything was secure, and every needed access was quickly possible.
Really, if someone would use that approach for a Linux filesystem, the world would be easier and better. Maybe the btrfs devs read this, then they should look at Netware 3, which already had this neat ACL solution.
Posted Nov 18, 2010 9:01 UTC (Thu)
by Fowl (subscriber, #65667)
[Link] (10 responses)
A list of users/groups and their permissions to this object (and optionally its' children) seems pretty straight forward to me.
Posted Nov 18, 2010 9:41 UTC (Thu)
by zmi (guest, #4829)
[Link] (9 responses)
Maybe you've never had a big file system. Take this example:
Company with 900 employees, 40TB storage in about 40 departments, with a total of 100 million files.
And then someone needs to set a new permission for a top-level dir. Both Unix and Windows ask to write those permissions to all files below that dir. If there are 10 million entries, the session will be blocked for a pretty long time.
Novells Netware didn't have that: Set a new permission, done within the second. I don't know how they stored permissions, but it never depended on the amount of data below that dir.
Also, in Unix and Windows there's a mix of permissions from a share and permissions to a file. In Netware you assigned a right, at that's it. Much easier to review.
Posted Nov 18, 2010 10:14 UTC (Thu)
by Fowl (subscriber, #65667)
[Link] (8 responses)
Ah, I get you. It's the implementation that's the problem, not the concept.
> Both Unix and Windows ask to write those permissions to all files below that dir.
I'm fairly certain Explorer (the Windows shell) uses the most naive method possible for applying permissions.
> Also, in Unix and Windows there's a mix of permissions from a share and permissions to a file. In Netware you assigned a right, at that's it. Much easier to review.
For as long as I can remember giving full access to shares to "Everyone", and then using filesystem permisions has been the recommended practise. It is useful occasionally for enforcing "no remote access" policies, etc.
Posted Nov 18, 2010 10:42 UTC (Thu)
by zmi (guest, #4829)
[Link] (4 responses)
And it brings the feature "you see the share, but clicking on it tells you you can't access it". Again it's the implementation that's wrong: If I have no right on it anyway, don't display it. Seems to be a lazyness of programmers to have chosen this way.
Posted Nov 18, 2010 13:55 UTC (Thu)
by mpr22 (subscriber, #60784)
[Link] (2 responses)
Counterpoint:
Posted Nov 18, 2010 14:04 UTC (Thu)
by dskoll (subscriber, #1630)
[Link] (1 responses)
Counterpoint: /bin/ls lists the names of directories not owned by the user it's running as whose access control mode is 0700 (user rwx, all others forbidden).
Which is perfectly correct behavior according to the way UNIX permissions are defined. The ability to list names in a directory is controlled only by the r bit of the directory itself.
Posted Nov 18, 2010 14:16 UTC (Thu)
by zmi (guest, #4829)
[Link]
Using a graphical dir browser like Dolphin could hide such unreadable contents, that would be nice, as normally users don't need to see that. Should be a config option.
Browsing a server over the network is about 20 years younger "command", solving completely different needs, and it would help security a bit if shares not accessible are not seen by a user. But by the time Microsoft reinvented networking, they did not have the slightest clue about security (and I'd say that only started with Win7, where a user can work as user not admin). Maybe we'll see that improvement once someone at Microsoft gets the idea. Or maybe the Samba team can implement a setting to hide this, and later MS adopts it as it's clever.
Posted Nov 21, 2010 0:27 UTC (Sun)
by Fowl (subscriber, #65667)
[Link]
Posted Nov 18, 2010 17:57 UTC (Thu)
by davecb (subscriber, #1574)
[Link]
--dave
Posted Nov 18, 2010 20:38 UTC (Thu)
by jra (subscriber, #55261)
[Link] (1 responses)
No, with Windows ACLs it's the concept.
Look at this:
http://www.pcguide.com/ref/hdd/file/ntfs/secRes-c.html
as an example. Explain that to a user. Don't forget to include why the sort order of DENY's ACE's depends on where in the file hierarchy they came from.
Good luck ! :-)
Jeremy.
Posted Nov 21, 2010 0:31 UTC (Sun)
by Fowl (subscriber, #65667)
[Link]
How is that complicated?
Posted Nov 18, 2010 9:05 UTC (Thu)
by Fowl (subscriber, #65667)
[Link] (9 responses)
---
Could you explain the Netware model a bit more? It just sounds like ACLs to me.
Posted Nov 18, 2010 10:38 UTC (Thu)
by zmi (guest, #4829)
[Link] (8 responses)
Also, the way you assigned rights was simple: take an object (user, group, department, etc.), assign it to a dir with rights, and specify if it's for subdirs as well or only this dir.
And when you didn't have a right on a dir, you didn't even see it. I dislike the Windows approach of seeing a share, and upon click you get the info "no permission". That's just stupid.
Posted Nov 18, 2010 11:09 UTC (Thu)
by dgm (subscriber, #49227)
[Link]
Posted Nov 18, 2010 18:43 UTC (Thu)
by jeremiah (subscriber, #1221)
[Link] (1 responses)
Posted Nov 18, 2010 22:06 UTC (Thu)
by zmi (guest, #4829)
[Link]
Posted Nov 18, 2010 19:05 UTC (Thu)
by jeremiah (subscriber, #1221)
[Link] (4 responses)
you had a file/object, and a list of permissions/security attributes for each object. Object could be a group of objects, but group depth was not a concern. Mutiple applications (controlled by us) could access the permissions, and make decisions based on what they found. If there was a permission that they didn't understand, access was not allowed. This was a situation where we could trust the apps, and not the people. We also took the approach that permissions were subtractive. Everything started as readable/writable and access could only be removed. The nice thing about this was that it was extendable.
This isn't relevant to Novell ACL's just trying to get people's thoughts.
Posted Nov 18, 2010 22:09 UTC (Thu)
by zmi (guest, #4829)
[Link] (3 responses)
Posted Nov 18, 2010 23:41 UTC (Thu)
by jeremiah (subscriber, #1221)
[Link] (1 responses)
Posted Nov 19, 2010 13:19 UTC (Fri)
by jeremiah (subscriber, #1221)
[Link]
I think SELinux is amazingly complete. It allowed us to implement a solution that always requires 2 users, from a group of 3. You throw LUKS, encrypted drives, and removable media into the mix, and you have as close to a bullet proof scenario as possible. On the other hand, I don't want to have to write code that the average admin can't administer without spending a month dealing with a sharp learning curve.
Like a lot of us here I'm a developer, and a system administrator. When I have my development hat on I try to think of the user, and what they have to put up with, while balancing it with security requirements etc. As an administrator, I know I'm willing to tolerate more than most users. The difficult part for me, is defining my target audience, and understanding their abilities and tolerance, and shooting for that. And sometimes the perfect solution, has to be hobbled security wise, or the product won't sell. The only way I've found to begin addressing that is though intelligent defaults, and meaningful dialogs/user interaction.
I am intrigued by the Netware ACL's though, since you seem to have found a happy place when dealing with them as opposed to other permission systems. Thanks for the input.
Posted Nov 21, 2010 0:35 UTC (Sun)
by Fowl (subscriber, #65667)
[Link]
If you don't find a specific ACE allowing you access, you don't have access.
Posted Nov 18, 2010 23:11 UTC (Thu)
by skissane (subscriber, #38675)
[Link] (2 responses)
I don't think evaluating access control should belong in the kernel. Policy questions like this should be handled by a trusted user space process. When a process tries to access a resource, the kernel should ask the security daemon if it is OK. The security daemon can do whatever you want to answer that, and then just tell the kernel permit or deny. There could be several different security daemons to choose from, depending on needs. A smartphone, or home users desktop, will have very different needs from a server operated by an intelligence agency - rather than one system to fit all needs, multiple systems for different needs may be better.
In principle, what I am suggesting is similar to how I understand RACF (and TopSecret and ACF2) work on IBM mainframes. (OK, not all the associated mainframe-warts, but I think the basic idea is good.) The security software is an add-on to the base OS, the base OS just exposes standard hooks for the security software to integrate with.
To avoid the performance hit of constantly context-switching to the security daemon, the kernel should have some kind of cache. That way, first time process tries to access a resource, kernel asks security daemon for OK. If daemon says yes, then kernel remembers and doesn't need to ask again. If everything is a file descriptor, a good time to do this would be at OPEN time - the kernel asks once when FD is open what access will be permitted for that FD, then remembers that for duration of FD.
I suppose this is the classical capability architecture - if everything is an FD (or in Windows-ese, a "handle"), then we can merge the concepts of FDs with capabilities. And a security daemon is then used at FD/handle creation time to determine for the lifetime of the FD/handle what it can do. It might need to be consulted occasionally once FD is created - e.g. is it OK to pass this FD to another process? Also, if the security permissions are revoked while FD is open, the daemon should be able to ask kernel to forcibly close, or downgrade the rights, of an FD it earlier approved to be opened.
Posted Nov 20, 2010 0:32 UTC (Sat)
by giraffedata (guest, #1954)
[Link]
Incidentally, I think the separateness of RACF happened out of necessity more than architecture. The filesystem formats were already cast in stone with no concept of permissions whatsoever in them. There was no concept of a user identity either. I don't know if designers of RACF considered building all that into the supervisor code and felt it would be less fixable that way or just that it would be harder, but I do like the result.
RACF and its alternatives also encompass resources other than files.
I think there are plentiful examples of this on Linux too, but I don't follow those things. Selinux? AppArmor?
Posted Nov 25, 2010 20:18 UTC (Thu)
by slashdot (guest, #22014)
[Link]
AppArmor has separately-stored policy, while SELinux has separately-stored policy which is however automatically baked into the filesystem.
The real problem Linux has is that nobody seems to have the interest, authority and/or ability to figure out the optimal security model to use, so there are several ones in wide use, but none is actually polished and widespread.
Also, security UI and user-friendliness work seems quite lacking, with the result that advanced security often gets just turned off and even if enabled, only distribution-provided policies tend to be used.
Posted Nov 26, 2010 13:16 UTC (Fri)
by Ross (guest, #4065)
[Link] (3 responses)
Posted Nov 26, 2010 15:02 UTC (Fri)
by cladisch (✭ supporter ✭, #50193)
[Link] (2 responses)
The user and group bits wouldn't make much sense without the corresponding IDs.
Posted Nov 29, 2010 19:57 UTC (Mon)
by Ross (guest, #4065)
[Link] (1 responses)
However if you are going to count uid and gid those then it started with 2 2 1.5 = 5.5 bytes (sounds like the count in the article I agree), but it moved to 4 4 1.5 = 9.5 bytes. Since the point was that this was something which can't be changed (and hasn't except for POSIX ACLs which are mostly ignored) I don't think that's what the author meant.
But maybe it's true. He seems to be responding to comment so he can clear it up very easily.
Posted Nov 29, 2010 21:20 UTC (Mon)
by neilbrown (subscriber, #359)
[Link]
As you say, permission information only uses 5bytes and 1 bit (setuid etc are not permission bits, they are really the 'type' of the object and so are is some ways more closely related to IF_REG etc). Being that precise in the article would have been excessive I think. It is still true that the permissions were stored in 6 bytes. It is just that some room was left over for file type as well.
POSIX ACLs may well be mostly ignored, but ACLs are still the only direction being explored for making the permission model more complete. My point was simply that they have a storage cost which gets worse quickly, but worse than that it has a serious usability cost.
Posted Nov 26, 2010 14:04 UTC (Fri)
by Ross (guest, #4065)
[Link] (1 responses)
Clearly you haven't had to find a way to do it. :) There are a few different ways.
The easiest way to be able to apply permissions for more than one group is to create many additional groups which are unions of the others by putting the users in them. Yes, now you get to maintain these. It's best to write a tool to generate them.
If you want intersections you can do that with groups too, or by nesting subdirectories and applying the traversal permissions for each group to those.
But what if you want to mix read permission for one set of groups with write permission for another?
Well, you have to use the file's real write bit and group owner for the write permission since that's the only way to control it traditionally. Then use the parent directory's permissions to prevent read access from anyone not in the second set of groups and set the file's world-readability bit.
If you want to grant execute permission to a third set of groups -- that's a problem. That one really is impossible but execute doesn't mean much if you can read something (and it's not suid or sgid).
Please don't think I'm saying any of that is nice or preferable to POSIX or Windows ACLs, because it clearly sucks horribly for so many use cases, but it does should that it is possible to apply read/write permissions to arbitrary sets of groups if you're willing to deal with this kind of setup.
Posted Nov 26, 2010 22:23 UTC (Fri)
by neilbrown (subscriber, #359)
[Link]
Yes: it does seem that it was a slight over-statement to say "impossible". If you have unlimited groups per user, allow users to create their own groups, and don't worry too much about giving new access to already-running processes, then many complex things are indeed possible.
Maybe we need a different maxim: "simple things should be simple, complex things shouldn't drive you insane" !
Thanks for your thoughts.
Ghosts of Unix past, part 3: Unfixable designs
The NFSv4 working group (under the IETF umbrella) were tasked with creating a network filesystem which, among other goals, would provide interoperability between POSIX and WIN32 systems. As part of this effort they developed yet another standard for ACLs which aimed to support the access model of WIN32 while still being usable on POSIX.
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
1) For the system() call, POSIX says: "It is unspecified whether the handlers registered with pthread_atfork() are called as part of the creation of the child process." In glibc, they aren't.
2) Regarding posix_spawn, POSIX says: "It is implementation-defined whether the fork handlers are run when posix_spawn() or posix_spawnp() is called." In glibc, they are.
3) The linux-specific clone() system-call does not have atfork handlers called.
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
NeilBrown
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
layout of the apache httpd.conf to a mental overlay for the server's
(virtual) filesystem/URI space to be way beyond my competence. .htac-
cess files at least have the virtue of controls being in proximity to
the stuff they control, though that thinking runs entirely counter to
the point being made in the article, the extended-attribute bloat,
etc. so maybe i just drank the inode/xattr Kool-Aid to my permanent
detriment, but the ``composability'' of the permissions by masking
them through the filesystem's links down to the object of concern is
something i just totally grok. (i think there must be some connec-
tion i should make here about exploiting the grafted-on filesystem
trees design to the full being part-and-parcel, but i am obviously not
a big-picture type, and i think that case was made for chroot/name-
space forking)
things a bit, at the fringes
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Apache doesn't do this for it is hard to get a good cross-platform file-change notification (which doesn't have possible side-effects).
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
"Posteriority"? You're going to sit on it? :)
Typos
Typos
with a web form to fill in... When people are in their browser reading a web
site, they hate to jump through the hoop of firing up their E-mail program (or
navigating to their web-mail site) just to report a typo, especially when there's
a handy-dandy easier-to-use forum thread right there that they can mention it
in instead...
"mailto:" is useless... It doesn't bring up my prefered E-mail client (elm,
running on a completely different machine than where my browser is currently
running)...
Typos
Typos
see gnome-default-applications-properties
Signals vs. system calls
I can't quite tell what problem you're pointing out. Are you saying it's horrible that anything that makes a system call of an interruptible type has to check for EINTR or partial completion and repeat/resume the system call?
Signals vs. system calls
The trick of having an internal pipe to communicate between your signal handler and your main event loop is still subject to this problem.
One assumes that signalfd() would not interrupt system calls on signals delivered through the FD
Signals vs. system calls
Signals is the WORST part of Unix.
Windows DPC shows us that signals _can_ be done right.
Signals is the WORST part of Unix.
Signals is the WORST part of Unix.
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Question 4 (on Xnotify) would result in an article that I would very much like to read. Question 5 could result is a potentially useful slab of code (though it is less clear whether it would be used).
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
That is not to say that IPv6 is the holy grail, it's design by committee and as such is probably too different on one front and not different enough on another. And of course it's trial by jury with a terribly large jury, so there is probably not one protocol (now or ever) that would meet all the demands.
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
I don't think CPU speed per se (how fast a single CPU is) is relevant. It's all about cost, since most IP networks are free to balance the number of CPUs, system buses, network links, etc.
IPV6 and hardware-parseable IP headers
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: returning -1 for system call failure
Ghosts of Unix past, part 3: returning -1 for system call failure
Partial answer to #2
Null-Terminated Strings
Null-Terminated Strings
Null-Terminated Strings
If you want dynamic strings, then talloc_strdup and talloc_strdup_append etc (in libtalloc) are probably your friends, though I confess I haven't used them extensively.
strlcpy
Null-Terminated Strings
Null-Terminated Strings
Null-Terminated Strings
Null-Terminated Strings
Null-Terminated Strings
but C-strings don't care..
Null-Terminated Strings
Null-Terminated Strings
Null-Terminated Strings
Null-Terminated Strings
enum {lang_english, lang_french, lang_german} current_language = lang_french;
const char mltstr_language[] = "english\0francais\0deutch\0"
const char *curlang(const char *mltstr)
{
/* select the right sub-string depending on current_language */
}
void fct(void)
{
printf ("LANG=%s", curlang(mltstr_language))
}
It saves *a lot of space* ; having strings, (aligned) pointers arrays everywhere, and worse having (aligned) size for pascal strings takes easily more memory than the program code and data altogether.
Null-Terminated Strings
2 Ibid, p. 583Null-Terminated Strings
{
const char *ptr = mltstr;
for (unsigned cptlang = 0; cptlang < current_language; cptlang )
while (*ptr ) {}
return (*ptr)? ptr : mltstr;
}
cout << "The " << big? "big " : "small " << "dog is " << age << " year old.";
needs an efficient storage for small strings, even more when doing a multi language software.
Null-Terminated Strings
> since C treats enumerations as a distinct type (instead of int as in C)1
> the string (which may also lead to unpredictable behavior at runtime)2. Of
> course, you're referring to a C-style string, so this may be a moot point.
Null-Terminated Strings
Null-Terminated Strings
Null-Terminated Strings
Null-Terminated Strings
Ghosts of Unix past, part 3: Unfixable designs
Rusty.
When Do We Want it? Before LCA 2011!
Ghosts of Unix past, part 3: Unfixable designs
telldir/seekdir
telldir/seekdir
telldir/seekdir
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Now you have a hierachical structure, each dept. has it's own dir, and below that you have other dirs shareable with other depts.
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
And it brings the feature "you see the share, but clicking on it tells you you can't access it". Again it's the implementation that's wrong: If I have no right on it anyway, don't display it. Seems to be a lazyness of programmers to have chosen this way.
/bin/ls
lists the names of directories not owned by the user it's running as whose access control mode is 0700 (user rwx, all others forbidden).Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Access Control: take them from Novell Netware
Ghosts of Unix past, part 3: Unfixable designs
You've hit the nail on the head better than any of the other comments or the article itself, by talking not about what the right permission scheme for all future applications is, but a fixable design that lets us recover if we pick the wrong permission scheme today.
Ghosts of Unix past, part 3: Unfixable designs
Ghosts of Unix past, part 3: Unfixable designs
Six bytes?
Six bytes?
Six bytes?
Six bytes?
The "impossible"
The "impossible"