Exploiting symlinks and tmpfiles

By Jake Edge
September 19, 2007

"Insecure tmpfile creation" and "arbitrary file overwrite using symlinks" (and other similar names) are commonly seen vulnerabilities listed in the LWN daily security update. The problems are related in many ways and can be very serious, with damage ranging from corrupted files to full takeover of a vulnerable system. By and large, they are easy to avoid, so it is disheartening to see them crop up time and time again.

Typically, these kinds of attacks exploit race conditions, where correct functioning depends, inappropriately, on the order of operations between two or more processes in the system. The classic example is a program that checks for the existence of a file, in a directory writable by others, before opening it, to avoid overwriting an existing file. An attacker can arrange, usually through repeated attempts, to create the file just after the existence check and before the open. The vulnerable program's author made an incorrect assumption about what else could be going on in the system, which allows the attacker's program to race with it.

At first blush, it doesn't seem particularly harmful for a program to mistakenly overwrite the attacker's carefully inserted file. After all, the victimized program will probably just truncate the file before writing whatever data it had planned. This is where symbolic links (symlinks) come into play.

Symlinks are just an alias for an entry in the filesystem which can be created by anyone with write access to the directory where the symlink will reside. The target of the symlink can be most any string, normally they are the path to the target of the alias, but there is no requirement that the target exist. More importantly, there is no check that the process which creates the symlink has access rights to the target. When operations are performed on a symlink, the filesystem layer follows the pointer to the actual file, checking the permissions on the inode of that file.

What that means is that any random Linux user can create a symbolic link from /tmp/foo to /etc/passwd, though they will not be able to write to the former, because the permissions on the latter do not allow it. But, privileged programs, either setuid or those run as root, do have the proper permission. If they open and write to /tmp/foo, they have just corrupted the password file.

Vulnerable programs aren't usually quite that simple, but they do use predictable filenames or patterns. If an attacker knows that the administrator often runs a vulnerable program or script, which writes to /tmp/fooNNNNN where NNNNN is a random number, they can run a program which continuously makes links from those filenames to some file they wish to corrupt. If their program happens to generate the right link at the right time, the corruption succeeds. Normally, a program that creates a temporary file will delete it when it is done executing, but for symlinks that just removes the link, leaving the file that was pointed to with the whatever contents were written.

A setuid program provides even more opportunities for exploitation as the attacker can run it many times, under his control, while running other programs that create the symlinks. If the attacker can control, via input to the program, what gets written, the problem becomes worse still, quite possibly leading to complete compromise of the system. The scenarios for abusing this kind of hole are endless.

It doesn't necessarily have to be a temporary file that gets exploited, any file that gets opened in a directory that is writable by others can potentially be symlinked elsewhere. This can lead to unexpected results for reads, or corruption of unexpected files for writes. These types of vulnerabilities can be used when a regular user login (or system user like 'apache') is compromised, by an exploit or password disclosure, to further compromise the system. Some may be difficult to exploit reliably, but the consequences are such that it may be worth the effort.

As always, David Wheeler's Secure Programming for Linux and Unix HOWTO is an excellent resource for avoiding these kinds of problems. The basic idea is to avoid the race by using atomic filesystem operations or, for tmpfiles, mkstemp(). When creating files, ensure that the open() call uses O_CREAT | O_EXCL which will fail if the file already exists. Another important note is that a program should not close and reopen files that live in shared directories, instead they should be left open until the program is done with them.

These kind of problems have been around for twenty years or more, but still keep cropping up, which is a good indication that many programmers aren't following secure coding practices. Whenever one is writing code that is opening files, which is, after all, a very common operation, some consideration should be given to symlink/tmpfile vulnerabilities. With some perseverance, these kinds of vulnerabilities could become a thing of the past.

Index entries for this article
Security	Race conditions
Security	Vulnerabilities/Temporary files

Exploiting symlinks and tmpfiles

Posted Sep 20, 2007 8:58 UTC (Thu) by intgr (subscriber, #39733) [Link] (6 responses)

When creating files, ensure that the open() call uses O_CREAT | O_EXCL

What about programs that use the buffered ANSI fopen() call? As far as I can tell, there is no easy and portable way to use it for atomic file creation. Using open() and fdopen() is an alternative, but again is usable on POSIX systems only.

Exploiting symlinks and tmpfiles

Posted Sep 20, 2007 9:51 UTC (Thu) by epa (subscriber, #39769) [Link] (2 responses)

Indeed, why shouldn't O_CREAT and O_EXCL be the default (clunky but safe) behaviour - and if your program needs to overwrite an existing file rather than creating a new one, you can give a flag to specify this unusual behaviour.

Well, like C string functions, I guess it's too late to fix these design decisions now.

Exploiting symlinks and tmpfiles

Posted Sep 20, 2007 11:58 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

Way too late. I mean, even the insecure pre-stdio gets() function is still around, decades after deprecation.

This is the downside of code reuse and common interfaces: it's very hard to change stuff once it's been widely used. (This seems to be a common trait in all sorts of complex systems: consider the degree of conservation of ancient regulatory systems in metazoan genetics, for instance, simply because once it's set up it's too hard to change. Building foundations is easy: changing them once a house is built on top is really difficult.)

Exploiting symlinks and tmpfiles

Posted Sep 20, 2007 15:05 UTC (Thu) by ken (subscriber, #625) [Link]

Thank you for the clarification. I was a bit confused but once you made the link to metazoan genetics things cleared up. :-)

No easy solution with fopen()

Posted Sep 20, 2007 14:19 UTC (Thu) by dwheeler (guest, #1216) [Link] (2 responses)

You're correct, there is NO easy solution with fopen(). I believe that there should be work to modify the standards to create an additional option character (just as "b" is a flag for "binary" on some systems), but it'd be a fair amount of work to get it through the standards process.

Actually, there is an easy solution with fopen()

Posted Sep 20, 2007 16:29 UTC (Thu) by hummassa (guest, #307) [Link] (1 responses)

I tested it on linux, works ok -- except for the fact that it WILL clobber an empty file:

  
FILE* open_only_if_does_not_exist(const char *filename) {   
  FILE* f = fopen(filename, "a");   
  if( ftell(f) ) {   
    fclose(f);   
    return 0;   
  }   
  return f;   
}

Actually, there is an easy solution with fopen()

Posted Sep 20, 2007 18:45 UTC (Thu) by vmole (guest, #111) [Link]

...except for the fact that it WILL clobber an empty file

So, in other words, it doesn't do what the function name claims. Not to mention no error checking on the fopen() call. Yeah, I know, it's just psuedo-code in a comment. But since the whole article is about correct code without security holes, I'm being a dick about it.

Anyway, it pointless to try to do this within the C standard. If you don't have POSIX calls (open(), fdopen()), then you don't have POSIX file system semantics, so you've got no guarantees anyway. For temporary files, use tmpfile(). If your OS/library doesn't have tmpfile() (which means it's not even C89), implement it using whatever OS specific tools are necessary. For non-temporary but unique files, the most general technique looks to be mkstemp() and rename(), but I'd guess plain old open()/fdopen() is just as well supported.

O_EXCL|O_CREAT over NFS: DON'T!!!

Posted Sep 21, 2007 20:25 UTC (Fri) by AnswerGuy (subscriber, #1256) [Link] (1 responses)

While the original article did not cover this particular topic I'd like to remind everyone that the desired semantics of open(..., O_EXCL|O_CREAT) are NOT supported over NFS (at least as late as the version 3 of those protocols).

Quoting from the open(2) man page:

O_EXCL When used with O_CREAT, if the file already exists it is an
error and the open will fail. In this context, a symbolic link
exists, regardless of where its points to. O_EXCL is broken on
NFS file systems, programs which rely on it for performing
locking tasks will contain a race condition. The solution for
performing atomic file locking using a lockfile is to create a
unique file on the same fs (e.g., incorporating hostname and
pid), use link(2) to make a link to the lockfile. If link()
returns 0, the lock is successful. Otherwise, use stat(2) on
the unique file to check if its link count has increased to 2,
in which case the lock is also successful.

An portion of the link(2) provides details on why a fstat() is necessary if the link() fails. (Basically there are conditions where the NFS server's RPC (success) response could fail to reach the client even after the link was created. The subsequent fstat() on the originally opened file descriptor can detect cases where the link() erroneously returned an error.

I should add that the use of stat() would be sufficient for cases where one is concerned about inadvertant race conditions --- but I think that fstat() is required for situations where one must defend against potentially hostile processes with write access to the directory in which all this locking is taking place. In other words the advice in the man page only covers the non-hostile case (suitable for non-SUID/non-SGID use in a directory which is not allowing group nor world write access.

In cases where security is a consideration I think we have to unconditionally perform an fstat() on the originally opened file descriptor. Otherwise we are vulnerable to an unlink() and recreation race. A stat() will check the file/inode which is on the underlying filesystem at the time the call is performed. So the link that's present is resolved to an inode during that call. An fstat() checks against the inode that was originally opening (syncronizing the vnode to the underlying inode. In the case where an unlink() was slipped in between the open() then the new link points at a new and different inode. (The original inode may be, at that point, anonymous; in which case the target of our successfully called link() is also pointing at the new (now compromised) inode.

Of course I'm just speculating here ... reasoning things out from my understanding. I'm not an expert in secure programming and I can't cite any canonical sources.

I have personally seen that open(...,O_EXCL...) is NOT supported on NFS. So that's not speculation. I've read hearsay that it's supposed to work under NFSv3 ... but I haven't seen a convincing, credible statement to that effect. I don't know if it's "intended" to be supported and if there are buggy NFSv3 implemenations that fail to achieve this. In short I would recommend a more conservative approach for the foreseaable future.

I will forward this comment along to David Wheeler and suggest that he review it and consider adding anything he considers worthy and appropriate to his HOWTO on the topic ... and I would welcome any comments from others with deeper expertise. I'd be particularly interested in pointers to any stress testing harness which could be deployed to a few hundred clients to beat the tar out of any code which is supposed to be doing such things correctly. (My first test case would the the venerable old lockfile utility that ships with the procmail package. My next one would be an internally used utility that my employers are trying to fix as I write this).

[Of course I realize the essential futility of trying to prove that a given work of code doesn't have any race conditions. You can never be sure of that via any form of blockbox testing. However, I do want to be able to definitely demonstate when a program is failing to be race-free in a reproducible fashion. I've proposed a crude design for such a harness; it makes a "contest" comprised of processes which each create qmail "lock free" styled results files then busy wait on a starting sentinel (which I call the starting gun and implement as touch $LOCKDIR/BANG ... then they all contend for the lock; all the losing contestants post their results to their private files, renaming those to *.done and exiting. Then winner waits, holding the lock, until all the other contestants are "done" and then tallies up the results, searching for any other proceses which claim to also be "winners." There's some additional timeout handling. Any case where there appears to be more than one "winning contestant" means that the locking semantics being tested are definitely broken. Cases with a single winnner are inconclusive (an underlying race condition could simply have been missed, as is always the case with races). Timeouts resulting from "losing contestants" who fail to complete are indications of unreliability among the client systems, the networking infrastructure, or the filers --- but they say nothing about the locking semantics under test. Anyone who is interested in more details of my proposed test harness is welcome to contact me (I'll monitor this thread) and anyone who sees potential flaws or can suggest code which has already robustly implemented soemthing like this is especially encouraged to do so].

Jim "The AnswerGuy" Dennis

O_EXCL|O_CREAT over NFS: DON'T!!!

Posted Sep 28, 2007 8:50 UTC (Fri) by Ross (guest, #4065) [Link]

If you don't mind having temp directories, the easy workaround is to make a directory with a unique name. That is an all or nothing operation even on NFS.

Another much more complicated trick you can use if you don't have O_EXCL is to open, lstat the file, then fstat the descriptor, and compare the results. There's still a race condition of course, but now you detect it. You should only continue if they have the same device and inode number, are empty, and have a link count of 1. If they don't check out you close the file and try another name. The bad thing there is that some device files can be affected by just being open()ed (kind of a bug really). The other problem is that you probably only need this for NFS and I'm not sure how it interacts with attribute caching, though it should work locally. (This technique can also be used to open files owned by another user and verify they didn't change out from under you.)

Some systems also have O_NOFOLLOW which will help against symlinks (but not hardlinks, though it's hard to think of an attack with those).

And above all, if you use an unpredictable filename with enough characters it will make things far more difficult statistically for the attacker, to the point the race condition is only of a theoretical concern (similar to concern over MD5 checksum collisions). Use of rand() with time() or getpid() for a seed wouldn't cut it of course :)

O_EXCL over NFS: Don't!!! (Repost in HTML)

Posted Sep 21, 2007 20:48 UTC (Fri) by AnswerGuy (subscriber, #1256) [Link] (1 responses)

Quoting from the open(2) man page:

   O_EXCL When used with O_CREAT, if the file already  exists  it  is  an
	      error  and the open will fail. In this context, a symbolic link
	      exists, regardless of where its points to.  O_EXCL is broken on
	      NFS  file	 systems,  programs  which  rely on it for performing
	      locking tasks will contain a race condition.  The solution  for
	      performing  atomic file locking using a lockfile is to create a
	      unique file on the same fs (e.g.,	 incorporating	hostname  and
	      pid),  use  link(2)  to  make a link to the lockfile. If link()
	      returns 0, the lock is successful.  Otherwise, use  stat(2)  on
	      the  unique file to check if its link count has increased to 2,
	      in which case the lock is also successful.

An portion of the link(2) provides details on why a fstat() is necessary if the link() fails. (Basically there are conditions where the NFS server's RPC (success) response could fail to reach the client even after the link was created. The subsequent fstat() on the originally opened file descriptor can detect cases where the link() erroneously returned an error.

I should add that the use of stat() would be sufficient for cases where one is concerned about inadvertant race conditions --- but I think that fstat() is required for situations where one must defend against potentially hostile processes with write access to the directory in which all this locking is taking place. In other words the advice in the man page only covers the non-hostile case (suitable for non-SUID/non-SGID use in a directory which is not allowing group nor world write access.

In cases where security is a consideration I think we have to unconditionally perform an fstat() on the originally opened file descriptor. Otherwise we are vulnerable to an unlink() and recreation race. A stat() will check the file/inode which is on the underlying filesystem at the time the call is performed. So the link that's present is resolved to an inode during that call. An fstat() checks against the inode that was originally opening (syncronizing the vnode to the underlying inode. In the case where an unlink() was slipped in between the open() then the new link points at a new and different inode. (The original inode may be, at that point, anonymous; in which case the target of our successfully called link() is also pointing at the new (now compromised) inode.

Of course I'm just speculating here ... reasoning things out from my understanding. I'm not an expert in secure programming and I can't cite any canonical sources.

I have personally seen that open(...,O_EXCL...) is NOT supported on NFS. So that's not speculation. I've read hearsay that it's supposed to work under NFSv3 ... but I haven't seen a convincing, credible statement to that effect. I don't know if it's "intended" to be supported and if there are buggy NFSv3 implemenations that fail to achieve this. In short I would recommend a more conservative approach for the foreseaable future.

I will forward this comment along to David Wheeler and suggest that he review it and consider adding anything he considers worthy and appropriate to his HOWTO on the topic ... and I would welcome any comments from others with deeper expertise. I'd be particularly interested in pointers to any stress testing harness which could be deployed to a few hundred clients to beat the tar out of any code which is supposed to be doing such things correctly. (My first test case would the the venerable old lockfile utility that ships with the procmail package. My next one would be an internally used utility that my employers are trying to fix as I write this).

[Of course I realize the essential futility of trying to prove that a given work of code doesn't have any race conditions. You can never be sure of that via any form of blockbox testing. However, I do want to be able to definitely demonstate when a program is failing to be race-free in a reproducible fashion. I've proposed a crude design for such a harness; it makes a "contest" comprised of processes which each create qmail "lock free" styled results files then busy wait on a starting sentinel (which I call the starting gun and implement as touch $LOCKDIR/BANG ... then they all contend for the lock; all the losing contestants post their results to their private files, renaming those to *.done and exiting. Then winner waits, holding the lock, until all the other contestants are "done" and then tallies up the results, searching for any other proceses which claim to also be "winners." There's some additional timeout handling. Any case where there appears to be more than one "winning contestant" means that the locking semantics being tested are definitely broken. Cases with a single winnner are inconclusive (an underlying race condition could simply have been missed, as is always the case with races). Timeouts resulting from "losing contestants" who fail to complete are indications of unreliability among the client systems, the networking infrastructure, or the filers --- but they say nothing about the locking semantics under test. Anyone who is interested in more details of my proposed test harness is welcome to contact me (I'll monitor this thread) and anyone who sees potential flaws or can suggest code which has already robustly implemented soemthing like this is especially encouraged to do so].

Jim "The AnswerGuy" Dennis

My apologies for posting this twice; given the complexity of commentary, I'd intended for it to be posted in HTML for easier reading. I hope John or someone on the LWN team will delete the earlier copy of this

O_EXCL over NFS: Don't!!! (Repost in HTML)

Posted Sep 27, 2007 22:39 UTC (Thu) by cras (guest, #7000) [Link]

I've read hearsay that it's supposed to work under NFSv3 ... but I haven't seen a convincing, credible statement to that effect.

My NFS tester shows that it at least appears to work with Linux, Solaris and FreeBSD: http://www.dovecot.org/list/dovecot/2007-July/024102.html. Looking at Linux 2.6 sources it doesn't look like it tries to implement a racy O_EXCL check in client side (fs/nfs/nfs3proc.c nfs3_proc_create()), so the test's results should be correct. I don't know if other OSes do that. I guess it would be nice to have a better O_EXCL tester which tries to catch race conditions.