Best method (mv/cp) to replace an active file in linux?

Posted on

Problem :

I am making a script for updating a text include/conf file for an active service. The script will first write changes to a temp file. When done, to replace the include file with the temp file, considering the service is active, is it better to use cp, mv, echo > or other? It is not clear to me how it works with programs holding on to file handles.

If the answer is it’s not possible if something is holding the file handle, then next best. Assume the service is just opening, reading, closing. What is the safest method to get the file replaced?

Almost forgot, im using PHP for the script, is there a difference in using the built in php move/copy methods vs bash mv / cp?

Solution :

Analysis

After a program (e.g. the service in question) obtains a file handle, the handle refers to the same file identified by its inode number in the filesystem that holds the file. The pathname of the file doesn’t matter.

The pathname will change if the file gets renamed or moved within the filesystem. The pathname will disappear if the file gets unlinked (deleted) or moved to another filesystem (this operation is in fact copying, then unlinking). None of these matters. The handle will stay valid even if the link count drops to zero (meaning there is no longer a pathname in the filesystem that leads to the file). The filesystem will truly get rid of the file only if the link count is zero and nothing uses the file.

If you write something to the file with whatever … > the_file or cp foo the_file then the content of the file will change, but “the file” will still mean the file with its unchanged inode number. The old handle will refer to the modified file. The program holding the handle will be able to see the changes immediately.

One big problem with this approach is the program can by chance operate on the file being in an invalid state. By “invalid” I mean “invalid by the logic of the program”, not “broken in the filesystem”. E.g. > truncates the file first, so its size is zero, only then something gets written. The program may spot the file being empty while it should never be empty by the logic of the program. In general you may overwrite or modify without truncating or with truncating (to the desired size) at the end, so at some moment the file may be half-old, half-new. Anyway the program cannot easily know if the file is complete and valid at any given moment. When it comes to config files a sane approach is to allow programs to assume the file is valid and not to update configs this way.

An alternative way of updating is with mv. When you do mv foo the_file and both foo and the_file are in the same filesystem, the_file gets replaced by foo atomically and the result is the content of former foo under the name the_file.

“Atomically” here means if any program opens the_file anew at any moment then it will obtain a file handle that will lead to the old content or to the new content. There is no way to spot a mixed state or otherwise mangled content because such content never exists.

Think of it as of hardlinking foo to the_file (this act switches the_file filename from its old inode to the one of foo) and then unlinking foo. “Atomically” in this context does not mean you cannot spot foo and the updated target file existing simultaneously. It only means the_file is always either all old or all new, there is no intermediate state.

Notes:

  • mv between filesystems is like cp + unlink, it behaves like cp (discussed above).
  • See Is mv atomic on my fs?

Because the_file filename is now associated with another inode (the one foo pointed to before it got unlinked), all file handles to the_file from before mv still point to the old file. The old file still exists in the filesystem even if there is no pathname to it. It can be read, written to; it can grow. It’s a separate file, different than what you see as the_file now. It’s different in the same way the_file earlier was different than foo.

This means updating a config file held open by a service will be futile if done this way, unless the service reopens it by path and reads the config again.

When updating without switching to another inode (e.g. with cp), you need some mechanism to tell the service to stop using the file (which may or may not include closing the file) and to tell it to continue using the file later. Otherwise there’s a risk the service uses some intermediate state of the file. The most basic mechanism is simply to stop the service and start when ready. The service may support some mechanism to do it without stopping but I wouldn’t count on it.

In general allowing intermediate states to exist in the filesystem is risky. In case of power failure or another interruption (e.g. rash Ctrl+C) you may end up with invalid file. I already explained “invalid” as “invalid by the logic of the program”, not “broken in the filesystem”, but then we were talking about a situation where the file was invalid only temporarily, it was going to be valid eventually. Now we’re talking about a situation where the invalid state survives because the process that was going to turn one valid state into another terminated in the middle.

mv allows you to avoid invalid states. With mv you definitely need some mechanism to tell the service to reload the file by path; otherwise it will stick to the old file. The most basic mechanism is to restart the service. The service may support an alternative, e.g. sshd rereads its config upon SIGHUP, without restarting. This is the Right Way.

Note many depends on how the service behaves:

  • The service may read the file once, configure itself and never read again, unless restarted or asked to (if supported).
  • Or it may read the file again sometimes for whatever reason by using the same file handle.
  • Or it may read again sometimes for whatever reason by opening anew.
  • It may monitor the file and reread it when it changes. In general it’s not easy to tell when a change is complete, so I would tell this approach is flawed.
  • The service may monitor the filename and reread it when it starts pointing to another inode. This works well with the mv method of updating, but only for a single file. If configuration can be stored in many files then the service should reread it only on demand. With mv you can atomically update any single file but not a multi-file config as a whole.

Explicit answer

What is the safest method to get the file replaced?

Definitely mv within the same filesystem. You need support from the service if you want it to work with the new file without being restarted though. I don’t know how your service behaves.

Is there a difference in using the built-in PHP move/copy methods vs Bash mv / cp?

(Side note: mv and cp you call in Bash are standalone executables not related to Bash.)

I don’t know PHP, I cannot answer this. I think there is no reason for its move method not to be atomic, but I may be wrong. I hope the information from this answer will help in your own research. Now you know what to pay attention to.

There are two concepts here: File-name and file-data.

If a file is deleted by rm, it loses its name, but its data is not freed until the
last program that opened it has stopped.
However, if the file is opened in an exclusive manner, it cannot be manipulated.

So, if the file cannot be deleted, a known workaround is to rename it instead.

In your case, where the other program is opening the file continuously,
you risk finding yourself in a situation where you deleted the file, but
haven’t copied the new file in, so that the other program will find the file as
missing.

If the file being missing is not a problem, I would do it like this:

  • Rename the file to some temporary name
  • Copy the replacement file in its place
  • Delete the temporary name.

If this method does not work for you, you will need to use a semaphore mechanism.
You could possibly use the
lockfile
command as a lock between the two programs. This example comes from the man page:

...
lockfile important.lock
...
access_"important"_to_your_hearts_content
...
rm -f important.lock
...

Leave a Reply

Your email address will not be published. Required fields are marked *