When Windows copies a file, does it ever copy bytes that are in the slack space?

Date:February 13, 2018 / year-entry #37
Tags:tipssupport
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20180213-00/?p=98015
Comments:    31
Summary:Keeping tabs on the slackers.

A customer who works with highly sensitive information wanted confirmation that when they copy a file with Explorer, Windows will copy only the data that logically belongs to the file and no data that happens to reside in the slack space.

Slack space refers to physical storage allocated to a file but not used to hold any file contents. Slack space typically appears if the last unit of file data storage is not filled with file data. For example, if you have a cluster size of 4KB, and the file is 10KB in size, then one way of storing the information is to allocate three clusters: The first cluster holds the first 4KB of data; the second cluster holds the next 4KB of data, and the third cluster holds the last 2KB of data. The last 2KB of the third cluster is unused.

The concern is that the last 2KB of the third cluster, which formally does not contain any file data, may nevertheless be copied to the destination file (also as slack data). When copying the last piece of the file, is it possible that Windows reads the entire cluster (even the slack space), and then writes the entire cluster (including the slack space) to the destination? If there is any confidential information in the slack space of a non-confidential file, then could copying the non-confidential file inadvertently copy the confidential information?

Fortunately, the answer is no.

The contents of the slack space are not visible outside the file system driver itself. File copying is handled at a higher level. For example, the Copy­File function has access only to user-mode-visible file contents. It cannot see the slack space in the source file, and therefore cannot copy it.

Even in the case of offloaded data transfer (ODX), the code that performs the transfer communicates with the file system, and the file system driver won't let anybody see the data in the slack space. Therefore, the transfer will not transfer any data that resides in the slack space of the source file.

On the other hand, if you are copying data by operating at a level below the file system driver, then the file system driver can't stop you from seeing the slack space. For example, if you use direct volume access and read sectors straight off the hard drive, you will see everything on the volume, including slack space.


Comments (31)
  1. kantos says:

    It would seem to me that the only way to get confidential information in the slack space in the first place would be to do a non-secure erase. While it would be nice if windows had such functionality available by default, it doesn’t. That said there are numerous utilities out there that DO have such a capability. That said you need to be aware of your medium before using them as they can cause unnecessary wear and tear.

    1. Clockwork-Muse says:

      ….except most cases of needing a secure erase (such as selling a drive you no longer need) are covered adequately by drive encryption. For everything else, a “secure erase” utility is going to bump up, hard, against the drive maintaining itself – defragging, move due to bad/expired sectors, etc – you’d have to regularly clean up the _entire_ drive for such scenarios. It’s just so much easier to 1) encrypt the drive, and 2) have the os properly handle “extra” bytes in these situations.

      1. And that’s only considering magnetic hard drives. On an SSD it’s worse because of wear-leveling algorithms. Overwriting a file will almost never overwrite the same bytes on the drive, so you effectively have to secure-erase all of the unallocated space on the drive in order to “securely erase” the data. Which is also a great way to significantly reduce the lifespan of said drive.

    2. mikeb says:

      Another possible way to get information in slack space might be: 1) write a bunch of data to a file; 2) truncate the file to a shorter length (maybe with `SetEndOfFile()`).

      Note: I’m not saying that Windows systems do leave whatever information is in the cluster there when truncating – I don’t know the actual behavior. But it wouldn’t surprise me if the data got left in the cluster.

      Interestingly, the docs for SetEndOfFile() mention that there are three size-related attributes for a file stream:

      file size
      allocation size
      valid data size

      This article talks about when (file size < allocation size), but what about when (file size < valid data size)? The docs for `SetFileValidSize()` says:

      If SetFileValidData is used on a file, the potential performance gain is obtained by not filling the allocated clusters for the file with zeros. Therefore, reading from the file will return whatever the allocated clusters contain, potentially content from other users. This is not necessarily a security issue at this point, because the caller needs to have SE_MANAGE_VOLUME_NAME privilege for SetFileValidData to succeed, and all data on disk can be read by such users. However, this caller can inadvertently expose this data to other users that cannot acquire the SE_MANAGE_VOLUME_PRIVILEGE privilege if the following holds:

      If the file was not opened with a sharing mode that denies other readers, a nonprivileged user can open it and read the exposed data.
      If the system stops responding before the caller finishes writing up the ValidDataLength supplied in the call, then, on a reboot, such a nonprivileged user can open the file and read exposed content.

      If the caller of SetFileValidData opened the file with adequately restrictive access control, the previous conditions would not apply. However, for partially written files extended with SetFileValidData (that is, writing was not completed up to the ValidDataLength supplied in the call) there exists yet another potential privacy or security vulnerability. An administrator could copy the file to a target that is not properly controlled with restrictive ACL permissions, thus inadvertently exposing the extended area’s data to unauthorized reading.

      Sounds like a pretty unlikely scenario, but maybe one that the customer who originally asked the question might still want to be aware of.

    3. Antonio Rodríguez says:

      I’ve never understood the need for secure erase in consumer computing. If your application depends on confidential data, then you should not sell the used hard drives or equipment. Rather, you should store them securely. And if you have to dispose them, destroy them before – and a simple hammer allows to render any hard drive nonoperational (SSDs are a horse of a different color). If you rely on confidential data, you should assume the (small) cost of not selling used hardware.

      Of course, data recovery companies are able to repair hard drives with damaged mechanics (hammer or else). But they are also able to recover many types of “secure erases”, except the most thoughtful variations. The ones that, by the way, take a loooong time to complete: calculate how much time it takes to write all over a 2 TB hard drive, and multiply that by 99. You’ll get that the average 2 TB hard drive take about 20 days to be securely erased (at a maintained speed of 120 MB/s, 24 hours a day).

      1. alegr1 says:

        Information recorded on modern drives is unrecoverable after a single erase. There is no recoverable residual left.

  2. Alan says:

    My first thought on seeing the title was “Who cares?” Upon reading the context, yeah, that’s a good question, and I’m glad to know the answer.

    1. Well, my first thought was “obviously not”! Has this person never copied data from disk with 8 kB cluster size to a disk with 4 kB cluster size?

  3. creaothceann says:

    IIRC when people were collecting the ROM contents of video game console cartridges like the SNES, they started to see source code in them. Games were sent to Nintendo on DOS-formatted floppy disks. One theory is that the ROM writing process simply copied whole sectors, including ‘erased’ data.

    See the category “Games with uncompiled source code” on the website “The Cutting Room Floor”.

    1. BOFH says:

      The original Commodore Amiga 1000 was accidentally shipped with source code fragments in the slack space of the Kickstart 1.0 diskette:
      http://www.pagetable.com/?p=34

    2. ender9 says:

      Microsoft’s own floppies contained things like e-mail fragments in slack space.

  4. 12BitSlab says:

    Raymond, thanks! This is good to know!

  5. Joshua says:

    It turns out you can see into the slack space from usermode by playing games with SetFileValidData. See https://msdn.microsoft.com/en-us/library/windows/desktop/aa365544(v=vs.85).aspx

    1. Tim says:

      I guess that’s technically correct, but the usermode process needs an assist from an improperly secured process with the correct permissions to call SetFileValidData. The issue there is in the hypothetical buggy application, not the filesystem.

    2. Beldantazar says:

      Sure, but you have to be an administrator to do that, which means you could just do whatever else you wanted to get at the sensitive data anyway.

    3. If you can use that function, you’re already way on the other side of that airtight hatchway, though. You might as well just read sectors until you find something interesting at that point.

      1. Joshua says:

        I wasn’t talking about security. Hint: user mode not unprivileged user.

        1. Then the confusing part of your claim is that this is some novel technique, compared to just raw reading all data via whatever sector API you prefer.

  6. IanBoyd says:

    This is probably one of the scenario’s where you could answer the question by asking, “What would happen if it did copy slack space?”

    We know that when you create a file, the file contents are zero-initialized by the filing system. If you attempt to:

    – create a file
    – seek 100 MB forward
    – write 4 KB
    – seek to the beginning

    You will find that your first 100 MB contain zeros. That’s because:

    – the *valid* length of your file is only valid up until the last spot that you wrote
    – and any place you didn’t write data is going to be zero

    Attempting to read past the end of a file will result in EOF – no data.

    If you’re an administrator, you can bypass the file-system’s zero-initialization by calling `SetFileValidData(handle, 100*1024*1024)`. This lets you read old data on the hard drive; which is why it’s limited to administrators. (Technically someone with the SE_MANAGE_VOLUME_NAME right). This feature can be used by SQL Server (Instant File Initialization) to grow a file instantly without having to wait for the file system to zero all the new pages.

    If file copy *could* read slack space, it would mean:

    – a file has a length beyond it’s end-of-file (which isn’t how it works)
    – users can read slack space (which isn’t how it works)

    1. Brian_EE says:

      >This is probably one of the scenario’s where you could answer the question by asking, “What would happen if it did copy slack space?”

      It’s more like one of those scenario’s where the PHB wanted “something official” from Microsoft, so the minion had to ask the question he knew the answer to.

      1. Tim says:

        I don’t know. A good programmer probably has some intuition that it’s probably impossible to copy “slack space” in a userland application like Explorer, but there are a lot of underlying assumptions there. For example, that’s assuming there isn’t any kernel or filesystem special API for “fast copying” of files wherein the slack data isn’t abstracted away and perhaps would be copied in some situations.

      2. Harry Johnston says:

        I don’t think it’s that obvious that the copy is always done in user-mode. I mean, you could look at the File System Drivers documentation and note that there’s no IRP_MJ_COPY control code, but that’s hardly conclusive.

        1. IanBoyd says:

          > I don’t think it’s that obvious that the copy is always done in user-mode.

          It’s certainly not done *in* user-mode; but it’s done *by* user-mode.

          And the rule for users in user-mode is that they can’t see slack space. The implementation, wherever it is done, will follow that rule.

  7. alegr1 says:

    Not so simple. When you read the last sector of a file opened with FILE_FLAG_NO_BUFFERING, the whole sector gets read to memory, even though the file may end in the middle of it. Same thing happens when you memory-map the file.

    1. John Doe says:

      So, did you read the slack space?

    2. I’ve just tested sample code, and this is emphatically NOT TRUE. ReadFile will not read more than the actual size of the file into the buffer; if your buffer is initialized to 0xCC, it’ll still be mostly 0xCC if you read a 10-byte file, even if the actual slack space of the file is zeroes or some cryptographic key. The airtight hatchway is still sealed.

      You can test yourself:

      #include
      using namespace std;

      int main(int argc, char* argv[]) {
      char* buf;
      buf = (char*)_aligned_malloc(4096,4096);
      memset(buf, 0xcc, 4096 * sizeof(char));

      HANDLE hIFile;
      LPDWORD actualsize = 0;

      hIFile = CreateFileA((LPCSTR)argv[1], GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL
      | FILE_FLAG_NO_BUFFERING
      , NULL);
      ReadFile(hIFile, buf, 4096, NULL, NULL);

      return 0;
      }

      Little programs, no error checking, etc, etc. Break on the return and examine the contents of buf.

  8. cheong00 says:

    If I were asked such question, I’d probably answer with something like:

    It’s much like when you have to copy data in an array to another location. There could be other potentially sensitive data at the allocated and not used block of memory assigned to the array*, but when you copy an array, you only loop copy the content of assigned part up to the length counter and don’t copy beyond the boundary, so unless you’re using block memory copy instruction / API that “ignores context of the array at all” the answer is “no data in the unassigned section will be copied in the copy operation”.

    * assumes the programming language does not require zero out memory before giving to the code.

  9. Neil says:

    What scenarios would get bytes into the slack space? I suppose file truncation would do it, but are there others?

    1. M Hotchin says:

      Sector re-use after a file is deleted. Most file systems do not overwrite a file’s contents on deletion.

  10. Kirby FC says:

    >>Antonio Rodríguez
    >>Of course, data recovery companies are able to repair hard drives with damaged mechanics (hammer or else). But they are also able to recover many types of “secure erases”, except the most thoughtful variations.

    This is a common misconception that hasn’t actually been true for many years. At one time, ~20 years ago, it was theorized that you could use Magnetic Force Microscopy or Scanning Tunneling Microscopy to image bits recorded on magnetic media and recover data data that had been over-written (see “Secure Deletion of Data from Magnetic and Solid-State Memory”, written by Peter Gutmann in 1996). However, there is no documented evidence that this has ever actually been done.

    But, that’s now irrelevant. Because of increased data density on hard drive platters, any hard drive (not SSD) manufactured in the last 10+ years can be rendered “securely erased” with a single over-write of random bits. Of course, if you are really that paranoid about what’s on your old hard drives, then you are correct, you shouldn’t be selling or giving them away.

    1. alegr1 says:

      Many enterprise-grade drives have an “instant secure erase” feature, where an internal encryption key gets overwritten, instantly making the whole drive contents unrecoverable.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index