Can I be sure that turning on automatic generation of short file names will get me short file names?

Date:October 4, 2018 / year-entry #225
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20181004-00/?p=99895
Comments:    44
Summary:You can try hard, but it may not succeed.

A customer wanted to know if administrative permissions were sufficient to enable generation of short file names (8 dot 3). Their plan was to set the registry key, call Get­Short­File­Name to get the short file names, and then restore the registry key to the value it had.

They wanted to know if there were any group policies that would override the registry key and thereby foil their fiendish plan.

That registry key is the one controlled by group policy, so you could in theory get unlucky and a group policy refresh could occur just after you updated the value.

But wait, let's try to understand the customer's problem before coming up with a solution.

First of all, the customer's explanation didn't make sense. You don't have to change the policy in order to call Get­Short­File­Name. The short file name exists (or doesn't) regardless of the policy setting. Short file names are generated automatically at the time a file is created, and it is at the point of file creation that the policy is consulted to determine whether to auto-generate a short file name. Changing the policy for short file name generation does not retroactively add or remove short file names for existing files.

The customer clarified that they did leave out a step in their quick description:

  • Set the registry key.
  • Install their program.
  • Call Get­Short­File­Name.
  • Restore the registry key.

They installed their program with short file names enabled so that the Get­Short­File­Name would indeed get a short name. They use this short name to ensure that there aren't any spaces-related command line parsing errors.

Okay, the correct way to fix spaces-related command line parsing errors is not to make the spaces go away. It's to fix your command line parser so you don't choke on file names with spaces in their name!

The customer explained that their program is just fine with files that have spaces in their name, but their program is a utility program, and it is used by other programs and scripts. Those other programs and scripts have a Unix heritage, and Unix file names rarely have spaces in their name. Consequently, those programs and scripts tend to have poor support for files with spaces in their name.

The customer was hoping to force generation of short file names during their program installation, so that those external programs can be given a spaces-free path to the program.

Okay, so the first note is that there is no way to absolutely guarantee that there will be a spaces-free short name for a file, because support for short file names is gradually fading away. Short file names were originally created to maintain backward compatiblity with 16-bit programs, but the population of 16-bit programs has been dwindling for quite some time, especially since 64-bit Windows doesn't support them natively. ReFS and exFAT don't support short file names. Network shares from Unix systems rarely do. In general, any file system invented in the past 15 years or so will most likely not support short file names.

Furthermore, since short names are auto-assigned at the point of creation, it means that if the user installs the program into a pre-existing directory that lacks a short name, then you're not going to have a short name for the full path. Setting the registry key will not be sufficient by itself.

The customer thought about the situation for a while and came up with a different solution: If the user chooses to install the program into a path that contains spaces, then their installer also creates a symbolic link in a path with no spaces¹ that points to the installation directory.

¹ They didn't say exactly where, but I suspect they put in in C:\ProgramData.


Comments (44)
  1. Joker_vD says:

    > Those other programs and scripts have a Unix heritage, and Unix file names rarely have spaces in their name.
    Having word-splitting enabled by default during variable expansion was one of the sh’s worst design decisions. That’s why sooner or later people invariable switch from (ba)sh scripts to Perl or Python or whatever other language where strings don’t randomly explode in your face.

    1. Kevin says:

      Unfortunately, it’s not that simple. There are numerous ways that a simple find … | xargs … can blow up in your face, most having nothing to do with spaces (ASCII 0x20). In particular, LF and CRLF are both legal in Unix file names, which confuses xargs (I’m ignoring the obvious problem of “what if my filename starts with a dash and gets interpreted as a flag” because that’s an easy fix). Then you have display fun such as ls either breaking your terminal or unconditionally munging weird file names. Not to mention lots of legal names are really hard to type, and you can even have raw byte sequences that aren’t legal under any character encoding. GNU tools have nonstandard workarounds for some of these issues under some circumstances, but if you want it to work under POSIX, your best bet is to give up and stop writing shell scripts altogether.

      Windows is not completely innocent either, seeing as you have to quote the arguments you pass to CreateProcess() (whereas Unix only uses quoting for the benefit of the shell – if you quote the arguments to exec*(), it will interpret the quotes literally and choke). And I’m sure someone with more batch programming experience than me can come up with all kinds of fun misfeatures of cmd.exe (which I’m sure we will all stop using just in time for the heat death of the universe).

      TL;DR: All systems are terrible, just in different ways.

      1. Scott H. says:

        One of my favorite moments was when I somehow managed to get a file named * in a Linux directory. Needless to say I was very careful getting that deleted.

      2. alegr1 says:

        Windows CreateProcess has pretty close quoting rules to posix system() call. If you supply NULL for the first argument which everybody does. What’s unusual in CreateProcess is that it’s trying to deduce is you forgot to quote the program name.

        1. Kevin says:

          That’s true, but misleading, because system() is just a convenience function that invokes the shell (which then calls exec*()). The standard syscall for executing a program is execve() – ~everything else is a frontend to it.

      3. Bulletmagnet says:

        > There are numerous ways that a simple find … | xargs … can blow up in your face, most having nothing to do with spaces (ASCII 0x20)
        If you have GNU utilities, you can do find … -print0 | xargs -0 …
        If you don’t have GNU utilities, too bad.

        1. Joker_vD says:

          Ah, GNU utilities, amazing things they are. For example, “diff -u” can compare two files with spaces in their names and produces a unified diff file which “patch -u” can *not* apply! See https://www.gnu.org/software/diffutils/manual/html_node/Unusual-File-Names.html

  2. morlamweb says:

    Global solution, meet a local problem. I seem to recall hearing about the inherent problems of this sort of “problem-solving”. I wonder where…

  3. Scott H. says:

    Not to mention, MS-DOS/8.3 supported spaces in filenames. So even if their plan succeeded and they got short names for everything, they couldn’t guarantee space-free names!

    1. Erik F says:

      While it’s true that spaces were supported, most programs had big problems with them: many assumed that spaces weren’t allowed, so you’d receive a message saying that the filename wasn’t valid. Of the programs that I used, I think that WordPerfect (possibly GW-BASIC too?) was the only one that actually allowed you to use files that had spaces in their names.

    2. Yuhong Bao says:

      It was not common though, though the “EA DATA.SF ” file used it.

    3. ender9 says:

      While DOS did indeed support spaces in filenames, the generated short names on Windows never contain them

  4. Harold H20 says:

    >>”the correct way to fix spaces-related command line parsing errors is . . . . . . to fix your command line parser so you don’t choke on file names with spaces in their name!”

    I am constantly amazed by this. Spaces in filenames have been a thing for 23 years, and yet we still have problems with it.

    1. Antonio Rodríguez says:

      Many programmers, especially developers of corporate or governmental applications, fail to pay their taxes. Just install Windows to a drive different than C: or give your account’s user name two words separated by an space (which is perfectly legal) and you will find many programs which break.

      Spanish Ministry of Treasury distributed for years a free application to prepare your Income Tax declaration. It had millions of downloads every year, but it didn’t handle correctly spaces in file paths, and it blocked the upload of the declaration if your user name had a space. The application installed itself under C:\AEAT\ (even if Windows was in other drive!), so the developers probably were aware of the parsing bug, but it generated the temporary file for the declaration in the user’s AppData folder.

      1. alegr1 says:

        Windows drive is always named C:, since I think Windows 7.

        1. ender9 says:

          Not when you do an upgrade (and until Windows 7 when you run the installer from existing Windows installation, even if you choose to do a clean install).

        2. DWalker07 says:

          Where did you get the notion that the “windows drive” (do you mean the drive Windows is installed on?) is always C? That’s not true at all. Windows can be installed to the D drive, or to a drive that uses any letter you want.

          1. alegr1 says:

            You can install Windows on any drive or partition, but when you boot it, it will get drive letter C

    2. Joshua says:

      Windows quoteing rules are obscure and some commands deliberately do it differently.

    3. Richard says:

      Cmd Batch files have a lot of trouble too, even now. Partly because its string handling barely exists, but also for the same reason (ba)sh can have trouble.

      Hence PowerShell, Python and Perl which are all considerably more wonderful but have the nasty downside of not existing on a lot of Windows installations.

      1. Nick says:

        PowerShell has been installed by default in every version of Windows since Windows 7 and Windows Server 2008 R2. If you’re stuck using versions of Windows before then, I’m sorry, I hope you’re not stuck there for much longer.

    4. Luaan says:

      ASCII also should have died a long time ago, but even newly released applications often have trouble with non-US encodings, especially Unicode. As I found to my chagrin when I decided to create a Windows account linked to my Microsoft account in Windows 10, which created my Windows profile folder with my real name – which contains non-ASCII characters. My Users folder contains about ten different mis-encodings of my name (including having the offending characters URL encoded), each of the broken applications having its own way of mishandling paths. Some outright don’t work, so I had to create another account with an ASCII name and use “runas” to run those. In 2018.

  5. “In general, any file system invented in the past 15 years or so will most likely not support short file names.”

    Yet the setup program of Windows 10 (really: the routines which handle DISM.exe /Apply-Image) turn generation of short filenames ON – even for the AMD64 processor architecture – and thus overrides both the documented registry setting as well as the volume-specific setting which can be set via FSUTIL.exe or FORMAT.exe /FS:NTFS /S:Disable

    1. skSdnW says:

      Could it be for compatibility? Even new 64-bit systems probably work better if progra~1 exists and is the short name for Program Files. Some silly people have probably hard-coded “c:\progra~1” or “%windir%\..\progra~1” in some scripts/tools.

      1. Dan says:

        Search your registry for “PROGRA~1”, and you’re likely to find plenty of hits.

        1. Not a SINGLE PROGRA~1 (or any other short name) here since 22+ years!

      2. It might by anything a weird mind can come up with, but it’s useless to speculate, since M$FT did not document it.
        Before Windows 10, generation of short filenames was NOT forcibly turned on.

        Please define “works better”!
        The 22+ year old “Designed for Windows Guidelines” (well: their update for 64-bit architectures) explicitly state that 64-bit applications MUST support long filenames.

        JFTR: I routinely turn short filename creation off during setup since more that 20 years. There were some “silly” programs which used hard-coded short filenames in the past, but I haven’t seen one in the last ten years or so. Besides that, they would have failed anyway, since there is NO guarantee that the (automatically generated) short filename for “Program Files” will always be PROGRA~1

  6. morlamweb says:

    If the customer can control the installer to the point where they monkey around with the reg key as part of their install process, why can’t they simply enforce space-less paths in their installer? It’s their program and their compatibility constraint; why force the file system to create short file names for all other programs just for their needs? Also, consider what would happen if the installer crashed during the installation process: 8.3 file names would be left enabled unless someone or something comes along to disable it.

  7. DWalker07 says:

    “… then their installer also creates a symbolic link in a path with no spaces¹ that points to the installation directory.”

    What if the target system doesn’t have any paths with no spaces? The ProgramData path can get renamed. Not easily, but it can…… Maybe the installer will try to create a path with no spaces, in order to store the symbolic link? If the installer is able to create a directory in the root of some drive, that is.

    I have seen an example of this: A program that HAS been updated recently, but which relies on some ancillary packages that have not been updated in 20 years or so.

  8. florian says:

    My spinal reflex during reading was that the customer could use a hardlink to solve his problem, but a symlink seems better to preserve the hierarchy original vs. link.

    Short file names allow accessing full paths with at least 32 nesting levels (or even more, if some of the individual components are shorter than 8 characters; the total length is MAX_PATH). So these are also the minimum prerequisites to test programs with “super long paths” beyond MAX_PATH, to make sure the short file names won’t come to the rescue, as they seem to be used automagically by Windows Explorer or the Win32 API, if necessary.

    For this testing scenario, I can only come to think: really?

  9. Ben says:

    This is another reminder that filenames are not just an arbitrary sequence of bytes.

    They are part of the user interface.

    That means they are subject to restrictions on names, and motivations for names, on that account. That’s why you can’t use \u0001 in a filename and also why you have to handle spaces.

    If you need your filename to encode arbitrary data, hex encode it.

  10. Adrian says:

    Your suspicion about them selecting ProgramData got me thinking – it’s not guaranteed to be a path with no spaces, is it? Can user relocate this directory without using symlinks? Is it guaranteed for all localized Windows versions to keep a space-free name? It only takes a single rogue translation to break this (somewhat sketchy) assumption.

  11. cheong00 says:

    > Those other programs and scripts have a Unix heritage, and Unix file names rarely have spaces in their name.

    I have question on that. *nix program have been accepting space in file/directory name since forever. I remember that in school I’ve had a few time create file with “\ ” in filename without a problem, and you don’t even need to surround it with double quote.

    I’d suggest them to try run their program or script in Bash of Cygwin or even that come with Git instead.

    1. Simon says:

      Oh, UNIX itself has no problem with spaces in filenames… they’re totally legal. But spaces in filenames do cause headaches in shells, with the result that people tend to avoid using spaces in any file that might be accessed from the shell, and since they therefore don’t encounter a lot of them, the scripts they write don’t tend to handle it well.

      1. Just try to write a Makefile with whitespace in file names… https://stackoverflow.com/q/9838384/214671

      2. BTW there are shells which handle filenames in a way more intelligent way… my personal favorite is fish, which, between other awesome features that work out of the box (the autocompletion is frigging magic, I suspect it has some direct connection with the brain) avoids all the mess with whitespace-splitting after expanding variables/globbing. All variables are actually lists, and when they are put on the command line the single elements are kept as they are, without re-splitting them on whitespace; same for the output of globbing.

  12. BobVul says:

    Hm. I wonder if avoiding spaces is the reason a certain <bean-derived drink> programming language’s most common runtime puts its (PATH-listed) launcher in ProgramData.

  13. aitap says:

    > Those other programs and scripts have a Unix heritage, and Unix file names rarely have spaces in their name.

    I wouldn’t phrase it like that. The problem is that the system calls to launch a program in a Unix-like system expect an array of command-line arguments, while CreateProcess only takes one string. Mistakes arise when one expects the environment to take care of safely passing the command line arguments as opaque zero-terminated strings, while the rules say that the application being launched is responsible for how a single string passed to it is going to be parsed.

  14. Baltasar says:

    > Okay, the correct way to fix spaces-related command line parsing errors is not to make the spaces go away. It’s to fix your command line parser so you don’t choke on file names with spaces in their name!

    I don’t even think windows supports spaces in file names correctly, provided it tries to find “c:\program”, “c:program and”… blindly. Spaces in file names was a bad addition poorly implemented.It would have been more sensible to design a codification involving ‘_’,, for example.

    1. It doesn’t do it blindly. It does it only if the standard interpretation (stop at the first unquoted space) fails.

  15. Brian says:

    What I have a hard time figuring out are the results from calling (from a .NET language): Path.GetInvalidFileNameChars and Path.GetInvalidPathChars). A couple of the differences between the two sets of “invalid characters” are that while both “?” and “*” are illegal in file names, they are legal in path names. That surprised me. I have no idea what would happened to most apps if you included an asterisk in a folder name.

    1. Eryk Sun says:

      > while both “?” and “*” are illegal in file names

      In practice I think it’s always true that wildcard characters are reserved in file and directory names. However, it does depend on the file-system driver and the extent to which it relies on the file-system runtime library for working with names (e.g. FsRtlIsNameInExpression).

      The only character that’s reserved in a path up to and including the device name is backslash, which is reserved by the kernel’s object manager. This means you can use DefineDosDevice to assign unusual names to volume devices, including names that include slash (not backslash), wildcard characters, and control characters. From the Windows API and common shells, such unusual device names are only usable (if at all) within a fully-qualified, local-device path (i.e. prefixed with “\\\\?\\” or “\\\\.\\”). The native NT API can even support null (‘\0’) in names, at least in principle, because the system uses counted strings. However, we can’t use the Windows API with such path components because it’s based on null-terminated strings.

      Once parsing reaches the file-system device and attached filter devices that manage a volume device, the associated drivers have complete control over parsing the remaining path. For the sake of sanity, a file-system should reserve backslash, slash and null. Also, if it expects to play nice with how FindFirstFile[Ex] and NtQueryDirectoryFile[Ex] (i.e. IRP_MJ_DIRECTORY_CONTROL: IRP_MN_QUERY_DIRECTORY) are used, it should reserve the five wildcard characters in file and directory names. This includes asterisk, question mark, less-than sign (DOS_STAR), greater-than sign (DOS_QM), and quotation mark (DOS_DOT).

      NTFS additionally reserves colon as the file-stream delimiter (i.e. FileName:StreamName:StreamType), vertical bar (pipe), and control characters (ordinals 1-31). In stream names, NTFS disallows colon, backslash, slash, and null. The VBoxSharedFolderFS file system (installed in a VirtualBox guest system) reserves backslash, slash, null, and the 5 wildcard characters in filenames. In contrast to NTFS, it allows colon, vertical bar, and control characters in filenames, which facilitates working with a POSIX host system.

  16. Ian Yates says:

    I was thinking symlink as a dodgy workaround. They work wonders for so many things.

    I had a customer with a medical device running Windows 7 on there. Running out of space, and the app had hard-coded local path for its data. Its data was separated into *many* subfolders.

    So I just moved a bunch of those folders to a NAS. Then enabled Windows to allow symlinks to point to network drives (other machines on the network would access this device and expect the data to be available via a share), and problem solved. Saved the customer many $1000’s

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index