The joys of search wildcards in Win32 command interpreters

You've come to this page because you've asked a question similar to

My wildcards are matching more extensions than they should be. I ran the command

dir *.txt

and it matched example.txt1, example.txt2, and example.txt3. It should have only matched example.txt.

Worse, when I ran the command

del *1*

on another occasion it deleted all of my files. What's going on?

This is the Frequently Given Answer to that question.

What's going on

What's going on is that your command interpreter, and Windows, are working as designed. It's just that the design isn't straightforward. And it's all because Windows wants to allow you to run DOS programs.

On Win32, files have two names, long filenames (LFNs) and short filenames (SFNs). The long filename is what, nowadays, people are accustomed to thinking of as the name of a file. It's the name as displayed to the user by most modern Win32 tools. It's in Unicode, and it has fairly generous limitations as to length. But it isn't the name. Files can have short filenames as well. Short filenames are restricted to the 8.3 format, as inherited from CP/M via PC/MS/DR-DOS, and are in an 8-bit character set. Short filenames are, in Win32, on equal footing with long filenames.

The reasons for this are complex and long. To précis: Windows provides short filenames for the benefits of DOS and Win16 programs, which hardwire a fixed 8.3 format for filenames. Out of the various choices that can be made here, Microsoft chose to present both long and short filenames to Win32 programs and to present short filenames to DOS and Win16 programs (with an extension to the DOS API that they can use if they want to use long filenames), ensuring that every file has a short filename created for it automatically if its long filename does not happen to fit the 8.3 format. Thus as long as a short filename is used, every program, Win32, Win16, or DOS, can access the file using the same single name.

The aspect of the design that fools most people is that Microsoft's CMD, and several other Win32 command interpreters, thus match search wildcards against both short and long filenames. This produces unexpected behaviour when the short filename matches a search wildcard where the long filename for the file would not match. The unexpected behaviour has its roots in how short filenames are automatically created when a file is created with a long filename.

One of the most widely-experienced unexpected behaviours is the unexpected matching of a tilde character ('~') or the digit one ('1') when there are no such characters in the long filename. This is because on DOS-Windows 9x/ME and Windows NT, short filenames can be generated from long filenames using an algorithm that truncates the filename and then (if more than one long filename would result in the generation of the same short filename) adds a tilde character and a numerical suffix, starting from 1. This means that a lot of files' short filenames (basename portions) end in the string "~1".

Here's a directory listing of some of the files on a Windows NT 5.1 system, for example (The /X option turns on the display of short filenames. The other options are simply eliminating irrelevant parts of the output. See the documentation if you are curious.):

[c:\windows]dir /x/k/m *~1*.bmp
  4-08-04  12:00           1,272  BLUELA~1.BMP    Blue Lace 16.bmp
  4-08-04  12:00          17,062  COFFEE~1.BMP    Coffee Bean.bmp
  4-08-04  12:00          16,730  FEATHE~1.BMP    FeatherTexture.bmp
  4-08-04  12:00          17,336  GONEFI~1.BMP    Gone Fishing.bmp
  4-08-04  12:00          26,582  GREENS~1.BMP    Greenstone.bmp
  4-08-04  12:00          65,954  PRAIRI~1.BMP    Prairie Wind.bmp
  4-08-04  12:00          17,362  RHODOD~1.BMP    Rhododendron.bmp
  4-08-04  12:00          26,680  RIVERS~1.BMP    River Sumida.bmp
  4-08-04  12:00          65,832  SANTAF~1.BMP    Santa Fe Stucco.bmp
  4-08-04  12:00          65,978  SOAPBU~1.BMP    Soap Bubbles.bmp

[c:\windows]

A similarly widely-experienced unexpected behaviour is the unexpected matching of 4-or-more-character extensions by a wildcard that only specifies a 3-character extension (e.g. "*.txt" matching example.txt1 as above). This is because the extension of a short filename is generated by taking the (final) extension of the long filename and truncating it to three characters. (So files with long filenames ending in ".txt1", ".txt2", ".txt3", and so forth will all have short filenames ending in ".txt", for example.)

The situation isn't even this simple, however. This is because the algorithm used to generate a short filename from a long filename can vary. DOS-Windows 95/ME uses one algorithm. Windows NT leaves the choice of algorithm up to the filesystem driver controlling the volume, and its FAT FSD has subtly different choices to its NTFS FSD. And both systems have registry options that can modify their default behaviours.

As explained in Chapter 17 of the the Windows NT 4 Resource Kit documentation, although they start off with the same basic initial steps, Windows NT's algorithm differs from the DOS-Windows 9x/ME algorithm when it encounters more than four collisions on the same short filename. DOS-Windows 9x/ME will keep incrementing the appended number. Windows NT (versions 4 and later) instead switches to a different algorithm, that constructs a four-character hash string from the (entire) long filename in order to reduce the frequency of short filename collisions.

As also explained in the Windows NT 4 Resource Kit, applications that run under the Windows NT POSIX subsystem (which as of Windows NT 5 was unbundled from the operating syste, and re-branded as a separate package, Interix, part of Windows Services for Unix) don't generate short filenames when they create files. Short filename generation is only caused by Win32 applications (or by DOS or Win16 applications using long filename extensions to the DOS API).

The options that modify the default behaviours are, variously:

Win32FileSystem

This controls the FAT FSD on Windows NT. If enabled, the FAT FSD will restrict all names to the 8.3 form, thereby effectively eliminating long filenames on FAT volumes.

NtfsDisable8dot3NameCreation

This controls the NTFS FSD on Windows NT. If enabled, the NTFS FSD will not create any short filenames when creating files on NTFS volumes. A user with local administrator privileges on a machine can change this setting using the fsutil utility:

fsutil behavior set disable8dot3
Win95TruncatedExtensions

This controls the both the FAT and NTFS FSDs on Windows NT, but only in Windows NT versions 5.2 and later. If disabled, the FSDs will use a slightly different algorithm for generating short filenames from long ones, which substitutes tildes when converting 4-or-more-character extensions to 3-character ones. If enabled, then the usual behaviour of just truncating the extension to 3 characters occurs.

The algorithm described in Microsoft KnowledgeBase article 142982 is one of the algorithms, but, contrary to what the MSKB article might mislead one into believing, it isn't the sole algorithm and isn't the whole of the picture.

What the local fix is

How one addresses this with a local fix depends, in large part, whether one wants to retain the ability to have access to all files and directories on the system from DOS and Win16 applications. If one does, then one's options are largely limited to using JP Software's command interpreters (more of which later).

Nowadays, however, it's most usually the case that one doesn't want all of this shortname silliness. To that end, the most convenient option is to just turn off as much generation of short filenames as one can, which can be done with the fsutil command by a local administrator:

fsutil behavior set disable8dot3 1

Caveats apply, of course. This only applies to Windows NT, not to DOS-Windows 9x/ME. It only affects the NTFS filesystem driver, and won't affect FAT volumes or network volumes. And it (obviously enough) only applies to files created after the setting is changed (which requires the system to be rebooted for the NTFS FSD to note the new value of the setting, since it only does this once, at initialization).

But as long as one turns this setting on on a pristine all-NTFS system before creating one's files, the problems of short filenames matching wildcards unexpectedly can be averted.

A more universal local fix, that applies across all filesystems (not just solely NTFS) and that will work even where files have already been created with short filenames, is to use a different command interpreter. JP Software's command interpreters (TCC a.k.a. Take Command and TCC/LE a.k.a. 4NT) have an option to include or to not include short filenames when matching search wildcards, and the default is to not match short filenames. The behaviour observed above will not happen by default with JP Software command interpreters. The JP Software on-line help makes a point, in several places, that turning on short filename matching "can delete files you did not expect" and "can result in […] disastrous results".

Writing better applications

Even JP Software's command interpreters don't avoid one of the pitfalls of short filenames, which is, as Microsoft KnowledgeBase article 130694 notes, that they slow down directory scans. Even if an application (such as JP Software's command interpreters when the SFN search option is turned off) doesn't use the short filename information obtained from a directory search, the filesystem driver still has to expend the effort to look that information up in the first place.

New in Windows NT versions 6.0 and later is a FindExInfoBasic flag that one can specify in a program when calling the FindFirstFileEx() function. This flag instructs the filesystem driver to not look up the short filename information in the first place. If you are writing a new Win32 application, targetting Windows NT version 6.0 and later, and don't want all of this mucking around with short filenames and their unexpected side-effects when using search wildcards, this is the flag to use.


© Copyright 2010 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.