Date: | May 16, 2006 / year-entry #167 |
Tags: | tipssupport |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20060516-07/?p=31193 |
Comments: | 31 |
Summary: | The magic characters like <, >, and | in command lines like myprogram.exe | sort > output.txt are interpreted by the command interpreter CMD.EXE; they aren't built into the CreateProcess function. (This is obvious if you think about it. That command line created two processes; which one should CreateProcess return a handle to?) If you... |
The magic characters like <, >, and | in command lines like myprogram.exe | sort > output.txt
are interpreted by the command interpreter
If you pass a command line like this to cmd.exe /C myprogram.exe | sort > output.txt Since different command line interpreters use different syntax, you have to specify which command line interpreter you want to use.
If the command line came from the user, you probably want to use
the |
Comments (31)
Comments are closed. |
Or you could implement the pipe and redirection yourself — this is not necessarily easy (in fact I have no idea how to do it; I assume CreateProcess takes some info that determines where a process’s stdin comes from and where its stdout/stderr go to, but I don’t know for sure), and you’ll be duplicating code that’s already in cmd.exe. But it is (probably) another way to do it. ;-)
If you want shell interpretation performed, I think the proper thing to do is simply to call the system() function.
Did old versions of DOS have redirection defined by the myprogram.exe? I recall simplistic redirection being explained in terms of parameters when you typed "EDIT /?" <shudder>
When I first started using PCDOS, it seemed bizarre to me that Command.com would interpret some special characters, like < and >, but not others, like * and ?. On Unix they were all done by the shell.
Dave Harris: I believe the wildcard characters (* and ?) were not expanded by the shell in DOS due to the limited RAM available for the process block (the location where the command line parameters were stored). Wildcards can expand out to hundreds if not thousands of files, overflowing the PCB. This means that expanding wildcards was always left up to the application. It persists in this manner simply for backward compatibility.
Doesn’t system() just run Raymond’s command line for you on Win systems? (I.e., %COMSPEC% /C "whatever string you pass to system()".)
That’s what it does on Linux, at least — it forks, then runs /bin/sh -c "whatever string you pass to system()", and then waits for that child process to exit. (Note that the quotes — or something emulating them — are *required*; the -c option requires *one* slot in the argv array for its argument. I believe cmd.exe’s /C option is the same, but I don’t know for sure. OTOH, WinMain receives a single string, not an argv array, so maybe it doesn’t matter.)
On Windows I doubt it can wait for the process to exit, because of Raymond’s post a couple weeks ago about "what can you do with the HINSTANCE returned by ShellExecute?". Unless it uses CreateProcess instead of ShellExecute; I’m not sure if that makes a difference in this case or not.
Search msdn for sample code using the phrase
"Creating a Child Process with Redirected Input and Output" (I have a CD) and it will tell you how to shell woth redirection without cmd.
BryanK, the system() function probably just calls CreateProcess(), so it waits until the command exits. ShellExecute is a relatively new function that analyzes its input to determine how to handle it, whereas system() is defined to always use the system shell to execute the given command — they are not interchangeable.
duggie, old versions of DOS did redirection the same way UNIX does (i.e. via changing the location of file descriptors 0, 1, and 2). However DOS could not multitask, so piping was handled by redirecting the output of the first program to a temp file, then redirecting that file to stdin on the next program once the first had exited.
Dave Harris, it only seems bizarre because DOS (and Windows) have different ideas of which characters are special. On Unix, the * and ? characters are special and must be interpreted by the shell. On Windows those characters aren’t special and are interpreted by the individual program. This allows programs to process wildcards in ways that make sense. If a wildcard refers to a files in a directory, the standard file enumeration functions will handle it the same for every program. Only very naive programs (those that take an unordered list of filenames) can really be helped by shell-processed wildcards. For example "ren *.bat *.cmd" doesn’t make sense when interpreted by the shell. It also is nice to have "dir /s *.exe" work just as easily as "dir *.exe".
On UNIX, the command "rm *.txt" will fail on a directory with a million text files because the shell will spend lots of time and memory reading in the whole directory to find the text files, sorting it, and then trying to pass it to "rm", only to have it fail on exec() because you can’t pass a million parameters to a process. To do this properly you either need to write a custom program or do something like "find . -name ‘*.txt’ -print0 | xargs -N rm". On Windows, "del *.txt" will work no matter what.
On sufficiently old Unix systems, although the shell interpreted wildcards, memory space was too cramped to put them all in the command line. Of course there was no such thing as a million filenames, but even with one filename the result wouldn’t be put in the command line. The shell created a temporary file and told the executable program to read that file to get the list of filenames.
On the other hand, on Windows, "del *.txt" also sometimes doesn’t work, though of course the reasons are different.
Dave Harris and I.P. Overscsi:
I think that not expanding those characters is also a marginal safety feature. It allows commands like DEL to check for total erasure (and perhaps, to optimize).
Dave Harris and I.P. Overscsi:
I think that not expanding those characters is also a marginal safety feature. It allows commands like DEL to check for total erasure (and perhaps, to optimize).
gabe, oh and by the way:
1. dir /s is innappropriate as an example because the recursion is implemented by cmd.exe not "standard file enumeration functions". Otherwise, I should be able to do ren /s, right ?
2. I seriously doubt that NTFS implements the wildcards, which means FindFirstFile is handling the del "*.txt" in your example. Which in turn says to me that Windows will have the same problem that you claim unix has handling a 1-million file directory (running out of memory and what not).
Gabe: Actually Unix does not treat * and ? as special. They are interpreted by the user’s shell. That is why shells like zsh can extend wildcards beyond the basic * and ?.
While the Windows behavior may be nice for certain things (like your ren example), it makes extending the wildcard set almost impossible, because you have to worry about backward compatibility.
Leaving wildcard expansions to the shell allows them to implement whatever new shorthand they want (** in zsh for subdirectory recursion, for example).
We can debate endlessly as to which is better, but I find that the model of keeping the OS simple more attractive.
Huh, it is very interesting to read these comments. I’ve often wondered why the Windows cmd shell works the way it does.
In fact, at my first job as a tester, one of the bugs we encountered was "wildcards work on Unix but not Windows." ;)
Personally, I much prefer the Unix way of wildcard processing. It’s predictable within the shell, and I always use the same shell. Even if I’m on a Solaris machine, I can still use bash. I know what "find" will do, I know what "grep" will do, and I know what "ls" and "rm" will do. With respect to running out of memory expanding filenames, there is nothing wrong with dumping the names to a file, processing the file, etc. Plus, at a past job we split up directories with that many files anyway because of the performance hit we took (on Windows Server).
But that’s my personal preference.
The performance difference between the native Windows shell cmds and Cygwin implementations of their Unix counterparts is very different, as Gabe points out. I find it best to use a mix of "dir /s", "findstr" and "grep." :)
For running the cmd line…
/c leaves the window open when the process is done.
/k closes the window when the process is done.
Useful sometimes when launching a dos basesd process from your application.
Having the shell expand command line args means expansion is done by one program – the shell – rather than by every command the user may want to use. The user learns one syntax and it works for most programs that take filename args.
If wildcards needs to be given explicitly, there are well-defined ways to escape those chars – namely, the backslash.
find / -name *bak …
Of course there are drawbacks. "ren *.bat *.cmd" is one simple operation that requires a script in UNIX.
What if you have a web server that produced daily logs and you want to move 2005 Q1 logs to another directory?
mv all/logs.20050[123]* Q1
Needs three commands in Windows – one for each month.
What I don’t understand is why Windows demands every program parse their command line. When I invoke a program, the args I will pass are distinct pieces of data. A filename is ONE ARG even if it has spaces. Forcing the args into one string loses the distinctiveness of each arg and requires the callee to re-parse and seperate the pieces.
Apparently, NOT EVEYONE KNOWS HOW TO PARSE A COMMAND LINE!
The whole business of quoting %1 could have been avoided if the unix style argv[] is used. After all, Explorer knows that the filename the user clicked on is an indivisible datum.
I should point out that my Unix version of "del *.txt" was actually "del /s *.txt" because I forgot to pass "-maxdepth 1" to find.
Norman, if old shells put filename lists into files for the commands to read, every single command would have to know how to get its parameters from a file and xargs would never need to have been written.
64bitter, my point was not that recursion should be handled by the shell, but that it should be handled along with wildcards by the application. Since the shell doesn’t know about the recursion, it doesn’t make sense for it to handle wildcard expansion. That means that "dir /s *.txt" works just as well as "dir *.txt", but "ls -lR *.txt" doesn’t work nearly as well as "ls -l *.txt", and "ls -l **/*.txt" probably doesn’t do what you want either. And if for some reason your directory has a file called "-rf" in it, "rm *" almost certainly doesn’t do what you want, while "del *" almost certainly does. Likewise, rm doesn’t need wildcard processing because the shell already does it, but that means that "rm *.txt" works and "rm -r" works, but "rm -r *.txt" doesn’t work (and neither does "rm -r *.txt" as it would with ls).
As for "keeping the OS simple", it’s hard to tell which standpoint is better. In Unix, every shell has to have its own (possibly incompatible) wildcard implementation and since that wildcard handling is fairly naive, each program that wants to handle wildcards or recursion has to have its own (possibly incompatible) implementation. Whenever you run into a situation with too many files to fit on a command line, you’re stuck running find or find|xargs, so you don’t even get your shell’s wildcards anymore. In Windows, only the filesystem layer has to implement wildcards. This makes Windows simpler, right?
NTFS implements wildcards, but the processing is done by an OS library routine (FsRtlIsNameInExpression). Just to try it out, I wrote a Perl script to make a directory with 100,000 .exe files and 100,000 .txt files. It took about 2.5 minutes to run on an NTFS volume. I ran "dir /o > nul" (to sort the directory) and it took about 12 seconds and used 30MB of memory. Then I ran "del *.txt", and the command processor took about 1 minute and never used more than 1500kB of memory.
If you ran that test on Unix you wouldn’t run out of memory (who doesn’t have 32MB laying around?), but you would hit ARG_MAX (128k in Linux) and execve() would return E2BIG.
For those of you who are more interested in the gritty details of command line redirection, RaymondC…
Yes, it is annoying when you want to use wildcards and the Windows program you are running doesn’t accept them. Of course there’s nothing stopping you from just running a wildcard-expanding shell (either a native POSIX version or a Win32 version like cygwin). If you’re compiling it yourself with VC, you could even link in setargv.obj (as described in http://msdn2.microsoft.com/en-us/library/8bch7bkk.aspx) and you get wildcards expanded automatically.
If you’re really annoyed by some program not getting wildcards expanded, you can write your own program to handle it via ImageFileExecutionOptions. Raymond may hate you if you do this.
Anyway, having wildcards in the shell is a nice feature, but it still doesn’t prevent programs from having to implement wildcards (like ls, find, ftp, and unzip). If the shell expansion isn’t good enough and the program you want to use doesn’t implement them, you’re stuck with running it through for or find anyway.
If your directory has a file named -rf in it, then that’s when you would use "rm — *". That is, after all, the whole point of the "–" argument.
As far as "only the filesystem layer has to implement wildcards" — yes, but if only every program would *use* that layer. Use of the pattern matching function(s) in NTFS is sadly inconsistent between programs — too many of them expect each file to be given to them, one at a time, and don’t use FindFirstFile/FindNextFile at all. (And this isn’t just programs ported from Unix, where they wouldn’t have to worry about it; several programs provided as part of a "bare" win2k install do it as well. Unfortunately I can’t remember which ones they are at the moment.)
we should take the further debate out of raymond’s blog. I am sure no one else cares about it :)
I have no quarrel over remote queries like the one you mention, however, the unix way to do it would probably (I’m not a unix expert by any means)be to run rsh ls *.txt at the remote and collect the output.
Remember, Unix was networked when networks were even slower :-)
To me, doing it in the driver is optimizing for the pathological case at the cost of future extensiblity (which you haven’t addressed).
BTW, if I add a new FSD, does it also have to implement the wildcard expansion ? I would hope not. It’s been a while since I looked at the IFS docs, but I don’t remember them saying anything about that.
running "rsh ls *.txt" seems like a strange way to solve the problem. An operation that used to require only directory read privileges now needs remote logon privileges?
But "remote logon privileges" may be required anyway, depending on how the directory tree was exported to the client.
(NFS? Yes, that would need extra privileges to do it on the server. SMB? Depends; the client would probably need new privileges, but possibly not, depending on the way the SMB "handlers" on both ends were done. SCP/SFTP? No new privileges are required for that. Granted, exposing a remote directory tree via only scp/sftp is probably pathological…)
Wednesday, May 17, 2006 5:56 AM by Gabe
> Norman, if old shells put filename lists
> into files for the commands to read, every
> single command would have to know how to get
> its parameters from a file
In sufficiently old versions of Unix, that was true, exactly as I stated.
> and xargs would never need to have been
> written.
In sufficiently old versions of Unix, that was true, exactly as you stated.
Wednesday, May 17, 2006 7:41 AM by Anonymous Coward
> Personally, I much prefer the Unix way of
> wildcard processing. It’s predictable within
> the shell,
and it doesn’t change until you open up an editor and start typing search expressions or commands.
Gabe: Good points, but my gripe is exactly with the filesystem having to implement wildcards. To me, that’s a more complex OS, not simpler. It also means that the Windows wildcard expansion is sadly limited. * and ? do not make for much power beyond simple groupings.
For example, can I do "ls [A-C]*.txt" with dir ?
As far as your argument that the shell should not handle wildcards, I disagree. As I was trying to point out, dir /s is fine, but how do you handle ren /s without re-writing cmd.exe ? What about some other arbitrary command that needs to operate on a set of files ?
Wouldn’t it be better if no command ever had to worry about whether it should recurse or not or how it should parse wildcards (or not parse them) ?
If I want to add ** (for arguments sake) to windows, I have to update an FSD and/or ntdll/kernel32 instead of a shell.
How often does a normal user run into the pathological case of a > 128k command line ? (which btw is only 32k on Windows :) Is it really worth optimizing for that case ? They can always write scripts to get around pathological cases.
In my experience Unix shells have been consistent on the basic regex language they support.
ls -l **/*.txt does exactly what dir /s *.txt does.
And the nice thing about that is that any program that runs under zsh would have that pattern expanded for it. None of them would have to learn new patterns as they were developed. For example, ** was developed just to overcome the recursion problem. It required no changes to ls, cp, mv or any other existing tools.
I can understand both approaches, and I think they each fill a need.
The best part, of course, is that you can use Unix shells on Windows and get most of the Unix behaviors you want. If you use native ports instead of cygwin, relative speed is not an issue generally.
So the debate about which is better is really a non-issue. They both work on Windows, so the lesson is to use Windows :-)
64bitter, the reason that filesystems *should* implement wildcards is to put the processing closest to the data. Running "dir 64bit*" will take the same amount of time running on an NTFS directory of 10 files as 1000000 files. Running "ls 64bit*" will require 100000x more time to process, even if the filesystem happens to have directories indexed by name.
This is even more pronounced over a network. I would rather be able to ask a fileserver "which files start with ‘.txt’?" than to have to send every single filename over the network just to figure out which ones end with txt. Remember, networks can be slow and directories can be large.
It’s essentially the same difference as using SQL Server vs. accessing an MDB file sitting on a server.
And "dir /s *.txt" is the equivalent of "ls -lR *.txt" because when you run "ls -l **/*.txt" ls doesn’t have the ability to show listings per-directory.
<i>Granted, exposing a remote directory tree via only scp/sftp is probably pathological…</i>
Not quite that pathological: the Linux user-mode-filesystem driver interface <a href="http://freshmeat.net/projects/lufs/">lufs</a> includes an implementation of ‘sshfs’ which is essentially the same as mounting a remote filesystem while only having ‘scp’ access. This is deployed in live end-user systems; access to this filesystem driver is one of the most common reasons for people to install the package, I understand.
Sorry Norman, but I have a hard time believing that. Do you have any evidence? The V3 man page for sh says "…a list of names is obtained which match the argument…and the resulting sequence of arguments replaces the single argument…and finally the command is called with the resulting list of arguments." This means that as early as Feb 1973 the standard Unix shell was inserting the results directly into the command line.
64bitter, you’re not too far off the mark. zsh actually does some pretty heinous stuff to perform its tab-completion.
Any new filesystem would have to implement wildcards, but only to the extent of calling the system-provided functions. Feel free to start your own blog entry about it if you wish.
Thursday, May 18, 2006 4:54 PM by Gabe
> Do you have any evidence?
Mine dated from 1976 rather than 1973. Sorry I didn’t bring it to Japan and can’t quote from it directly.
I do thank you for quoting from the 1973 version.
Still, think about the memory problems they would have run into in those days. It’s not surprising that they might have experimented with workarounds and then reverted an experiment like this.
It is quite instructive to see how systems have evolved. For example, back then (1973) piping used the same syntax as file redirection. You might see a command like this:
ls >"pr -h ‘My directory’">
Note that the quotes were necessary in order to tell that the next tokens belong to pr instead of ls. This caused them to rethink things, and they decided on the | character we all know and love now (or ^ for those who are ASCII impaired).
Friday, May 19, 2006 1:03 AM by Gabe
> and they decided on the | character we all
> know and love now (or ^ for those who are
> ASCII impaired).
Yeah that brings back a memory. I saw someone use ^ in a command line, asked what it was, received a correct answer, and was puzzled. This was in an environment where all the terminals were ASCII, so even if I fully understood the answer then I still might have been puzzled.
In some non-ASCII environments there could be a different answer. Typing | requires pressing a Shift key, but ^ is a shiftless character. A command line typist could obtain a performance improvement 5% at a time ^_^