Why do some process stay in Task Manager after they’ve been killed?

Date:July 23, 2004 / year-entry #287
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20040723-00/?p=38363
Comments:    35
Summary:When a process ends (either of natural causes or due to something harsher like TerminateProcess), the user-mode part of the process is thrown away. But the kernel-mode part can't go away until all drivers are finished with the thread, too. For example, if a thread was in the middle of an I/O operation, the kernel...

When a process ends (either of natural causes or due to something harsher like TerminateProcess), the user-mode part of the process is thrown away. But the kernel-mode part can't go away until all drivers are finished with the thread, too.

For example, if a thread was in the middle of an I/O operation, the kernel signals to the driver responsible for the I/O that the operation should be cancelled. If the driver is well-behaved, it cleans up the bookkeeping for the incomplete I/O and releases the thread.

If the driver is not as well-behaved (or if the hardware that the driver is managing is acting up), it may take a long time for it to clean up the incomplete I/O. During that time, the driver holds that thread (and therefore the process that the thread belongs to) hostage.

(This is a simplification of what actually goes on. Commenter Skywing gave a more precise explanation, for those who like more precise explanations.)

If you think your problem is a wedged driver, you can drop into the kernel debugger, find the process that is stuck and look at its threads to see why they aren't exiting. You can use the !irp debugger command to view any pending IRPs to see what device is not completing.

After all the drivers have acknowledged the death of the process, the "meat" of the process finally goes away. All that remains is the "process object", which lingers until all handles to the process and all the threads in the process have been closed. (You did remember to CloseHandle the handles returned in the PROCESS_INFORMATION structure that you passed to the CreateProcess function, didn't you?)

In other words, if a process hangs around after you've terminated it, it's really dead, but its remnants will remain in the system until all drivers have cleaned up their process bookkeeping, and all open handles to the process have been closed.


Comments (35)
  1. Jack Mathews says:

    Man, I really wish that CreateProcess had flags for if you want to fill out PROCESS_INFORMATION (starting with NOT filling them out by default)

    About two weeks ago I looked through our codebase and found so many instances of it being done wrong (not CloseHandling threads and processes). Then I looked a popular competing compiler and noticed it had thousands of handles open. Sure enough, they were zombie threads from spawning the command line compiler.

  2. This is one of my biggest complaints about Windows – tasks that I cannot kill.

    This problem doesn’t seem to exist on the various Unix architectures – how is the interaction between processes and drivers different on Unix such that it’s possible to "always" kill a process there, compared to how it is on Windows?

    Thanks

    – Steve

  3. Actually come to think of it what I run into more commonly are tasks that I cannot kill because I’m apparently not allowed to. Or tasks whose priority I’m not allowed to change.

    For example, EverQuest is a CPU hog, so I’d like to lower it’s priority. Task manager says access is denied. I’m logged in with Admin rights. Why does this happen?

  4. Michael Hoffman says:

    Run the command "at TIME /interactive taskmgr" where TIME is one minute from now. At that time a Task Manager with SYSTEM privileges instead of your privileges will run. Then you can do all sorts of ill-advised things like change the priority of CSRSS.

  5. DrPizza says:

    "This problem doesn’t seem to exist on the various Unix architectures – how is the interaction between processes and drivers different on Unix such that it’s possible to "always" kill a process there, compared to how it is on Windows? "

    On all the *nix OSes I use (Linux, Solaris, AIX), there’s the same issue; you can’t kill something if it’s inside the kernel. I would imagine for the same reason.

    In either case it happens extremely rarely.

  6. Cooney says:

    This problem doesn’t seem to exist on the various Unix architectures

    You still have zombie processes on unix – they show up as status Z on top. I wonder why taskman doesn’t offer that sort of info.

  7. James Day says:

    Start Everquest with a batch file like this:

    start /belownormal orion95.exe

    Change the priority to whatever you desire. Assuming Everquest doesn’t itself adjust priority, other tasks will get more time.

  8. Merle says:

    If you can’t kill them, try Process Explorer (linked above as my "name"). It can kill unkillable things.

    Taskman is just too protective. Then again, the average user has access to taskman, and you don’t want the average user to kill off things like winlogon…

    I would guess that it would also indicate these processes Raymond is mentioning. It gives you pretty good details about tasks. What I like best (over taskman) is that you can see command-line arguments. Very convenient, esp if you have a lot of "javaw" processes floating around — which is Eclipse, and which is my job?

  9. Skywing says:

    Actually, you could kill those types of "unkillable" things that you "need" to run taskmgr as LocalSystem to kill while logged on as an administrator if you did a bit of work.

    The reason why administrators seemingly can’t kill some processes but LocalSystem can is because of the process security descriptor.

    To circumvent this, you need only take ownership of the process object and then modify the security descriptor to grant yourself PROCESS_TERMINATE access.

    Other things that will prevent a process from getting terminated include an attached debugger (kill the debugger process first). In the debugger case, though, you will probably get a Win32 error of ERROR_ACCESS_DENIED returned from TerminateProcess (as opposed to the "stuck in kernel mode due to buggy driver" case, where TerminateProcess succeeds and yet appears to do nothing).

  10. Alex Feinman says:

    Most recently I came accross "unkillable" process, while debugging my own video server application. The moment it activates Windows Media 9 writer object, it is impossible to kill – no way, even if started under VStudio debugger. If it crashed or had to be abnormally terminated, the process will stay in memory until the machine is rebooted. Only if WMWriter has been stopped properly the process will exit.

  11. This problem doesn’t seem to exist on the various Unix architectures

    This is wrong, the exact same problem exists in all Unix architectures.

    The processes in the ‘Z’ (zombie) state are not really unkillable; in fact, they have already been killed. They are waiting for the parent process to collect its return value, and have already freed almost all of its resources (the only remaining resource it has is the structure which tracks its state in the kernel — much like the "process object" Raymond mentioned.) They waste almost no memory, and when the parent dies, they are "reparented" (if the grandparent already died, the init process becomes the parent, and it will make the zombie go away).

    The true unkillable processes are the ones in the ‘D’ state (uninterruptible sleep). If you try to kill one (or send any other signal), the kernel will wait until it comes out of that state before killing it (or letting it handle the signal). A broken driver or hung hardware (exactly the causes Raymond mentioned) or a bug in the kernel (much rarer, I believe it’s also a possible cause on Windows, and also much rarer there) can make a process unkillable, either waiting in uninterruptible sleep for something that will never happen, or (less common) stuck in a loop in the kernel (the latter case won’t show as ‘D’, but as an unkillable running process using 100% CPU).

    What happens is that (again, the same as Raymond mentioned Windows does) a process cannot be killed while in the kernel; whatever code the kernel is running will either receive a request to interrupt what it’s doing (if it’s in interruptible sleep) or will finish what it’s doing without even noticing (if it’s not waiting for anything; for instance, if you just asked it what time it is). If neither happen, the process can’t be killed.

    A special case is the init process (the process with PID 1); it ignores the KILL signal (which cannot be ignored by any other process), and the kernel will panic (the Unix equivalent of a BSOD) if it exits (or is killed).

    As an aside, I remember hearing that you could make notepad.exe impossible to kill via taskman by simply renaming it to winlogon.exe; is that still true, or did taskman.exe get smarter?

  12. Anonymous Coward says:

    The canonical example that happens on UNIX is when processes are doing i/o to an NFS server and something happens to the network. With defaults of hard mounting, and/or 10,000 retries no amount of kill -9 gets the processes back.

    Most UNIX code is fairly bad at checking for filesystem i/o failures (when was the last time you saw someone checking the result of close?) hence the default of hard mounting.

    Mind you much of the Windows code I have looked at isn’t much better. Older code was because in the days of floppy disks and small hard drives, disk full and i/o errors were more likely to happen.

  13. So, if I get one of these unkillable processes, I shouldn’t worry, because it’s <PRINCESS BRIDE>mostly dead</PRINCESS BRIDE>, right? Of course, only the truly obsessive want it gone out of their process list just because it bothers them.

    Except…

    There’s this feature that recycle programs. If I try to launch Finale twice, it detects one instance is running, and raises the existing window. Except, when it’s mostly dead, it won’t let me launch a new one.

    So when one Finale goes mostly dead, I’m out of luck. I can’t restart it.

    So how do I work around that? Ending the process via task manager gives me "permission denied", and a command-line kill and kill -f don’t work either. Even if I run the shell as Admin.

    I’ve had to reboot many times just to clear out the "mostly dead" process, because I really needed it to be "all dead".

  14. Ben Cooke says:

    Cesar,

    If you try to kill any process named winlogon.exe, Windows 2000 task manager displays a message box explaining that it’s a critical system process and that it cannot be killed.

    I don’t have a Windows XP system handy to test there.

  15. Raymond Chen says:

    I already described how to debug this problem. Use the kernel debugger to determine which driver has taken the thread hostage.

  16. Almost Anonymous says:

    Remember folks, name your viruses winlogin.exe!!

  17. Almost.. says:

    yes.. winlogin.exe

  18. qwerty says:

    Yup, I’ve got WinXP Pro SP1 and it also has a hardcoded check for "winlogon.exe". LOL

  19. I don’t really mind the hung process – but how do I tell it not try to recycling a hung process, so I can start my app without rebooting or running a debugger?

    At work, I debug programs. At home, I just want to use Finale to print up some music. I can’t debug it, or Windows for that matter.

    Another example, exiting Netscape will go off in the background and do "some stuff". If I decide I need to start Netscape between the time it closes the window and it’s done grinding away at the disk, I can’t. I click on the icon to launch, and it finds netscape.exe still running, and does nothing else.

    How do I stop programs from recylcing themselves? (I’m smart enough not to launch two apps at once, if I didn’t want to.)

  20. Raymond Chen says:

    That’s how the program was written; you’ll have to take it up with each program.

    Windows comes with a debugger (ntsd) that I often use to connect to a wedged program to figure out why it is stuck.

  21. Nicholas Allen says:

    Such is the tyranny of DDE. You can set the MOZ_NO_REMOTE environment variable to 1 and your browser won’t be bothered to check for already running instances. But running multiple instances out of the same profile is a not-at-all not-even-a-little totally unsupported scenario.

  22. Keith Moore [exmsft] says:

    (Sorry for jumping into this thread a little late…)

    Skywing wrote "Terminating a process or thread is actually implemented as a kernel mode APC, and most drivers disable kernel mode APCs while doing processing, delaying the termination request until kernel APCs are re-enableed."

    A more likely reason for a hung/unkillable process is that it has made an I/O request to a driver that a) pended the IRP, and b) didn’t make it cancellable first. NT’s IO manager adds a reference to the thread object that initiates the IRP, and this reference is not removed until the IRP either completes or is cancelled.

    My Toshiba laptop has a process 00HOTKEY.EXE that responds to the goofy special-purpose buttons littering the system case. If you try to kill this process, it won’t actually go away because it’s *always* waiting for a non-cancellable IRP to complete. In those rare cases where I really want to kill it, I "kill -f <pid>", then press one of the goofy buttons. The button press causes the pended IRP to complete, which dereferences the thread object, which dereferences the process object, which allows all of the expected cleanup to occur.

  23. Petr Kadlec says:

    Ad recycling hung processes: Some time ago, I have been solving some problems with my WLAN. As a part of the SW for the adapter, there is a GUI applet to check state of the connection, set WEP keys etc. If it hangs and you kill it, it won’t allow another instance to run. So I debugged it, found that it uses a global atom to enforce that restriction, and wrote a simple tool that just deletes that atom…

  24. Petr Kadlec says:

    :-o Sorry, but I am reasonably sure I hit Submit only once (and no refresh, mind you) !

  25. Qbeuek says:

    "If the driver is not as well-behaved (or if the hardware that the driver is managing is acting up), it may take a long time for it to clean up the incomplete I/O. During that time, the driver holds that thread (and therefore the process that the thread belongs to) hostage."

    I guess the built-in NetBios driver isn’t well-behaved – when trying to access a share on a turned-off machine (or through a slow link), the process becomes unkillable :-(

  26. rhino-x says:

    That’s because the redirector filesystem driver is a giant black hole that sucks things in and refuses to let go.

  27. Ilya Birman says:

    My ATI TV Player that came on a CD with graphics card driver sometimes hangs when recording something with Digital VCR. The TV Player GUI hangs completely and you cannot kill it no way. When you kill it in taskman, it reports no error. You just press "yes, i’m really sure i want to kill it" and the confirm dialog closes. The process remains in the list. The WINDOW remains on the screen. It’s very funny if it’s in always on top mode, because since this moment the only thing you can do to be able to work with your system is reboot. This is not about hung drivers, because, as I have already said, the WINDOW is still always-on-top!

    So what can I do with it? Is there a magic "kill -something" to kill it?

  28. Norman Diamond says:

    7/28/2004 9:53 AM Ilya Birman

    > The WINDOW remains on the screen.

    Can you minimize the window? In my experience the Windows GUI can force some hung application windows to minimize even without the application’s consent, though with others it doesn’t. One of Mr. Chen’s postings a few months ago touched on this.

  29. Ander says:

    Scott Tringali> If I try to launch Finale twice, it detects one instance is running, and raises the existing window…

    Why would you want to run Finale even once? Haven’t you tried Sibelius?

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index