Date: | March 5, 2004 / year-entry #85 |
Tags: | history |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20040305-00/?p=40373 |
Comments: | 81 |
Summary: | In a comment to one of my earlier entries, someone mentioned a driver that bluescreened under normal conditions, but once you enabled the Driver Verifier (to try to catch the driver doing whatever bad thing it was doing), the problem went away. Another commenter bemoaned that WHQL certification didn't seem to improve the quality of... |
In a comment to one of my earlier entries, someone mentioned a driver that bluescreened under normal conditions, but once you enabled the Driver Verifier (to try to catch the driver doing whatever bad thing it was doing), the problem went away. Another commenter bemoaned that WHQL certification didn't seem to improve the quality of the drivers. Video drivers will do anything to outdo their competition. Everybody knows that they cheat benchmarks, for example. I remember one driver that ran the DirectX "3D Tunnel" demonstration program extremely fast, demonstrating how totally awesome their video card is. Except that if you renamed TUNNEL.EXE to FUNNEL.EXE, it ran slow again. There was another one that checked if you were printing a specific string used by a popular benchmark program. If so, then it only drew the string a quarter of the time and merely returned without doing anything the other three quarters of the time. Bingo! Their benchmark numbers just quadrupled. Anyway, similar shenanigans are not unheard of when submitting a driver to WHQL for certification. Some unscrupulous drivers will detect that they are being run by WHQL and disable various features so they pass certification. Of course, they also run dog slow in the WHQL lab, but that's okay, because WHQL is interested in whether the driver contains any bugs, not whether the driver has the fastest triangle fill rate in the industry. The most common cheat I've seen is drivers which check for a secret "Enable Dubious Optimizations" switch in the registry or some other place external to the driver itself. They take the driver and put it in an installer which does not turn the switch on and submit it to WHQL. When WHQL runs the driver through all its tests, the driver is running in "safe but slow" mode and passes certification with flying colors. The vendor then takes that driver (now with the WHQL stamp of approval) and puts it inside an installer that enables the secret "Enable Dubious Optimizations" switch. Now the driver sees the switch enabled and performs all sorts of dubious optimizations, none of which were tested by WHQL. |
Comments (81)
Comments are closed. |
Ferris, how do you test a negative? The registry key ISN’T there. How could the WHQL people possibly know that turning this magic registry flag on would cause the driver to misbehave?
The manufacturer sent Microsoft the driver. We tested it. And it passed. So we signed the binary and gave it back to them. Then they changed the external conditions and effectively change the driver binary without changing the code – they enabled new codepaths that Microsoft wasn’t able to test.
There’s not much that Microsoft could do in that situation.
And I disagree about the "WHQL means jack all" comment – WHQL actually means a lot – at a minimum, it means that the user doesn’t get the "This driver wasn’t tested by Microsoft, if you install it, it’s gonna crash your system." dialog that comes up on non WHQL signed binaries.
That’s why nVidia goes through all the trouble to sign their drivers – it makes customers feel better.
["Ferris Beuller"s messages have been removed for spamming. You may see responses to "his" messages without context. It’s too bad because some of the comments were worthwhile.]
"Ferris Beuller" asks (via a deleted message):
"Spamming, they where legitimate issues with WHQL, if you cannot put them into a single post, you are an asshat. Is it that hard to copy and paste into a single post."
My response to "Ferris":
If it’s so easy to copy and paste into a single post, then go ahead and do it yourself. I am not your editor.
When I was at 3dfx, I remember Microsoft using other parts of a contracts to enforce WHQL compliance? Such as wanting to without our OpenGL ICQ and such. Aren’t there any kinds of damages or fraud protection to be put into WHQL agreements to stop this behavior?
1) So called WHQL Certified drivers are NOT always on WindowsUpdate and cannot always be installed via the MMC Device Manager snapin. We are forced to install via custom .EXE that stomps over our system (DirectX- lucky this fails on lesser version install attemps) or installs AOL type adware. Logitech, Creative Labs and Nvidia are prime candidates that do this and also why I blacklist these companies as they are not pure driver installs and always not installable via windows update or the MMC (Creative Labs are ignorant at this, speak to theyre dublin call center and see what I mean).
2) You can test registry keys via Sysinternals Regmon, it will show FAILED read attempts in the registry.
3) Full disclosure should be a prerequisite for WHQL certification if bug finding is the main reason for certification, you can then run static analyzers over this code. No source, no certification, if they want certification so bad they will do this.
4) Full retail installs should be part of the process, you test what they ship. Most people dont care or understand what a certified driver is, they run windows update.
Why not make the WHQL certification process stricter. If the driver accesses the registry or files the manufacturer could be asked to explain what the driver is looking for. This would involve more work on both Microsoft’s and the manufacturers part but wouldn’t the end result run more reliably?
WTF are you doing, I just put them into a single fuckin post.
Talk about ignorance.
I’m not sure of the specifics because I don’t work on WHQL (although I did work in the DirectX team at one time), but I think we (MS) have to be quite careful, because we’re sensitive to the charge that we might be playing favorites in a way that could get us into legal trouble.
So, in general, it’s a lot safer (legally) for us if everything is wrapped up in a well-defined suite of tests that’s either pass/fail, with not much room for human judgement. That way we can say that the test specifications are the same for everyone, and we’re not playing favorites. Of course, it does open the door to hardware vendors finding ways to sneak around the spirit of the tests while still conforming to the letter.
I can almost guarantee that if we started using our judgement to attempt to yank the WHQL certification from drivers/setups that we ‘know’ are bad even though they passed the tests, then we’d get all sorts of whining and possibly legal threats from IHVs.
"Ferris": Your account was still blocked when you posted the "consolidated" message. I’ve unblocked your account, but if you act up again, I will reinstate the block (and retroactively delete all your previous comments).
Ferris,
2) Then they’ll find some other sneaky way to look for the special flags. There might also be legitimate reasons to look for some missing registry keys. At the end of the day, if they really want to mislead the tests, they will.
3) Never going to fly. Hardware vendors have a lot of sensitive IP tied up in their drivers. I can’t imagine them being happy to give us complete source code.
4) That’s a nice idea, but then hardware vendors would complain that we were hurting their ability to change their shipping bundles, which typically come with control panel applets and all sorts of other goop apart from the driver itself. They do special bundles for OEMs all the time. We’d either be swamped with all the various configurations of installation bundles, or we’d have angry hardware vendors claiming that we were stifling their ability to distribute things the way they wanted to.
IP in the drivers, isnt that what patents are for, and copyright and NDAs?
Its not like theyre giving theyre source to the consumers.
Ok well I want to enforce a standard MMC and windows update install option.
Consider: Your company’s entire livelihood is based on this success of this driver. Are you going to give its complete source code to Microsoft for scrutiny? Even if it’s patented and copyrighted and NDA’d up the wazoo? (Note that you can’t NDA a patent. A patent is public information.) Are you going to take the chance that the source code might get Half-Life 2’d or Windows 2000’d?
And even if vendors were were willing to do that, that doesn’t solve the problem. All they have to do is find a way to fake out the static analyzers. "Oh, Microsoft looks for registry access? Fine, I’ll put my flag in a named global object instead. Oh Microsoft is now looking for named global objects? I’ll create a custom ioctl."
performance isnt part of the WHQL but stability is, however if they are deliberately avoiding you testing those codepaths, then they’re WHQL certificate should be revoked. Doesnt mean anything to the consumer anyway.
You can enforce install methods for drivers (MMC and windowsupdate, it MUST be on Windowsupdate and hence avaialble via the MMC snapin to even get considered for WHQL) yet you dont enforce this, we the user gets assreamed by the lack of MMC install support yet thats what its there for, whatever happened to this so called driver model.
Show me a company based on drivers and drivers alone, they sell hardware and licenses for the reference platforms.
How do you automate "deliberately avoiding you testing those codepaths"? And suppose you manually discover a secret switch and retroactively revoke their certification. Then they complain, "Hey, Microsoft is picking on me! No fair!"
You’re saying you’ve never said, "No, don’t buy that video card, their drivers suck."
I dont buy NV anymore because theyre performance sucks on the hardware. Theyre drivers are good or where until 50.xx series but still my decisions on hardware are hardware capabilities, I never see the drivers when I buy hardware for obvious reasons.
I do know I dont buy Creative Labs because of theyre install practices and theyre hardware is a bus hog.
You must excuse the crowd – videocards are a particularily sensitive topic among hardware enthusiasts. I wasn’t too thrilled to learn about so-called "optimizations" that a certain IHV was carrying out last year, which is probably why I currently own a Radeon 9800 Pro (but I digress).
You just wait Raymond, once the folks at Rage3D, nVNews and Beyond3D find this blog post, you’ll have a full-scale party (or war) on your hands…
Again I have to cry foul on some of this WHQL stuff as Microsoft played dirty ball with 3dfx.
Like, let’s see, they pressured us for OpenGL ICD source code and make us write a 2D device driver for NT to certify Voodoo2. Of course, 3dfx probably signed an idiotic contract.
Either way, Microsoft could easily take measures to stop this WHQL madness, but I’m pretty sure they clearly don’t care all that much about it.
>>You just wait Raymond, once the folks at Rage3D, nVNews and Beyond3D find this blog post
Yeah, never let the facts get in the way of a good M$-bashing session. ;)
I just want installable drivers nothing more nothing less, available via windowsupdate and installable via the MMC how hard is that? Isnt that why we have that MMC snapin and windwos update? At the very least make them submit the final WHQL driver to windowsupdate, they can package theyre shitty CD however they want, we can always bag the latest after an install or even do it as a company policy via MS SMS.
Jack,
I’m not sure what measures you think MS could "easily take to stop this WHQL madness". Trust me, fixing driver quality is pretty high up the priority list around here, because when things go wrong users tend to blame Microsoft at least as often as the driver vendor. If it were easy to fix, we’d do it in a heartbeat.
The solution here is not really to bash WHQL. I think this thread fairly establishes that WHQL does all it can politically and legally, if not technically.
There is a company that makes very popular video cards. I hate this company. I hate their inability to write drivers that do not crash. I hate their incessent attempts to install things in my system tray. I hate their consistent refusal to release updated drivers for older cards when new OS releases require them. I hate their buggy apps they ship with the cards to allow you to use special features of the cards.
And I let them know every chance I get. I refuse to buy their cards, and I send them an e-mail when I buy a competitors card, just to let them know.
Ferris: Well perhaps that’s what you want, just because you want somebody to do something doesn’t mean that they have to do it. Maybe the driver writer wants to sell the new driver in a "premium edition" or something.
Why bother having Windows update then for 3rd party WHQL drivers in the first place? Isnt that whats its there for, why not mandata that all as part of the WHQL certificate the driver is available on windows update, why the hell not?
Ferris,
Because then Microsoft would get all sorts of complaints about how we’re forcing people to go through our distribution channel. Bad Microsoft, abuse of power, blah blah blah.
Tony, you wouldn’t be forcing anybody to do anything. Companies are not required to get WHQL certification, their crappy drivers can be installed and used without it. If they don’t want to comply with the rules they will not be certified, how difficult is that?
By allowing companies cheat WHQL you (as in Microsoft) are making your own certifications worthless. Nobody cares about the certification if you willingly allow drivers that are not up to par to be certified.
And as far as I remember, using MSI is a requirement for Designed for Windows certification, why exactly can’t you make similar requirement for WHQL certification?
You can’t stop people from cheating. You can try to catch them, but they will just look for new ways of cheating that are harder to detect.
See Tony’s answer to question 4 for why IHVs want to be able to repackage their drivers. Now you might say, "Well Microsoft should tell those IHVs to screw off." And then you wonder why people hate Microsoft. "Microsoft is all so selfish, you have to do things their way or they tell you screw off."
The difference between "Designed for Windows" and WHQL is that "Designed for Windows" is a marketing program that doesn’t really have much of a ‘stick’ inside Windows itself. Whereas if you try to install a non-WHQL driver, Windows will pop up all sorts of dire warnings in an attempt to discourage you from doing that. So for hardware vendors, WHQL is somewhat less optional than "Designed for Windows", and hence they have more cause for complaint if the terms become too onerous.
Whenever Microsoft tries to ‘dictate’ to other companies how they can interact with Windows, even if it’s really for ultimate good of end users, there’s always a vocal group of people complaining about how we’re abusing our power, yadda, yadda, yadda. So we have to be sensitive to that, and strike a balance.
Windows is at its heart a platform upon which partners build their products. Those products can be apps, drivers, hardware, services or a combination of these. When you ask why Microsoft can’t "make" vendors just submit installable drivers to Windows Update, consider how inconsistent this is with our app story. Do we tell Adobe that we don’t like some features in Photoshop, so we won’t logo their app until they remove them? Do we tell our users that we think WordPad is good enough, so they shouldn’t need anything other than "inbox" apps? Partners ship whatever they want. Their customers decide whether they like their products. As a platform, we can do more to enable OUR customers to identify third-party products better. We should definitely allow our customers to decide whether they want a product based on its source (e.g. "I trust Symantec but not NVidia"). And we need to make it easier to yank stuff they don’t like. But in the end we deal with those products in a black-box way.
In a way, expecting Microsoft to police every driver on the planet is akin to expecting your home’s construction company to prevent you from putting ugly furniture in it. Say Company X sold a couch that unfolded at night and drilled holes in your wall. Would you be crying out to the construction companies of America to test all furniture for this kind of behavior? Would you blame the builder of your house for letting such a couch exist? No. It would be cool if the house told you that the couch did it. It would be cool if the house could automatically eject any piece of furniture you put in it. It would be cool if the house could warn you if you’re ever putting in another piece of furniture from the same company. That’s what we’re working on building into Windows.
Windows is PROPRIETRY, not a standard (in the commitee sense) so they play by the rules or theyre on theyre own.
I want drivers easily installable (MMC Device manager snapin) and accessable (windows update).
I guess us the users mean jack all when it comes to drivers.
Sure, we could do that. And who knows, maybe we could annoy our business partners to the point that they stop supporting our platform.
And then, of course, the Slashdot crowd starts screaming about anti-trust.
Ok , screw Windows Update then, I already cant find a few drivers I use on it, so why bother.
I just ran windows update on a peice of hardware taht is very very common on systems.
Logitech Quickcam – very common web camera – FAILED no such driver on windows update.
WHY BOTHER with windows update? Im sure I can find more popular drivers that are not there.
I guess every consumer KNOWS where to get all those drivers for computers. I guess they love spending time visiting umpteen websites fighting with theyre different layouts looking for the RIGHT MODEL of device drivers.
Isnt that the entire point of windows update? A one stop shop for updating your system.
I guess you are talking out yer arse.
So go complain to Logitech and tell them "Hey, give Microsoft a copy of your (WHQL-certified) driver so they can put it on Windows Update."
Thats my point numbnuts, make it part of the certification process.
Um, I believe Tony Cox already explained why it isn’t. And please lay off the rude insults or I’m going to have to ban you again.
Ban, rofl, how :D IP? Its a socks proxy, name? Yeah I cant change that.
Maybe the installer executable should be certified, along with the driver files? That would at least stop the old installer switcheroo trick.
Wouldn’t stop anybody for long. You can just wrap the installer inside a bigger installer (and have the bigger installer set the secret flag).
> I guess every consumer KNOWS where to get all
> those drivers for computers. I guess they love
> spending time visiting umpteen websites
> fighting with theyre different layouts looking
> for the RIGHT MODEL of device drivers.
No, most consumers just pop in the CD that came in the box with the hardware, install the drivers, and NEVER THINK ABOUT THEM AGAIN. EVER. Because they have no reason to. The drivers work, the hardware works. It is gamers, enthusiasts, and technicians that are concerned with staying on the leading edge of drivers–and as such, we might have to work a little harder to do it. So be it–it’s not MS’s job to cater to that fringe market.
Hasnt anyone figured out Ferris is just going to disagree with whatever justification for the current balance between enforcing quality and allowing customization? It has to be his way or the high way.
Don’t feed the troll. I know I do feed it by saying that, but it has to be said once.
At first yes, not always, when they no longer have that box and alot of people do not keep those boxes they are told to go to windows update.
Since when is the corporate enterprise a fringe market? Would you prefer to use SMS to roll out a driver to every machine in a domain easily or just run the MMC device manager remotely to install a driver or go to every machine with a CD and install it for oh lets say 100 machines?
so why doesn’t the WHQL driver certification process certify the complete package, installer and all?
then if the developer repackage the binary it would lose its cerification.
if an end user decided at a later stage to enable the optimisations, via a bundled tool or registry hack, at least it’s an explict action.
This would mean that a vendor couldn’t get a driver certified until they finished writing their entire CD, down to the README file.
And this still won’t stop somebody from just putting the certified installer inside another installer.
Mr. Chen, I must say I’m very impressed with your patience toward the misspelled-movie-character guy. I don’t know how you do it.
I hope the situation doesn’t sour the blogging experience for you because I really enjoy your writing and insights. If it gets out of hand, I suggest you simply disable comments altogether. It would be a shame to loose all the interesting discussion your more mature readers provide though…
Misspelled-movie-character guy: I agree with you that it’s a lot nicer when drivers are made available from windowsupdate.com: easier to install, no buggy extraneous software, etc. And I would also be happy to see the WHQL certification program be a little tougher on such abuse as described above. Believe it or not, it is possible to make these points without being rude and repeatedly lambasting our host…
Here’s the deal:
If the driver isn’t already on my harddrive before I install new hardware, then the out-of-box experience is already at 1 on a scale of 1 to 100.
If a video driver makes the desktop properties/settings dialog take a fraction of a second longer to load so it can put its Flash-powered custom tab in the dialog, divide the rating by 10.
If I can’t choose to install the driver from Add/Remove New Hardware, divide the rating by 10.
If a driver (or any other component) of any kind adds an item to "systray" and there is no way to remove it, divide the rating by 10.
The only reason for gaming benchmarks, elaborate custom installs, systray icons and custom tabs in standard dialogs is EGO. These things do not benefit me, the end user. Never.
— The guy who didn’t use BBS front ends if they required a "modem setup string" other than ATZ.
Your comment about renaming FUNNEL.EXE to TUNNEL.EXE reminded me of an interesting story.
Way back in the Win 3.1 days, when PC Magazine was considered one the leaders in techical computer mags, there was a particular video benchmark we would run. It would print alternating screens of polygons and text. The text was the words "PC Magazine". (Remember, this was long before 3D acceleration).
A certain video card from a certain company (now out of VGA card business) scored way too well on this test. So we did a "strings" on their VGA BIOS, and found the words "PC Magazine" in several places in their BIOS code. Hrm… I wonder what THAT was doing there?
The vendor is free to package it full of turds in the CD box if they want but they MUST make it available on windowsupdate and installable via the MMC Device Manager snapin as an option.
Tony and Raymond:
Yeah, you guys are right. I keep forgetting the perspective we had at the hardware companies about Microsoft controlling everything, homogenizing everything, that sort. I suppose it’s a really slippery path with legal and there’s two sides to everything.
Bleh. :)
So, the driver makers are deliberately deceiving consumers about the compatibility, technical capabilities or both of their hardware and software. Has Microsoft been contected by the FTC to assist in their investigation of the fraud yet?
How bad is the hit from moving the video drivers out of the more trusted security rings so the failures they are causing are not as damaging? Would it actually help?
James: I dunno. If the software that controls what displays on your screen crashes, regardless of what ring it’s running in it’s probably not good for your machine. =)
James Day:
The part of the Win32 subsystem dealing with graphics (GDI) was a quite ordinary user-process in the first versions of Windows XP (NT 3.5 and NT3.51). However it was moved to the kernel-side for NT4.0.
You can read about why they did this here: http://msdn.microsoft.com/archive/default.asp?url=/archive/en-us/dnarwbgen/html/msdn_movuser.asp
Maybe it would be better for Windows Server to have the GDI still as a user-level process?
Note that it was GDI, not the display driver, that ran in user-mode on NT 3.5. I.e., the component that worries about things like HBRUSHes and DIBs. The display driver still ran in kernel mode (since it needs privileged access to hardware resources).
You can’t do (accelerated) graphics at user-level these days because modern graphics cards have complex DMA engines (just like network cards and SCSI cards). There needs to be a kernel driver to handle the memory mapping and transfer of data. If there is a bug in that component, it will probably bring down the entire machine, so there isn’t much point in splitting out work to user space.
I am worry. What is MS doing to ensure this kind of deception doesn’t occur in Longhorn? From what I read, Longhorn will tie itself more directly to these video drivers.
Raymond Chen wrote:
"The display driver still ran in kernel mode (since it needs privileged access to hardware resources). "
To me, it seems like the display driver was split in two parts, one in each mode. See this figure: http://msdn.microsoft.com/archive/en-us/dnarwbgen/html/movuser02.gif
—
With regards to drivers & "Longhorn" you can find lot’s of information in the WinHEC2003 slides (http://www.microsoft.com/whdc/winhec/pres03.mspx). And I assume WinHEC2004 will bring even more information.
Why not just split the WHQL process in two?
1) Test the driver as usual (let’s assume the driver passes).
2) When driver is released to end users, download it from manufacturer’s website as a normal consumer would. Then test it again. If it fails, WHQL certification is retroactively revoked (and there should be a website loudly proclaiming this, and that the manufacturer is releasing dodgy drivers).
Now, if a small-time hardware site did 2), especially the loudly proclaiming bit, they would be feeling the pressure from nVidia lawyers post haste. But Microsoft? Ner… nVidia can’t afford that risk, not with ATi eating their lunch with the Radeon 9×00 series.
Note: Not saying nVidia are (the only ones) cheating. Just using them as an example because they’re big and well-known.
I’m not sure that the WHQL certification is much use anyway — drivers for Creative Labs products have been flaky for as long as I can remember, yet they’re now all certified. It doesn’t stop them crashing at random, but I suppose at least when I install them I don’t have to click on "Continue anyway" a bunch of times…
Although Ferris is a bit of a troll, the point about forcing drivers to install via the MMC is a pretty good one; I’ve lost count of the number of things I’ve attached to my PC that include the line "When the add new hardware wizard prompts you to locate the drivers, click Cancel" in the install instructions.
> Are you going to take the chance that the source code might get Half-Life 2’d or Windows 2000’d?
Surely you aren’t saying that MS is more likely to "lose" the source than the original company, Raymond? I must say though that the idea of MS agreeing that making access to the source mandatory to make sure that there is no code doing things it shouldn’t would make the /. crowd wet themselves with laughter.
Frankly, it really doesn’t matter if GDI/USER and the associated drivers live in user- or kernel-mode. If these critical subsystems die, the OS is toast. They are as critical to a running Windows system as CSRSS.EXE — "just" a user-mode process, but kill it and your system is kaput.
What does it matter for a database-server if GDI is there or not?
In _theory_ you should be able to create, for example an e-mail service, running in Interix or OS/2 subsystem, and it should not make a difference for it if CSRSS.EXE is alive or not.
And that would help a few hundred out of the 300 millions Windows-based machines out there. And even then, only until you wanted to do some remote administration via Terminal Server and realize that you can’t.
I assume that’s why SFU/Interix comes with a telnet server :P
That last one was just a joke… I agree with you that this isn’t a very useful thing. I’m just a bit curious in how dependent the rest of the system (NT Executive etc.) is on the Win32 subsystem.
I remember reading an interview with Jim Allchin a year ago where he spoke a bit about this issue (removing the need for a server to have GDI etc.). Unfortunately I can’t find the link back :(
It’d be very nice if Watson/OCA could tell you which device driver it was that caused your last BSOD. Then at least you would know who to chase up to get it fixed. Just knowing the file name might be enough although it might also be good to get the device description where it is a hardware driver fault.
70% of the BSODs I get say "Error caused by a device driver" but I never know which one is responsible so I can’t do much about upgrading it or bothering the Manufacturer about fixing it. Also it would be good to know if it is a MS device driver and if it has been signed. It sometimes seems like an easy way out to blame a device driver thinking it is not MSs responsibilty when in fact it may be a device driver created by MS.
At the moment I can crash my Tablet PC by reading a file off a Compact Flash card using BlueJ a Java IDE. I am using the XP SP2 beta so I suppose I should expect problems, but unless I know where the problem lies it doesn’t help much.
I agree about Windows update. All the drivers it suggests for my hardware are at least a year out of date and a lot of people I know have more problems after accepting a Windows Update supplied driver than they had before installing it.
The problem is that assigning blame is not easy. Suppose driver X crashes – is the problem in driver X? Or maybe the problem was in driver Y who corrupted driver X’s memory? Or maybe driver Z passed driver X a bogus pointer? Or driver Q called driver X when the rules state that Q cannot call X until it first does P and it forgot to do P?
The Driver Verifier tries to catch these sorts of problems, but it slows your machine way, way down due to all the extra checks. But if you suspect a driver, you can tag it for extra attention and see if it survives.
OK, I realise in kernel space any driver can corrupt any part of the OS memory so it is difficult to work out where it all started to go wrong.
In that case does the OCA determine that it was a driver that caused the problem to begin with, or is it just where the faulting instruction occured?
How much is the OCA data looked at? Do they work out if a particular driver is consistently causing problems and contact the developer?
I just think it might be better if consumers were able to apply pressure to IHVs based on some evidence that they are to blame.
If I can’t get an updated driver, or a fix for the problem I might decide to rip that componant out and buy the competitors.
Don’t you think that Microsoft makes too much use of kernel mode drivers? The more stuff you run in there the more chance there is of something going wrong. Do I need an Acoustic echo canceller or a DRM audio descrambler able to bring down my machine?
How about you let me know when something else is trying to install a driver? I don’t want these CD copy protection schemes to use Autorun to attach a filter driver to my CD-Rom drive.
A criteria of WHQL certification is that the driver does not adversely effect the operation of another device. A driver that does not let me read data off a particular audio CD should not qualify, thus it should not be installed without my agreement.
The "Debugging tools for Windows"[1] gives you all the information you need :) There is support to load a minidump and it then can analyze it to find out what the cause of the crash possibly is. However, the more information you give it – the better the result.
So, for example having the symbolfiles for all your drivers helps it a lot.
Just start for example windbg.exe (you can use the console-based debuggers too if you prefer) and load the crash-dump. After it has loaded the dumpfile you can type this command to get the analysis: !analyze -v
Tip: You can set the _NT_SYMBOL_PATH environment variable to download symbols from Microsoft when needed (for example set _NT_SYMBOL_PATH=SRV*C:DbgSymbols*http://msdl.microsoft.com/download/symbols).
[1] http://www.microsoft.com/whdc/ddk/debugging/default.mspx
[Oh, sorry for splitting up in multiple answers, but I didn’t see the last reponse before posting the first one]
OCA uses, AFAIK, the same analysis as you get with the use of the debuggers I explained above. See especially this WinHEC 2003 presentation for more information about how OCA works: http://download.microsoft.com/download/c/f/1/cf1806ad-5a4f-4f7d-a5b2-07fdb59a7adb/WH03_DDT36.exe. However there’s a lot more about driver quality etc. in the other slides which you can find here: http://www.microsoft.com/whdc/winhec/pres03.mspx
Thanks for the info, but it would be nice if this could be built into OCA. Instead of "A Device Driver" tell me the filename, then let me work the rest out for myself.
I remember hitting one of these myself and after a few days I got a piece of email telling me it was my video driver.
Edward, Andreas:
I can vouch for the windDbg utility. I used it at work after I foolishly volunteered to find out why certain computers (about 200 on th e network) got the blue screen at work while generating MS Access reports via a VB 6 application.
Luckly, I had kept all my windows dump files over the last 3 months, then analyzed a few of them after loading all the symbols, and bingo, it told me what DLL and what function was responsible for the crash.
I then blamed a video driver (I was half right, half wrong). The driver was old, and wasn’t robust enough to handle bad programming. The function in the driver had something to do with generating a GDI bitmap. So MS Access I thought was to blame. As it turned out, someone (no names) was using some stupid threading library to display the progress bar!! Yes, it was the progress bar, in combination with an old driver.
Moral of the story, BSODs can be debugged and it doesn’t take *that* much effort. Google some more if you need help.
Sorry if I’ve gone off topic.
Andreas: You said "In _theory_ you should be able to create, for example an e-mail service, running in Interix or OS/2 subsystem, and it should not make a difference for it if CSRSS.EXE is alive or not."
In this case (as in many others), "in theory" means "not really".
How would this email service communicate with the OS/2 subsystem? Via CSRSS.EXE, that’s how.
However, I do agree with the basics of what you’re saying. As a geek, I’ve always thought it would be cool if Microsoft made the NT OS available separately from Windows, with a "subsystem developers kit" of some kind. The marketing dweebs obviously thought differently.
On a slightly different (but related) topic: I’ve recently noticed an interesting behavioral difference between the NT and Linux kernels. In NT, *any* unhandled kernel exception causes a BSOD, even when running in "process context". In Linux, the kernel attempts to kill the offending process. Obviously, this is not always possible (an exception in an interrupt handler is always a Bad Thing), but it often *is* possible.
I think this is one reason Linux is considered more stable than NT. If some daemon hits some corner case down in a kernel API, the daemon is killed, the OS continues to operate, and the administrator restarts the daemon. If the same thing happens in NT, the system BSODs and must be rebooted.
I totally understand the philosophy behind NT’s behavor: kernel-mode is trusted, if there’s an unhandled exception, then trust is "broken", the system is considered compromised, and *wham*, BSOD. But I wonder if it would be possible for the NT kernel to be just a little more "forgiving".
keithmo wrote:
"How would this email service communicate with the OS/2 subsystem? Via CSRSS.EXE, that’s how."
I can’t immediately see why you have to go via CSRSS for this. All the environment subsystems are isolated for each other. The communication primitives, such as events, pipes, mutexes etc. are implemented in ntdll.dll. Where does the need for having CSRSS in the middle come from?
Yeah, it would be really cool to see the Windows Subsystem SDK :) No need to get C64 emulators when you instead can create a C64 subsystem :) Maybe marketing could use that as an argument for why the microkernel-architecture of Windows is soo much cooler then the architecture of Linux? :D
More fault-tolerance would be good, but it seems like a very hard job to implement it correctly. The heuristics should be really advanced to instead of BSOD continue to operate.
However, there is already a little of this in Windows XP. Or maybe this is a different issue? Sometimes when a graphics driver gets unresponsive Windows decides to shift to the standard VGA driver and tell the user to clean up the box and restart. Would be nice if it handled my crappy TV-card driver the same way :)
3/9/2004 3:34 AM keithmo [exmsft]:
> I think this is one reason Linux is
> considered more stable than NT.
Maybe that depends on who is doing the considering. Yes in that particular way Linux has that particular stability which NT doesn’t have. But overall, counting all kinds of crashes and hangs from all kinds of reasons, in my experience NT4 SP3 still beat Linux.
I’ve often written in public forums that there are two essential differences between Windows and Linux. (1) With Linux, you DO get what you paid for, except if you paid for it. (2) With Linux, when the code needs fixing, you DO have a snowball’s chance in hell of getting it fixed.
Of course Linux frequently improves (some steps forward, some steps back, but frequently catching up and improving). So maybe someday it will beat NT4 SP3. Of course if you were to compare Linux to NT4 SP4, or W2000, or XP, or W2003, there would be no contest, at the kernel level.
What has happened HAPPENEND, what is happening now is happening.
Stop it from happening by changing things in Longhorn
Makes the Os Smarter to detect the errors than the WHQL test.
That way the user can see if the hard earned money bought them a Legit Video card or if they were duped.
You can do that today: Run the video driver under the Driver Verifier. Of course, all the extra checking is not free. Your computer will run a lot slower. But if you want it, you can do it.
Hoping that noone has mentioned this yet, but why not publish the benchmark from the WHQL test for a given driver? Then the review websites & consumers would be able to notice a WHQL cheat…
I’m not sure what these people are thinking.
I’m not sure what these people are thinking.