Date: | December 1, 2005 / year-entry #369 |
Tags: | other |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20051201-09/?p=33133 |
Comments: | 56 |
Summary: | Of all the things I did for Windows XP, if I had to choose the one feature that I'm most proud of, it's fixing Pinball so it doesn't consume 100% CPU. The program was originally written for Windows 95 and had a render loop that simply painted frames as fast as possible. In the checked build, you... |
Of all the things I did for Windows XP, if I had to choose the one feature that I'm most proud of, it's fixing Pinball so it doesn't consume 100% CPU. The program was originally written for Windows 95 and had a render loop that simply painted frames as fast as possible. In the checked build, you could tell the program to display the number of frames per second. They reserved room for two digits of FPS. When I got to looking at Pinball's CPU usage, I built the checked version and took a peek at the frame rate. Imagine my surprise when I saw that Pinball's frame rate on contemporary hardware was over one million frames per second. I added a limiter that capped the frame rate to 120 frames per second. This was enough to drop the CPU usage from 100% to 1%. Now you can play Pinball while waiting for your document to print without noticeably impacting printing speed. |
Comments (56)
Comments are closed. |
Yay pinball works
now if only making driver installs/updates/uninstalls were as simple.
For your next trick, could you make the full screen option on pinball truly full screen? :)
Do you think someone could issue a patch for Age of Mythology so it doesn’t gobble up CPU time even when the user has switched away on XP using fast user switching?
It drives me mad when my son leaves it running on his account on a slowish machine around here.
I’m sure a word from Raymond could get the games group to "pay their taxes" ;-)
Printing is CPU-bound? I would have thought that sending data out the parallel port (or USB port, or network, depending on the printer type) wouldn’t cause much of a burden on the CPU — certainly not enough to bind it to the CPU’s speed, to the point where Pinball would slow it down.
(Of course, for the parallel port, it would depend on the port type. ECP would have almost no CPU burden at all, since it uses DMA. EPP/SPP might, but only because each byte has to be sent out using a separate I/O port write, so the inter-byte timings need to be just right.)
Oh, wait, maybe you’re talking about all the print driver/spooler/whatever code that has to run to get the document in a format that the printer can understand? If the printer supported GDI somehow, directly (similar to a PostScript capable printer under Unix), then most of the translation code wouldn’t need to run anymore.
But AFAIK there’s no "wire protocol" for GDI, like there is for PostScript (just write the ASCII-encoded PS characters to the device, and let it interpret it), so that would make it hard. Hmm…
That is nice. However, one thing work looking into is reducing the damage that a 100%-CPU-using application causes to the system. A method that works and is used by almost every OS except Windows is penalizing the priority of the offending thread.
NT uses a weird "randomly boost priorities" method, which is not nearly as good, for obvious reasons.
Of course you have to balance this, as the application may actually need the CPU (for example, games), and the current method gives it more smoothness. That said, I like more the "penalizing" method. And it’d be awesome if you could limit the percentage of CPU time a process can use, regardless of whether the CPU would be idle otherwise (forceful suspension). I know you can use external applications to do this, but…
By the way, why do games using VSync (with capped framerate to the one of the monitor) use 100% CPU always? And it’s kernel time, seems like the driver spinlocks waiting to update.
Nobody — probably because there’s no such thing as an interrupt to tell the OS when the vblank is happening. (Or at least, if it is there for certain video cards, it isn’t being used by the video driver for whatever reason.) So there’s no way for the kernel driver to say "wake me up when a vblank is about to happen", so there’s nothing it can do except spin while monitoring the card or monitor.
(And even if an interrupt did happen, it may be preempted by other interrupts in the system, or the Windows equivalent of Linux’s "bottom half"/"tasklet" (piece of code that runs outside interrupt context, but is triggered by the interrupt handler code) might get delayed, so there’s no guarantee that the driver could reliably wait until the next vblank even with the interrupt capability.)
Yes, this is crappy, but I don’t know of any really reliable way of waiting for an event whose duration is as short as the vblank…
Nobody, games use 100% CPU because an infinite rendering loop is the simplest way to achieve a high framerate. Typically, a new game will be designed to stress the hardware, which means it needs maximum performance. If you know you need max performance, there’s no point in introducing an artificial limit that won’t be hit for two or three years (when sufficiently fast hardware is available), because 1) game companies don’t care about two or three years away; old games aren’t their money maker, and 2) If you’re in full-screen mode, why do you really care if it uses 100% CPU or not? In theory, you’re not doing anything else anyway.
Tony, just turn off fast user switching. Then he’ll either log off properly (and it’ll close his game), or he won’t log off at all, and you can easily close it for him.
Hmm, apparently I should have thought about VSYNC a little more before I replied. Then I wouldn’t have referred to the max framerate as an "artificial limit". No matter. I think my previous comment still applies. If you don’t expect to get enough frames per second on the newest hardware to max out the vsync, there’s no point in trying to limit CPU to when you can update.
So why would you limit the frames to 120 instead of the unofficial frame rate standard of 60fps?
I know I’m splitting hairs but aren’t you technically doubling the used CPU (ok less than half a percent, I’m talking principals here).
Does the ball move so fast that it’s possible to see a minor difference between 60 and 120?
BryanK — Raymond likely just used printing as an example of something that pinball could interfere with.
"If the printer supported GDI somehow…" "But AFAIK there’s no "wire protocol" for GDI, like there is for PostScript…"
As far as I know, XPS (XML Paper Specification) is quite like this.
Raymond,
Pinball hasn’t been included in beta 1 and later builds of Windows Vista. Since it was originally developed by Maxis, there was speculation that Microsoft’s license to redistribute it had expired. Any official word on this?
I expect that you probably either a) don’t know or b) aren’t allowed to say, but on the chance that you do know and can say, that would be great. :-)
IIRC, back in EGA/VGA days there used to be an IRQ wired to the vertical blanking signal…
And WMF is somewhat like a GDI protocol, although it doesn’t support all operations.
Correct me if I’m wrong, but I believe tearing is more pronounced when FPS is less than your monitor’s refresh rate due to more movement between frames. In the absence of vsync, higher FPS is better for those of us with CRTs (70Hz+).
I think most points are invalid :)
Reasons:
1. Yielding CPU guarantees you NOTHING. Any other thread may start execution at any given time. You are just wasting time.
2. Example of something that has to be synched to the hardware, and is done correctly with a high priority system thread: audio.
3. Some modern games (DooM 3 and derivatives for example) which are capped at 60 fps will release CPU if they don’t need it – and they run just fine. Unless you use vsync, which will lock them at 100% CPU usage.
This is a problem with the driver model. If the card is busy (either waiting for vsync or waiting for the render to complete (GPU bound, not CPU bound)), the driver will ate 100% CPU. Read:
http://www.virtualdub.org/blog/pivot/entry.php?id=74#body
It could do sleep(1)s, for example. At 60 fps, that wouldn’t cause more than an 8% performance hit, but benchmarks are more important than real world usage.
That said, Travis Owens, given the right screen, I can easily tell the difference between 60 and 100 fps, so I’m glad it’s capped at 120, which is quite reasonable.
Pinball was even more fun when using Terminal Server. If you ran it over an RDP connection, then display updates stopped. That made it impossible to resolve the situation. IIRC even doing anything on the console was really hard.
A bit off topic but you did mention Pinball ;-)
Anyone know why the old Windows NT driver for HPFS (OS/2 partitions) was called pinball.sys?
Can’t you make Explorer not to consume 100% cpu when clicking on an avi-file?
http://miataru.computing.net/windowsxp/wwwboard/forum/14632.html
Regarding printing being CPU-bound – remember that cheap printers are really popular these days, and they are cheap precisely because they offload the heavy-duty work like rasterization to the software driver. Like WinModems, only they’re WinPrinters. (personally I only buy "real" printers that can interpret PostScript on-board, but I think I’m in the minority on this :)
Regarding games still eating 100% CPU even when GPU-bound – this is probably a graphics driver issue. I see no reason why an OpenGL implementation couldn’t sleep waiting for the buffer swap. Perhaps video driver companies have just been careless because typical users don’t complain about this. (another possibility might be audio mixing; a lot of games spin off a separate thread for handling audio).
I’m interested to know what method you used to successfully yield CPU time but guarantee you received another timeslice in under 10ms.
All the methods I’ve tried for yielding in games work most of the time, but a couple of times a second or so the system will sleep for much longer, say 10ms, and not give enough time for a game loop to complete at the required 60fps. Only Sleep(0) has proved reliable in terms of time, but you’ve pointed out the fallacy of believing Sleep(0) is much use in the past, and it certainly doesn’t help reduce power consumption on mobile or other downclocking CPU’s.
I’ve tried to use a timer object after you suggested this method, and while it was very stable it was unable to wait for the amounts of time I specified – I concluded that the timer resolution was too low to be useful. Of course, I could just be doing it completely wrong…
Pinball was the code name for HPFS — I suppose the name stuck long enough to make it into driver history as pinball.sys
HPFS of course was the high performance file system we wrote for OS/2
Maybe you can fix that one too:
– You hold up a flipper to stop the ball
– when the ball is in flipper-range you decide not to stop it
– let go and immediately flip again
–> ball goes right through the flipper.
This bug has been annoying me for years (was already there in Win95).
Otherwise I really like pinball, it almost acts like the real thing.
Nobody:
1. I’m assuming that’s supposed to say "Not yielding guarantees you nothing." True, but if you’re holding onto the CPU, you’re more likely to be there when you need to be. Standing in front of a store doesn’t guarantee you’ll be the first customer in when it opens, but it increases your chances.
2. Audio’s not really the same. Not at all. An audio play queues up audio in advance and hands it off to the OS. A game cannot queue frames in advance because 1) it needs to write into a buffer that’s already locked, 2) it can’t render fast enough for that (newer games, anyway), and 3) it doesn’t typically know what the frame will look like until it gets there. Audio uses a buffer. Video cannot. Slight delays in game feedback are unacceptable.
3. A cap of 60 Hz seems ridiculous to me. My old monitor ran at 85 Hz, and I imagine the game would look better if it matched the refresh rate. As for why it wastes cycles trying to lock onto the buffer? If there’s no reliable way to get notified when it becomes available, a spin lock is the appropriate way to do it.
Capping 3d games in general seems silly to me. I expect one to use 100% CPU usage. I’m in a full screen game, and if it’s new, it’s taxing the hardware anyway. I don’t really see the need to bother capping it at all.
PatriotB — I’ll have to start looking into XPS; it sounds like it could be a fairly decent spec. Thanks!
josh — Yeah, I didn’t think of WMF. You’re right, that might be an option for some operations at least. (But without support for everything, it starts to get a bit unsavory…)
Nobody: "This is a problem with the driver model." — That would not surprise me in the least. When you make writing a driver as complicated as possible (COM, interfaces, etc.), you make it harder for people to write "obviously-correct" drivers. And then bad stuff happens.
Dan Maas — agreed, el-cheapo printers are a problem. (This isn’t the only one by any stretch, but when Lexmark makes a Z15 model that costs about $20, what should I expect? Yikes.) I would love to be able to use PS capable printers at home (mostly because I use Linux at home, and everything prints to PS format — then I could just pipe it directly to the printer). My one printer, however, is a LaserJet 1100 that speaks PCL4 or 5, not PS.
But at least PCL is documented somewhere, and people have written PCL backends for Ghostscript. (But note that when I’m writing directly from GS to the printer, my CPU is still down in the 1% range.)
IIRC only a certain IBM manufactured EGA/VGA card had an interrupt dedicated to VBLANK.
If there was room for only 2 digits of FPS how could you tell whether it was rendering million frames per second or even 120 for that matter.
eruprahgp,
"When I got to looking at Pinball’s CPU usage, I built the checked version and took a peek at the frame rate. Imagine my surprise when I saw that Pinball’s frame rate on contemporary hardware was over one million frames per second."
Steve
"The program was originally written for Windows 95 and had a render loop that simply painted frames as fast as possible. In the checked build, you could tell the program to display the number of frames per second. They reserved room for two digits of FPS. "
Perhaps Raymond noticed that the two-digit framerate counter was ‘reporting’ a framerate of "00" and decided to increase the number of characters reserved for it? After all, if he can modify it to yield CPU, he could surely have modified it to display more characters :)
Also, on the vsync issue. The amount of time for a monitor to do a vsync is in the order of micro seconds. That means you have a VERY short amount of time to actually swap your back/front buffers in order to get it done before the vsync finishes. If you used an interrupt to notify the application that the vsync was happening, it would hardly have enough time to context-switch to the process let alone actually do the buffer swap. That’s why you need to use a spin loop – to ensure the right process is already when the vsync happens.
eruprahgp, I leave it to you as a puzzle how I was able to figure out the actual frame rate.
A little interesting info on using vsync with directx I read a while back.
http://www.virtualdub.org/blog/pivot/entry.php?id=74
Yeah Raymond. I’ve loved that game since day one. So smooth.
thanks.
So does that mean the Pinball game uses frame based animation?
I don’t think sleep(N) is relevant to reducing CPU usage in GPU-bound programs. The graphics driver just needs to put the thread to sleep waiting for an interrupt that signals fresh space in the graphics command buffer. From the application’s point of view, it would just block in glSwapBuffers().
NVIDIA cards definitely send an interrupt for each refresh; I’ve seen them on interrupt profiles of a Linux system (just idling in X).
Since the graphics command transfer is DMA based I also expect there are interrupts to signal completion of command buffers (just like ethernet and SCSI cards that use bus-master DMA).
NVIDIA recently added an OpenGL extension that allows fine-grained synchronization within the command stream. There are functions that insert markers into the command buffer and sleep until execution reaches a specific marker.
Fact nugget :o)
In his homepage, http://homepages.borland.com/dthorpe/products.html , Danny Thorpe wrote that he was the original author of Pinball, having written it in Delphi sometime before 1995.
And when he got wind of it that Microsoft was purchasing the game, he converted the game from Delphi to C++.
I’d like to point out that Raymond is most proud of his PERFORMANCE work :)
*laugh*
>> Can’t you make Explorer not to consume 100%
>> cpu when clicking on an avi-file?
Oh yeah. But the REALLY pathetic thing is:
You select an .AVI file. It builds preview, even if there is no need for it (just to show duration or video dimensions on the status line). A small spike of CPU, but bearable. (Depends on file length, it might me one second or more)
Now, another application starts to write to ANOTHER FILE on the same folder. For example, a download. EVERY WRITE on ANY FILE of that folder will re-trigger the preview.
So, welcome to 100% CPU usage for no apparent reason, even if the window is not even visible. Heck, when focusing it again it will regenerate preview, so just hitting ALT+TAB again and again will keep it at 100% CPU.
That is if you are lucky. If the file is incomplete, see quoted post. If it is corrupt, cross your fingers, chances of a segfault are very high.
And if you try to move/delete the file while the preview is generating, the F**KIN’ shell will LIE to you and tell you that ANOTHER app is locking the file. Perfect, just perfect.
No wonder the thing that makes R.C. most proud it this: it might very well be the only thing that is not insanely, hopelessly broken in XP.
I guess this doesn’t happen with .WMV files. Somebody should sue Microsoft for that. :D
And the limiter is 120 fps? Then why the game runs at 100 fps? Ah yes! Windows XP still uses a 100Hz clock interrupt frequency, right? What about changing that to 1KHz, like linux did some time ago? Oh, can somebody with an SMP machine check if it’s 66 fps for them? I think the HAL used 15 ms for the interrupt on SMP machines; but as always with Windows, the constant is buried deep on some binary file, to give the user zero options.
hpfs was called pinball.sys because pinball was the code name of the project that resulted in the filesystem known as pinball (more-or-less).
There were a number of "-ball" code names in Lan Manager, pinball, football, winball, etc.
HPFS was IBM’s name for the filesystem that was implemented in the pinball project.
Sleep()’s precision can be improved by calling timeBeginPeriod(1), but that globally taxes the system with interrupt overhead. Also, in a 3D program, it’s hard to gauge whether you’re going to block due to the command buffer becoming full, and there’s no way to tell the driver not to spin. The good news is that I think Vista is going to force a resolution on this issue, given the much broader use of the GPU.
Programs unnecessarily using 100% of the CPU is a pet peeve of mine, since a program written for a Pentium doesn’t need the full power of a Pentium 4, and it causes my laptop to heat up and turn all its fans on. I usually force the CPU to low speed, and then patch the import for PeekMessage() and insert a Sleep(). :)
Derek:
Yeah, that’s what I meant.
You don’t need "to be there" for VSYNC to work. If the HW doesn’t have an interrupt, it has an special register to change the screen buffer on vertical blank. This is the only correct way to do this. If you didn’t to it this way, you could see tearing of you came too late, or have to wait again (and that would have the potential to repeat itself forever).
The spinlock is to start rendering the next frame. The front buffer is displayed. The back buffer is waiting to be displayed, switched with the front buffer. And the application is waiting until the swap happens, so it has a buffer to draw to.
Version "B" is GPU-bound games, not vsynched. If an app is CPU bound, kernel (driver) CPU time is about 10%. If GPU bound, kernel CPU time becomes 50%. The application issues a request, the GPU is busy and the command queue full, so the driver spinlocks until the request can go through, instead of releasing the CPU gracefully.
VSync is a special case of "GPU bound": GPU causes delay while waiting for vblank.
On GPU-bound games, you can’t really release the CPU unless the command buffer is hugue, because it has the potential to fill several times in a single frame, and doing sleep(1)s would cause great harm.
But consider vsync at 60 hz: doing sleep(1)s, you would lose at most 1 ms per frame. At a maximum of 60fps, that’s 60 ms per second. I think that is very reasonable, 6% max performance impact, much nicer multitasking.
60 fps caps are crap, but it’d appear that "id software" felt lazy and didn’t want to do things the good, parametric way. Fixed ticrates solve a lot of problems, but they inherently cap the framerate *AND* waste processing power if the framerate falls bellow that, because multiple tics have to be calculated per frame. But if you can sustain 60fps, it’s very nice the CPU usage falls bellow 100%. It it took 4 ms to render a frame, the game does a sleep(12) and everybody is very happy.
The "consume 100% CPU to run a bit faster" is a very bad policy. Even in fullscreen mode, if you leave CPU time, you could do things like encoding in the background. Dual core CPUs are a crappy "patch" to this: if the game used two threads which never sleep or lock, you’d be in the same situation again.
Finally, the only reasonable reason to go beyond the monitor’s refresh rate is to reduce controller latency. If you DON’T sleep between capturing the input and drawing the image (like a lot of games do, and like vsync forces you inherently to), you wouldn’t have to do that. 125 fps is a reasonable "true" ceiling: that’s the USB HID capture rate, you’d render frames that had no input between them, and most people would lose perception at around 90 fps anyway.
A lot of cheap printers basicly implement a propriatory control protocol and a propriatory compression algorithim and have the software drivers rasterize everything into a raster image, compress it with the propriatory compression algorithim and send it (in bits so it can fit into the memory of the printer) to be printed.
Derek: "http://xp.c2.com/DoTheSimplestThingThatCouldPossiblyWork.html"
That’s a pretty flawed argument. Following that the best way to wait for anything is just to spin-lock.
Being someone who writes tools for games as a living, there is nothing worse than a rendering engine running full tilt. After 2 1/2 years, I am still fighting that flawed mentality.
Derek said:
"Capping 3d games in general seems silly to me. I expect one to use 100% CPU usage. I’m in a full screen game, and if it’s new, it’s taxing the hardware anyway. I don’t really see the need to bother capping it at all."
This is a HORRIBLE mindset because you’re now wasting power. Intel and Microsoft have both proven power conservation is important (for both your wallet, the enviroment and the noise of your system) and both companies upcoming products take this seriously.
This issue isn’t limited to just laptops (where’s it’s obvious what happens if you burn 100% cpu) and don’t fall into the mindset of "laptops shouldn’t play games" either, this applies to desktops just as much as laptops. Just because my desktop has access to an unlimited supply of power doesn’t mean I want it to use (waste) it. Unless you like the idea of owning a 600watt PSU, which is where desktops are going now-a-days, and it’s not a good thing.
Computers need to use less power, not more, watts are the lazy man’s answer to faster.
Jonathan Wilson said:
"A lot of cheap printers … rasterize everything … and send it to the printer"
Aren’t a lot of mid level printers doing this too? It sure seems like it on the <insert a popular office printer company> color laser printers I’ve used (both mid level and high level models I’ve used).
I’ve noticed when I flatten an image, images with lots of blank data (lots of white) send significantly faster than an image with withs of colors as iirc image compress is favorable to lots of blank/repeatable areas.
Andy, that’s not at all what I’m saying. While waiting in a spin lock is pretty much the simplest thing, it doesn’t always work. It doesn’t work for Mozilla to spin lock while waiting for a connection to this blog. It would drive the user insane. A game, on the other hand, can spin lock waiting to update with VSYNC, because 1) the typical user isn’t doing anything else with their computer then anyway, and 2) there’s a definite (small) maximum amount of time before the next VSYNC event occurs.
The Simplest Thing That Could Possibly Work is a great rule, but you have to look at the spirit of the rule, not the most literal interpretation.
Travis, I’m advocating it because modern games typically push the hardware to its max anyway. I see little reason to bother with a cap that may not be hit for at least a year (or more). It’s more code that now must be tested, now must be supported, etc. It’s not economical to bother with it, especially since most customers don’t seem to be asking for it.
If you’re writing a game that you expect to hit 150 fps on typical hardware, sure, put in a cap. But who’s writing games like that?
Nobody, if you’re not constantly waiting, how are you going to know when that register changes? Yeah, you can look when your thread wakes up, but it’s potentially faster to just spin. (If there’s an interrupt, though, the driver should probably use it instead of spinning.)
`But consider vsync at 60 hz: doing sleep(1)s, you would lose at most 1 ms per frame.` No. You lose, an *minimum* 1 ms per frame, and probably much more. You can’t expect the CPU to wake you up in exactly 1 ms. It’s extremely unlikely to happen.
`The "consume 100% CPU to run a bit faster" is a very bad policy.` It’s the simplest thing. There’s little compelling reason for a full-screen game to bother trying to share the CPU. (I don’t personally see the need to encode while playing a game.)
http://xp.c2.com/DoTheSimplestThingThatCouldPossiblyWork.html
`Dual core CPUs are a crappy "patch" to this: if the game used two threads which never sleep or lock, you’d be in the same situation again.`
I can’t imagine why a typical rendering engine would use both threads, unless it was CPU bound, and somehow resulted in a better framerate if it split processing. Audio, file i/o, etc. have no need to be infinite loops.
Phaeron, will that actually go through? Or will it return ERROR_NOCANDO? And of course, how bad is the penalty on that?
Also, why doesn’t this thing ever freaking remember me, even when I check the box?
Derek: timeBeginPeriod(1) works for me on XP, although in practice you would want to call timeGetDevCaps() to retrieve the actual supported minimum. As for the overhead, I don’t have numbers, but I suspect it’s fairly minimal now. It used to be noticeable in the days of Windows 95 — I could see a significant difference in WinTop that went away when I reset the timer rate from a DOS prompt.
Derek: No, no, no :)
Ok, so the game might be "hardware pushing" right now. Will it be in 2 year’s time? Heck, on most occasions you *HAVE* to put a cap, otherwise once the game will behave incorrectly in future hardware. The boxed version of Quake 2 runs at over 1000fps here, and it DOES NOT WORK. You can’t move, it’s hitting some numeric precision limit.
Newer games usually come with a reasonable cap. For example, in Half-Life 2 it’s 300fps. In HL1/CS it was 72 fps (you can increase it to 100), and most machines now will run solid at that framerate. A lot of people play it still, and they are suffering your mentality.
Moreover:
1. If you release the CPU on a "right" moment, when you have nothing to do, background apps/services that have to do very little work will do it when they don’t "annoy", instead of when you were doing useful work.
2. Again, in the general case, not much point in going over the monitor’s refresh rate. With top-of-the-line hardware, even the latest games can run at ~70fps most of the time, so it’s necessary even today.
3. It’s better if the game runs consistently than at varying speeds. Think about console games: all of them are VSynched; some at 60 and some at 30 Hz. Obviously not all scenes require the same amount of work, so just thing how many processing power they are wasting most of the time. Except they aren’t wasting it: that’s the best option for the work at hand.
100% CPU usage makes sense for data processing (such as video encoding, offline rendering, compiling if you aren’t IO-bound, that kind of thing). For real time stuff (audio, video, videogames) what you want is a picture every 16 or 33 ms, so to speak. Not all will take the same time, and obviously the CPU will be bellow 100%. That’s the right way. Don’t think “ooh, it’s cool it uses the hardware to its fullest”. It doesn’t. Either the graphics card or the CPU is waiting for the other. And of you get 500fps, you’ll just get tearing, and cause other problems (too high C->S traffic in some games, this is why Quake3 and UT2004 are capped at around 90fps)
Just think HDTV video clips: my media player uses 80% CPU and it still runs "just fine". Likewise, 60fps clips: they play smooth, and the player doesn’t have to hog the CPU. Decoding a huge video frame can take as many CPU time as the CPU processing a game does. With your mentality, it’d be OK for video players to use 100% CPU, since "most people don’t do other things when they’re watching movies".
I don’t know if I’m “normal”, but not only I whish games left CPU power to do “offline stuff” (as mentioned above), but a lot of games (in particular multiplayer ones) are prone to be run windowed and left unattended. Quake 4 does that right: if you leave it in the background, it caps itself to 8fps, and only needs 3% CPU time.
Think about it :)
Game companies aren’t making their money selling two year old games. It’s hard to see a good reason for them to bother with something expected to be needed in two years. But, I guess if they are putting the cap in, it must be more requested than I would expect.
However, I don’t think that you *have* to put a cap in older games. Pinball was running at over a million fps just fine. I don’t know what’s wrong with Quake, but that sounds like a bug.
1. Eh. I’m not sure that’s necessarily the case. If the background stuff has little enough to do, it shouldn’t matter much if it grabs the processor during rendering. On the other hand, if it’s not a small amount of processing, it might be problematic even if it takes over between rendering frames.
2. I don’t know anyone currently running top-of-the-line hardware. Most people aren’t. And most people’s hardware is getting quite stressed by new games, I’m pretty sure.
3. You make it sound like VSYNC guarantees a new frame every time it refreshes. That’s just not the case. If the framerate drops to 10 fps, that’s just all you’re going to get.
I do think it would probably be okay for full-screen video players to use 100% CPU. But there’s no point in it. Again, you’re comparing a buffered program to one which doesn’t buffer. It’s not a fair comparison. It would be more complex for a video player to use 100% CPU (what’s it doing when it finishes filling the buffer?), whereas a game using 100% CPU is simpler than one which is capped.
As for capping when not the front window, I’ll agree that’s useful. (It’s also nice that most games started handling alt-tabbing without crashing, finally.)
I’m not saying there it’s bad to put a cap on full screen games. I just don’t think it’s necessary, or can always be considered a good expenditure of resources. (Putting the game in the background is a really good example of a time when it’s appropriate to cap the framerate, though.)
I do agree that drivers probably shouldn’t spinlock waiting for vsync if they can catch an interrupt notification instead.
Noninteractive video playback isn’t comparable with interactive rendering. Video playback cab buffer several frames in advance, but with interactive rendering it’s worth striving for the least latency from input to output. With tripple buffering that’s possible without tearing, it’s also better to always render frames than give timeslices away to other apps. When I play games I want the minimum latency possible between input & ouput, other running tasks should not be prioritized.
But guys, most games today are GPU bound. (atleast at resolutions like 1920×1200 and higher) There should be "plenty" (20% or more in most cases) CPU left…?
Derek, the problem with the "The CPU isn’t doing anything else" mentality is that it doesn’t play well with systems like laptops that could be using that time to conserve power and so on. Playing nice with the rest of the system is one of Raymond’s taxes and games developers really ought to be as aware of them as anyone else is.
I just built an app to measure sleep() accuracy, doing sleep(1)s and averaging large sets of measurements; also running in real-time (or high, same results) priority and only outputing every 1000 measurements or so to avoid the printing affect the results.
On a Pentium-M, on any frequency, sleep(1) takes 10 milliseconds; that’s the interrupt running at 100 Hz. When you open some application that uses timeBeginPeriod(1) (such as a media player), it raises to 1000 Hz. This is what I expected.
On a (desktop) Pentium 4, things are different. I tested this on a non-HT 2.53 Ghz and an HT 3 Ghz machine, and surprisingly got the same results. About 64 Hz normally, 512 Hz when media player is open. Very strange.
Also, Sysinternal’s Process Explorer shows "context switches" statistics. For my test process, they matched what I said. For the fake "Interrupts" process, on the P-M it hovers around 100-200 normally and 1050-1150 when media player open, which matches what I expected.
On the P-4s, the machines were more interrupt-busy, but they were at about 250 normally and rose over 1200 when the media player was open, which doesn’t make sense since I expected them to grow only 512-64=448.
Interesting stuff:
* Most media players set timeBeginPeriod(1) when starting to play, and won’t release it until they are closed.
* VirtualDub doesn’t set it at all.
* Flash player browser plugin (at least on Firefox) sets it when a flash object is used on a page, and releases it when you change to another page or close that tab/window. Very good :)
* Quicktime plugin sets it when it’s first loaded and never releases it, until you close the browser :( (somehow this doesn’t surprise me)
* Viewing animated GIF files doesn’t activate it (I think they only had 10ms delay accuracy, so that’s OK)
There’s no significant performance degradation for using 1000 Hz timer, I measured this with an app that just counts the time it takes to loop a few million times. The services that are loaded at startup affect much more the result than the 1000 Hz timer. Of course, if there’s a stupid application using sleep(1)s as a sync primitive, that’d increase its CPU usage 10-fold :D
If you ever hope to be a great Windows programmer, take the time to read/subscribe toRaymond Chen’s blog,