Date: | January 24, 2006 / year-entry #31 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20060124-17/?p=32553 |
Comments: | 52 |
Summary: | Polling kills. A program should not poll as a matter of course. Doing so can have serious consequences on system performance. It's like checking your watch every minute to see if it's 3 o'clock yet instead of just setting an alarm. First of all, polling means that a small amount of CPU time gets eaten... |
Polling kills. A program should not poll as a matter of course. Doing so can have serious consequences on system performance. It's like checking your watch every minute to see if it's 3 o'clock yet instead of just setting an alarm. First of all, polling means that a small amount of CPU time gets eaten up at each poll even though there is nothing to do. Even if you tune your polling loop so its CPU usage is only, say, a measly one tenth of one percent, once this program is placed on a Terminal Server with 800 simultaneous connections, your 0.1% CPU has magnified into 80% CPU. Next, the fact that a small snippet of code runs at regular intervals means that it (and all the code that leads up to it) cannot be pruned from the system's working set. They remain present just to say "Nope, nothing to do." If your polling code touches any instance data (and it almost certainly will), that's a minimum of one page's worth of memory per instance. On an x86-class machine, that 4K times the number of copies of the program running. On that 800-user Terminal Server machine, you've just chewed up 3MB of memory, all of which is being kept hot just in case some rare event occurs. Finally, polling has deleterious effects even for people who aren't running humongous Terminal Server machines with hundreds of users. A single laptop will suffer from polling, because it prevents the CPU from going to more power-efficient sleep states, resulting in a hotter laptop and shorter battery life. Of course, Windows itself is hardly blame-free in this respect, but the performance team remains on the lookout for rogue polling in Windows and "politely reminds" teams they find engaging in polling that they should "strongly consider" other means of accomplishing what they're after. |
Comments (52)
Comments are closed. |
Polling doesn’t kill performance. People kill performance.
Take regmon for a spin and watch explorer.exe itself apparently poll tons of network tcp/ip related registry settings every few seconds on an "idle" XP SP2 laptop.
I assume this has something to do with either the firewall or wifi? Seems pretty lame on the performance front.
Even better is the "poll then screen update" trick <cough>Taskman</cough><cough>Process Explorer</cough>
I once had TS pegging an entire CPU on a quad proc because Taskman was running on the server’s desktop. That’s frugal cycle management!
Does the Taskbar clock get updated using polling?
So what is the alternative?
Especially where waiting for data to turn up in a database is concerned?
Use a notification-based mechanism, of course.
I already discussed how the clock updates. http://blogs.msdn.com/oldnewthing/archive/2003/08/29/54728.aspx Observe that it is based on notifications not polling.
jwf: Mark Russinovich noticed this in April last year: http://www.sysinternals.com/blog/2005/04/explorers-registry-polling.html. It appears to be caused by the Network Connection icon(s).
If there were a way for the database to send a message to a program, a lot of problems would go away.
My data processor service wouldn’t have to constantly poll the database for new jobs.
My database-driven web sites could caching.
Ok, so there is only two problems. But they are both huge.
Is there another technique other than employing an extended stored procedure, in the case of SQL Server, to make a remote call to a process to notifiy it of updates? And, if there are potentially multiple processes interested in those updates?
I actually discovered some sample code in the MSDN help that used polling in demonstrating how to use a non-blocking API!
I’ll have to wait until I get home before I can dig up the URL, though…
Jim, you could write update/insert triggers that create some system-wide notification. I believe it is possible, on Windows, to register a window message that is then usable in all processes, and broadcast this message to the entire system.
If not, can’t event objects be allocated system-wide? And can’t you release all threads waiting on such an object at once? (Note the race condition here, though.)
As a last resort, you could make an RPC to an update server, which then distributes the notification to all registered applications. That solves the multiple application problem, but at the expense of more coding from you. (It’s a reusable mechanism, though.)
Jonathan wrote:
"Polling doesn’t kill performance. People kill performance."
Sledghammer has this to say:
"No, Bullets kill performance."
A whole 3MB? Unless you get Terminal Server running on a Gameboy, how can this be a problem?
Sebastian, how does the update server learn when to broadcast the systemwide message?
800 users Terminal Server? And each of the users are running Office 2000 right? What kind of hardware are these servers running? :-) Even if you have over 4GB of RAM, Windows only allow 2GB for applications by default unless your app uses AWE or PAE. Eric, not sure why your servers crash with 8 procs, but I know Unisys has a 32 procs server running windows 2003 server.
Registered messages are fine if your application runs on the same server as your database. But for real applications those won’t work. And in a corporation with overzealous IT team your DB server is not allowed to make any connections to any other servers so notifications are not an option at all.
I’ve seen 32-way Windows servers with 32GB of RAM running just fine on Windows 2000 Datacenter Server.
As for polling the database, SQL Server 2005 has some great solutions for just that problem. Have a look here:
http://www.microsoft.com/sql/prodinfo/overview/whats-new-in-sqlserver2005.mspx
Specially, check out the section on "Query Notification." Basically, you given SQL Server a command, and it notifies you if the results of that command ever change. It’s absolutely brilliant!
Jonathan Allen Said:
"If there were a way for the database to send a message to a program, a lot of problems would go away. "
There is. You’ve got triggers in MS SQL and, I believe, My SQL. Oracle, I understand, may also allow java callbacks. If you’re talking about MS Access though … :)
<i>800 users Terminal Server? And each of the users are running Office 2000 right? What kind of hardware are these servers running? :-) Even if you have over 4GB of RAM, Windows only allow 2GB for applications by default unless your app uses AWE or PAE.<i>
While you’re right that 3 MB for 800 instances is very little compared to like the 80 GB of RAM required for 800 instances of a hypothetical program that takes 10 MB RAM per instances, I think you misunderstand the 2 GB limit. The 2 GB program limit is PER PROCESS. While the 2 GB limit (assuming that the 3 GB option is not enabled) will prevent you from running a single program that uses 3 GB RAM, you could easily run 10 programs that require 300 MB each.
<i>FILE servers? Unless you have a huge amount of users or have some insane performance requirements, serving FILES never requires more than the OS recommeded minimum. The hard drive is almost certianly faster than the network, so only small read buffers are needed to maintain throughput. 4GB? More like 512MB, if that.</i>
‘Never’ is always wrong. How much RAM a file server "needs" depends on what files you’re serving. Ideally you’d want the entire file system (at least the part that you serve) in RAM at all times. If that’s not possible (i.e. you have much more data you access than RAM), then the more RAM the better, as it means a higher percentage of the data is kept in cache, reducing disk access.
You try running the MSDN Library web server (which has several gigs of data, as well as heavy traffic, I’d imagine) on 512 MB RAM, and we’ll see how many hours before you want your mommy.
Talking about polling…
It happens that I’m going to write an object wrapper in C++ for controlling the phone recorder card. The card comes with a set of APIs. From the menu it doesn’t mention whether the driver will generate any event to be handled, so we figure that "the way" to handle it is to set a timer and check eo see if each line channel rings or not (by running RingDetect() on each channel)when the timer "ticks".
I hope it can be done in better way, but can’t figure out how to do.
An amendment to my previous post (lest Stu think I completely sidestepped his point about network vs. disk throughput):
Even if the disk drives on the server are faster than the network (which, admittedly, will be the case on most server, which have connections slower than about 50 MBps), seek time is the real killer. Even with modern, fast hard drives, seek time can be 7-10 ms. That limits your serving to 100-150 uncached requests/second. On a 10 MBps line running at full throughput, the average request only needs to be smaller than about 100 KB to kill a server, given the constraint of seek time. Once you start an I/O request queue on a server like that, it’s all over (this is assuming that that level of traffic is sustained, and not just a short spike of activity).
Note that my example was illustrating this point, as the MSDN Library contains a large amount of data contained in small files, and many different files are likely to be accessed simultaneously (by different clients).
Joke over, I’d just like to point out that there *are* valid reasons to poll, but only on a macroscopic (in terms of CPU time) scale. The best example I can think of offhand is automatic email checking. That, technically, counts as polling. You wait 5 minutes (or so), connect to the POP server, download any new mail, and disconnect.
Yes, it’s a far cry from waiting for a system update via polling, but it’s still polling. :)
Also, Jonathan Allen asked about database update notification rather than polling. If I remember correctly from that launch event I went to last month, SQL Server 2005 will do this for you.
Recently I HAD to go from a passive-wait design to a polled design and it *helped* performance, here’s why (and I agree it is a particular case)
(The OS was not Windows and it was a relatively slow processor, but I think it could happen elsewhere.)
It was a networked application whose sole purpose was to receive (and process) UDP frames as fast as possible, but without hogging too much CPU since there were other applications running.
1st ‘design’: while(1) { select() => receive }
This design was the more cpu friendly but did not pull enough performance because returning to select ate 500us at least between two receives, whereas polling only cost 50us. So what did I do?
I mitigated damages that way:
2nd ‘design’: while(1) { select() => for(i=1; i<100; i++) { receive, break if nothing } }
I still ‘lost’ 500us each time the app called select() but even if the UDP frames were spaced with less than 500us I didn’t lose frames because of the small OS bufferisation.
That way the application could have high performance when the network was heavily used and not wasting cpu when there was no network transfer.
It only wasted cpu when the network was lightly used, and even then it didn’t waste that much. And I guess it could have detected network usage and adapted but in fact I could not afford to lose UDP frames so maybe the adaptation delay would have penalized me if/when the bandwith abruptly increased.
@Nawak:
Would something like:
while(1){ioctlsocket(s,FIONREAD,&cb);if(!cb)select(…)else recv(…);}
work?
Well, ioctlsocket is the winsock subset of bsd ioctl
Connect a removable flash media reader like a Dazzle flash card reader (or is it Zio! now ?!) to your XP (sp2 or otherwise) system and you’ll see explorer poll the device every second for media presence.
What would Raymond suggest as an alternate mechanism to the explorer team?
I understand polling sucks…but..
Wasn’t it Access that used PeekMessage() incorrectly instead of the usual GetMessage() loop under Windows 3.1, thereby causing Windows 3.1 to always be busy and not able to write .ini file changes etc… It was a common thing to do then and with MsgWaitForSingleObject() surely you can still do this kind of thing in the UI thread quite efficiently now, or not?
Also: 0% CPU usage means nothing, there’s not enough granularity to measure very small usage. The minimum I’ve seen reported is about 0.9%.
Many apps exploit this and think they’re not doing damage; but they are overloading the kernel (which has to check every timer interrupt whether to wake them up), and most importantly, they are trashing the cache. If they use 0.5% and all of it is cache trashing, at least another 0.5% will be wasted by an app doing useful work "restoring" its own cache, so to speak.
The granularity of reporting should be improved. Heck, I’ve seen this cause problems already: DOSBox (and other low-level emulators) usually run at a frequency of around 1000 Hz, and sometimes Windows fails to account for their CPU time, so it shows 20% but they are using 80%. If you have a CPU with variable frequency, Windows will reduce the frequency when it sees 20%, but then suddenly it "saturates" and goes all the way up to 100%. It dithers between those two settings until you lock it :)
A nice way to check the "quality" of an application is to watch its "context switch" performance counter (available per thread). Sysinternal’s Process Explorer can do it. It scares me when I see apps that do 100 or 1000 switches per second, yet register 0% CPU usage.
Microsoft apps usually do decently, with notable exceptions such as Windows Media Player (even when non playing and minimized, although most media players do that too).
Also, to check for the ultimate shame, if an app registers about 100 switches per second, but that number raises to 1000 when you open a media file, it means it was doing sleep(1)s…
I’m not sure about SQL Server, but Oracle has a stored procedure package you can use to block queries until they get triggered by something else. We used that to build a fairly sophisticated change notification system that worked without polling. (It even handled multiple clients that consumed change notification messages at different rates.)
@Chris Becke:
Yes it would also work but since my socket is already non-blocking, I prefer "receive" to "if something then receive", because receive will also have to check if there’s something to receive anyway, so basically your code do the same thing as mine but check twice for data presence. I admit it’s more clear though.
But now I understand why you suggested that code: I didn’t explain clearly why I used a "for(100)" (and not a "for(1000)" or "10000"). I should have said that. It’s because the code also have to react to commands it receives through a pipe, so the pipe is also part of the select’s fd_sets. Therefore to keep a minimum of reactivity, I had to set an upper limit for the number of consecutive receives that go without selecting both the command pipe and the socket.
"but in fact I could not afford to lose UDP frames so maybe the adaptation delay would have penalized me if/when the bandwith abruptly increased."
Just out of curiosity, can you share the code that you wrote to guarantee reception of UDP frames when the frame is corrupted by another computer sending a packet at the same time? :)
If you "can’t afford" to lose packets, then you may as well give up and go home because lost packets are impossible to completely prevent. Maybe you could afford the extra processor usage to minimise lost packets and that’s cool, but you stil didn’t actually guarantee reception of every packet that ever got transmitted – and merely .
Cheong: The "better way" is to get the vendor to write their interface properly. You can only do as well as the underlying system allows – and theirs is probably either badly documented or broken.
[i]But that’s a WEB SERVER, not a FILE SERVER (although simple web servers could be classified as file servers.) In my book, a FILE sever is a sever that provides read and/or write access to static files stored on secondary storage, be that through SMB/CIFS, NFS, FTP or even HTTP.
If the sever is running a database query service, serving dynamic web pages, providing some sort of remote application system or other such things then it is more than just a FILE server.
Pure file serving is neither CPU or RAM intensive, it is network-bound, so there is no real need for multiple CPUs or gigs of RAM.[/i]
I was making a couple of assumptions about the MSDN Library server, which may or may not be correct. 1) the pages of the library are primarily static, individual web pages, 2) web page servers quality as file servers, as long as they don’t do a large amount dynamic content generation.
Agreed, file servers are not CPU intensive. Nor are they RAM intensive in the sense a MMORPG server would be, where it has to store massive amounts of rapidly changing state data; but the fact that they can often be I/O-bound means that large amounts of RAM may be needed for caches. I previously posted some rough calculations (well, the results, as the formulas themselves are trivial) showing how quickly a hard drive can get bogged down by serving lots of small files.
I am interested in how this relates to GAME programming.
I’ve started working through a few books and am currently using the following variation of the message loop:
BOOL bval = TRUE;
while(TRUE)
{
bval = ::PeekMessage(&msg, NULL, 0, 0, PM_REMOVE);
if (bval)
{
if (msg.message == WM_QUIT)
break;
::TranslateMessage(&msg);
::DispatchMessage(&msg);
}
else
{
// irrelavant code snipped here …
tickCount = ::GetTickCount();
if (tickCount > tickTrigger)
{
tickTrigger = tickCount +
GameEngine::GetFrameDelay();
mainFrame.GameCycle();
}
}
}
}
Conceptually, it seems that this hammers the CPU? Is there another approach to implementing decent fps type action without contantly polling ticks etc in every loop iteration?
If this is off topic, please ignore …
Thanks,
-Luther
@Anonymous Coward
"If you "can’t afford" to lose packets, then you may as well give up and go home because lost packets are impossible to completely prevent"
*I* know, but I am not the PHB that sold the product. In fact in the final application the total system may afford to lose frames, but what was sold was: no frame loss on our system! Every valid frame that enter the nic shall be processed! The fact that if you consider the whole system there *will* be lost frames was totally irrelevent to the final client. Their contract said that we shall not lose frames!
With such high expectations (that you cannot negotiate after the boss agreed to fulfill the client’s dream), the client may well have to sue Cisco (or whatever brand of switch they use) for ‘frame-stealing’ ;)
The "no frame-loss" requirement really made the product more expensive but I don’t care since it’s not my money and since in the end we managed to do it (from NIC to App) with the incredibly high throughput the PHB sold. And it was really an interesting job! Lots of things learned (from the NIC chip configuration to the OS tuning etc.)
I would like to add something about the clients… they’re the kind of people that believe that because the NIC is 100 Mbps, they can *have* 100 Mbps delivered by/to an application. So just to be on the safe side they took 96 Mpbs as the throughput they would get from an FTP server (run on the same kind of anemic hardware we were struggling with (and they *knew* we were struggling with it)) and computed the time it would take to download their gigabytes of files and sold *that time* to their respective client!
You cannot imagine how warm it is for me to know that, knowing the pricks they are!
@Luther, that loop is crazy. Why would the game simply not just run as fast as it can? Frame-rate independant code is not *that* difficult to get right, and seeing as how GetTickCount can be off by up to 10ms anyway, you’d get much smoother animations.
The loop is crazy because if the GameCycle() runtime exceeds the GetFrameDelay() time, then the game will start to run at half speed.
Its not crazy because, to make it run as fast as it can will needlessly burn all the CPU.
And, frame rate independent code might work well for 3d shooters. But for 2d games that use frame animated sprites as resources, having a cinsistent gamewide framerate tied to the art resources is important to avoid unpleasant beat effects.
I doubt that TS can handle 800 sessions even without any running applications. Every single session allocate >3 mb each, so where’s the actual problem here?
It seems to me that notification-based mechanism can only work if what you’re monitoring notifies you of change, so other than polling for information are there any other alternatives? Just as Randolpho mentioned automatic email checking requires polling, well because you don’t get notified by the server when new email arrives (or notification is beyond your control). It doesn’t seem like there’s an alternative other than polling, if notification is out of your control, and you (your software etc) are not notified.
FWIW, there *is* no way other than polling to do media detection in USB mass storage devices. USB mass storage is basically SCSI over USB, and the way you check for media presence is to poll with SCSI TEST_UNIT_READY commands.
What’s more, USB Flash keys typically lie about being removable — ie, they say they’re removable-media drives when really they’re not — so Windows *has* to poll them.
I believe they do this because identifying as removable also means that Windows turns off write caching by default, making them more resilient in the face of surprise removal.
I had a think about that – what we really need is a way to just sit idle and then say "OH WAIT WE GOT SOMETHIN" when something happens and then send that on. Like,
int main() {
// …
inethandle::wait("hello");
// …
}
void hello() {
// handle packet that we received…
}
Of course that might already exist for a case like that, I’ve never done much programming like that.. but you see what I mean.
Recently I found the explorer polls the hard disks every 60 seconds, apparently to determine if disk space is low so it can pop up a nice warning balloon or some-such. Using regmon, I determined that setting the DWORD registry key HKCUSoftwareMicrosoftWindowsCurrentVersionPoliciesExplorer|NoLowDiskSpaceChecks to 1 and restarting explorer turns of this behavior.
It clogs up the messaging system.
Continuing from yesterday’s article about IChannelFactory, today we’re looking at the server side of…
The non-GUI way of scheduling code to run on a thread.
Raymond Chen has a nice blog entry that discusses it.
Must read.
PingBack from http://smallcode.weblogs.us/2006/08/15/an-approach-to-api-calls-optimization/
PingBack from http://cpfh.wordpress.com/2006/01/25/performance-implication-of-polling/