Date: | May 18, 2005 / year-entry #122 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20050518-42/?p=35613 |
Comments: | 28 |
Summary: | Looking at the profile for our program so far, 35% of the CPU time is spent copying strings around. Let's see if we can improve that. The best way to speed up copying strings is not to copy them in the first place. Using a wstring in our DictionaryEntry structure forces the vector class to... |
Looking at the profile for our program so far, 35% of the CPU time is spent copying strings around. Let's see if we can improve that.
The best way to speed up copying strings is not to copy them in
the first place. Using a
Let's use plain string pointers rather than struct DictionaryEntry { bool Parse(const WCHAR* begin, const WCHAR* end); void Destruct() { delete[] m_pszTrad; delete[] m_pszSimp; delete[] m_pszPinyin; delete[] m_pszEnglish; } LPWSTR m_pszTrad; LPWSTR m_pszSimp; LPWSTR m_pszPinyin; LPWSTR m_pszEnglish; };
The
Since we aren't using LPWSTR AllocString(const WCHAR* begin, const WCHAR* end) { int cch = end - begin + 1; LPWSTR psz = new WCHAR[cch]; lstrcpynW(psz, begin, cch); return psz; } bool DictionaryEntry::Parse( const WCHAR* begin, const WCHAR* end) { const WCHAR* pch = std::find(begin, end, L' '); if (pch >= end) return false; m_pszTrad = AllocString(begin, pch); begin = std::find(pch, end, L'[') + 1; if (begin >= end) return false; pch = std::find(begin, end, L']'); if (pch >= end) return false; m_pszPinyin = AllocString(begin, pch); begin = std::find(pch, end, L'/') + 1; if (begin >= end) return false; for (pch = end; *--pch != L'/'; ) { } if (begin >= pch) return false; m_pszEnglish = AllocString(begin, pch); return true; }
There isn't a class Dictionary { public: Dictionary(); ~Dictionary(); int Length() { return v.size(); } const DictionaryEntry& Item(int i) { return v[i]; } private: vector<DictionaryEntry> v; }; Dictionary::Dictionary() { ... if (cchResult){
The last bits of the change are to get rid of the
temporary This new program clocks in at 120ms (or 290ms if you include the time it takes to destroy the dictionary). Again, we're twice as fast as the previous version, but still haven't quite crossed the 100ms barrier. And the time to destroy the dictionary isn't starting to be a more significant factor. But I have another trick up my sleeve. [Raymond is currently on vacation; this message was pre-recorded.] |
Comments (28)
|
What is amusing is we are slowly stripping out all the object oriented ease-of-use features, and getting down to some hard, optimized code. I suspect by the time this is done, the OO code will be pretty small… :)
I remember Larry Osterman making a comment about the standard libraries disappearing from this sample :)
I hope I don’t spoil the fun, but as we’ve arrived where we are now, I get the slight feeling of I wouldn’t want to get there from here.
Wouldn’t it have been better to open the file as memory mapped, then convert the whole thing to unicode with a single call on the length of the whole buffer, having allocated another buffer double the size, and then walk through the converted data in place, marking end of strings with 00 00 pairs, and setting up the pointers in the dictionary?
Then the whole thing can be destroyed with a single delete, as there is nothing extra to clean up, no individual allocations from the heap, etc.
What’s the point of destroying anything while you’re quitting the program? I suppose that’s one way of getting rid of all of the destruction time.
I bet allocation time can be improved by grabbing a large chunk of memory at the beginning and then parcelling it out in the AllocString function.
Why not use smart pointers?
boost::shared_array<wchat_t>?
or perhaps boost::shared_ptr<std::wstring>?
it might be a little slower, but you’d have much cleaner code.
(www.boost.org for those who don’t know)
Copying the strings wouldn’t be a problem in VC++ 6 since its std::basic_string implementation uses shared reference-counted buffers. VC++ 7.1 (maybe 7.0 as well?) doesn’t do that, which is good for thread-safety but sometimes bad for memory and processor efficiency. It does have the "small string optimisation", avoiding the need for (de)allocation when copying short strings, but that unfortunately bloats the string objects (to 32 bytes each, I think) and the DictionaryEntry objects (128 bytes?!).
Hopefully C++0x is going to address the "move" constructor issue with the new r-value references. The STL will then be changed to use this when resizing vectors and you’ll be able to get performance like this without losing the nice stl syntax.
Shame we’ll all have retired by the time we get C++0x though.
Stewart: Move constructors are a great idea but they probably wouldn’t help here as most of the strings are short enough to benefit from the "small string" optimisation anyway.
a couple of minor nits: first, m_pszSimp is uninitialized, which causes the delete[] to choke.
second, if you fail to parse a DirectoryEntry, Destruct() is never called and any partially-parsed info will be leaked.
otherwise, though, great series so far, and i’m looking forward to the rest.
I notice a general consensus among the commenters that people are not entirely happy with the direction that this code is moving in. As the code gains speed, it loses readability. We seem to be moving closer and closer to writing plain C. I wouldn’t be surprised to see this program grow a customized memory allocator, or string class.
As a C++ programmer I recognize this pattern. It seems writing in C++ often comes down to (re)writing things yourself. While there is nothing wrong with this, and it can be fun to do, it is certainly not productive. Think how much more features could have been added to this program in the time spent optimizing.
As a comparison, look at Rico Mariani’s blog, which has had an equivalent program in C# for more than a week now, and even in its original incarnation it’s still faster than the C++ version, and significantly easier to brain-parse.
I don’t mean to put down Raymond’s efforts. I myself love reading (and writing) this low-level stuff. This series – indeed the whole blog – is a hacker’s delight. If the purpose of this series is to teach program optimization techniques, it definately succeeds.
But if your objective is to write functional, maintainable software in a reasonable amount of time, C++ is looking increasingly less attractive.
But, but, but… If the vector was large enough you wouldnt have had all that copying of DictionaryEntrys in the first place.
The best optimisation here would have been to reserve a decent sized vector. And replace wstring with a class like flex_string using the small string optimisation
I dunno… the vector stuff didn’t seem to be what was killing us in the profile. Sure there’s a cost there but it wasn’t at the top of the list.
Would we really want to bake in assumptions about how big the dictionary is? The final growth of the vector is the most costly.
If that were the source of problems I would be switching to an alternate data structure because the vector semantics aren’t especially needed to make the dictionary work. A linked list of dictionary chunks for instance would probably be good. Maybe with an index in front of that.
I’ll be posting my analysis of this version later today. Hope you’re all having fun with this exercise.
But, but, but…why dont you reserve enough space in your vector up front? That way you wouldnt have had all the string copies in the first place!
Now the code is seriously messy and will leak a vast amount of memory if there is an exception.
Denis – STL is used in many high performance products. STL must be used with care, but that is no different than any other framework. It won’t automatically make your code fast, no.
Tito,
Switching a smart pointer class that runs slower would kind of defeat the purpose of speed-optimizing, wouldn’t you agree?
Switching to a smart pointer class that runs slower would kind of defeat the purpose of speed-optimizing, wouldn’t you agree?
What I don’t understand is why, if the goal was simply to avoid copying the wstrings, you jump directly from wstring to WCHAR*, when a much smaller jump from wstring to wstring* would net you the same lack-of-copying benefit without the requirement of rewriting your code. A few extra *’s, ->’s, news and deletes would have enabled you to use the existing code.
This is an example of how wrong approach to C++ programming could make C looks better :)
crap, no wonder that C is still so popular in MS.
This is pretty funny, but all you have demonstrated so far is that you should *never* use STL in products that should run in reasonable timeframe.
IMHO next step will be ‘get rid of AllocString() & Destruct()’ since we have delimiter characters to place ‘