Loading the dictionary, part 5: Avoiding string copying

Doug says:

May 18, 2005 at 9:19 am

What is amusing is we are slowly stripping out all the object oriented ease-of-use features, and getting down to some hard, optimized code. I suspect by the time this is done, the OO code will be pretty small… :)

Sriram says:

May 18, 2005 at 9:49 am

I remember Larry Osterman making a comment about the standard libraries disappearing from this sample :)

Anthony Wieser says:

May 18, 2005 at 10:03 am

I hope I don’t spoil the fun, but as we’ve arrived where we are now, I get the slight feeling of I wouldn’t want to get there from here.

Wouldn’t it have been better to open the file as memory mapped, then convert the whole thing to unicode with a single call on the length of the whole buffer, having allocated another buffer double the size, and then walk through the converted data in place, marking end of strings with 00 00 pairs, and setting up the pointers in the dictionary?

Then the whole thing can be destroyed with a single delete, as there is nothing extra to clean up, no individual allocations from the heap, etc.

nksingh says:

May 18, 2005 at 10:15 am

What’s the point of destroying anything while you’re quitting the program? I suppose that’s one way of getting rid of all of the destruction time.

I bet allocation time can be improved by grabbing a large chunk of memory at the beginning and then parcelling it out in the AllocString function.

Tito says:

May 18, 2005 at 10:17 am

Why not use smart pointers?

boost::shared_array<wchat_t>?

or perhaps boost::shared_ptr<std::wstring>?

it might be a little slower, but you’d have much cleaner code.

(www.boost.org for those who don’t know)

Ben Hutchings says:

May 18, 2005 at 10:19 am

Copying the strings wouldn’t be a problem in VC++ 6 since its std::basic_string implementation uses shared reference-counted buffers. VC++ 7.1 (maybe 7.0 as well?) doesn’t do that, which is good for thread-safety but sometimes bad for memory and processor efficiency. It does have the "small string optimisation", avoiding the need for (de)allocation when copying short strings, but that unfortunately bloats the string objects (to 32 bytes each, I think) and the DictionaryEntry objects (128 bytes?!).

Stewart Tootill says:

May 18, 2005 at 10:21 am

Hopefully C++0x is going to address the "move" constructor issue with the new r-value references. The STL will then be changed to use this when resizing vectors and you’ll be able to get performance like this without losing the nice stl syntax.

Shame we’ll all have retired by the time we get C++0x though.

Ben Hutchings says:

May 18, 2005 at 10:31 am

Stewart: Move constructors are a great idea but they probably wouldn’t help here as most of the strings are short enough to benefit from the "small string" optimisation anyway.

CornedBee says:

May 18, 2005 at 11:33 am

There isn’t a std::rfind function

And what’s wrong with wcsrwc or whatever the wide-character version of strrchr is called?

Dave says:

May 18, 2005 at 11:40 am

a couple of minor nits: first, m_pszSimp is uninitialized, which causes the delete[] to choke.

second, if you fail to parse a DirectoryEntry, Destruct() is never called and any partially-parsed info will be leaked.

otherwise, though, great series so far, and i’m looking forward to the rest.

Aleko Petkov says:

May 18, 2005 at 11:45 am

I notice a general consensus among the commenters that people are not entirely happy with the direction that this code is moving in. As the code gains speed, it loses readability. We seem to be moving closer and closer to writing plain C. I wouldn’t be surprised to see this program grow a customized memory allocator, or string class.

As a C++ programmer I recognize this pattern. It seems writing in C++ often comes down to (re)writing things yourself. While there is nothing wrong with this, and it can be fun to do, it is certainly not productive. Think how much more features could have been added to this program in the time spent optimizing.

As a comparison, look at Rico Mariani’s blog, which has had an equivalent program in C# for more than a week now, and even in its original incarnation it’s still faster than the C++ version, and significantly easier to brain-parse.

I don’t mean to put down Raymond’s efforts. I myself love reading (and writing) this low-level stuff. This series – indeed the whole blog – is a hacker’s delight. If the purpose of this series is to teach program optimization techniques, it definately succeeds.

But if your objective is to write functional, maintainable software in a reasonable amount of time, C++ is looking increasingly less attractive.

indranil banerjee says:

May 18, 2005 at 11:48 am

But, but, but… If the vector was large enough you wouldnt have had all that copying of DictionaryEntrys in the first place.

The best optimisation here would have been to reserve a decent sized vector. And replace wstring with a class like flex_string using the small string optimisation

ricom says:

May 18, 2005 at 1:04 pm

I dunno… the vector stuff didn’t seem to be what was killing us in the profile. Sure there’s a cost there but it wasn’t at the top of the list.

Would we really want to bake in assumptions about how big the dictionary is? The final growth of the vector is the most costly.

If that were the source of problems I would be switching to an alternate data structure because the vector semantics aren’t especially needed to make the dictionary work. A linked list of dictionary chunks for instance would probably be good. Maybe with an index in front of that.

I’ll be posting my analysis of this version later today. Hope you’re all having fun with this exercise.

IndranilBanerjee says:

May 18, 2005 at 1:38 pm

But, but, but…why dont you reserve enough space in your vector up front? That way you wouldnt have had all the string copies in the first place!

Now the code is seriously messy and will leak a vast amount of memory if there is an exception.

Ken says:

May 18, 2005 at 2:40 pm

Denis – STL is used in many high performance products. STL must be used with care, but that is no different than any other framework. It won’t automatically make your code fast, no.

Rick c says:

May 18, 2005 at 4:03 pm

Tito,

Switching a smart pointer class that runs slower would kind of defeat the purpose of speed-optimizing, wouldn’t you agree?

Rick C says:

May 18, 2005 at 4:11 pm

Switching to a smart pointer class that runs slower would kind of defeat the purpose of speed-optimizing, wouldn’t you agree?

Ken says:

May 18, 2005 at 10:14 am

What I don’t understand is why, if the goal was simply to avoid copying the wstrings, you jump directly from wstring to WCHAR*, when a much smaller jump from wstring to wstring* would net you the same lack-of-copying benefit without the requirement of rewriting your code. A few extra *’s, ->’s, news and deletes would have enabled you to use the existing code.

Michael Pryhodko says:

May 18, 2005 at 6:25 pm

This is an example of how wrong approach to C++ programming could make C looks better :)

crap, no wonder that C is still so popular in MS.

ASeverin says:

May 18, 2005 at 1:20 pm

This is pretty funny, but all you have demonstrated so far is that you should *never* use STL in products that should run in reasonable timeframe.

IMHO next step will be ‘get rid of AllocString() & Destruct()’ since we have delimiter characters to place ‘’ and can use just pointers to already converted unicode buffer

/denis

asdf says:

May 18, 2005 at 9:07 pm

shared_ptr isn’t just in boost anymore. It’s in C++TR1 due out any time now (see http://aristeia.com/EC3E/TR1_info_frames.html).

Ken says:

May 18, 2005 at 9:14 pm

Rico – the data structure you describe (almost) is called std::deque.

deque (pronounced "deck") is basically a vector of pointers to fixed size chunks. A linked list of pointers wouldn’t allow random access into the container, but a vector of pointers is easy (and fast) to resize – the actual items don’t need to be copied when the container expands, it just has to allocate a new chunk. With large containers, this also eliminates the need for enormous blocks of contiguous memory.

Using a deque would eliminate all the vector resizing related string copying.

If the compiler is smart enough, replacing DictionaryEntry::Parse with a constructor that did the same thing and passing in an unnamed DictionaryEntry to the push_back method could eliminate the rest of the string copying. However, that approach would necessitate relying on exceptions to handle unparsable lines, since constructors have to result in a valid object or an exception. Or, have failed parsing result in a sentinel valued DictionaryEntry and pop_back if the most recently inserted DictionaryEntry was a failed parsing.

Anders Dalvander says:

May 19, 2005 at 2:05 am

There isn’t a std::rfind function

No, and there is no direct need for it either, use std::find with std::reverse_iterator instead.

Ken says:

May 19, 2005 at 8:29 am

Rick C – that depends if the smart pointer is doing more work than you would be otherwise doing yourself.

shared_ptr does reference counting, which might add a little bit of unnecessary overhead on pointer copies, but I doubt it would hurt a lot. It would certainly be less than the previous cost of copying strings.

The correct answer, of course, is to profile.

Tim Smith says:

May 19, 2005 at 9:02 am

I’m still wondering what is the point of this series. As others have pointed out, there are some bad and silly things STL does, but many of the optimizations are from what might be a limited understanding of STL. We seem to go from one extreme to another.

I have been using STL for many years now. I really like it but I understand many of the limitations. Far to many people act like all you have to do is start using technique X and all your problems will be solved. You see this with such things as STL and exceptions. One of the reasons I love reading Herb Sutter’s books is because he doesn’t gloss over the failings of C++ and STL. Often he goes out of his way to tell you why something is the "right thing" to do while telling you the ramifications of doing such things.

James says:

May 27, 2005 at 1:15 pm

I see this series as an exercise in showing that there’s much to be said for transparent code. In the final version, you can see exactly what’s happening. In the original version, you can see what the code is supposed to do, but you don’t know what’s going on when it executes, and that’s where the trouble comes from. I’d call the final version of the code simpler and more robust as a result.

The Old New Thing : Very late remarks on the original Chinese dictionary series says:

March 3, 2006 at 10:00 am

PingBack from http://blogs.msdn.com/oldnewthing/archive/2006/03/03/542924.aspx

Rico Mariani's Performance Tidbits says:

January 23, 2007 at 9:29 pm

Well today Raymond hits a snag in version 5 of the program. I started by profiling the code as provided

Date:	May 18, 2005 / year-entry #122
Tags:	code
Orig Link:	https://blogs.msdn.microsoft.com/oldnewthing/20050518-42/?p=35613
Comments:	28
Summary:	Looking at the profile for our program so far, 35% of the CPU time is spent copying strings around. Let's see if we can improve that. The best way to speed up copying strings is not to copy them in the first place. Using a wstring in our DictionaryEntry structure forces the vector class to...