Very late remarks on the original Chinese dictionary series

Date:March 3, 2006 / year-entry #79
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20060303-13/?p=32073
Comments:    5
Summary:I have not forgotten about the Chinese/English dictionary series, but I simply haven't had the motivation to sit down and write up descriptions and discussion for the code that I wrote along the way, so instead of adding to the program, I'm going to answer some questions that were asked back when I started the...

I have not forgotten about the Chinese/English dictionary series, but I simply haven't had the motivation to sit down and write up descriptions and discussion for the code that I wrote along the way, so instead of adding to the program, I'm going to answer some questions that were asked back when I started the series but which I didn't respond to at the time since I was out of town.

More than one commenter suggested using v.reserve() to pre-allocate the vector memory. First of all, the cost of vector reallocation really didn't factor into the performance after the first few rounds of optimization, so adding a reservation step ended up being unnecessary. Furthermore, getting the correct value to pass to v.reserve() would mean making two passes over the dictionary, one to get the number of entries in the dictionary and set the vector reservation size, and another to fill the dictionary itself. The alternative would have been to make a guess as to the number of entries in the dictionary based on the total file size and the average length of each entry. Fortunately, it never came to that.

Another commenter suggested preprocessing the file. That is also a valid technique, but I intentionally avoided it partly for expository purposes (it would have removed much of the challenge), and partly because I wanted to be able to update the dictionary by merely replacing the dict.b5 file.

Commenter CornedBee suggested using the wcsrchr function as an alternative to the missing std::rfind method. Note, however, that the DirctionaryEntry::Parse method takes a string in the form of a start and end; it is not a null-terminated string. Passing this to wcsrchr would have resulted in quite undesirable behavior.


Comments (5)
  1. Andy says:

    So far this has been one of my all time favorites of your post themes. I eargly await every installment in this series.

  2. Frank says:

    Andy: I second that.

    The series was interesting to me in three ways: it had to deal with character encodings that I have somehow mercifully managed to avoid in my work, dealt with performance optimizations, and it dealt with nice Win32 GUI tricks to create serve the user better.

    I would love to see more!

  3. Cooney says:

    You can still preprocess the file and make the upgrade seamless: all you have to do is come up with a new extesion, say .b5-chewed and process the .b5 file into .b5-chewed every time it’s newer than the .b5-chewed file.

    Of course, this will increase startup time once in a while. You can make that a win by telling the user that you noticed the new file and are chewing on it. After all, they just stuck the new file in there, right?

  4. This assumes of course that you have write permission into the directory that contains the raw data. If the administrator updates the b5 file, the user won’t be able to save out the "chewed" file.

    (If you respond that the chewed file should go into a separate directory, then you have the problem of what to do when there are multiple b5 files in the system.)

    But the real reason was simply that I didn’t want to create another file and have to manage it.

  5. Miral says:

    "Commenter CornedBee suggested using the wcsrchr function as an alternative to the missing std::rfind method."

    But, like Anders Dalvander pointed out, you don’t need std::rfind — just use a std::find and std::reverse_iterator combo.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index