Converting from traditional to simplified Chinese, part 2: Using the dictionary

Date:July 12, 2005 / year-entry #187
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20050712-10/?p=34963
Comments:    8
Summary:Now that we have our traditional-to-simplified pseudo-dictionary, we can use it to generate simplified Chinese words in our Chinese/English dictionary. class StringPool { public: StringPool(); ~StringPool(); LPWSTR AllocString(const WCHAR* pszBegin, const WCHAR* pszEnd); LPWSTR DupString(const WCHAR* pszBegin) { return AllocString(pszBegin, pszBegin + lstrlen(pszBegin)); } ... }; The DupString method is a convenience we will use...

Now that we have our traditional-to-simplified pseudo-dictionary, we can use it to generate simplified Chinese words in our Chinese/English dictionary.

class StringPool
{
public:
 StringPool();
 ~StringPool();
 LPWSTR AllocString(const WCHAR* pszBegin, const WCHAR* pszEnd);
 LPWSTR DupString(const WCHAR* pszBegin)
 {
  return AllocString(pszBegin, pszBegin + lstrlen(pszBegin));
 }
 ...
};

The DupString method is a convenience we will use below.

Dictionary::Dictionary()
{
 ...
    if (de.Parse(buf, buf + cchResult, m_pool)) {
     bool fSimp = false;
     for (int i = 0; de.m_pszTrad[i]; i++) {
      if (pmap->Map(de.m_pszTrad[i])) {
       fSimp = true;
       break;
      }
     }
     if (fSimp) {
      de.m_pszSimp = m_pool.DupString(de.m_pszTrad);
      for (int i = 0; de.m_pszTrad[i]; i++) {
       if (pmap->Map(de.m_pszTrad[i])) {
        de.m_pszSimp[i] = pmap->Map(de.m_pszTrad[i]);
       }
      }
     } else {
      de.m_pszSimp = NULL;
     }
     v.push_back(de);
    }
 ...
}

After we parse each entry from the dictionary, we scan the traditional Chinese characters to see if any of them have been simplified. If so, then we copy the traditional Chinese string and use the Trad2Simp object to convert it to simplified Chinese.

If the string is the same in both simplified and traditional Chinese, then we set m_pszSimp to NULL. This may seem a bit odd, but it'll come in handy later. Yes, it makes the m_pszSimp member difficult to use. I could have created an accessor function for it (so that it falls back to traditional Chinese if the simplified Chinese is NULL), but I'm feeling lazy right now, and this is just a one-shot program.

void RootWindow::OnGetDispInfo(NMLVDISPINFO* pnmv)
{
 ...
  switch (pnmv->item.iSubItem) {
   case COL_TRAD:    pszResult = de.m_pszTrad;    break;
   case COL_SIMP:    pszResult =
      de.m_pszSimp ? de.m_pszSimp : de.m_pszTrad; break;
   case COL_PINYIN:  pszResult = de.m_pszPinyin;  break;
   case COL_ENGLISH: pszResult = de.m_pszEnglish; break;
  }
 ...
}

Finally, we tell our OnGetDispInfo handler what to return when the listview asks for the text that goes into the simplified Chinese column. With these changes, we can display both the traditional and simplified Chinese for each entry in our dictionary.

Next time, a minor tweak to our display code, which happens to illustrate custom-draw as a nice side-effect.


Comments (8)
  1. hmmm says:

    All well and good, but will this help get Longhorn shipped (with some features, please) any quicker? Or is this whole blogging thing (not Raymnod Chan specifically, but M$-wide) just a way to increase "visibility" and play a little CYA for the stack-rank game?

  2. Kris says:

    I just happened to come across this dictionary design. Very interesting. Just wondering if you would take this all the way thru and finally expose as it as a COM Component.

    I am also interested in how MS folks design their UI apps(like Office) with automation in their mind. Would you please blog on this sometime in future? Thanks for the wonderful insights your blogs bring.

  3. Ben says:

    hmmm: The #1 priority at all times at Microsoft is helping existing customers. The #2 priority varies between fixing security issues (when there are some assigned to you), and working on your project.

    This isn’t about ranking (god knows Raymond don’t need more reputation) — it’s about helping people deal with the strange world of Win32 programming.

  4. ryanmy says:

    Ben makes excellent points… and in any case, Raymond is known to write posts for this blog far, far in advance — sometimes months ahead — in order to ensure that they keep coming even when all of us are hunkered down for Beta 1. (That’s why I haven’t updated lately :P)

    By the way, you might want to spew your drivel over at some of the Google guys — they’re actually required to spend part of their day working on something other than their product. (But then, if it spends years in public beta, can it really be said to ship?) It’s funny how double standards work…

  5. Craig Ringer says:

    Personally, I find this weblog very interesting and useful. That’s despite the fact that I don’t even *use* win32, let alone program for it, unless I really can’t avoid it.

    Also, consider the public discussion and feedback that comes of things like this. I can’t help but see that being useful. It might not "help get longhorn shipped" any faster, but I imagine it’ll help it be better designed. Personally, I’d prefer that.

    You might also do well to get the name of the person whose weblog you are criticising correct in future.

  6. mattd says:

    hmmmm,

    Why are you in a hurry to get longhorn? What is so big about it. I thought basicly everything a dev would care about is being back ported anyway? With WinFS pulled I just don’t see much to it. Even the newly dropped screen shots were a bit *yawn*. I will say that the new driver model with WDF looks cool but…

  7. Nathan Moore says:

    Is there some reasoning behind the choice to use LPWSTR (or LPCWSTR) over WCHAR* (or const WCHAR*)? At first it seemed that LPWSTR was only being used for a null terminated array, however the StringPool::DupString method eliminates that idea.

    I guess that I have never really understood the point of the LPTYPE vs TYPE*. Or the point of the CHAR typedef for that matter.

  8. Nathan: You’re right, I should’ve used LPCWSTR since the string is null-terminated.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index