Date: | February 12, 2004 / year-entry #58 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20040212-00/?p=40643 |
Comments: | 40 |
Summary: | So what's with all these different ways of saying the same thing? There's actually a method behind the madness. The plain versions without the underscore affect the character set the Windows header files treat as default. So if you define UNICODE, then GetWindowText will map to GetWindowTextW instead of GetWindowTextA, for example. Similarly, the TEXT... |
So what's with all these different ways of saying the same thing? There's actually a method behind the madness. The plain versions without the underscore affect the character set the Windows header files treat as default. So if you define UNICODE, then GetWindowText will map to GetWindowTextW instead of GetWindowTextA, for example. Similarly, the TEXT macro will map to L"..." instead of "...". The versions with the underscore affect the character set the C runtime header files treat as default. So if you define _UNICODE, then _tcslen will map to wcslen instead of strlen, for example. Similarly, the _TEXT macro will map to L"..." instead of "...". What about _T? Okay, I don't know about that one. Maybe it was just to save somebody some typing. |
Comments (40)
Comments are closed. |
Not related to this post in particular, but damn, your blog rocks. Just thought I’d let you know.
Interesting… I didn’t actually know about the existence of the underscore _TEXT(), though I have a feeling that stuff I do defines both UNICODE and _UNICODE (so in the end it makes no difference, even if it is slightly incorrect).
I could be wrong (and it could well be a different ball game there — it often is) but I think MS stuff written for WinCE (sample/OEM-customisable sources) just uses TEXT() unilaterally.
What about .Text? :)
Don’t forget TCHAR vs. _TCHAR
And __T and __TEXT
I for one am glad for _T because it causes the least noise in the code. TEXT() is just too long for something that does so little. ;)
I personally loathe the way this is necessary in C and C++. Everything looks awful if all your strings have to have a L in front of them, and converting between narrow/wide strings is a mess.
I understand why it has to be that way, but damn, it’s not nice to work with.
The Delphi model is so much better. You just declare your string variables as being narrow or wide, and the compiler deals with the details. I consider that to also be superior to the Java and .NET model which says your strings will be wide whether you need it or not. I don’t speak any languages other than English, and hence most of the time my strings don’t need to either. ;)
.text is a section in your executable and object files which contains your executable code (under normal circumstances), and has nothing to do with string constants, which are store elsewhere. Just to be confusing.
Although I have a sneaking feeling that you know this already, judging by the smiley ;)
Shane: If C figured this out for you, it wouldn’t be C anymore. I noticed another blogs.msdn.com post on this very subject today, in fact (should be pretty easy to find).
C is C because it requires you to be explicit in many cases where other languages "just figure it out". C programmers tend to like the fact that C compilers *have to be told* instead of just guessing.
That said, if you ignore most C compiler warnings, C is suddenly worse in this respect than anything else…
You really should have explained this with reference to "the standard" which you mentioned a few days ago: the runtime library mustn’t define a name that conflicts with user-space identifiers and the Windows headers cannot use a reserved name.
Before now, I’d not realised the difference and not seen it documented, instead assuming that the people employed by Microsoft were ignorant fools: that’s an easy assumption to make of a company that’s kicking out products like Win95 and VC6… of course, since those days, the company has made a dramatic turn-around on the C++ front, but other standards are still lacking and the intrinsic security of its products is pathetic: see http://www.eeye.com/html/Research/Upcoming/index.html for todays revelation.
No wonder MS made front-page news in the UK with its last round of patching.
If you "let the compiler decide" then you no longer have C++. C++ is strongly typed for a reason, it prevents errors. If you want DWIM*, use Perl ;)
* Do What I Mean
To say that the intrinsic security of Microsoft products is "pathetic" is unfounded; Linux products release more patches every year than the MS operating systems.
I said I know why C does what it does. I’m saying that I don’t like having to work with it.
With Delphi, there’s actually no guesswork involed. If you assign to a narrow string, generate a narrow string constant. Likewise for a wide string constant. If assigning narrow to wide, or wide to narrow, do the conversion. The compiler doesn’t guess anything, it follows a set of well defined rules. The program controls the decision, the compiler implements the busywork. As it should be!
My comment was more along the lines of wondering why .NET takes a step backwards from languages like Delphi. Delphi is an example of how narrow and wide strings can live together at no inconvenience to the programmer, unlike the inconvenience in C. That’s not a criticism of C. C doesn’t even have a string type, so it wouldn’t make much sense to have automatic string conversion. :)
As far the strongly typed comment, C++ isn’t strongly typed. A strongly typed language does’t allow an object to be forcibly cast to a type it’s not. It is however statically typed, which is what I assume you mean. It’s still an open question as to whether static typing is helpful or harmful to error prevention.
I think the theory behind .NET supporting only unicode is to make it impossible to write a program that cannot be internationalized.
I get you Raymond, but I can’t resist to reply that using Unicode strings is only one among many steps to make your app world-ready.
Not exactly comparing like for like there.
C++ ahh the language that became a language of cludges and fixes for crapy compilers and libraries.
Jim:
This is straying somewhat off-topic, but I’ll follow-up anyway…
The number of patches per year is a ludicrous metric to use to measure the security of a system. It could be argued either way that more patches ultimately means increased security.
In reality, there’s two metrics which actually matter: the time taken from problem report to public patch release, and the time between an exploit being released and public patch release.
I don’t know about the first metric in the Windows world, but judging by the fact that exploits are often already in the wild before patches are available (the second metric), it doesn’t look good. I’m not going to try and push an open vs. closed source debate, because this isn’t remotely the right forum for it, but one of the advantages of having so many disparate pairs of eyes looking over the sources is that problems get found quickly. It’s worth looking at the speed at which, for example, the Debian or FreeBSD security teams get fixes out after an issue is discovered. Times are typically measured in *hours*. The ASN.1 bug, in contrast, was (apparently) reported some time last year. Nobody can claim that this is acceptable.
This isn’t an anti-Microsoft bash, in case it appears as one. Everyone can do more to improve security, but a patch for a non-disclosed exploit needs to be out within 7 days, *at most*. As soon as that problem is exploited by someone with nefarious intent, you’ve got a problem, and you need to have the patches *already out there* by that point.
That is why people say that Linux and the BSDs are so secure compared to Windows. It has absolutely nothing to do with the sheer number of patches.
(Also, bear in mind that a Linux distribution often contains all the applications that you’d ever want to run on it – so the number of patches released over the course of a year is going to be huge compared to those released for an OS and a few associated applets)
Shane King wrote:
>I don’t speak any languages other than English,
>and hence most of the time my strings don’t need
>to either.
Your literal strings might not, but you never know what the user may want to enter. And then you will end up either (1) converting wide user input into (1a) local ANSI encoding, (1b) local OEM encoding, (1c) some hardcoded encoding like 1250, or (2) converting your narrow format string into a wide format. Neither of these alternatives looks good.
I have been working in Delphi. The fact that the whole VCL is ANSI, makes it nearly impossible to produce an application that will allow the user to enter text in any language of his choice.
Conversion from wide to narrow must not be done implicitly as it loses information. The original string may contain such pairs of characters (a, b) that for every narrow encoding E in the world at least one of {a, b} is not representable in E. Or the user might want to use an encoding different from the system default ANSI encoding. For example, in Russia the ANSI encoding is windows-1251, but it is considered bad style to send mail in windows-1251. Instead, mail is to be encoded in koi8-r (aka CP20866).
Conversion from narrow to wide also cannot be done implicitly as it requires additional information as to which encoding the original narrow string uses. It may (or may not) be the default ANSI encoding, if the string came from the GUI. It may (or may not) be the default OEM encoding, if the string came from console input. It may be any other encoding if the string came from a socket or a file.
Encodings are a pain in the backend, it’s time we all moved to one unified encoding (like UTF-8) for default, and formats that specify the encoding explicitly (like <?xml version="1.0" encoding="koi8-r"?>) for advanced uses where, for example, it would reduce network traffic.
I’ll apply the counter-argument: ASN.1 is critical to protocols used by Exchange and Windows (2000 and later) domains, such as X.400, X.500, LDAP, and many others. A patch which caused these protocols to fail would wreck the IT infrastructure for many businesses. Therefore it’s more important to ensure that the patch is correct than for it to be released as soon as it’s complete. As long as the patch release is _timely_, I don’t care how long it takes to test.
I’m guessing here, but I suspect that the patch was _code-complete_ a long time ago. If an exploit _had_ been published, the patch could have been released at that time.
As we’ve seen recently, the publication of a patch seems to _lead_ to exploits in the Windows world.
I’ll admit that it sounds a lot like a game of chicken, but I believe it’s better to ensure that the software functions correctly for everybody when used in-spec, rather than to protect a small number of possible exploits. I’m not saying ‘don’t patch it’ but I _am_ saying that both sides need to be considered and weighed carefully.
Oh yes, and the ‘many eyes’ fallacy. Many eyes are only of value if those eyes are trained, and if they even exist – I’m not convinced that they do. I prefer my patches to be produced by trained, experienced developers who are familiar with the source code and its conventions, who are employed to produce the patches. I don’t care for J Random Hacker producing a claimed patch which breaks functionality I rely upon _and doesn’t fix the hole_.
C/C++, its no longer a language, its a MACRO programming language these days. How much of the code do you write is preprocessor? Alot I bet.
Thats whats wrong with the language and why the rest of the world moved on to better languages.
Mo:
By the second metric you mention, Windows is just fine. Every (correct me if you can find data, please) major incident of Windows OS or tool hacking, including worms, in the past few years have taken advantage of an exploit for which patches were already available. Period. The fact that these patches were not deployed everywhere speaks to the difficulty of managing patches in Windows as well sa the ignorance of sysadmins, not to the quality of the underlying product.
As for "patches for every concievable application", this sounds not unlike the criticisms people level at Microsoft for enabling too many products and features in their releases.
Jim (not an MS employee)
"Jack": I very much appreciate your comments but please change your handle and please remove "MSFT" since you aren’t actually a Microsoft employee as far as I can determine. This is your second warning.
Jim:
My point was simply that the "Linux vendors release more patches than Microsoft do, therefore Linux is more insecure" metric is downright stupid, because it fails to take into account any reasons why there would, even with a minimal number of incidents, be significantly more patches released for one than the other.
Mike:
Then don’t deploy the patch. It’s then up to *you* to decide, and *you* can review the code of the patch (or pay somebody else to that you trust) if you want. Personally, I want patches from Microsoft as soon as they’re complete, *I* want to be in the position to decide whether it’s safe to install a patch on my system, I don’t want somebody else saying "uh, well, we wrote the code months ago, but we’re not sure if it works yet".
Also, given the sheer amount of resources Microsoft have (and claim to have thrown at security), do you really think sitting on a buffer overrun fix for six months is justifiable?
Mo:
I agree that the patch count metric is stupid, and I apologize for implying anything else. My actual belief is that neither Linux/BSD/UNIX nor MS products are as secure as they can be, and that MS products take more than their share of the heat.
Microsoft is also in a tough position when it comes to releasing patches; if you attended the PDC in October, you know Microsoft is catching MASSIVE amounts of flack for releasing patches TOO often, and God help MS when a patch they releases breaks a production system.
I actually do think it’s justifiable to sit on a fix when it’s as fundamental as ASF.1; the sheer number of vectors by which this could be attacked demands it. What if Microsoft released the patch as soon as somebody cranked it out, but the patch didn’t actually cover all vectors (due to insufficient testing)? The announcement of the vulnerability would lead within days to exploits that would hurt even people that are patched.
I know people should test patches before deployment; I know people should stay patched. MS can do more to make patches easier to apply and manage, but they can’t make their customers do the right thing.
Jim (not an MS employee)
Jim:
I agree with you almost 100% (wow)
Yes, they all suck. Personally, I mostly get sick of "X is more secure than Y" arguments, but this one bugged me for some reason.
I too think it’s justifiable to sit on a fix when it’s as fundamental as ASN.1, except for the fact that:
– Certain other platforms had remarkably similar vulnerabilities (which were patched) last year. The smart money would be on nefarious folks trying similar exploits against Windows, too.
– This was reported in July 2003. It’s now February 2004 – there’s "sitting on" a patch and there’s *sitting on a patch*. 7 months is just far too damned long!
Personally, I think the whinging about the number of patches MS are releasing is silly; although I suspect a certain proportion actually mean to complain about the number of problems that are having to be fixed, rather than the fixes themselves :)
How’s this?
// File 1.
#define UNICODE
#include <windows.h>
#include <tchar.h>
// TCHAR defined in <windows.h> based
// on UNICODE.
// sizeof(TCHAR) == 2
// File 2.
#define UNICODE
#include <tchar.h>
#include <windows.h>
// TCHAR defined in <tchar.h> based
// on _UNICODE.
// sizeof(TCHAR) == 1
So just changing the order of common includes can introduce hard to find bugs.
Remarks on Chapter 2 – An Introduction to Unicode
The project have UNICODE defined so when today I reviewed this piece of code it looked wrong because the Find method as a character parameter that is not treated really as an unicode character. However it did compile well (and work too) under Visual C++ 6. I still doesn’t understand why it didn’t require the L (or TEXT or _T) macro.
2/18/2004 2:55 PM Pingback/TrackBack:
> I still doesn’t understand why it didn’t
> require the L (or TEXT or _T) macro.
Me too. So I looked at MSDN pages, and they’re broken too. Both October 2001 since you mentioned VC++ 6, and October 2003.
October 2003, page CStringT::Find
ms-help://MS.MSDNQTR.2003OCT.1033/vclib/html/vclrfCStringTFind.htm
> //typedef CStringT< TCHAR, StrTraitATL< TCHAR > > CAtlString;
>
> CAtlString s( "abcdef" );
> _ASSERT( s.Find( ‘c’ ) == 2 );
> _ASSERT( s.Find( "de" ) == 3 );
>
> CAtlString str("The waves are still");
> int n = str.Find(‘e’, 5);
> _ASSERT(n == 7);
Not a _T() to be seen.
October 2001, page CString::Find
> // First example demonstrating
> // CString::Find ( TCHAR ch )
> CString s( "abcdef" );
> ASSERT( s.Find( ‘c’ ) == 2 );
> ASSERT( s.Find( "de" ) == 3 );
>
> // Second example demonstrating
> // CString::Find( TCHAR ch, int nStart )
> CString str("The stars are aligned");
> int n = str.Find(‘e’, 5);
> ASSERT(n == 12);
Not for all the _T() in China.
Because the numerical value of ‘x’ is equal to the numerical value of L’x’ so you get away with it. This works only for characters below 0x80 though.
I suspect in the second and third cases that the program was compiled in ANSI, so _T() is the same as not saying anything.
The cited MSDN pages recommend the use of TCHAR, to work regardless of Unicode vs. ANSI options, even though the example code puts it in comments instead of actual code. It looks like MSDN is recommending that the arguments to the Find method be coded that way too, to work regardless of Unicode vs. ANSI options. That is why the cited pages are broken.
You and I both suspect that the second and third cases, and I suspect the first case too, were only compiled in ANSI by MSDN’s testers/proofreaders. This does not lessen the breakage in the cited MSDN pages.
Okay I’m not familiar with this class so I went to look at the documentation and lo, this class supports constructing from either character set. You don’t have to construct it from the character set of the underlying string.
http://msdn.microsoft.com/library/en-us/vclib/html/vclrfCStringTCStringT.asp
CStringT(
LPCWSTR pszSrc
);
CStringT(
LPCSTR pszSrc
);
So there’s no bug. The code will compile fine either ANSI or UNICODE.
OK, it looks like the bug was fixed in Visual Studio .NET. But the examples still fail in Visual Studio 6, which is used by Pingback/Trackback and (except for personal experiments) myself. MSDN editions which addressed Visual Studio 6 were defective in this way, though current MSDN editions which have the same example code can be considered valid for Visual Studio .NET.
On the other hand, it seems that a new bug was added in Visual Studio .NET. According to the MSDN page which you cited, CStringT can construct a string from a single wide character:
> CStringT(
> wchar_t ch,
> int nLength = 1
> );
but cannot construct a string from a single ANSI character. Visual Studio 6 went through a minor contortion to enable constructing a CString from a single char regardless of whether the compilation environment was ANSI or UNICODE. (Though Visual Studio 6 couldn’t construct a CString from a single wide character in an ANSI compilation environment.)
This is all very fascinating but note that I have no influence over what Visual Studio does. If you can find a blogger from the VS team maybe they can do something about it.
3/3/2004 5:43 PM Raymond Chen:
> This is all very fascinating but note that I
> have no influence over what Visual Studio
> does.
You had said that you had interest and contacts in fixing the MSDN library, which in some circumstances would include the issue raised by Pingback/Trackback (for Visual Studio 6). You investigated code in Visual Studio .NET and found a fix, which made the example code in MSDN become accurate. The example code in MSDN at the time of Visual Studio 6 was inaccurate at the time of publication and remains that way, which I guess doesn’t interest you, it simply remains that way.
> If you can find a blogger from the VS team
> maybe they can do something about it.
Hmm, good idea. By the way, about two months ago I reported a more immediately obvious bug in VS .NET, Microsoft’s PSS replied that localized versions of VS .NET 2003 English are different from North American versions of VS .NET 2003 English (I’m not sure which English or English version I have since it came in MSDN), and then PSS replied that MS does not honor the printed warranties which it distributes with MSDN subscriptions. So you are right, a blog has a better chance. By any chance can you point me to one?
My focus is on the core Platform SDK. I have influence over that. I do not have influence over other parts of MSDN, like the Visual Studio or .NET Framework documentation. A list of Microsoft bloggers with areas of expertise can be found on http://blogs.gotdotnet.com/
How do I get the CStringT doc for vc6 vs vc7 which is all I can find on msdn.
Commenting closes after two weeks.
http://weblogs.asp.net/oldnewthing/archive/2004/02/21/77681.aspx
Sucking a file off the disk while converting its character set.
This is where LVN_SETINFOTIP is used.
PingBack from http://blog.m-ri.de/index.php/2007/05/31/_unicode-versus-unicode-und-so-manches-eigentuemliche/