Date: | October 17, 2006 / year-entry #350 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20061017-03/?p=29363 |
Comments: | 35 |
Summary: | The "T" in LPTSTR comes from the "T" in TCHAR. I don't know for certain, but it seems pretty likely that it stands for "text". By comparison, the "W" in WCHAR probably comes from the C language standard, where it stands for "wide". |
The "T" in |
Comments (35)
Comments are closed. |
Ain’t it nice that in .Net types have names instead of acronyms? ;-)
I always thought the "T" stood for "typed char" because just CHAR would have been confusing; though "text" makes more sense.
Hum. It’ll still always be "Long Pointer To a STRing" for me.
Why are the Win32 calls named
‘W’ for ‘Wide’ and
‘A’ for ‘ANSI’
?
Why not ‘W’ and ‘N’ (for ‘Narrow’)
or ‘U’ (for ‘Unicode’) and ‘A’ (for ‘ANSI’)?
(It’s especially galling, since it’s the ‘A’ functions that support MBCS …)
post, and somebody doesn’t read the second sentence. As for the A, that
was discussed two years ago. -Raymond]
I think it’s T for Typed Char as stated above since T can change type based on the compiler settings (with _UNICODE defined, it’s a WCHAR and with _UNICODE not defined, it’s a CHAR, ansi). Similar to the _T("") macro that will do the same thing with a constant string as opposed to L"" which always makes a wide char.
>Wow, I think that’s a new record. A two-sentence post, and somebody doesn’t read the second sentence.
I think you misunderstood my point. I’m obviously aware that the
programming langauge contains ‘wide chars’, and it doesn’t take much to
figure out that ‘W’ comes from ‘wide char’.
My point is that, if you’ve got two things to distinguish by a
naming convention, you could name them systematically based on size
(‘wide’ or ‘narrow’) or on content (‘unicode’ versus ‘ansi’).
Mixing the two (‘wide’ versus ‘ansi’) seems a little, well,
arbitrary.
>As for the A, that was discussed two years ago.
That’s as maybe, but I wasn’t here two years ago, and it didn’t occur me to see whether this was a subject already covered.
WCHAR
, decorating functions asGetWindowTextU
would lead to people saying “That’s completely moronic. It should be decorated with aW
. WCHAR, GetWindowTextW.Duh.” What I’ve learned from writing this blog is that no matter what
you do, half of the people will tell you that you made the obviously
wrong decision and that even a moron would have recognized that the
other way was clearly superior. (As for old topics: Perhaps
I should just put the blog on infinite repeat. Every day, I just repost
something from two years ago. Would save me a lot of work.) -Raymond]
Why comment, if you’re unwilling to do any research? To you honestly expect your complaints to be taken seriously when you respect someone that little?
Now if it was Q or G, maybe you’d have a point.
Let us all be nice.
First, Raymond types at a high level. I am mid-level, in the usual WIN32/C realm he discusses, and out of date. Slack is due. We are here at his sufferance. Did you get what you paid for?!?
On the other hand, the above poster reads the blog to learn, so, let us not try to discourace him.
Nivenesque third hand, keep up the posts, without dupes.
Thx
As far as I remember, ‘T’ means ‘transparent’ – by using TCHAR as character type you handle character size transparently.
<<half of the people will tell you that you made the obviously wrong decision>>
Here you are wrong. At least 70% will say you made the wrong decision :-) And if you change lines, the one you just left will go faster. Murphy is always right :-)
<<"W" in WCHAR probably comes from the C language standard>>
I think WCHAR was used in Windows before the C standard decided to adopt the half-baked wchar_t.
Wow, things can certainly get heated when one asks what appears (to me) to be a straightforward question.
I take it the answer is "it’s W versus A (rather than W versus N, or W versus C) for no particular reason".
Lest you think I am somehow telling you that you made the wrong decision, I’ll point out that the only note of complaint was me using "galling" about using MBCS with functions tagegd "A". Which isn’t really that irritating to me personanlly, since I tend to use Unicode.
The T in TCHAR definitely means "TEXT", as Raymond thought. And I have proof :)
In tchar.h there is a macro _T defined that is intended to be used around your TCHAR-based string constants [eg. _T("foo")], so that they will change to Unicode as appropriate as well.
There is an alias for the same macro, which actually predates it (it was shortened because it was a hassle to type the whole thing in everywhere). And guess what it’s called? That’s right, _TEXT.
So an LPCTSTR is a Long Pointer to a Constant Text STRing (or _TEXT String).
The T may come from _T and _TEXT historically, but it is a bit unhelpful because the difference between LPTSTR and LPSTR is not that one is ‘T’ext and the other is not, it is that one is ‘T’yped i.e. that it will switch between W and A. But oh well; that’s how these things come about.
@Raymond
‘What I’ve learned from writing this blog is that no matter what you do, half of the people will tell you that you made the obviously wrong decision and that even a moron would have recognized that the other way was clearly superior.’
It took years of blogging to work this out?! Did the religious wars regarding the best placement of { in code not give you an insight into typical programmer mentality. Obviously K&R bracing is best and emacs is the only text editor – only a moron would disagree. Oh, and pointers should be decalred thus: ‘int* ptr’. All alternatives such as ‘int *ptr’, and ‘int * ptr’ are unholy.
@Mihai
Ah, but multiple declarations on a single line are wrong too ;)
<<pointers should be decalred thus: ‘int* ptr’>>
Nope, ‘int *ptr’ is the right thing :-)
Only half kidding. This is my motive:
int* a, b, c; // a, b, c are int* ???
versus
int *a, b, c; // only a is pointer.
Amazing the arguments people can have in the comments of blog posts, isn’t it? I was reminded of this
>half of the people will tell you that you made the obviously wrong decision<
Is that a problem or is this an answer to Daves point?
I read such blogs to see why and where different minds come to
different conclusions. Sometimes they match my thinking, sometimes not,
sometimes they change my mind.
I’ve read Dave’s first post and agreed with him. Then I’ve read
Raymonds note on it, and got confused where (like Dave) I missed
something.
After re-reading the blog-statement and Daves post several times, I
do not see a reason to award Dave with a Guinnes-Book entry about
“fewest sence understood”.
Daves reasoning is comprehensible and it obviously applies also to WCHAR, which could also be named UCHAR.
Thus, to argue with “Given that the type for Unicode characters is
WCHAR” is just not applicable as it does not take Daves point into mind.
After reading the first two sentences from Daves post, this was clear to me, how about you?
Regards Xavi
(unrelated to any of the posters here)
WCHAR
because that’s what the C language calls it (wchar_t
). -Raymond]"I think WCHAR was used in Windows before the C standard decided to adopt the half-baked wchar_t."
Well, wchar_t was in the first (1990) ISO C standard, and AFAIK most of the things in the standard had been tried out in /at least/ one implementation for some time to make sure it was workable, so it’d almost certainly have been available in a couple of compilers since 1989.
If anyone knows when WCHAR was first introduced into Windows, that’d be interesting.
Xavi wrote:
Not really. UCHAR is already defined to mean "Unsigned char".
To clear up a common misunderstanding… the UNICODE define affects Win32, the _UNICODE define affects the Microsoft C runtime library. The difference is subtle. UNICODE controls the following definitions:
TCHAR
LPTSTR
TEXT()
Win32 A/W APIs and structures
_UNICODE, on the other hand:
_TCHAR
_TINT
_T()
CRT string APIs and related structures
Yes, you can define UNICODE and _UNICODE differently. I think ATL has specific support for that. IIRC it also supports the obscure corner case in which OLECHAR is defined as CHAR rather than WCHAR (16-bit Windows & MacOS)
Closing note about LPSTR/LPWSTR/LPTSTR: they are not just typedefs for CHAR/WCHAR/TCHAR *, they mean "NUL-terminated string of CHAR/WCHAR/TCHAR". This is actually enforced in PREfast ("lint" mode of the latest Microsoft compiler)
<<If anyone knows when WCHAR was first introduced into Windows, that’d be interesting.>>
The need was definitely out there, but who was really-really first, is tought to say.
Windows NT 3.1 was Unicode, and was started in 1988 (http://en.wikipedia.org/wiki/Windows_NT_3.1)
And no, I don’t have a Windows NT 3.1 SDK around to check if it used WCHAR :-)
This is pointless, but I actually did check the other day about when ‘wchar’ became part of the C Standard – figuring that it must have been much later than NT. easynews revealed discussion about gcc’s wchar early in the 90s. So maybe not the actual standard yet, but it existed on unix.
p.s. : yes, UCHAR is a common typedef for ‘unsigned char’.
wchar_t in Unix was not originally Unicode, though it can be (I’m not sure if anyone does it though).
A few things I’ve read lend the impression that wchar_t in NT was not originally Unicode, though it is now.
For comparison, char was not originally EBCDIC, but it can be. For comparison, char was not originally either (depending on its value) 100% or 50% of a Shift-JIS character, but it can be.
True norman, wchar_t does not mean unicode or anything in particuliar except ‘wide’.
On unix, wchar_t is in fact 32-bit but the spec says nothing about how it should be implemented.
Wednesday, October 18, 2006 11:46 PM by Ulric
I’m pretty sure it was 16-bit when I was reading related code. Even though most processors had 32-bit words, 16-bit unsigned shorts existed.
Of course that was for EUC Japanese only, so other locales could vary. Locales existed by the time I was reading that code, but they might not have existed yet when that code was written.
Ouch, I lied, sorry.
Should be:
Even though most general-purpose processors (that could run compilers, database systems, etc.) had 32-bit words, 16-bit unsigned shorts existed.
I believe Raymond should take heart that there is likely a silent majority who reads what Raymond has to say… nods… and either agrees with Raymond or appreciates the insight regardless of personal opinion.
They do not bother reading the comments.
Well, you need 32 bits to store a UCS-4 character. (Or is that UTF-32? I can never remember which is which. In this case I think they’re exactly the same bit pattern for every code point, but in the case of UTF-16/UCS-2, some non-BMP stuff is different. One of UTF-16/UCS-2 can’t represent non-BMP characters, and I can never remember which.)
But anyway, if you want your strings to be arrays of characters, without any special escaping required at all, you’ll need a 32-bit wchar_t. And you’ll waste incredible amounts of memory on a long wchar_t string that contains only 7-bit-ASCII characters, but there’s not much you can do about that.
@Mihai
typedef int* intptr;
intptr a, b, c;
‘int*’ IS the type, I think it’s just the c syntax that makes it look stupid without the typedef.
>Given that the type for Unicode
>characters is WCHAR
No, it is actually unsigned short so only GetWindowsTextUS would be proper. :)
> int* a, b, c; // a, b, c are int* ???
>versus
> int *a, b, c; // only a is pointer.
Tsk, tsk, tsk, only a is pointer in both cases.
Last time I checked you were supposed to write:
int *a, *b, *c;
if you wanted them all to be the pointers.
Igor, that’s exactly the point. The first form makes it *look* like "a", "b", and "c" are of type "int*", instead of "a" being "int*" and "b" and "c" being "int", because the * is next to the "int" instead of the "a".
An int in ansi c should be the largest native integer type of the hardware. Why isn’t int 64-bit on visual c++ in win64?
>Igor, that’s exactly the point.
>The first form makes it *look* like…
I personally always use the second form but everyone familiar with C syntax knows that compiler doesn’t give a damn where the whitespace in the above-mentioned code is.
>Why isn’t int 64-bit on visual c++ in win64?
Apart from Raymond’s explanation there is one potential reason I perceive — code size.
Due to x64 concept being just an expansion of GPRs (exactly like it happened when we had 16->32 bit transition) where underlying CPU architecture wasn’t completely widened (ALUs, multipliers, data paths, etc) both Intel and AMD suggested that 64-bit code still use 32-bit integers wherever possible because:
a) it is more efficient in terms of performance
b) instructions that use 32-bit operands are shorter leading to the more compact code which fits better in L1 instruction cache thus again improving performance.
As the matter of fact, I believe that is the only justifiable reason.
Raymond mentioned that changes in SDK would break the code. I think there are alternatives to what they did based on one simple fact:
Fact is that you release new Platform SDK to enable developers to use features present in future operating systems.
Based on that fact we may assume that whoever installs latest Platform SDK they are doing it because they intend to support those future operating systems.
From that it easy to conclude that they will bite the bullet and fix their code.
Those who don’t want to fix it can continue using old Platform SDK.
That is at least what I would do because this will have to be fixed sooner or later.
Sorry for the O/T.
PingBack from http://blogs.msdn.com/oldnewthing/archive/2007/01/05/1416853.aspx