Date: | September 6, 2017 / year-entry #201 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20170906-00/?p=96955 |
Comments: | 46 |
Summary: | No, but you can maybe fake it. |
The
Unfortunately, there is no way to change the setting at runtime,
nor is there an override in
But wait, all is not lost.
What the customer could do is ship two versions of the program,
byte-for-byte identical except that one of them has the
Another approach is to register the large-address-aware version in the Start menu, and have it check the feature-flight flag when it is launched. If the flag says to disable large-address-awareness, then the program launches the not-large-address-aware version with the same command line. Yes, it's a bit clunky, but at least it's do-able.
¹
To reduce disk space, they could move the bulk of their code
into a DLL and have the EXE be a stub that loads the DLL and then
calls the |
Comments (46)
Comments are closed. |
>> To reduce disk space, they could move the bulk of their code into a DLL and have the EXE be a stub that loads the DLL and then calls the RunTheProgram function in the dll
Let’s be bold: one of the EXE is a stub that loads the full exe with LoadLibrary and call its main() function :-)
That is something that would be nice to do.
Inb4 someone gets picky with why this doesn’t work.
Tried that; it’s theoretically possible if you’re willing to code against RawEntryPoint() and use a bunch of horrible-looking assembly to decode where you are DllMain or not.
You don’t need assembler to decode whether your entry point is called as _DllMainCRTStartup() or [w]WinMainCRTStartup(): use the MSC intrinsic _ReturnAddress() to determine whether your caller is NTDLL.dll or KERNEL32.dll
And you don’t need assembler to clean up the stack: [w]WinMainCRTStartup() does not return, but needs to call ExitProcess(), so you can always use a _DllMainCRTStartup() entry point.
I can’t believe you’re recommending that people rely on an extremely subtle implementation detail.
I only wrote that you don’t need assembler to detect whether a “raw” entry point is called as DLL or application.
Regarding the subtle details: the _DLLMainCRTStartup() entry is only called by NT’s user space module loader, which is implemented in NTDLL.dll As long as the module loader continues to be implemented in NTDLL.dll the _DLLMainCRTStartup() callback will be called from NTDLL.dll
This didn’t change in the last 24 years.
Sometimes things that haven’t changed in 24 years change. Especially extremely implementation-dependent things like this. (Example coming in a few months.)
Of course things change from time to time.
I don’t expect any production code to rely on the fact that the _DLLMainCRTStartup() callback is called from within NTDLL.dll, as this detail is not documented.
Microsoft also dared to change other details of this interface at least twice in this millennium: with Windows XP they introduced the undocumented callback reason 4 alias DLL_APPLICATION_VERIFIER, and since Windows 7 they call this entry twice with reason 1 alias DLL_PROCESS_ATTACH when the DLL is registered as an application verifier provider.
You say “I don’t expect any production code to rely on” what you wrote, but what you wrote was not qualified with any disclaimers like “Not for production use.” You just came right out and said to do it: “use the MSC intrinsic _ReturnAddress() to determine whether your caller is NTDLL.dll or KERNEL32.dll”.
I wrote what ACTUALLY works!
Maybe I should introduce a nitpickers corner.-)
Again, you are not being clear whether the information you are providing is intended for production use. You say “This is what actually works” which implies that it is suitable for production. But it’s not. On this site, I try to be clear about the distinction between contractual behavior and implementation detail. Because I’ve seen what happens when people confuse the two.
I replied to the false statement it’s theoretically possible if you’re willing to code against RawEntryPoint() and use a bunch of horrible-looking assembly to decode where you are DllMain or not. to show how this can be done practically and without a bunch of horrible-looking assembly.
The purpose/intention is solely educational.
It was not clear at what point we crossed from documented behavior to implementation-defined. Certainly nobody bothered to say that we landed in the implementation-defined part.
We never crossed that line, simply because Microsoft provides NO documentation how or by whom _DLLMainCRTStartup() and [w]WinMainStartup() are called.
The existing documentation for DllMain(), https://msdn.microsoft.com/en-us/library/ms682583.aspx and https://msdn.microsoft.com/en-us/library/ms682596.aspx, is even ambiguous: the name DllMain() is used for the function called by the CRT too. Only the documentation for the IDE/Compiler, https://msdn.microsoft.com/en-us/library/988ye33t.aspx, or the linker, https://msdn.microsoft.com/en-us/library/f9t8842e.aspx and https://msdn.microsoft.com/en-us/library/aa235421.aspx, introduces the names *MainCRTStartup, and tells their calling convention.
Even the fact that returning from an applications *MainCRTStartup() function does not terminate the process when secondary threads have been created is NOT documented there.
Historically ms has treated most implementation details as public interfaces and kept them backwards compatible, instead of breaking them to maintain a better technical solution as a whole, this is what happens when you make such decisions.
@640K :
No, this is what happens when people are lazy and/or do not want to play by the book.
Play stupid games (: relying on implementation detail), win stupid prizes (: a nightmarish code base).
So you’re saying that we were always on the “documented and supported” side of the line? Or you’re saying that we were always on the “not documented and unsupported” side? Because the original comment sounded like a “working within the bounds of documented and supported” comment.
I choose the second alternative: we were always in the land of the undocumented.
I didn’t expect that I need to mention that explicitly.
As a general rule, on this site, any venture into the undocumented is explicitly called out.
On the contrary, I started this thread while remaining in the land of the documented: https://blogs.msdn.microsoft.com/oldnewthing/20040614-00/?p=38903
The horrible assembly is for determining your load address directly and for repairing the stack.
You don’t need a bunch of ugly assembly to access your own modules MZ and PE headers and determine whether it was built as DLL or EXE: use
extern IMAGE_DOS_HEADER __ImageBase;
to access the MZ header, get the offset of the PE header from __ImageBase->e_lfanew, then test …->OptionalHeader.DllCharacteristicsUnfortunately this but does not tell whether your modules raw entry point was called as [w]WinMainCRTStartup(void) or _DLLMainCRTStartup(HMODULE, DWORD, LPVOID).
Fortunately another commenter pointed out that LoadLibrary(“some.exe”) succeeds, but does not call the entry point at all. Likewise CreateProcess(“some.dll”, “*”, …) fails with ERROR_BAD_EXE_FORMAT
So: your precondition that a module linked as DLL may be loaded as EXE and vice versa does not hold!
This is actually all documented in the linker documentation, you don’t even need to do any of that work… you can tell NTLOADER exactly what you want called, that said you DO need to meet your side of the contract if you’re going to do that.
No, you can’t tell NT’s module loader HOW to call the entry point: this is, as Raymond Chen already wrote, an implementation detail. JFTR: you can specify any symbol you like with /ENTRY: to the linker. If this symbols prototype but does not match the prototype expected by NT’s module loader your module will crash, sooner or later: _DLLMainCRTStartup(HMODULE, DWORD, LPVOID) differs from [w]MainCRTStartup()
After playing around a lot with the raw entry point, it may surprise you to learn that you don’t have to call ExitProcess.
While it is true that the CRT/VCRuntime source shows that after it returns from (w)main or (w)WinMain it calls exit, this is just an easy way to invoke the process cleanup. Returning from the entry point function exits the process naturally, in fact the exit in general is only called for unmanaged applications. This could change at some point in the future. So again, this is relying on an implementation detail.
Simply returning from [w]WinMainCRTStartup() does NOT terminate the process if there is a secondary thread running.
I don’t like zombies.
I once experimented with that too, in a simple program. As Raymond said in “The old-fashioned theory on how processes exit” ( https://blogs.msdn.microsoft.com/oldnewthing/20070502-00/?p=27023 ), it’s fine when you control all threads (since assuming no one calls ExitProcess(), the process exits naturally when the last thread does) but nowadays you don’t. There is no CleanupThreads() functions, no way for a DLL to register for “thread cleanup notification”, and no way (outside of .Net) to mark your threads as “background threads” or register a standard “this thread should exit its main loop within this second” event.
Which means, as soon as you use any function that spawns a background thread, you’re screwed.
In the lengthy exchange here, the “what actually works” comment is exactly what gets code into trouble, and causes the need for application compatibility shims, and all kinds of other things-that-should-not-be-necessary.
It has already been pointed out that Microsoft could choose the have an upgrade path to break lots of programs, and say “well the program didn’t do the right thing”, even if the program did “what actually works” (or worked at the time). Then no one would upgrade, because no business-critical software still runs.
It is a huge distinction between what works, and what works but is also future-proof. Some old code written for Windows 95 works, and even installs, and some does not. Some code relies on the presence of C:\Program Files, and some code does not. It may work for YOU, right now….
1. my comment targeted the wrong claim a bunch of horrible-looking assembly.
2. even Microsoft introduces incompatible changes from time to time, or removes old interfaces from the Win32 API.
3. if I write code which relies on undocumented behaviour I don’t expect Microsoft or any other OS-vendor to fix my bugs if things go wrong.
4. Microsoft “documents” quite some interfaces in header files like winnt.h only. See Raymonds next blog entry.
Do you even need to do that? I know that having a dll literally export a __stdcall function called “main”, and having the EXE simply DLLImport it worked just fine in a quick console mode app test. Is there some reason that one cannot do that with a function named WinMain, which just happens to be present in the dll, but is not the DLL’s entrypoint?
When LoadLibrary called to load executeable, but not dll, then it behaves like LoadLibraryEx(..DONT_RESOLVE_DLL_REFERENCES) so you executable loaded as dll will crash when it will try to call any API. So you need to essentially write own PE loader (or spoof NTDLL’s one to think its a really dll).
So how does the loader know it’s an executable not a DLL?
obviously by looking for IMAGE_FILE_DLL in IMAGE_FILE_HEADER::Characteristics
An unexpected problem with doing that is that it causes Malware/Virus scanner heuristics to start flagging you up.
Clunky? I’d call it clever, particularly the small stub solution.
This is just a single byte change? Can you write a small utility that modifies the binary?
Modifying a binary is likely to trigger anti-malware protection software. It will certainly invalidate the digital signature.
Jigsaw patch will take care of the digital signature just fine: the modification is known in advance so we can just carry both signatures in the modifier. I prefer your solution of loading a giant DLL from two stub EXEs anyway.
If customer concerned only about 3rd party dll’s in own address space then they just can MEM_RESERVE upper 2GB of own (large) address space before loading that risky dll.
And put own heap manager over that reserved space to allocate own buffers on that 21s century’s high memory area.
I expected that would be the hack. It doesn’t seem entirely safe though:
– What if the stack of the main thread ends up in the high 2G, and the add-in then crashes when doing arithmetic on addresses of local variables? Fix: run the add-in on another thread started after MEM_RESERVE.
– What if e.g. kernel32 gets loaded in the high 2G?
– What if a system DLL reserves some address space before your MEM_RESERVE, but frees it later, and the add-in then happens to allocate it?
I also expected there would be an override via UpdateProcThreadAttribute, but apparently not.
> What if the stack of the main thread ends up in the high 2G, and the add-in then crashes when doing arithmetic on addresses of local variables? Fix: run the add-in on another thread started after MEM_RESERVE.
Yeah. That fixes it. There’s also a trick using only documented APIs to reallocate your stack (begins with ConvertThreadToFiber).
> What if e.g. kernel32 gets loaded in the high 2G?
Impossible.
> What if a system DLL reserves some address space before your MEM_RESERVE, but frees it later, and the add-in then happens to allocate it?
Easily prevented by delay-loading everything but kernel32.dll.
Btw, I think system DLLs never loads above 2GB boundary on a 32-bit system, because if that’s the case, you can’t have any non-LargeAddressAware processes running on that system. Anytime these returns memory pointer above 2GB, and the non-LargeAddressAware processes may fail.
Only if their “currently preferred base address” (i.e. the one selected by ASLR) is available. Now you are probably doing something nasty if the address where it wants to load kernel32.dll is already reserved before kernel32.dll is mapped, but for other system dlls it might happen. The reason that the system tries to use the same address for all instances of the same dll is so pages with relocations can be shared as much as possible – this also weakens ASLR because an information leak in one process can be used as part of an exploit of another, other OSes fixes this by using position independent code, but that comes with a performance penalty for x86.
That is at least how it works for 32-bit processes, I’m not sure about 64-bit processes since you already get IP-relative addressing for free.
I think the DLL’s are preloaded but the process get the translated address mapped into their address space. And the system DLLs should always return a within 2GB address when a pointer is needed like when calling GetProcAddress() for some function in kernel32.dll.
There were tons of old application (and DLLs) that uses signed variable type to store pointer. As the memory allocation rules for DLLs follows that of the main program, not returning >2GB value for system DLLs will certainly induce hangs and crash more often, and the general users will think it’s Windows’ fault as “the program runs without problem on WinXP”. I don’t think Microsoft will choose to risk that.
kernel32/kernelbase/user32 cannot be higher than 2GB cuz there’re not-large-address-aware processes and these dlls always mapped at same address in each process in session (so ASLR randomizes their base across reboots) .
To make sure nothing allocated/freed there its better to reserve memory before actually process starts. That mean CreateProcess(..CREATE_SUSPENDED..) and use VirtualQueryEx/VirtualAllocEx to reserve high addresses. Then resume process and tell it somehow where’re that reserved addresses that may actually be used by it.
I’m not sure CREATE_SUSPENDED will work here. During process creation the first thread will be created (suspended), but it’s user mode stack will probably be allocated as part of that. I haven’t tested it, but it seems it must work like that since if it didn’t allocate the stack until the process was resumed, the stack allocation might fail causing process creation to fail after CreateProcess has returned.
Yes, thread’s stack will be there as well as TEB, PEB, RTL_USER_PROCESS_PARAMETERS and some stuff related to SXS and compatibility layer.
But assuming its your application and you don’t use such tricks like exiting from main thread and continue to live then (by bypassing CRT’s main epilog that calls ExitProcess) then everything should be fine.
PS: I mentioned VirtualQueryEx due to AFAIR VirtualAlloc(Ex) (MEM_RESERVE) succeeds when trying to reserve already reserved pages, so need to check with VirtualQueryEx to find MEM_FREE to reserve them and only them. And (if you’re paranoic) that logic can be extended so if it will find anything unexpected there (something not from 1st sentence) – then disable ‘risky’ extensions.