Can I enable Large Address Awareness dynamically at runtime?

Comments (46)

Stéphan Leclercq says:

September 6, 2017 at 7:51 am

>> To reduce disk space, they could move the bulk of their code into a DLL and have the EXE be a stub that loads the DLL and then calls the RunTheProgram function in the dll

Let’s be bold: one of the EXE is a stub that loads the full exe with LoadLibrary and call its main() function :-)
1. Darran Rowe says:
  
  September 6, 2017 at 8:08 am
  
  That is something that would be nice to do.
  Inb4 someone gets picky with why this doesn’t work.
2. Joshua says:
  
  September 6, 2017 at 8:52 am
  
  Tried that; it’s theoretically possible if you’re willing to code against RawEntryPoint() and use a bunch of horrible-looking assembly to decode where you are DllMain or not.
  1. Stefan Kanthak says:
    
    September 6, 2017 at 12:03 pm
    
    You don’t need assembler to decode whether your entry point is called as _DllMainCRTStartup() or [w]WinMainCRTStartup(): use the MSC intrinsic _ReturnAddress() to determine whether your caller is NTDLL.dll or KERNEL32.dll
    And you don’t need assembler to clean up the stack: [w]WinMainCRTStartup() does not return, but needs to call ExitProcess(), so you can always use a _DllMainCRTStartup() entry point.
    1. Raymond Chen - MSFT says:
      
      September 6, 2017 at 12:17 pm
      
      I can’t believe you’re recommending that people rely on an extremely subtle implementation detail.
      1. Stefan Kanthak says:
        
        September 6, 2017 at 12:46 pm
        
        I only wrote that you don’t need assembler to detect whether a “raw” entry point is called as DLL or application.
        
        Regarding the subtle details: the _DLLMainCRTStartup() entry is only called by NT’s user space module loader, which is implemented in NTDLL.dll As long as the module loader continues to be implemented in NTDLL.dll the _DLLMainCRTStartup() callback will be called from NTDLL.dll
        This didn’t change in the last 24 years.
      2. Raymond Chen - MSFT says:
        
        September 6, 2017 at 1:02 pm
        
        Sometimes things that haven’t changed in 24 years change. Especially extremely implementation-dependent things like this. (Example coming in a few months.)
      3. Stefan Kanthak says:
        
        September 6, 2017 at 2:14 pm
        
        Of course things change from time to time.
        I don’t expect any production code to rely on the fact that the _DLLMainCRTStartup() callback is called from within NTDLL.dll, as this detail is not documented.
        Microsoft also dared to change other details of this interface at least twice in this millennium: with Windows XP they introduced the undocumented callback reason 4 alias DLL_APPLICATION_VERIFIER, and since Windows 7 they call this entry twice with reason 1 alias DLL_PROCESS_ATTACH when the DLL is registered as an application verifier provider.
      4. Raymond Chen - MSFT says:
        
        September 6, 2017 at 2:29 pm
        
        You say “I don’t expect any production code to rely on” what you wrote, but what you wrote was not qualified with any disclaimers like “Not for production use.” You just came right out and said to do it: “use the MSC intrinsic _ReturnAddress() to determine whether your caller is NTDLL.dll or KERNEL32.dll”.
      5. Stefan Kanthak says:
        
        September 6, 2017 at 2:36 pm
        
        I wrote what ACTUALLY works!
        Maybe I should introduce a nitpickers corner.-)
      6. Raymond Chen - MSFT says:
        
        September 6, 2017 at 2:37 pm
        
        Again, you are not being clear whether the information you are providing is intended for production use. You say “This is what actually works” which implies that it is suitable for production. But it’s not. On this site, I try to be clear about the distinction between contractual behavior and implementation detail. Because I’ve seen what happens when people confuse the two.
      7. Stefan Kanthak says:
        
        September 6, 2017 at 2:53 pm
        
        I replied to the false statement it’s theoretically possible if you’re willing to code against RawEntryPoint() and use a bunch of horrible-looking assembly to decode where you are DllMain or not. to show how this can be done practically and without a bunch of horrible-looking assembly.
        The purpose/intention is solely educational.
      8. Raymond Chen - MSFT says:
        
        September 6, 2017 at 3:11 pm
        
        It was not clear at what point we crossed from documented behavior to implementation-defined. Certainly nobody bothered to say that we landed in the implementation-defined part.
      9. Stefan Kanthak says:
        
        September 7, 2017 at 6:05 am
        
        We never crossed that line, simply because Microsoft provides NO documentation how or by whom _DLLMainCRTStartup() and [w]WinMainStartup() are called.
        The existing documentation for DllMain(), https://msdn.microsoft.com/en-us/library/ms682583.aspx and https://msdn.microsoft.com/en-us/library/ms682596.aspx, is even ambiguous: the name DllMain() is used for the function called by the CRT too. Only the documentation for the IDE/Compiler, https://msdn.microsoft.com/en-us/library/988ye33t.aspx, or the linker, https://msdn.microsoft.com/en-us/library/f9t8842e.aspx and https://msdn.microsoft.com/en-us/library/aa235421.aspx, introduces the names *MainCRTStartup, and tells their calling convention.
        Even the fact that returning from an applications *MainCRTStartup() function does not terminate the process when secondary threads have been created is NOT documented there.
      10. 640k says:
        
        September 6, 2017 at 10:42 pm
        
        Historically ms has treated most implementation details as public interfaces and kept them backwards compatible, instead of breaking them to maintain a better technical solution as a whole, this is what happens when you make such decisions.
      11. Maximilien Noal says:
        
        September 7, 2017 at 5:16 am
        
        @640K :
        
        No, this is what happens when people are lazy and/or do not want to play by the book.
        
        Play stupid games (: relying on implementation detail), win stupid prizes (: a nightmarish code base).
      12. Raymond Chen - MSFT says:
        
        September 7, 2017 at 7:39 am
        
        So you’re saying that we were always on the “documented and supported” side of the line? Or you’re saying that we were always on the “not documented and unsupported” side? Because the original comment sounded like a “working within the bounds of documented and supported” comment.
      13. Stefan Kanthak says:
        
        September 7, 2017 at 7:59 am
        
        I choose the second alternative: we were always in the land of the undocumented.
        I didn’t expect that I need to mention that explicitly.
      14. Raymond Chen - MSFT says:
        
        September 7, 2017 at 9:13 am
        
        As a general rule, on this site, any venture into the undocumented is explicitly called out.
      15. Joshua says:
        
        September 7, 2017 at 9:35 am
        
        On the contrary, I started this thread while remaining in the land of the documented: https://blogs.msdn.microsoft.com/oldnewthing/20040614-00/?p=38903
        
        The horrible assembly is for determining your load address directly and for repairing the stack.
      16. Stefan Kanthak says:
        
        September 8, 2017 at 8:27 am
        
        You don’t need a bunch of ugly assembly to access your own modules MZ and PE headers and determine whether it was built as DLL or EXE: use extern IMAGE_DOS_HEADER __ImageBase; to access the MZ header, get the offset of the PE header from __ImageBase->e_lfanew, then test …->OptionalHeader.DllCharacteristics
        Unfortunately this but does not tell whether your modules raw entry point was called as [w]WinMainCRTStartup(void) or _DLLMainCRTStartup(HMODULE, DWORD, LPVOID).
        Fortunately another commenter pointed out that LoadLibrary(“some.exe”) succeeds, but does not call the entry point at all. Likewise CreateProcess(“some.dll”, “*”, …) fails with ERROR_BAD_EXE_FORMAT
        So: your precondition that a module linked as DLL may be loaded as EXE and vice versa does not hold!
      17. kantos says:
        
        September 7, 2017 at 10:02 am
        
        This is actually all documented in the linker documentation, you don’t even need to do any of that work… you can tell NTLOADER exactly what you want called, that said you DO need to meet your side of the contract if you’re going to do that.
      18. Stefan Kanthak says:
        
        September 8, 2017 at 8:36 am
        
        No, you can’t tell NT’s module loader HOW to call the entry point: this is, as Raymond Chen already wrote, an implementation detail. JFTR: you can specify any symbol you like with /ENTRY: to the linker. If this symbols prototype but does not match the prototype expected by NT’s module loader your module will crash, sooner or later: _DLLMainCRTStartup(HMODULE, DWORD, LPVOID) differs from [w]MainCRTStartup()
    2. Darran Rowe says:
      
      September 6, 2017 at 2:41 pm
      
      After playing around a lot with the raw entry point, it may surprise you to learn that you don’t have to call ExitProcess.
      While it is true that the CRT/VCRuntime source shows that after it returns from (w)main or (w)WinMain it calls exit, this is just an easy way to invoke the process cleanup. Returning from the entry point function exits the process naturally, in fact the exit in general is only called for unmanaged applications. This could change at some point in the future. So again, this is relying on an implementation detail.
      1. Stefan Kanthak says:
        
        September 6, 2017 at 2:56 pm
        
        Simply returning from [w]WinMainCRTStartup() does NOT terminate the process if there is a secondary thread running.
        I don’t like zombies.
      2. Medinoc says:
        
        September 7, 2017 at 1:54 am
        
        I once experimented with that too, in a simple program. As Raymond said in “The old-fashioned theory on how processes exit” ( https://blogs.msdn.microsoft.com/oldnewthing/20070502-00/?p=27023 ), it’s fine when you control all threads (since assuming no one calls ExitProcess(), the process exits naturally when the last thread does) but nowadays you don’t. There is no CleanupThreads() functions, no way for a DLL to register for “thread cleanup notification”, and no way (outside of .Net) to mark your threads as “background threads” or register a standard “this thread should exit its main loop within this second” event.
        
        Which means, as soon as you use any function that spawns a background thread, you’re screwed.
    3. DWalker07 says:
      
      September 7, 2017 at 9:25 am
      
      In the lengthy exchange here, the “what actually works” comment is exactly what gets code into trouble, and causes the need for application compatibility shims, and all kinds of other things-that-should-not-be-necessary.
      
      It has already been pointed out that Microsoft could choose the have an upgrade path to break lots of programs, and say “well the program didn’t do the right thing”, even if the program did “what actually works” (or worked at the time). Then no one would upgrade, because no business-critical software still runs.
      
      It is a huge distinction between what works, and what works but is also future-proof. Some old code written for Windows 95 works, and even installs, and some does not. Some code relies on the presence of C:\Program Files, and some code does not. It may work for YOU, right now….
      1. Stefan Kanthak says:
        
        September 8, 2017 at 10:39 am
        
        1. my comment targeted the wrong claim a bunch of horrible-looking assembly.
        2. even Microsoft introduces incompatible changes from time to time, or removes old interfaces from the Win32 API.
        3. if I write code which relies on undocumented behaviour I don’t expect Microsoft or any other OS-vendor to fix my bugs if things go wrong.
        4. Microsoft “documents” quite some interfaces in header files like winnt.h only. See Raymonds next blog entry.
  2. Kevin Cathcart says:
    
    September 14, 2017 at 10:50 am
    
    Do you even need to do that? I know that having a dll literally export a __stdcall function called “main”, and having the EXE simply DLLImport it worked just fine in a quick console mode app test. Is there some reason that one cannot do that with a function named WinMain, which just happens to be present in the dll, but is not the DLL’s entrypoint?
3. Killer{R} says:
  
  September 6, 2017 at 1:03 pm
  
  When LoadLibrary called to load executeable, but not dll, then it behaves like LoadLibraryEx(..DONT_RESOLVE_DLL_REFERENCES) so you executable loaded as dll will crash when it will try to call any API. So you need to essentially write own PE loader (or spoof NTDLL’s one to think its a really dll).
  1. Joshua says:
    
    September 6, 2017 at 1:29 pm
    
    So how does the loader know it’s an executable not a DLL?
    1. Killer{R} says:
      
      September 6, 2017 at 1:52 pm
      
      obviously by looking for IMAGE_FILE_DLL in IMAGE_FILE_HEADER::Characteristics
  2. ZLB says:
    
    September 8, 2017 at 7:10 am
    
    An unexpected problem with doing that is that it causes Malware/Virus scanner heuristics to start flagging you up.
mikeb says:

September 6, 2017 at 9:35 am

Clunky? I’d call it clever, particularly the small stub solution.
George says:

September 6, 2017 at 11:40 am

This is just a single byte change? Can you write a small utility that modifies the binary?
1. Raymond Chen - MSFT says:
  
  September 6, 2017 at 11:59 am
  
  Modifying a binary is likely to trigger anti-malware protection software. It will certainly invalidate the digital signature.
  1. Joshua says:
    
    September 6, 2017 at 3:26 pm
    
    Jigsaw patch will take care of the digital signature just fine: the modification is known in advance so we can just carry both signatures in the modifier. I prefer your solution of loading a giant DLL from two stub EXEs anyway.
Killer{R} says:

September 6, 2017 at 12:25 pm

If customer concerned only about 3rd party dll’s in own address space then they just can MEM_RESERVE upper 2GB of own (large) address space before loading that risky dll.
And put own heap manager over that reserved space to allocate own buffers on that 21s century’s high memory area.
1. ranta says:
  
  September 6, 2017 at 3:30 pm
  
  I expected that would be the hack. It doesn’t seem entirely safe though:
  – What if the stack of the main thread ends up in the high 2G, and the add-in then crashes when doing arithmetic on addresses of local variables? Fix: run the add-in on another thread started after MEM_RESERVE.
  – What if e.g. kernel32 gets loaded in the high 2G?
  – What if a system DLL reserves some address space before your MEM_RESERVE, but frees it later, and the add-in then happens to allocate it?
  
  I also expected there would be an override via UpdateProcThreadAttribute, but apparently not.
  1. Joshua says:
    
    September 6, 2017 at 5:59 pm
    
    > What if the stack of the main thread ends up in the high 2G, and the add-in then crashes when doing arithmetic on addresses of local variables? Fix: run the add-in on another thread started after MEM_RESERVE.
    
    Yeah. That fixes it. There’s also a trick using only documented APIs to reallocate your stack (begins with ConvertThreadToFiber).
    
    > What if e.g. kernel32 gets loaded in the high 2G?
    
    Impossible.
    
    > What if a system DLL reserves some address space before your MEM_RESERVE, but frees it later, and the add-in then happens to allocate it?
    
    Easily prevented by delay-loading everything but kernel32.dll.
  2. cheong00 says:
    
    September 6, 2017 at 7:53 pm
    
    Btw, I think system DLLs never loads above 2GB boundary on a 32-bit system, because if that’s the case, you can’t have any non-LargeAddressAware processes running on that system. Anytime these returns memory pointer above 2GB, and the non-LargeAddressAware processes may fail.
    1. poizan42 says:
      
      September 6, 2017 at 10:28 pm
      
      Only if their “currently preferred base address” (i.e. the one selected by ASLR) is available. Now you are probably doing something nasty if the address where it wants to load kernel32.dll is already reserved before kernel32.dll is mapped, but for other system dlls it might happen. The reason that the system tries to use the same address for all instances of the same dll is so pages with relocations can be shared as much as possible – this also weakens ASLR because an information leak in one process can be used as part of an exploit of another, other OSes fixes this by using position independent code, but that comes with a performance penalty for x86.
      
      That is at least how it works for 32-bit processes, I’m not sure about 64-bit processes since you already get IP-relative addressing for free.
      1. cheong00 says:
        
        September 7, 2017 at 1:18 am
        
        I think the DLL’s are preloaded but the process get the translated address mapped into their address space. And the system DLLs should always return a within 2GB address when a pointer is needed like when calling GetProcAddress() for some function in kernel32.dll.
        
        There were tons of old application (and DLLs) that uses signed variable type to store pointer. As the memory allocation rules for DLLs follows that of the main program, not returning >2GB value for system DLLs will certainly induce hangs and crash more often, and the general users will think it’s Windows’ fault as “the program runs without problem on WinXP”. I don’t think Microsoft will choose to risk that.
  3. Killer{R} says:
    
    September 7, 2017 at 1:08 am
    
    kernel32/kernelbase/user32 cannot be higher than 2GB cuz there’re not-large-address-aware processes and these dlls always mapped at same address in each process in session (so ASLR randomizes their base across reboots) .
    To make sure nothing allocated/freed there its better to reserve memory before actually process starts. That mean CreateProcess(..CREATE_SUSPENDED..) and use VirtualQueryEx/VirtualAllocEx to reserve high addresses. Then resume process and tell it somehow where’re that reserved addresses that may actually be used by it.
    1. Stewart says:
      
      September 8, 2017 at 12:45 am
      
      I’m not sure CREATE_SUSPENDED will work here. During process creation the first thread will be created (suspended), but it’s user mode stack will probably be allocated as part of that. I haven’t tested it, but it seems it must work like that since if it didn’t allocate the stack until the process was resumed, the stack allocation might fail causing process creation to fail after CreateProcess has returned.
      1. Killer{R} says:
        
        September 8, 2017 at 9:43 am
        
        Yes, thread’s stack will be there as well as TEB, PEB, RTL_USER_PROCESS_PARAMETERS and some stuff related to SXS and compatibility layer.
        But assuming its your application and you don’t use such tricks like exiting from main thread and continue to live then (by bypassing CRT’s main epilog that calls ExitProcess) then everything should be fine.
        PS: I mentioned VirtualQueryEx due to AFAIR VirtualAlloc(Ex) (MEM_RESERVE) succeeds when trying to reserve already reserved pages, so need to check with VirtualQueryEx to find MEM_FREE to reserve them and only them. And (if you’re paranoic) that logic can be extended so if it will find anything unexpected there (something not from 1st sentence) – then disable ‘risky’ extensions.

Comments are closed.

Date:	September 6, 2017 / year-entry #201
Tags:	code
Orig Link:	https://blogs.msdn.microsoft.com/oldnewthing/20170906-00/?p=96955
Comments:	46
Summary:	No, but you can maybe fake it.