How can I reserve a range of address space and create nonzero memory on demand when the program reads or writes a page in the range, even when multithreading?

Last time, we described how you can become the page access manager for a range of pages, but it required that all the accesses came from one thread at a time because you don't want another thread to be able to access the memory while it is still being prepared. That requirement exists because we are preparing the pages in place, and once you unprotect the page so you can prepare the page, another thread can sneak in and see the pages before they're ready.

Let's see what we can do to get this to work in the multithreading case, too.

Unfortunately, I don't see a version of VirtualAlloc that lets you say, "Please take this page of memory I already have and map it into that location over there." You can do it if you're willing to use AWE, but that requires permission to allocate physical memory, and you lose the ability to write-protect pages (which makes detecting dirty pages harder), and it works only on natively 32-bit versions of Windows.

So we'll have to use a different trick: mapping the same block of memory into two locations. We'll take the trick a step further and map the same memory twice, but with different permissions.

First, create a shared memory block with CreateFileMapping, specifying a page protection of PAGE_READWRITE. This gives you read/write access to the underlying memory.

Next, map the shared memory block with MapViewOfFile, specifying a file mapping access of FILE_MAP_WRITE, since we will eventually want to give the client write access (just not at first). This is the memory region that will be used to hold client-visible memory. Right now, it's filled with zeroes, but we'll fix that soon.

Use VirtualProtect to change the page protection to PAGE_NOACCESS for all the pages. This removes access to all the pages. The client-visible memory is now ready.

When an access violation occurs and you want to swizzle some memory and map it in, here's what you do:

Use the faulting address to figure out which page of data needs to be swizzled and mapped in.

Use some sort of synchronization to make sure only one thread is doing the swizzling for this page. If you discover that the page has already been swizzled, then you are done because the other thread already did the work for you.

Otherwise, you are the first thread to handle the access violation. Find the corresponding page in your file mapping and use MapViewOfFile with a file mapping access of FILE_MAP_WRITE. This creates a second view of the page in which the client just took an access violation.

Use this second view to create the data that you eventually want to make visible to the client. Note that we have two views to the same data: A no-access view that the client knows about and a read-write view that only you know about.

When you're happy with the page of data, you can unmap the second view since you don't need it any more.

Use VirtualProtect to change the page protection of the client-visible page to PAGE_READONLY. Do this only for the one page that you prepared. This "opens up" that page in the view, converting it from PAGE_NOACCESS to PAGE_READONLY.

Similarly, when you encounter a write access violation on a page in the client-visible view, you mark the page as dirty and upgrade the page to PAGE_READWRITE. When the client closes the database, you unswizzle the dirty pages and write them back out. (If you want to be super-clever, you could also unswizzle the pages and write them out even before the client closes the database. Remember to make the pages read-only, so that you can detect when the client dirties the pages again.)

Notice that the client-visible file mapping now contains a mix of no-access pages, read-only pages, and read-write pages.

There are some obvious optimizations you can perform here.

First of all, you don't have to create a single file mapping for everything. Creating the file mapping will take a commit charge for the entire size of the mapping, even if you end up not using all of it. Instead, you can start with a small file mapping (say, one megabyte), and when you use up all those pages, you create a new file mapping to hold the next megabyte. This creates extra bookkeeping for your page management code, but you won't have more than a megabyte of "extra" memory committed.

Another optimization is to cache the views that you use to prepare the swizzled pages. At one extreme, you could just map them in as read-write and just leave them mapped indefinitely. Or you could keep the few most recent views around, hoping for data locality.

Anyway, that's the sketch of how you can have a process-wide block of user-mode-managed addresses where you control what happens the first time the client reads from or writes to that page.

Marvy says:

January 27, 2018 at 7:01 pm

I’ve long wondered something about mapping the same memory at two different addresses. If you program in assembly, then it makes perfect sense. If you program in some high-level language, then maybe it also makes sense, depending on the language spec. But what if you program in C? Consider:

void f(int* x, int* y) {
if(x==y) return;
*x = 4;
*y = 8;
assert(*x == 4); // can this check be optimized out??
}

The pointers are not const, so we are allowed to write to them.
There are no threading games going on.
They are not volatile, so presumably anything we wrote we can read back unchanged.
But if x and y are two mappings of the same memory, this fails.
Does this mean that to be fully standards compliant, we should declare them volatile, or is there some subtlety at work here?

cheong00 says:

January 28, 2018 at 8:37 pm

The use of mapped address is like the use of “pointer to pointer” in C.

In this case, it’s “You have two pointers. One points to the memory address directly, the other points to some memory location that contains address to the memory address referenced by the first pointer”.

1. David Trapp says:
  
  January 30, 2018 at 9:09 am
  
  Yes but the question here was what the compiler’s view on this would be, right?
  I guess it won’t optimize this away because as soon as you dereference a pointer, the compiler can never know what’s the value you get, even if you accessed the same pointer one line above. Even when you application is single-threaded, *another process* may have changed the memory in the meantime (in case of shared memory), or maybe you accessed some hardware I/O which is done through memory access (there are even cases in which a “write” is actually a call to kind of a “function implemented by hardware” and the “read” will give you something entirely different). So I would assume that this check can in never be optimized out.
  
  1. JDG says:
    
    February 2, 2018 at 12:36 pm
    
    C and C++ have something called the Strict Aliasing Rule. It states that no two parameters to the same function that are pointers to different fundamental types may alias. Implied by this wording is that if they *are* of the same fundamental type, then they *might* alias, which rules out certain optimizations. In some variants, such as C99 with its “restrict” keyword, you can actually tell the compiler, “Even though there are other pointer parameters of the same type, I promise they won’t alias, go ahead and optimize!”
    
    1. Marvy says:
      
      February 3, 2018 at 11:06 am
      
      But here, we explicitly check that x and y are not equal, so the compiler “knows” they don’t alias.

Joe White says:

January 28, 2018 at 5:37 pm

Huh. It wouldn’t have occurred to me that you could change the VirtualProtect settings on a block of memory that came from MapViewOfFile. I would’ve expected that MapViewOfFile “owns” that memory. Changing its protection feels like visiting someone’s house and replacing their curtains.

SI says:

February 1, 2018 at 7:17 am

The missing version of VirtualAlloc would be quite useful for expanding sparse matrices, without having to copy them to the new location. Ran out of column indices? VirtualAlloc a new larger block of memory, and reuse the existing memory block for the known data.

Date:	January 26, 2018 / year-entry #23
Tags:	code
Orig Link:	https://blogs.msdn.microsoft.com/oldnewthing/20180126-00/?p=97905
Comments:	7
Summary:	Some memory mapping magic.

How can I reserve a range of address space and create nonzero memory on demand when the program reads or writes a page in the range, even when multithreading?

Cancel reply