Date: | July 14, 2016 / year-entry #147 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20160714-00/?p=93875 |
Comments: | 18 |
Summary: | Set the file allocation information. |
A customer wanted to create a file with the following properties:
The last requirement exists because there are third party tools that read the log files, and those tools are just going to use traditional file I/O to access the log file.
The customer suggested an analogy:
"If we were operating on The file system team responded with this solution:
Use
the
The effect of setting the file allocation info lasts only as long as you keep the file handle open. When you close the file handle, all the preallocated space that you didn't use will be freed. Here goes a Little Program. Remember, Little Programs do little to no error checking. #include <windows.h> int __cdecl main(int argc, char** argv) { auto h = CreateFile(L"test.txt", GENERIC_ALL, FILE_SHARE_READ, nullptr, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, nullptr); FILE_ALLOCATION_INFO info; info.AllocationSize.QuadPart = 1024LL * 1024LL * 1024LL * 100; // 100GB SetFileInformationByHandle(h, FileAllocationInfo, &info, sizeof(info)); for (int i = 0; i < 10; i++) { DWORD written; WriteFile(h, "hello\r\n", 7, &written, nullptr); Sleep(5000); } CloseHandle(h); return 0; }
This program creates a file and preallocates 100GB of disk space for it.
It then writes to the file very slowly.
While the program is running, you can do a
The preallocated disk space is also released when you call
There's a special gotcha about setting the file allocation info: If you set the file allocation info to a nonzero value, then the file contents will be forced into nonresident data, even if it would have fit inside the MFT. |
Comments (18)
Comments are closed. |
Given that a file is more like an std::deque than an std::vector (data is allocated in biggish chunks and is never copied around) it’s not really clear what kind of performance advantage they are after by preallocating everything; after all, deque itself doesn’t have a reserve method because it’s mostly useless. Even the additional locality shouldn’t matter much, given that a log is normally append only (and is read sequentially). Maybe the customer had some mistaken idea about the inner workings of the file system?
I see two reasons they may want to do this:
1) you get fail on open (well, failure around the time the Open happens) rather than fail on write
2) you get to reserve your space up front, making sure that no other user of the volume can take of the space you need (again working to prevent fail on write). It’s like sending someone into the movie theatre early to reserve 8 seats before anyone else arrives.
But… but.. let’s say the disk only had 50GB of space left. Which is better: To write 50GB of log, and then fail, or to fail when trying to create a 100 GB log file? In the second choice, nothing gets logged and the program might not even start.
In that case, in a program bigger than a “little program”, you have a second failure path that creates a smaller log file. In that log file, you write “Couldn’t create Log file – Quitting” (or, you steal as much space as makes sense and pre-allocate a smaller log). The idea is to reduce the likelihood of a fail-on-write-to-the-log as much as you can.
Logging is a particularly good example of this pattern because it is a cross-cutting concern. A well-written application is likely to perform logging calls at many different layers of abstraction and in many different contexts. It is not practical to correctly handle a logging failure at every one of these call sites, so most sane logging frameworks just swallow logging errors silently (with perhaps a message to stderr, if you’re lucky). In this regard it is much like how many garbage-collected languages handle throwing finalizers: you can’t clean up from a failed cleanup, nor is the application in a good position to decide what to do about it, so just ignore it and destroy the object anyway.
What are those “LL” in front of 1024? What do they do?
It’s a C++ number suffix to say that the constant is a long long.
It indicates that they’re of type “long long”, which is important here mostly to ensure that the multiplication ends up with the correct type (if it stayed in a 32-bit type, it’d end up as 0 instead since 100GB = 0 modulo 2^32).
LL indicates that the integer literal should be treated as type “long long”.
http://en.cppreference.com/w/cpp/language/integer_literal
From https://msdn.microsoft.com/en-us/library/c70dax92.aspx
To specify an unsigned type, use either the u or U suffix. To specify a long type, use either the l or L suffix. To specify a 64-bit integral type, use the LL, or ll suffix.
Thanks for the collective answers. :)
Why can’t the compiler decide that the way .NET Framework interpreter and Delphi compiler do? Is this some sort of power-developer feature?
Well, things like C++, C# and Delphi are different languages.
auto myint = 0;
auto mylong = 0L;
auto myreallylong = 0LL;
Create two 32-bit numbers (one an int and the other a long) and a 64-bit “long long” in MS C++ (remember, C++ does not specify the bit length of it’s types). In C#:
var myint = 0;
var mylong = 0L;
specify 32 and 64-bit integers (in the .NET world, the bit-length of integral types is part of the standard).
In way, I was asking why “C++ does not specify the bit length of it’s types”? But I guess you implied the answer: The same reason that Wright brothers’ plane didn’t have jet engine. So, thanks.
.NET actually has the same issue.
long l = 1024 * 1024 * 1024 * 100; -> “Error CS0220: The operation overflows at compile time in checked mode”
Dim l As Long = 1024 * 1024 * 1024 * 100 -> “Error BC30439: Constant expression not representable in type ‘Integer'”
The compiler treats the literals as int32s and preforms int32*int32 multiplication on them which overflows. If you add the L suffix everything works because now the literals are all int64s and you are performing int64*int64 multiplication.
long l = 1024L * 1024L * 1024L * 100L;
Dim l As Long = 1024L * 1024L * 1024L * 100L
Yes. Interesting how I never run into this on .NET: I never had to manually allocate a very large number to my variables during my career.
Actually, you might need the suffix to declare constants for use in Interop too.
Taking example for a recent support case in MSDN forum:
Public Enum ACCESS_MASK As UInteger
‘…
GENERIC_READ = &H80000000UI
‘…
End Enum
Try take away “UI” at the end and see if it can compile.
Reserve disk space with this one weird trick!
(Sorry, couldn’t help myself. Looks like quite a neat solution actually.)
I agree with Brian: it’s not a bad thing to make sure you have space to write to your log file. If something goes wrong and the disk fills up, at least you can write ‘couldn’t generate output, disk full’ to your log file. Yes, you’d probably find out from disk usage monitoring, but having it in the log can save time troubleshooting, particularly if it’s only a brief condition – for instance if your app cleans up a large output file after a failed write.
This seems similar to fallocate() on Linux, which I have used a couple of times to achieve the same sort of result. File transfer programs don’t seem to use it but I think that this method would be handy when you are copying a large file because you can guarantee beforehand that the copy won’t fail due to lack of space on the destination. I’m sure there’s a good reason but off the top of my head I can’t think of what that might be.
The documentation for SetFileInformationByHandle() seems to imply that not all file systems support all features: is there any documented guidance regarding what common file systems support which information class?