Date: | May 13, 2005 / year-entry #119 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20050513-26/?p=35643 |
Comments: | 28 |
Summary: | Even after moving the character conversion out of the getline function, profiling reveals that getline is still taking nearly 50% of our CPU. The fastest code is code that isn't there, so let's get rid of getline altogether. Oh wait, we still need to break the file into lines. But maybe we can break the... |
Even after moving the character conversion out of the
#include <algorithm> class MappedTextFile { public: MappedTextFile(LPCTSTR pszFile); ~MappedTextFile(); const CHAR *Buffer() { return m_p; } DWORD Length() const { return m_cb; } private: PCHAR m_p; DWORD m_cb; HANDLE m_hf; HANDLE m_hfm; }; MappedTextFile::MappedTextFile(LPCTSTR pszFile) : m_hfm(NULL), m_p(NULL), m_cb(0) { m_hf = CreateFile(pszFile, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if (m_hf != INVALID_HANDLE_VALUE) { DWORD cb = GetFileSize(m_hf, NULL); m_hfm = CreateFileMapping(m_hf, NULL, PAGE_READONLY, 0, 0, NULL); if (m_hfm != NULL) { m_p = reinterpret_cast<PCHAR> (MapViewOfFile(m_hfm, FILE_MAP_READ, 0, 0, cb)); if (m_p) { m_cb = cb; } } } } MappedTextFile::~MappedTextFile() { if (m_p) UnmapViewOfFile(m_p); if (m_hfm) CloseHandle(m_hfm); if (m_hf != INVALID_HANDLE_VALUE) CloseHandle(m_hf); } This very simple class babysits a read-only memory-mapped file. (Yes, there is a bit of oddness with files greater than 4GB, but let's ignore that for now, since it's a distraction from our main point.) Now that the file is memory-mapped, we can just scan it directly. Dictionary::Dictionary() { MappedTextFile mtf(TEXT("cedict.b5")); typedef std::codecvt<wchar_t, char, mbstate_t> widecvt; std::locale l(".950"); const widecvt& cvt = _USE(l, widecvt); // use_facet<widecvt>(l); const CHAR* pchBuf = mtf.Buffer(); const CHAR* pchEnd = pchBuf + mtf.Length(); while (pchBuf < pchEnd) { const CHAR* pchEOL = std::find(pchBuf, pchEnd, '\n'); if (*pchBuf != '#') { size_t cchBuf = pchEOL - pchBuf; wchar_t* buf = new wchar_t[cchBuf]; mbstate_t state = 0; char* nextsrc; wchar_t* nextto; if (cvt.in(state, pchBuf, pchEOL, nextsrc, buf, buf + cchBuf, nextto) == widecvt::ok) { wstring line(buf, nextto - buf); DictionaryEntry de; if (de.Parse(line)) { v.push_back(de); } } delete[] buf; } pchBuf = pchEOL + 1; } }
We simply scan the memory-mapped file for a Exercise:Why don't we have to worry about the carriage return that comes before the linefeed?
Exercise:Why don't we have to worry about
possibly reading past the end of the file when we check
With this change, the program now loads the dictionary in 480ms (or 550ms if you include the time it takes to destroy the dictionary). That's over twice as fast as the previous version.
But it's still not fast enough. A half-second delay between hitting
[Raymond is currently on vacation; this message was pre-recorded.] |
Comments (28)
Comments are closed. |
Exercise:Why don’t we have to worry about possibly reading past the end of the file when we check *pchBuf != ‘#’?
because that check is done at:
while (pchBuf < pchEnd)
Hmm
Something is strange here.
When I run your version 2 and 3 I get the following times (constructor only) :
version 2: 570 ms
version 3: 430 ms
So, on my system version 3 is only 25% faster.
This is far from over twice as fast.
Are you sure that you did not change anything else between the two version ?
Perhaps you changed to a single-threaded library, or you changed some optimization setting ?
What compiler are you using?
What does your compiler command line look like ?
My profiling of your version 2 indicates that the biggest problem is still in the character conversion.
40% is spent in do_in, only 12% in getline
I am using VS 2003 with a single-threaded library with /O2 optimization
If I dig a little deeper in the profiling of do_in I find that it is still calling MultiByteToWideString once for each character in the file.
A much better solution is to call MultiByteToWideString manually for each line instead of using codecvt
Raymond’s on vacation but maybe I can offer a consolation reply. As I recall he did the time testing on an older machine of his at home. It’s entirely possible that different microarchitectures have different costs.
I have a copy of exactly what Raymond built and I’ll be posting another analysis on my own blog later today and we’ll see if my measurements (on modern hardware) are more in line with what you see.
I suspect he built it with /O2 and that’s all but I’m not sure it would matter much because virtually all the cost (to-date) has been in the libraries.
What are you guys using to profile the code? Is there some tool in vs.net that I’m not aware of?
What tool do you use to profile your code?
Are there any good free (or open source) profilers out there and what’s the best commercial one?
Raymond answered which profiler he uses here http://blogs.msdn.com/oldnewthing/archive/2005/05/11/416430.aspx#416480
The fun continues as today we look at Raymond’s third improvement.
Raymond starts using some pretty…
This program hits a lot of different elements of the computer, disk drives, memory, and CPU. Each different hardward configuration will have bottlenecks and performance issues manifest in different areas.
Perhaps the codecvt class in VS.NET 2005 does the whole string at once, while the one in VS.NET 2003 does it one character at a time? I think Raymond’s using 2005 (from his last post anyway).
I have a question. Is it possible for a control character such as ‘n’ to appear in the trail byte of a DBCS character? If the answer is yes, will std::find safely ignore these bytes?
I have a side question though. Why aren’t there newline or any extra features in notepad.exe. I know a lot of us use notepad for the most basic of files. I was wondering why there isn’t support of different encodings and such?
Berlin Brown:
Because (as answered in another post) notepad is a simple wrapper around the TextEdit standard windows control (and wordpad a wrapper around the RichText control).
I’m wondering how long it will take before the entire C++ runtime library is gone from this example :)
while (pchBuf < pchEnd) {
const CHAR* pchEOL = std::find(pchBuf, pchEnd, ‘n’);
pchBuf = pchEOL + 1;
}
The last last line had better end with ‘n’. Otherwise you’ll get extremely poor performance
I don’t like the way the MappedTextFile constructor silently fails. I think you should be using exceptions to report errors. Much of the standard library can throw exceptions anyway, so you have no hope of avoiding them.
indranilbanerjee wrote: "The last last line had better end with ‘n’. Otherwise you’ll get extremely poor performance"
If std::find() doesn’t find the target value, it returns the "last" iterator (pchEnd). This does mean that "pchBuf = pchEOL + 1;" could potentially put pchBuf beyond the end of the buffer, which in general has undefined behaviour, but by using memory-mapped files Raymond has already stepped outside portable C++.
Nikos,
File mapping IS the disk cache on NT. The disk cache on NT works by mapping the file into memory and then reading from the mapped section.
In your constructor’s intializer list, the order of initialization should match the order of declaration for your members.
my personal experience from mapped files is at best a mixed bag
i’m using a read-only filemap like raymond’s to search for text in files (ok a bit more complex since i’m not mapping all the file at once caring for huge files, win9x etc, but basically the same)
many times i feel that straight CreateFile/ReadFile is faster. My suspicions are:
[*] the overhead associated with file mapping is significant — especially so for small files
[*] somehow file mapping misses the disk cache. So whereas repeated searches the simple way (ReadFile) are practically instantaneous from the 2nd time on (the file is read from the cache rather than HD), searches based on file mapping take more or less the same time, be it the first or 10th scan
any feedback on performance? It isn’t unconceivable i’m doing something wrong, but the read loop is so simple so…
thanks
Dean,
Every call to ReadFile goes to kernel mode. File mappings only need to go to kernel mode (through a page fault) if a page that has been touched does not have the data it should have on it.
Ah, makes sense then. Thanks!
I assume Raymond has a good reason for using the multithreaded C runtime library.
Looking at the analysis that ricom has done over at http://blogs.msdn.com/ricom/, Locking plays a major part in the poor performance.
Nikos:
You must be doing something wrong. I have never seen reading from mapped files be slower than traditional ReadFile.
Can you produce an example program that demonstrates what you are saying ?
Would reading a file from start to finish (just once, then closing it) be faster with a mapped file, or would it be the same as using ReadFile?
Larry
I’m not trying to dispute you or anything. But I’m definite that repeated reads of a file while mapped don’t get any faster, whereas normal reads get *much* faster from the 2nd time onwards. This is from experience.
It’s also different in 2000/XP: the former gets slightly faster even in mapping reads, whereas XP is a total dog in this respect :)
Well, I have tried both your programs, and I see no noticeable difference between the two.
Both programs behave just as you might expect:
The first time you run a search it is very slow because it is actually reading from the disk.
The second time, it is very fast and the disk is not accessed at all.
This is exactly as it is supposed to work.
I am running on Windows XP SP2 with a 1.2 GHZ Pentium 4 and 700 MB memory.
The fun continues as today we look at Raymond’s third improvement . Raymond starts using some pretty