The case of the DLL that was not present in memory despite not being formally unloaded, part 2 - The Old New Thing
Skip to main content
Dev Blogs
AI
All .NET posts
.NET MAUI<br>ASP.NET Core<br>Blazor<br>Entity Framework
C++<br>C#<br>F#<br>TypeScript
NuGet<br>Servicing<br>.NET Blog in Chinese
Microsoft for Developers<br>Agent Framework<br>Develop from the cloud<br>Xcode<br>ISE Developer<br>TypeScript<br>PowerShell<br>Python<br>Java<br>Java Blog in Chinese<br>Go<br>Microsoft Edge Dev<br>Microsoft 365 Developer<br>Microsoft Entra Identity Developer<br>Microsoft Entra PowerShell
Visual Studio<br>Visual Studio Code<br>Aspire
All things Azure<br>Azure SDK<br>Azure VM Runtime Team<br>Microsoft Azure<br>Azure Cosmos DB<br>Azure DocumentDB<br>Azure Data Studio<br>Azure SQL<br>DevOps<br>DirectX<br>Microsoft Foundry<br>Power Platform
OData<br>Unified Data Model (IDEAs)
Windows Command Line<br>#ifdef Windows<br>Inside MSIX<br>MIDI and music<br>React Native<br>The Old New Thing<br>Windows Developer
Raymond Chen
Last time, we looked at crashes caused by a DLL being removed from memory behind everybody’s back, causing crashes when somebody tried to call into that no-longer-there DLL that everybody thought was still there.
A colleague of mine who was looking at other crashes coming from this process found that most of those other crashes were also of the form "a data structure was corrupted because somebody wrote the single byte 01 into it." That piece of information made everything fall into place for my side of the investigation.
We saw earlier that the bottom bit of the HMODULE is set for datafile module handles. Therefore, if one of these stray 01 bytes happens to overwrite the bottom byte of an existing HMODULE handle, that turns it into a (fake) datafile module handle. And then, during process destruction, a component dutifully cleans up the DLLs they loaded by freeing them (say because they were stored in an RAII type like wil::unique_hmodule), the code will pass this (fake) datafile module handle to FreeLibrary. The FreeLibrary function sees the bottom bit set and says, "Oh, this must be the handle to a module that was loaded via LOAD_LIBRARY_AS_DATAFILE," so it frees it as a datafile.
Freeing a datafile module means undoing the steps that were taken when the module was loaded as a datafile: Unmapping the DLL from memory. In particular, loading a module as a datafile does not add the DLL to the list of DLLs that were loaded as code; therefore, unloading a datafile module doesn’t remove it from that list. As far as the DLL list is concerned, the DLL is still in memory.
A one-bit error caused the code to lie and attempt to free a module handle that did not correspond to a LoadLibrary call, resulting in mass havoc.
The "DLL unmapped from memory" crash is just an alternate manifestation of the "somebody is writing 01 bytes to places they shouldn’t" bug. The original bug had a larger bucket spray than we initially thought.
The good news is that all of the crashes have funneled down to a single bug. The bad news is that you now have to debug this one memory corruption bug.
Unfortunately, at the time of this writing, the root memory corruption bug in the third party program has yet to be identified. We don’t know whether it’s coming from an operating system component or from the program itself. Though the fact that it appears to occur only in one process, where it sprays across multiple modules, suggests that it’s a problem with that program, or that there’s something peculiar about how this specific process uses the system.
If you look at the original stack trace, you can see that the problem is occurring at process termination. That’s probably why the problem has lurked for so long: Crashes at exit often go unnoticed because there is no end-user loss of functionality. The user was finished with the program anyway. Whether it exits cleanly or with a crash doesn’t affect the user much.
Sorry. Not all stories have a happy ending.
19
Category<br>Old New Thing
Topics<br>Code
Share
Author
Raymond Chen
Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.
19 comments
Join the discussion.
Leave a commentCancel reply<br>Sign in
Code of Conduct
Sort by :
Newest
Newest<br>Popular<br>Oldest
I think the problem lies in UnbindCrtHandles. Console handles are not managed by the CRT, and the CRT does not change their state. Therefore, if UnbindCrtHandles destroys the handles, FreeConsole will fail. You can reach out to the Windows Console team at github.com/microsoft/terminal, they don’t own the UCRT, but they do own the console host, the driver, and FreeConsole.
Thank you, your insights have been awesome and I appreciate...