A Tale of Two File Names

GranPC1 pts1 comments

A Tale of Two File Names - tomgalvin.uk

2015-06-09

A Tale of Two File Names

25 minute read

Users of DOS or older versions of Windows will have invariably stumbled upon a quirk of Windows’ handling of file names at some point. File names which are longer than 8 characters, have an extension other than 3 character long, or aren’t upper-case and alphanumeric, are (in some situations) truncated to an ugly shorter version which contains a tilde (~) in it somewhere. For example, 5+6 June Report.doc will be turned into 5_6JUN~1.DOC. This is relic of the limitations brought about by older versions of FAT used in DOS and older versions of pre-NT Windows.

In case you aren’t aware of how 8.3 file names work, here’s a quick run-down.

All periods other than the one separating the filename from the extension are dropped - a.testing.file.bat turns into atestingfile.bat.

Certain special characters like + are turned into underscores, and others are dropped. The file name is upper-cased. 1+2+3 Hello World.exe turns into 1_2_3HELLOWORLD.EXE.

The file extension is truncated to 3 characters, and (if longer than 8 characters) the file name is truncated to 6 characters followed by ~1. SomeStuff.aspx turns into SOMEST~1.ASP.

If these would cause a collision, ~2 is used instead, followed by ~3 and ~4.

Instead of going to ~5, the file name is truncated down to 2 characters, with the replaced replaced by a hexadecimal checksum of the long filename - SomeStuff.aspx turns into SOBC84~1.ASP, where BC84 is the result of the (previously-)undocumented checksum function.

The name “8.3” derives from the 8 characters in the filename and the 3 characters in the extension (rather than a version number, as I previously assumed). You can still use 8.3 file names in modern Windows - type C:\PROGRA~1\ into Windows Explorer and see where it takes you.

The short file names don't update retroactively - files 5 and 6 were created after the short file name for files 1 to 4 had been created, so only 5 and 6 use the checksum. As expected for an obscure Windows feature, all of this is about as well-documented as the Antikythera mechanism.

This post focuses on the undocumented checksum function mentioned in the final bullet point. I’m a stickler for uncovering undescribed or hidden behaviour, which is why The Cutting Room Floor is one of my favourite websites. As far as I’m aware, there is absolutely no external documentation on this particular checksum function, even on Microsoft’s own support website1 or MSDN23 - this is just asking to be uncovered.

I started digging around by making a small executable in C++ to call the Windows API function GetShortPathName on the command-line parameter, so I could see exactly how this API function behaved. Here’s the source code (you’ll need to import Windows.h):

#include "stdafx.h"

int _tmain(int argc, _TCHAR* argv[])<br>WCHAR shortBuffer[512];<br>GetShortPathName(argv[1], shortBuffer, 512);<br>wprintf_s(L"%s\r\n", shortBuffer);<br>return 0;

I found that the short path name will not change (including tilde number and checksum) once it is created, unless the file name itself is also changed. This means that GetShortPathName must just retrieve the 8.3 file name from disk rather than actually calculating it on the fly. I tested the file checksums against some common hash functions, to no avail - Microsoft must’ve rolled their own at some point. Since I’m not a cryptanalyst I realised the only way that I would find the checksum function would be to open up the Windows API for myself.

My go-to utility for these sorts of things is OllyDbg, a debugger which can be used without the original source code. It has a few nice analysis features and a light footprint so I keep it on a USB drive at all times. I figured I’d be able to use it to step inside the Windows API call and see where it leads, which the VS debugger doesn’t allow you to do. Olly allows you to pass command-line parameters to the executable you’re debugging, which is a file name in this case.

I’ve never disassembled anything bigger than a simple crack-me program before, so this was certainly jarring: you’re greeted with an info-dense view of the x86 registers, the disassembly of the executable with calls and jump paths highlighted, and what I can only assume is a memory dump of the executable or the stack. Execution pauses at the entry point and allows you to step through, set breakpoints, and other typical debugger features.

I stepped through the executable, watching crap from the MSVC runtime shuffle around the registers, until I reached a call to KERNEL32. GetShortPathNameW, which is the call we’re looking for. I stepped inside, not too sure what to expect, and was greeted with… a tangle of unintelligible assembly.

I soon realised that I’d make no progress here unless I knew what I was looking for. I’d need some form of reference to guide myself along the disassembly. Unfortunately the Windows API is closed tighter than an aeroplane door; it may as...

file windows name names characters checksum

Related Articles