PE file format: DOS Header

5 minute read

DOS header

First of all, what’s the DOS header?

The DOS header is a structure located right at the start of DOS-MZ executables, and preserved on the newer PE format for backwards compatibility. The beginining of the DOS header is marked by the ASCII sequence “MZ” (4D 5A in hex). Those are the initials of Mark Zbikowski, the designer of the DOS-MZ format.

MZ

In Portable Executable files, the DOS header contains an additional DOS stub containing the following ASCII sequence:

"This program cannot be run in DOS mode."

DOS header stub

Key fields in the DOS header

WORD e_magic: a 2-byte signature that must equal to MZ (0x4D5A in hexadecimal). This sequence lets the loader know that the file is a valid DOS executable (and by extension, a valid PE file).

LONG e_lfanew: In PE files, this newer 4-byte field contains the offset to the NT header. It’s relative to e_magic, the MZ signature) which is the start of the file.

e_lfanew field

In the following example, e_lfanew field corresponds to the byte sequence C8 00 00 00. That means that 0xC8 (ignoring the null values) is the offset to the NT header in modern PE files.

To find the NT header in HxD, let’s go to offset C8 found in the e_lfanew field, this offset is relative to the file start (0x0):

Going to offset 0xC8

New PE header

Can see you see the PE\0\0 signature that I highlighted?

PE\0\0 sig

This marks the start of the IMAGE_NT_HEADERS structure, and Windows uses e_lfanew to find the offset to that structure.

Below is a diagram I made, illustrating it:

Sections diagram I made

C representation of IMAGE_DOS_HEADER

We can take a look at the contents of the DOS header by looking at the IMAGE_DOS_HEADER structure definition from winnt.h

typedef struct _IMAGE_DOS_HEADER {  
    WORD e_magic;    // Magic number (MZ)
    WORD e_cblp;     // Bytes on last page of file
    WORD e_cp;       // Pages in file
    WORD e_crlc;     // Relocations
    WORD e_cparhdr;  // Size of header in paragraphs
    WORD e_minalloc; // Minimum extra paragraphs needed
    WORD e_maxalloc; // Maximum extra paragraphs needed
    WORD e_ss;       // Initial (relative) SS value
    WORD e_sp;       // Initial SP value
    WORD e_csum;     // Checksum
    WORD e_ip;       // Initial IP value
    WORD e_cs;       // Initial (relative) CS value
    WORD e_lfarlc;   // File address of relocation table
    WORD e_ovno;     // Overlay number
    WORD e_res[4];   // Reserved words
    WORD e_oemid;    // OEM identifier (for e_oeminfo)
    WORD e_oeminfo;  // OEM information; e_oemid specific
    WORD e_res2[10]; // Reserved words
    LONG e_lfanew;   // File address of new exe header (important!)
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

The struct has 30 WORD fields (2 bytes each), plus the 4-byte e_lfanew (of size LONG).

Total size: 30 * 2 + 4 = exactly 64 bytes.

Historical background

A lot of you right now may be confused: Why does every PE file need a DOS header, despite modern Windows system not even running DOS programs?

2 words: backwards compatibility.

See, the PE format was designed as an extension of the old DOS MZ executable format. Back then, the Portable Executable format was a new kid on the block. Both DOS-MZ and PE executables were distributed with a .exe extension, so whenever a user attempted to run a PE .exe on MS-DOS, the DOS loaders would behave unexpectedly because they expected a DOS MZ format for .exe extensions, not Portable Executable.

So, in order to make the DOS loaders predictably exit the program upon running a PE executable, Microsoft distributed every PE file with this DOS stub inside the DOS header:

This program cannot be run in DOS mode

So that if a DOS loader tries to execute a PE file, they don’t crash or hang; instead, they immediately print this message in the console and exit the process.

This program cannot be run in DOS mode

Looking back, making this ASCII sequence native to every PE file “might’ve” been a mistake move by Microsoft😅. The PE format evolved to be a much more powerful and modular format afterward, way more than originally speculated, leaving the DOS-MZ forma to die in the shadows. Yet, every PE file in existence still contains those 36 bytes worth of ASCII bloat, to remind DOS loaders from 40 years ago that our modern .exe file cannot be run on DOS systems.

DOS stub code injection

The DOS header (and the DOS stub) can be modified to inject custom code. Back in the days of DOS-MZ, malware authors would inject their custom 16-bit DOS code into the DOS header itself. Then, when a DOS machine tries to run the modified .exe, it will execute the injected DOS code instead of the boring “This program cannot be run in DOS mode” message.

To demonstrate how this is done, I’ll take a random PE executable and replace its DOS stub with a custom 16-bit shellcode to it that prints “@AlonAlush” 5 times in green.

To correctly exit the program after executing our payload, we’ll add the bytes 0x4C01 right after the PE\0\0 signature. In 16-bit MS-DOS assembly, the opcode 4C corresponds to the INT 21h function 4CH, which terminates a process with a return code. Following it, 01 is the exit code passed.

DOS stub shellcode injection

Now let’s run this patched PE in DOSBox, a very popular MS-DOS emulator:

Injected code in DOSBox

As you can see, instead of printing the generic “This program cannot be run in DOS mode”, our exe ran the injected code that printed @AlonAlush name in green 5 times, as we expected. Of course, you can modify the DOS stub to run any 16-bit machine code you’d like.

Leave a comment