Wasted Neurons Wednesday - COM and EXE files

These days, all Windows programs are .EXE files. But back in the days of DOS, there were also .COM files.

Why no .COM files anymore?

The answer lies with the precursor of MS-DOS - CP/M.

MS-DOS was not actually written by Microsoft - IBM had tried to get CP/M for their new PC, but couldn't agree on the royalty rates. As a hedge bet, they asked Microsoft for an operating system - so Microsoft first licensed, then bought outright someone else's CP/M clone and resold it to IBM!

86-DOS was designed to be source compatible with CP/M, albeit loosely. That meant that most programs could be recompiled for 86-DOS without any difficulties.

So IBM's PC-DOS - IBM always called their MS-DOS version "PC-DOS" - was basically a CP/M clone. And CP/M used .COM files for external commands (programs)...

 

However, .COM files were pretty basic. Just a memory image, really - they were always loaded at the same address - 0100h, just after the Program Segment Prefix. A .COM file is often thought to be limited to under 64Kb in size, but in can be larger - however, you have to write the memory management yourself!

That's the big difference between a .COM file and a .EXE file - the .EXE file can be loaded anywhere rather than at a fixed point, and MS-DOS can also load different parts into different memory segments.

If that last part made no sense to you, be thankful. Segmented memory models are evil, and we should never repeat those mistakes!

 

So that's why we have .COM and .EXE files - the .COM files are there for CP/M compatibility, and the .EXE files are the newer, shinier MS-DOS format.

BUT WAIT!

Then why were so many MS-DOS programs in the old .COM format, rather than the shiny new native .EXE?

 

Well, basically, most of them were small and didn't need to be a .EXE. But a few did grow larger than 64Kb - COMMAND.COM and FORMAT.COM spring to mind - so should have become .EXE files.

Except that MS-DOS didn't actually care whether the application was a .EXE or a .COM filename, because COMMAND.COM checked the first two bytes of the file when loading it - if they were "MZ" or "ZM", it treated it as a .EXE. If those two characters weren't there, it was loaded as a .COM file.

So it didn't actually matter what the file extension was!

More importantly, the filenames had to stay the same for compatibility - it would have been chaos if you'd renamed COMMAND.COM, as many other applications called it for various reasons!

 

And it gets worse.

Of course, there is one final thing to note about .COM/.EXE files that might be useful to know...

.COM takes precedence.

So if you have "PROGRAM.COM" and "PROGRAM.EXE" in the same directory, and you type "PROGRAM" at the command prompt to run it, you're going to load "PROGRAM.COM". Every time.

There were actually some viruses that used that to spread. Very stupid, very dumb viruses that were incredibly easy to spot and remove. But it happened!

There were also applications that would use this trick to load an initial "wrapper" application that would do any checking that they needed - hardware, licensing, whatever - before unloading and loading the .EXE file.

 

Of course, you're very unlikely to have that to run a true .COM file today. These days pretty much everything is a .EXE. So all of this is just wasted neurons...