A few lessons from building a PC emulator
I spent the (many) rainy days this summer building a PC emulator. At this point it works with an unmodified original IBM PC BIOS and can run DOS in text mode. While building it I learned a few important lessons which I thought would be nice to share.
Lesson 1 - Undefined REP combinations
According to the Intel manual, the REP, REPE and REPNE prefixes can only be combined with certain instructions. REP can be combined with INS, OUTS, MOVS, LODS and STOS. REPE and REPNE can be used with CMPS and SCAS. The effect of other combinations and uses is undefined.
It is not very far-fetched to think that an emulator can do something arbitrary in those undefined cases - but that would not be a good idea at all! As it turns out it is not at all uncommon for software to use these undefined combinations. It is expected that any string instruction will work with any kind of REP prefix.
When it comes to combining REP prefixes with non-string instructions, real processors seem to simply ignore the REP prefixes and run the non-string instruction as is. The counter register remains unchanged.
Lesson 2 - Handling FPU detection code without FPU emulation
FPU detection can turn out to be a major headache if you choose not to implement FPU emulation. FPU detection code can look something like this:
If you just assume that any floating point instruction should cause an exception you will be in for a nasty surprise when software tries to run the detection sequence. In fact those three instructions should not raise an exception. The emulator should just continue with the next instruction as if nothing happened.
Lesson 3 - PUSH SP differences
The processors before 80286 handled PUSH SP differently than the 80286 and later processors. Before 80286, PUSH SP pushed the value of SP after it had been decremented for the push. 80286 and later processors push the value of SP as it was before the instruction was executed.
So does it really matter which way you choose to do it in an emulator? Take a look at the following code:
CMP AX, SP
This code sequence uses the difference to detect if the processor is older than a 80286 or not. Imagine you make your emulator look like a 80286 while it only implements the 80186 instruction set. Some software will then try some non-supported 80286 instructions. The lesson here is that PUSH SP must always be implemented according to which instruction set you emulate.
Lesson 4 - POP SS, STI and MOV SS, Ew have one important thing in common
POP SS, STI and MOV SS, Ew have one important thing in common. They inhibit all interrupts until the following instruction has been executed. Forget this and you may need a few rainy days extra.
Lesson 5 - Never interrupt a segment override prefix doing its job
This lesson depends on how you implement the emulator. Do you allow interrupts between all instructions? Do you treat segment override prefixes as instructions on their own? Do you forget them after the next instruction has run? Is the next instruction always the correct next instruction? Not if an interrupt gets between the two. The lesson here is to never interrupt a segment override prefix doing its job, because if you do you will surely spend another few rainy days hunting down the weirdest things you ever saw. Unfortunately I can tell you that spurious "Bad command error reading device CON" messages are not only caused by the Prague virus...