DOS assembly in EMU8086

Ahmet Göker
8 min readNov 5, 2023

--

Hello everyone,

Welcome to this educational blog post. Today, we will explore the fascinating world of printing a string in the EMU8086 emulator. But before we dive into this topic, let’s first gain an understanding of what the microprocessor 8086 is.

Introduction 8086

The microprocessor 8086 is a significant improvement over its predecessor, the 8085. It can understand the same instructions as the 8085, so transitioning to it is relatively smooth.

The 8086 has two main parts: the Bus Interface Unit (BIU) and the Execution Unit (EU). The BIU manages data and control signals between the microprocessor and other computer components, like memory and devices. The EU is where the real action happens. It contains the Arithmetic Logic Unit (ALU) for math operations, flags for decision-making, and 16-bit general-purpose registers for storing data.

These 16-bit registers provide more data-handling power compared to the 8-bit ones in the 8085. However, you can still use them as two separate 8-bit registers if necessary, giving you flexibility.

In essence, the 8086 is an upgraded microprocessor that can work with 8085 instructions while offering enhanced capabilities and more storage for data.

In the operation of the microprocessor 8086, it’s essential to understand the functions of both the Execution Unit (EU) and the Bus Interface Unit (BIU). The BIU plays a crucial role in optimizing instruction processing speed through its instruction prefetch and queue mechanisms. The queue acts as a FIFO(First in First Out) buffer between the BIU and the EU.

Segment Registers

The 8086 microprocessor, known for its 16-bit architecture, uses a segmented memory model, which means that memory is divided into various segments, each with its own address space. To manage these segments, the 8086 uses a set of specialized registers known as segment registers.

These segment registers play a vital role in addressing memory and data storage efficiently. Let’s explore these segment registers in detail:

  1. CS — Code Segment Register:

— The CS register points to the segment where the currently executing program or code resides.
— It is crucial for the microprocessor to fetch instructions and execute them correctly.
— The content of CS is combined with the instruction pointer (IP) to form the actual memory address of the instruction to be executed.

2. DS — Data Segment Register:

— The DS register points to the segment where the data for the current operation is stored.
— It is used for accessing variables, constants, and data structures.
— When you access data, the DS register is combined with the offset (a memory location within the segment) to form the effective memory address.

3. ES — Extra Segment Register:
— The ES register is used as an additional data segment for certain instructions.
— It allows the microprocessor to access more data in a single instruction.

4. SS — Stack Segment Register:

— The SS register points to the segment where the stack, a special area of memory used for temporary data storage, is located.
— All push and pop operations, such as function call and return, are performed within this segment.

The value in a segment register is shifted left by 4 bits and added to an offset to calculate a 20-bit physical address. This mechanism allows the 8086 to address up to 1 MB of memory efficiently.

Segment registers are crucial for proper memory management and data access in the 8086 architecture. They facilitate the segmentation and address calculation processes, ensuring that the microprocessor can access the right data and execute instructions accurately. Understanding how to manipulate these registers is fundamental when programming for the 8086 microprocessor.

How Assembly Assembles to Machine Code:

Every day, as a programmer, you should consider how instructions are assembled into machine code. You’ve probably heard the phrase ‘0s and 1s’ when referring to what occurs at a low level in your computer. This is indeed the case with the code you run. CPUs process code as a series of instructions, which are binary sequences of 0s and 1s that instruct the CPU on what actions to perform. Let me give an example below:

For example, in this context, 10 might indicate an arithmetic operation, 000001 could correspond to the ‘add’ operation, and 0010 and 0011 might represent the binary values for 2 and 3 the operands.

It’s worth noting that early programmers indeed worked directly with just 0s and 1s, which could be quite challenging. This is why assembly languages were developed they offered a more human-readable and user-friendly way to express binary machine code instructions.

So, how do we transition from assembly language to machine code? Assembly languages are translated into binary code using an assembler. Unlike a compiler, the assembler’s task is relatively straightforward. This is because each assembly language instruction directly corresponds to a machine code instruction. The assembly process essentially involves matching the various components of each assembly instruction to their binary equivalents.

https://www.spiceworks.com/tech/tech-general/articles/machine-vs-assembly-language/

EMU8086

emu8086 is a software program that emulates the behavior of the 8086 microprocessor. It’s designed to help people learn and practice 8086 assembly language programming. This tool provides an integrated environment with features like a code editor, assembler, and debugger. With its graphical interface, emu8086 simplifies the process of writing and testing assembly language code. It’s often used in education to teach computer architecture and low-level programming. However, keep in mind that emu8086 doesn’t simulate all aspects of real-world hardware and operating systems, so it’s best suited for learning and early-stage development.

In order to download: https://emu8086-microprocessor-emulator.en.download.it/

Writing a code in EMU8086

In this blog, I will not delve deeply into the microprocessor, as that topic will be explored in future posts. In this section, I will provide a code example to demonstrate its appearance

Example-1

In this example, I will just addition operation with explanation:

org 100h sets the origin (memory location) of the program to 100h, which is a typical starting point for a DOS program.

These two lines are data initialization. They use the mov (move) instruction to store values in registers.

  • mov ax, 5 loads the value 5 into the AX register.
  • mov bx, 2 loads the value 2 into the BX register.

This is an unconditional jump instruction. It redirects the program’s execution flow to a label named calculator. Labels in assembly language are used as markers for specific points in the code.

This is a label, marking the beginning of a code section. In this case, it marks the start of the calculator section.

This line is an addition operation. It adds the value in the BX register to the value in the AX register and stores the result in the AX register. After this instruction, AX will hold the value 7 (5 + 2).

The ret instruction is often used in procedures or subroutines. However, in this context, it doesn't have a corresponding call instruction to return from, so it doesn't serve a specific purpose.

Example-2

In this example, I will be able to print “I love hacking” with 8086 processor:

  • .model small: This line specifies the memory model, indicating that it's using a "small" memory model, which is typical for simple DOS programs.
  • .data: The .data section defines data elements used in the program.
  • message db "I love hacking$": This line declares a null-terminated string named "message" containing the text "I love hacking."
  • .code: The .code section indicates the start of the code segment.
  • main proc: Here, a procedure named "main" begins. This is the main entry point of the program.
  • mov ah, 9: This instruction loads the value 9 into the AH register. In DOS, AH is often used to specify the function code for various interrupt services. Here, 9 indicates "print string."
  • mov dx, offset message: This instruction loads the offset address of the "message" variable (the address of the string) into the DX register.
  • int 21h: This is a software interrupt instruction (interrupt 21h) that calls a DOS service. In this case, it's service 9, which is for printing a string. The AH register specifies the service, and the DX register contains the address of the string.
  • mov ah, 4Ch: This instruction loads the value 4Ch into the AH register, indicating "program termination."
  • int 21h: Another software interrupt (interrupt 21h) is called to exit the program. The AH register tells DOS that the program should terminate.
  • main endp: This marks the end of the "main" procedure.
  • end main: The program's entry point is specified as "main."

reference: https://en.m.wikipedia.org/wiki/DOS_API

Thank you for taking the time to explore this technical blog! I hope you found it both informative and engaging. I aimed to provide practical explanations in an accessible manner. Should you have any questions or require further clarification on any topic discussed, please don’t hesitate to reach out to me via social media.

Social Media

Twitter: https://twitter.com/lockpin010_

LinkedIn: https://www.linkedin.com/in/ahmetgoker/

Youtube: https://www.youtube.com/@lockpin010

Ahmet | Security Researcher | Sociologist

                                                                                                                     

--

--

Ahmet Göker
Ahmet Göker

Written by Ahmet Göker

Full stack Reverser | C-ASM | Security

No responses yet