Introduction to Assembly Language#

Decorative image for slide show

Introduction#

So far in this course a number of topics have been covered including Introduction to Data Representation, Architecture of the Atmel ATmega 328 Microcontroller, Introduction to Programming and Program Development and Introduction to Programming with C. In this section we will study assembly language which is a low level language that provides a one-to-one mapping between mnemonic instructions and the machine code that is executed on the microcontroller. This will allow you to see how complex high-level instructions and functions in C are written in assembly language and the final program that gets transferred to the microcontroller.

Decorative image used as a slide background

Topics discussed#

In this lecture we will present an introduction to assembly language, including program structure and syntax as well as operation classifications. We will also revisit the Direct Port Manipulation in C example from Digital I/O Example Program and translates this into assembly language looking at some of the key instructions involved.

Contents#

Machine Code and Assembly Language#

a decaorative image of machine code

This Photo by Unknown Author is licensed under CC BY-SA

Machine code#

Programs are stored on a microcontroller as a series of binary codes located within sequential memory addresses …

… this is known as machine code[1].

The program in our microntroller looks like this

1000101010110001 1000001101111111 1000101010111001 
1000010010110001 1000001101100000 1000010010111001
1000010110110001 1000110001111111 1000010110111001

So what is wrong with machine code?#

A photograph showing some machine code

Source: www.shutterstock.com/image-vector/binary-code-digital-numbers-green-background-1724376772

Well nothing really … if you are a computer!

Human beings looking confused

www.shutterstock.com/search/confused+person

Otherwise … if you’re a human, machine code is difficult to:

  • Write

  • Read

  • Understand

  • Debug,

and most importantly

  • Maintain

Instead, instructions can be written in a mnemonic form termed assembly language and then translated into machine code by an assembler.

Assembly Language#

Every CPU (or family of CPU’s) has an instruction set where each operation that can be performed is represented by a certain binary combination.

The next step up in language levels is to represent each of these binary patterns with a short mnemonic.

Programs written using these mnemonics are known as assembly language programs[2].

An example of an assembly code instruction

Fig. 87 An example of an assembly code instruction ADC - Add with carry. Extracted from the Atmel ATmega328 reference manual.#

A Short history of assembly languages#

Assembly languages were first developed in the 1950s and were referred to as 2nd generation programming language. Assembly language is a low level language that uses mnemonic codes (symbols) to represent machine code instructions, rather than using the instructions’ numeric (binary) values.

Essentially, assembly langauges are a much more readable but directly translatable representation of machine code.

Assembly language is commonly called just assembly, ASM, or symbolic machine code.

Despite the giant leap from machine code to assembly language, by the 1980s its use had largely been overtaken by higher-level languages such as Fortran and C and more recently Python for many applications.

Why learn low level languages#

High Level Language (HLL) programs are designed to be independent of a particular machine architecture. As a result, they rarely take into account any special features of the machine - features which are commonly available to assembly language programmers. Assembly language is therefore good for hardware-specific jobs such as device drivers.

If you understand assembly language, you’ll have an appreciation for the compiler, and you’ll know exactly what it is doing with HLL statements. Once you see how compilers translate seemingly innocuous statements into a ton of machine code, you will begin to understand how HLL code could be optimised.

Good assembly language programmers make better HLL programmers because they understand the limitations of the compiler and they know what it is doing with their code.

As well as these points, assembly programmers will have a better understanding of

  • how data is represented in memory and other external devices;

  • how processors access and execute instructions and how instructions access and process data;

  • how a program accesses external devices – I/O; *how to write efficient code as it requires less memory and execution time.

Why do people use assembly language?#

In short:

  • Speed - assembly language programs are generally the fastest programs around (up to ten times faster that optimized HLL programs (The Art of Assembly Language, 2010))[3].

  • Space - assembly language programs are often the smallest.

  • Capability - you can do things in assembly which are difficult or impossible in HLLs.

  • Knowledge - your knowledge of assembly language will help you write better programs, even when using HLLs.

  • Reverse Engineer/modify (Hack) pre-assembled programs.

Introduction to Assembly Language#

a decaorative image of assembly code

This Photo by Unknown Author is licensed under CC BY-SA

Assembly Language 101#

An assembly language program consists of a series of instructions to an assembler which will then produce the machine code program that is loaded to the microcontroller.

A program is written as a sequence of statements - one statement per line:

  • Lines can be empty to separate sections of code

  • Statements cannot span multiple lines

Each statement contains up to four fields each separated by a space or tab character as shown below:

[label:]    operator    [operand]     [;comment]

All statements must have something in the Operator field, but the label, operand and comment fields can be empty.

Assembly language labels#

The label field is used to create a reference point in the program than can be used to identify/locate a collection of instructions.

Examples:

LOOP        operator

  COUNTER:  operator

MY_CODE     operator

SECTION1    operator

Labels must follow a set of rules and a particular format:

  • All labels must be unique and cannot use system reserved phrases.

  • All labels must start with a letter.

  • Labels can contain letters, numbers, and special characters (symbols, such as @, $, _` ).

  • Labels that don’t begin at column 1 must be followed by a colon character ( : ).

  • Labels are written in all capitals.

Assembly Language Operators#

The operator field contains either an assembly directive or a mnemonic/instruction.

Assembly directives, sometimes termed pseudo-operations are directives to the assembler that will not be translated to machine code but provide information critical to the program’s function or is required by the assembler.

Assembly directives can be used to specify the starting address in memory, generate fixed tables and data, indicate the end of a program and several others.

Assembly mnemonics#

A mnemonic is an instruction that will be directly translated into machine code and is used to manipulate data in some way.

The list of allowed mnemonics/instructions is called the instruction set and is specific to a particular microcontroller architecture. However in general, the mnemonics can be classified into one of six groups:

  • Data Transfer: IN, LD, LDI, LDS, MOV, OUT, ST, STS;

  • Arithmetic: ADD, ADC, ADIW, SUB, SUBI, SBC, INC, DEC, MUL, MULS, FMUL;

  • Logical: AND, ANDI, EOR, OR, ORI;

  • Program flow: BREQ, BRGE, BRNE, BRLO, BRMI, BRPL, CALL, JMP, RET, RJMP;

  • Bit and Bit Test: LSL, LSR, ROL, ROR, ASR, SBI, CBI, BSET, BCLR;

  • MCU Control: BREAK, NOP, SLEEP, WDR.

For this module we have been working with an Atmel ATmega328 microcontroller which is based on the AVR® enhanced (AVRe+) architecture.

Assembler Directives#

The operator field contains either an assembly directive or a mnemonic/instruction.

Assembly directives, sometimes termed pseudo-operations are directives to the assembler that will not be translated to machine code but provide information critical to the programs function that is required by the assembler.

Some common directives include;

  • .CSEG / .DSEG / .ESEG

  • .ORG / .EXIT

  • .EQU / .SET / .DEF / INCLUDE

  • .DB / .DW / .BYTE

Directives are specific to a particular microcontroller family (different to the instruction set). A list of supported directives for the AVR based microcontrollers can be found here.

Assembly Language Operands#

  • The operand field follows the operator and contains the address or data to be used by the instruction.

  • A name (‘label’) can be used to represent the address of the data or a symbol to represent a data constant.

  • The field can be empty if the instructions given by the operator do not need an address or data.

    • As an example the operator NOP (no-operation) requires no operand.

  • Some operators allow for multiple operands and in these cases the operands are separated by commas (,).

Examples 1#
LDI     R16, 0b01010101
ADD     R16, R17
LDS     R2, 0xFF00
NOP
MOV     R16, R17

Assembly Language Comments#

As with the C language, the comment field is there to allow the programmer to include any comments which may make the program easier to understand at a later time or by another reader.

When the assembler is reading the line of text, the comment field is ignored.

Comments also follow a set of rules and a particular format dependent on the assembler being used[4]:

  • If an entire line is a comment, it must start with a semicolon or an asterisk symbol in the first column.

  • If not starting in the first column, the comment must start with a semicolon.

  • The comment must be separated from the operator or operand field by at least one space.

Examples 2#
;This comment line starts with a semicolon
*This comment line starts with an asterisk

operator ;This comment follows an operator
operator

Assembler#

It is important to be aware of the assembler and the structure assembly language programs follow.

The assembler takes the sequence of mnemonics (instructions) written in assembly language and translates them into machine code using the process illustrated in Fig. 88.

The assembly process

Fig. 88 The assembly process#

The assembler processes the assembly language file and generates an object file and listing file(s)

  • An object file is essentially a one-to-one mapping of mnemonics against the binary (sometimes hexadecimal) instruction set

  • The listing file shows each line of the assembly language input along with the memory addresses resolved by the assembler, resulting machine code or data and other diagnostic information.

The linker combines multiple object files as well as any library files and generates an executable which can be loaded onto the microcontroller (this file is often a *.hex file).

Interfacing with Digital I/O example.#

Decorative background image showing the circuit discussed in the example.

Example#

Recall the example from Digital I/O Example Program reproduced again here as Fig. 89. The left and right push buttons connected to the digital inputs D3 and D2 respectively corresponding with Port D Bits 3 and 2 on the Atmega328 microcontroller.

When the left push button is pressed the red LED (Port B Bit 1) is illuminated and the green LED (Port B Bit 0) illuminated when the right push button is pressed.

A photograph of the example circuit which has two buttons and two LEDs.

Fig. 89 A photograph of the example circuit which has two buttons and two LEDs discussed in Digital I/O Example Program.#

C-Language Code#

Let us start with the C-language program we wrote for this (main.c).

#include <stdint.h>

//I/O and ADC Register definitions taken from datasheet
#define	PORTD (*(volatile uint8_t *)(0x2B))
#define DDRD (*(volatile uint8_t *)(0x2A))
#define PIND (*(volatile uint8_t *)(0x29))

#define PORTB (*(volatile uint8_t *)(0x25))
#define DDRB (*(volatile uint8_t *)(0x24))
#define PINB (*(volatile uint8_t *)(0x23))

int main(void)
{
	//Set Data Direction Registers
	DDRD = DDRD & 0b11110011; //setup bits 2 and 3 of port D as inputs
	DDRB = DDRB | 0b00000011; //setup bits 0 and 1 of port B as outputs

	PORTB = PORTB & 0b11111100; //both pins B0 (D8) and B1 (D9) start low
	
	PORTD = PORTD | 0b00001100; // Enable the pull up resistor for bits 2 and 3 of port D
	
	for(;;)
	{
		if((PIND & 0b00000100) == 0)
		{
			PORTB = PORTB | 0b00000001; //sets port B, bit 0 to logic 1/high, switches the LED connected to D8 on
		}
		else if ((PIND & 0b00001000) == 0)
		{
			PORTB = PORTB | 0b00000010; //sets port B, bit 1 to logic 1/high, switches the LED connected to D9 on
		}
		else
		{
			PORTB = PORTB & 0b11111100; //sets bits 0-5 of port B to logic 0/low, switches off both the LED's
		}
	}
	
}

I/O Addresses#

The addresses of the IO registers with information on which can be used in particular contexts.

Fig. 90 The addresses of the IO registers with information on which can be used in particular contexts.#

Assigning a name to the I/O addresses#

We use .EQU, .SET or .DEF to assign a name to a memory location[5]:

.EQU label = 12345
.SET variable = 0x0100
.DEF my_register = R16
First assembler example - assigning names

Assembly equivalent of int main(void)#

In C language we put our code (or calls to external functions) within a main function, written as:

int main(void) {
  // Program code
}

In assembly language, there isn’t a main function as such but rather .CSEG / .DSEG / .ESEG directive along with the .ORG directive are used to define the start address of code, data and EEPROM segments in memory.

In these lines we are telling the assembler that we want the code segment to start at memory location with address \(200_{16}\).

Second assembler example - setting the start address for a program

I/O port access and bitmasking operations#

We can use the IN and OUT operations for reading from and writing to ports respectively, and the ANDI and ORI operations for setting up bitmasks.

We include scans of the documentation for these operators in the following images.

The IN operator The OUT operator The ANDI operator The ORI operator

Setting up the I/O Ports#

Using the C language, we wrote:

DDRD = DDRD & 0b11110011;

to ensure bits 2 and 3 of port D are configured as inputs.

Similar lines were written to set up the output bits in Port B, the starting condition of these bits and then to enable the pull up resistors on Port D.

The direct translation to Assembly language involves three lines for each action as illustrated in Fig. 91.

Setting up and accessing I/O in assembler

Fig. 91 Setting up and accessing I/O in assembler#

Infinite loop#

Using the C language, we created an infinite loop as follows:

for (;;) {
  // Program code
}

This essentially “traps” the program to ensure it continuously loops executing the program code within the code block.

In assembly language we can produce the same result by creating a “Label” and using the operation RJMP (relative jump):

An infinite loop using a label and rjmp.

documented as shown here

The RJMP operators

Detecting a button press#

In C language, to detect a button press we used the ‘if statement’ below with a bit mask corresponding to a particular bit of the port and monitoring for its state changing to 0 or Low.

if ( (PIND & 0b00000100) == 0)
{
  PORTB = 0b00000001;	 // sets port B, bit 0 to logic 1 (high)
                         // which switches the LED connected to D8 on
}

In assembly we can use the compare (CP) and branch if equal (BREQ) instructions to achieve this same implementation.

Documentation for the CP (compare) operation Documentation for the BREQ (branch if equal) operation

Consider the assembly code shown below.

  • In lines 32-36 we read PIND, use a bit mask to select bit 2, compare the value of the selected bit with zero and branch to label LED1 if the value is zero (0x00).

  • In lines 38-42 we read PIND, use a bit mask to select bit 3, compare the value of the selected bit with zero and branch to label LED2 if the value is zero (0x00).

  • If we reach lines 44, then both buttons were high (not pressed) so we use the bitmask 0x11111100 to ensure that both LEDs connected to bits 0 and 1 of port B are turned off. We then jump to the label LOOP (lines 45-47).

Detecting a button press in assembly code

In the next few lines (49-59) we read the current value of Port B into R16 before performing a bitwise or operation with the immediate pattern provided (mask). The new value stored in R16 is then sent to the PORTB register essentially changing the state of the LED to on. Finally the RJMP instruction ensures the program loops back to the start.

Assembly code to switch on LEDs

A similar process is used for LED2.

Comparison with C#

In Fig. 92 we compare the C program with the equivalent assembly program.

Comparing an I/O program written in C with an equvalent assembly program.

Fig. 92 Comparing an I/O program written in C with an equvalent assembly program.#

The assembly code is available for study as a GIST main.asm.

An Advantage of Assembly Language#

Using C language, it is not possible to read or write individual bits of a register or I/O port.

In assembly language, this is possible using bit operations such SBI, CBI, SBIC, SBIS and a handful of others:

The SBI operation. The SBIC operation.

These can only be used on certain registers as identified in the documentation for the I/O memory map:

The I/O memory map showing the registers for which bitwise operations are available.

Comparisons#

In Fig. 93 we show the original C program, the first version of the assembly program, and a version that is using SBIC to directly branch based on the value of a single bit in Port D. These will work in the same way.

Comparison of three equivalent programs

Fig. 93 Comparison of three equivalent programs#

Summary#

In this section:

  • We have introduced assembly language as a direct mapping of mnemonics to machine code.

  • We have explored the basic structure of an assembly language program including operator classification, operands and comments.

  • Finally, we have revisited out digital switch example from C and looked at how this can directly translate to Assembly language and how it can be optimized using specific features of the Atmel ATmega328 microcontroller.

On Canvas#

This week on the canvas course pages, you will find the sample program from today’s lecture, look through this and ensure you are confident in how it works and how the masks are defined and registers set.

There is also a short quiz to test your knowledge on these topics.