Reading Through a String Character by Chracter in Armv8

Arrays, Address Arithmetic, and Strings

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

In both C or assembly, you can allocate and access memory in several different sizes:

C/C++ datatype Bits Bytes Register Admission retentivity Allocate retentivity
char 8 1 al BYTE [ptr] db
short 16 2 ax Discussion [ptr] dw
int 32 4 eax DWORD [ptr] dd
long 64 viii rax QWORD [ptr] dq

For instance, we can put full 64-scrap numbers into memory using "dq" (Information Quad-discussion), and and so read them back out with QWORD[yourLabel].

We can put private bytes into memory using "db" (Data Byte), then read them back with BYTE[yourLabel].

C Strings in Assembly

In plain C, yous can put a string on the screen with the standard C library "puts" function:

puts("Yo!");      

(Endeavour this in NetRun at present!)

You tin expand this out a bit, by declaring a cord variable.  In C, strings are stored as (abiding) character pointers, or "const char *":

const char *theString="Yo!"; puts(theString);      

(Effort this in NetRun at present!)

Internally, the compiler does 2 things:

  • Allocates retention for the cord, and initializes the retention to  'Y', 'o', '!', and a special nil byte called a nul terminator that marks the end of the string.
  • Points theString to this allocated retentivity.

In assembly, these are split up steps:

  • Allocate retention with thedb(Data Byte) pseudo instruction, and store characters at that place, like    db `Yo!`,0
    • Different C++, y'all can declare a string using any of the three quotes: "doublequotes", 'singlequotes', or `backticks` (backtick is on your keyboard below tilde ~)
    • However, newlines similar \due north Simply work within backticks, an odd peculiarity of the assembler we apply (nasm).
  • Note we manually added ,0 after the cord to insert a zip byte to stop the string.
    • If you forget to terminate the string, puts can impress keen garbage later the string until it hits a 0.
  • Bespeak at this retention using a jump label, just similar we were going to jmp to the string.

Hither'south an example:

mov rdi, theString ; rdi points to our string extern puts  ; declare the function call puts    ; call it ret  theString:    ; label, just similar for jumping 	db `Yo!`,0  ; data bytes for string (don't forget nul!)      

(Endeavour this in NetRun now!)

In associates,  there's no syntax divergence betwixt:
  • a label designed for a jump education (a block of lawmaking)
  • a label designed for a call teaching (a function ending in ret)
  • a characterization designed as a string pointer (a nul-terminated string)
  • a label designed as a information pointer (allocated with dq)
  • or many other uses--it's only a pointer!

We can besides alter the pointer, to motion down the string.  Since each char is one byte, moving by iv bytes moves past 4 chars hither, printing "o assembly":

mov rdi, theString ; rdi points to our string
add rdi,4 ; move downwardly the string by iv chars
extern puts ; declare the function phone call puts ; call it ret theString: ; label, just similar for jumping db `Hello associates`,0 ; information bytes for cord

(Endeavor this in NetRun now!)

Address Arithmetic

If you allocate more than one constant with dq, they announced at larger addresses.  (Recollect that this is backwards from the stack, which pushes each boosted item at an ever-smaller address.)  So this reads the 5, like y'all'd expect:

dos_equis: 	dq v   ; writes this constant into a "Data Qword" (viii byte block) 	dq 13  ; writes some other abiding, at [dos_equis+eight] (bytes)   foo: 	mov rax, [dos_equis] ; read memory at this label 	ret

(Try this in NetRun at present!)

Adding 8 bytes (the size of a dq, 8-byte / 64-flake QWORD) from the commencement constant puts us straight on meridian of the second constant, 13:

dos_equis: 	dq 5   ; writes this constant into a "Data Qword" (viii byte cake) 	dq xiii  ; writes some other constant, at [dos_equis+viii] (bytes)  foo: 	mov rax, [dos_equis+8] ; read memory at this label, plus eight bytes 	ret

(Try this in NetRun now!)

If you add anything between 0 and eight, like adding ane byte, you will load role of the five and part of the 13, resulting in a weirdly split and shifted effect.

Accessing an Array

An "array" is just a sequence of values stored in ascending order in memory.  If nosotros listed our data with "dq", they show up in memory in that gild, so we can practise pointer arithmetics to selection out the value we desire.  This returns vii:

mov rcx,my_arr ; rcx == address of the array
mov rax,QWORD [rcx+1*viii] ; load chemical element i of array
ret

my_arr:
dq four ; array element 0, stored at [my_arr]
dq 7 ; assortment element 1, stored at [my_arr+8]
dq 9 ; assortment chemical element 2, stored at [my_arr+xvi]

(Try this in NetRun now!)

Did you ever wonder why the first array chemical element is [0]?  It's considering it'south zero bytes from the start of the pointer!

Continue in mind that each array chemical element above is a "dq" or an eight-byte long, so I motion down by eight bytes during indexing, and I load into the 64-bit "rax".

If the array is of four-byte integers, nosotros'd

declare them with "dd" (data DWORD), move downwards by four bytes per int array chemical element, and store the respond in a 32-fleck register like "eax".  Only the arrow annals is ever 64 $.25!
mov rcx,my_arr ; rcx == address of the assortment
mov eax,DWORD [rcx+one*4] ; load element 1 of array
ret

my_arr:
dd 0xaaabbbcc ; array element 0, stored at [my_arr]
dd 0xc001007 ; assortment element 1, stored at [my_arr+four]

(Attempt this in NetRun now!)

It'southward extremely easy to have a mismatch between one or the other of these values.  For example, if I declare values with dw (2 byte shorts), but load them into eax (4 bytes), I'll take loaded 2 values into one register.  So this code returns 0xbeefaabb, which is two 16-bit values combined into one 32-bit register:
mov rcx,my_arr ; rcx == accost of the assortment
mov eax,[rcx] ; load element 0 of assortment (OOPS! 32-chip load!)
ret

my_arr:
dw 0xaabb ; array element 0, stored at [my_arr]
dw 0xbeef ; array element 1, stored at [my_arr+2]

(Endeavour this in NetRun now!)

Y'all tin reduce the likelihood of this type of error by calculation explicit memory size specifier, like "WORD" beneath.  That makes this a compile error ("fault: mismatch in operand sizes") instead of returning the incorrect value at runtime.
mov rcx,my_arr ; rcx == address of the assortment
mov eax, Word [rcx] ; load chemical element 0 of array (OOPS! 32-bit load!)
ret

my_arr:
dw 0xaabb ; array element 0, stored at [my_arr]
dw 0xbeef ; array element 1, stored at [my_arr+2]

(Effort this in NetRun at present!)

(If we really wanted to load a sixteen-bit value into a 32-bit register, we could use "movzx" (unsigned) or "movsx" (signed) instead of a plain "mov".)
C++
$.25
Bytes
Assembly Create
Associates Read
Example
char eight
1
db (data byte)
mov al, BYTE[rcx+i*i]
(Endeavour this in NetRun now!)
short 16
ii
dw (data WORD)
mov ax, WORD [rcx+i*2] (Try this in NetRun at present!)
int 32
4
dd (information DWORD)
mov eax, DWORD [rcx+i*4] (Try this in NetRun at present!)
long 64
8
dq (information QWORD)
mov rax, QWORD [rcx+i*8] (Try this in NetRun at present!)
Human C++ Associates
Declare a long integer. long y; rdx (aught to declare, just use a annals)
Copy one long integer to another. y=ten; mov rdx,rax
Declare a pointer to an long. long *p; rax    (nothing to declare, use any 64-bit register)
Dereference (look up) the long. y=*p; mov rdx,QWORD [rax]
Find the address of a long. p=&y; mov rax,place_you_stored_Y
Access an assortment (easy way) y=p[two]; (sorry, no easy style exists!)
Access an assortment (hard manner) p=p+2;
y=*p;
add together rax,2*8; (move forward by two 8 byte longs)
mov rdx, QWORD [rax] ;  (take hold of that long)
Admission an array (as well clever) y=*(p+2) mov rdx, QWORD [rax+two*eight];  (yes, that actually works!)

Loading from the incorrect place, or loading the incorrect amount of data, is an INCREDIBLY COMMON trouble when using pointers, in whatsoever language.  You WILL make this mistake at some point over the form of the semester, and this results in a crash (rare) or the wrong data (about often some strange shifted & spliced integer), so exist careful!

Walking Pointers Downwardly Arrays

There's a classic terse C idiom for iterating through a string, by incrementing a char * to walk down through the bytes until you striking the zero byte at the end:
        while (*p++!=0) { /* practise something to *p   */ }

If you unpack this a bit, you detect:

  • p points to the first char in the string.
  • *p is the first char in the string.
  • p++ adds 1 to the pointer, moving to the next char in the string.
  • *p++ extracts the first char, and moves the arrow down.
  • *p++!=0  checks if the first char is naught (the end of the string), and moves the arrow down

Here'due south a typical case, in C:

char s[]="string";   // declare a cord char *p=s;           // point to the start while (*p++!=0) if (*p=='i') *p='a';  // replace i with a puts(s);      

(Effort this in NetRun now!)

Hither's a like arrow-walking play tricks, in assembly:

mov rdi,stringStart over again: 	add together rdi,1 ; move arrow downwards the string 	cmp BYTE[rdi],'a' ; did we hit the letter of the alphabet 'a'? 	jne over again  ; if not, keep looking  extern puts call puts ret  stringStart: 	db 'this is a great string',0      

(Try this in NetRun now!)

(We'll see how to declare modifiable strings later.)

Reading Through a String Character by Chracter in Armv8

Source: https://www.cs.uaf.edu/2017/fall/cs301/lecture/09_15_strings_arrays.html

0 Response to "Reading Through a String Character by Chracter in Armv8"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel