Reading Through a String Character by Chracter in Armv8
Arrays, Address Arithmetic, and Strings
CS 301: Assembly Language Programming Lecture, Dr. LawlorIn both C or assembly, you can allocate and access memory in several different sizes:
C/C++ datatype | Bits | Bytes | Register | Admission retentivity | Allocate retentivity |
char | 8 | 1 | al | BYTE [ptr] | db |
short | 16 | 2 | ax | Discussion [ptr] | dw |
int | 32 | 4 | eax | DWORD [ptr] | dd |
long | 64 | viii | rax | QWORD [ptr] | dq |
For instance, we can put full 64-scrap numbers into memory using "dq" (Information Quad-discussion), and and so read them back out with QWORD[yourLabel].
We can put private bytes into memory using "db" (Data Byte), then read them back with BYTE[yourLabel].
C Strings in Assembly
In plain C, yous can put a string on the screen with the standard C library "puts" function:
puts("Yo!");
(Endeavour this in NetRun at present!)
You tin expand this out a bit, by declaring a cord variable. In C, strings are stored as (abiding) character pointers, or "const char *":
const char *theString="Yo!"; puts(theString);
(Effort this in NetRun at present!)
Internally, the compiler does 2 things:
- Allocates retention for the cord, and initializes the retention to 'Y', 'o', '!', and a special nil byte called a nul terminator that marks the end of the string.
- Points theString to this allocated retentivity.
In assembly, these are split up steps:
- Allocate retention with thedb(Data Byte) pseudo instruction, and store characters at that place, like db `Yo!`,0
- Different C++, y'all can declare a string using any of the three quotes: "doublequotes", 'singlequotes', or `backticks` (backtick is on your keyboard below tilde ~)
- However, newlines similar \due north Simply work within backticks, an odd peculiarity of the assembler we apply (nasm).
- Note we manually added ,0 after the cord to insert a zip byte to stop the string.
- If you forget to terminate the string, puts can impress keen garbage later the string until it hits a 0.
- Bespeak at this retention using a jump label, just similar we were going to jmp to the string.
Hither'south an example:
mov rdi, theString ; rdi points to our string extern puts ; declare the function call puts ; call it ret theString: ; label, just similar for jumping db `Yo!`,0 ; data bytes for string (don't forget nul!)
(Endeavour this in NetRun now!)
In associates, there's no syntax divergence betwixt:- a label designed for a jump education (a block of lawmaking)
- a label designed for a call teaching (a function ending in ret)
- a characterization designed as a string pointer (a nul-terminated string)
- a label designed as a information pointer (allocated with dq)
- or many other uses--it's only a pointer!
We can besides alter the pointer, to motion down the string. Since each char is one byte, moving by iv bytes moves past 4 chars hither, printing "o assembly":
mov rdi, theString ; rdi points to our string
add rdi,4 ; move downwardly the string by iv chars
extern puts ; declare the function phone call puts ; call it ret theString: ; label, just similar for jumping db `Hello associates`,0 ; information bytes for cord
(Endeavor this in NetRun now!)
Address Arithmetic
If you allocate more than one constant with dq, they announced at larger addresses. (Recollect that this is backwards from the stack, which pushes each boosted item at an ever-smaller address.) So this reads the 5, like y'all'd expect:
dos_equis: dq v ; writes this constant into a "Data Qword" (viii byte block) dq 13 ; writes some other abiding, at [dos_equis+eight] (bytes) foo: mov rax, [dos_equis] ; read memory at this label ret
(Try this in NetRun at present!)
Adding 8 bytes (the size of a dq, 8-byte / 64-flake QWORD) from the commencement constant puts us straight on meridian of the second constant, 13:
dos_equis: dq 5 ; writes this constant into a "Data Qword" (viii byte cake) dq xiii ; writes some other constant, at [dos_equis+viii] (bytes) foo: mov rax, [dos_equis+8] ; read memory at this label, plus eight bytes ret
(Try this in NetRun now!)
Accessing an Array
An "array" is just a sequence of values stored in ascending order in memory. If nosotros listed our data with "dq", they show up in memory in that gild, so we can practise pointer arithmetics to selection out the value we desire. This returns vii:
mov rcx,my_arr ; rcx == address of the array
mov rax,QWORD [rcx+1*viii] ; load chemical element i of array
retmy_arr:
dq four ; array element 0, stored at [my_arr]
dq 7 ; assortment element 1, stored at [my_arr+8]
dq 9 ; assortment chemical element 2, stored at [my_arr+xvi]
(Try this in NetRun now!)
Did you ever wonder why the first array chemical element is [0]? It's considering it'south zero bytes from the start of the pointer!Continue in mind that each array chemical element above is a "dq" or an eight-byte long, so I motion down by eight bytes during indexing, and I load into the 64-bit "rax".
If the array is of four-byte integers, nosotros'd
declare them with "dd" (data DWORD), move downwards by four bytes per int array chemical element, and store the respond in a 32-fleck register like "eax". Only the arrow annals is ever 64 $.25!mov rcx,my_arr ; rcx == address of the assortment
mov eax,DWORD [rcx+one*4] ; load element 1 of array
retmy_arr:
dd 0xaaabbbcc ; array element 0, stored at [my_arr]
dd 0xc001007 ; assortment element 1, stored at [my_arr+four]
(Attempt this in NetRun now!)
It'southward extremely easy to have a mismatch between one or the other of these values. For example, if I declare values with dw (2 byte shorts), but load them into eax (4 bytes), I'll take loaded 2 values into one register. So this code returns 0xbeefaabb, which is two 16-bit values combined into one 32-bit register:mov rcx,my_arr ; rcx == accost of the assortment
mov eax,[rcx] ; load element 0 of assortment (OOPS! 32-chip load!)
retmy_arr:
dw 0xaabb ; array element 0, stored at [my_arr]
dw 0xbeef ; array element 1, stored at [my_arr+2]
(Endeavour this in NetRun now!)
Y'all tin reduce the likelihood of this type of error by calculation explicit memory size specifier, like "WORD" beneath. That makes this a compile error ("fault: mismatch in operand sizes") instead of returning the incorrect value at runtime.mov rcx,my_arr ; rcx == address of the assortment
mov eax, Word [rcx] ; load chemical element 0 of array (OOPS! 32-bit load!)
retmy_arr:
dw 0xaabb ; array element 0, stored at [my_arr]
dw 0xbeef ; array element 1, stored at [my_arr+2]
(Effort this in NetRun at present!)
(If we really wanted to load a sixteen-bit value into a 32-bit register, we could use "movzx" (unsigned) or "movsx" (signed) instead of a plain "mov".) C++ | $.25 | Bytes | Assembly Create | Associates Read | Example |
char | eight | 1 | db (data byte) | mov al, BYTE[rcx+i*i] | (Endeavour this in NetRun now!) |
short | 16 | ii | dw (data WORD) | mov ax, WORD [rcx+i*2] | (Try this in NetRun at present!) |
int | 32 | 4 | dd (information DWORD) | mov eax, DWORD [rcx+i*4] | (Try this in NetRun at present!) |
long | 64 | 8 | dq (information QWORD) | mov rax, QWORD [rcx+i*8] | (Try this in NetRun at present!) |
Human | C++ | Associates |
Declare a long integer. | long y; | rdx (aught to declare, just use a annals) |
Copy one long integer to another. | y=ten; | mov rdx,rax |
Declare a pointer to an long. | long *p; | rax (nothing to declare, use any 64-bit register) |
Dereference (look up) the long. | y=*p; | mov rdx,QWORD [rax] |
Find the address of a long. | p=&y; | mov rax,place_you_stored_Y |
Access an assortment (easy way) | y=p[two]; | (sorry, no easy style exists!) |
Access an assortment (hard manner) | p=p+2; y=*p; | add together rax,2*8; (move forward by two 8 byte longs) mov rdx, QWORD [rax] ; (take hold of that long) |
Admission an array (as well clever) | y=*(p+2) | mov rdx, QWORD [rax+two*eight]; (yes, that actually works!) |
Loading from the incorrect place, or loading the incorrect amount of data, is an INCREDIBLY COMMON trouble when using pointers, in whatsoever language. You WILL make this mistake at some point over the form of the semester, and this results in a crash (rare) or the wrong data (about often some strange shifted & spliced integer), so exist careful!
Walking Pointers Downwardly Arrays
There's a classic terse C idiom for iterating through a string, by incrementing a char * to walk down through the bytes until you striking the zero byte at the end: while (*p++!=0) { /* practise something to *p */ }
If you unpack this a bit, you detect:
- p points to the first char in the string.
- *p is the first char in the string.
- p++ adds 1 to the pointer, moving to the next char in the string.
- *p++ extracts the first char, and moves the arrow down.
- *p++!=0 checks if the first char is naught (the end of the string), and moves the arrow down
Here'due south a typical case, in C:
char s[]="string"; // declare a cord char *p=s; // point to the start while (*p++!=0) if (*p=='i') *p='a'; // replace i with a puts(s);
(Effort this in NetRun now!)
Hither's a like arrow-walking play tricks, in assembly:
mov rdi,stringStart over again: add together rdi,1 ; move arrow downwards the string cmp BYTE[rdi],'a' ; did we hit the letter of the alphabet 'a'? jne over again ; if not, keep looking extern puts call puts ret stringStart: db 'this is a great string',0
(Try this in NetRun now!)
(We'll see how to declare modifiable strings later.)Reading Through a String Character by Chracter in Armv8
Source: https://www.cs.uaf.edu/2017/fall/cs301/lecture/09_15_strings_arrays.html
0 Response to "Reading Through a String Character by Chracter in Armv8"
Post a Comment