This version of the book is good! Great for beginners and experts alike. However, in later versions of this book (one of which I have on my bookshelf), the author uses High Level Assembler (HLA) which is according to the back cover of the book; "a revolutionary tool...". It is basically a proprietary compiler that uses a syntax that is "interesting" but not portable to the rest of the world of x86 assembler.
So, if you buy the book (like I did), please be aware of this.
I agree. HLA syntax can best be described as "if someone who knows only C decided to make an Asm syntax". I remember much flamewars on newsgroups about it.
GNU GAS syntax is roughly as repugnant, but became popular only because of GNU/Linux. Deviating from the official docs has caused much divisiveness and confusion.
As someone coming from a C++/Rust background and dabbling in retro game assembly, the biggest roadblocks I've faced trying to understand hand-written SPC700 assembly are flags, memorizing/looking up instructions, branches and jumps being "backwards" from C++ in x86-64 and a total wildcard in SPC700, unstructured control flow, and integer literals being dereferences rather than immediates by default. I haven't tried using HLA yet, but I get the impression its syntax would greatly assist with unstructured control flow (and possibly distinguishing between pointers/offsets and immediates). The manual at https://www.plantation-productions.com/Webster/HighLevelAsm/... is relatable:
> For example, one might test the carry flag after an addition to determine if an unsigned overflow has occurred using code like the following:
add eax, 5
jnc NoOverflow
<< code to execute if overflow occurs >>
NoOverflow:
> Although this code is straightforward, you would be surprised how many students cannot visualize this code. On the other hand, if you feed them some pseudo code like:
add eax, 5
if( the carry flag is set ) then
<< code to execute if overflow occurs >>
endif
> Those same students won't have any problems understanding this code.
I too find reading assembly a hindrance to understanding the intent and behavior of the code, and I mentally map assembly to high-level behaviors similar to structured programming. If HLA is more accessible and productive for regular programmers as a tool to read/write reasonably-efficient code on architectures without good C compilers, I don't consider that repugnant but empowering. (Admittedly it's more disconnected from the output binary instructions than regular assembly.)
(Sidenote: are assemblers designed around simplicity of parsing and speed at processing compiler-produced assembly, or user experience for humans writing assembly, or is the latter more common on architectures without good C compilers?)
I added brace to branch conversion in an MSP430 assembler I wrote. You could write code like:
add eax, 5
jnc {
nop
}
And it would automatically create the labels for you, syntactically, it was equivalent to what you've written above, but with labels like ".auto_brace_X".
You could also do the typical "if-else" branching with slightly extended syntax like:
add eax, 5
jnc {
<carry is set>
}:jmp {
<carry is not set>
}
<continue>
Which allowed easy extension to "do-while" loops:
{
<some loop>
dec ecx
}:jnz
The trickiest part is remembering that all branches are "unless" branches instead of "if" branches, and it obviously only works for simple constructs and can make refactoring and optimization a pain; however, there was plenty of code that benefited from using this.
lol.. was just trying to be familiar. here's some old code I dug up that used it:
; find next task in circular list Next(6 cycles) Top(14 cycles)
{
add.w #10, r10 ;2 increment pointer to next task area
cmp.w #mms_task.end, r10 ;2 are we at the end of the task list?
jnz { ;2 no? start next task
.init: eint ;1 ensure interrupts are enabled
bis.w #MMS_EACH, &mms_task.state ;5 ensure EACH state flag is set
mov.w #mms_task.list, r10 ;2 restart at top of list
}
; build state and check against task Skip(12 cycles) Take(31 cycles)
mov.b &IFG2, r8 ;3 r8 = ifg2 interrupt flags
and.w #(MMS_TX+MMS_RX), r8 ;1 get only TX and RX ready flags
mov.w &mms_task.state, r9 ;3 r9 = current task state
bis.w r8, r9 ;1 add TX and TX flags to global state
; determine if this task needs to run
bit.w @r10, r9 ;2 task request flags & system flags
}:jz ;2 no matches? find next task
; load the task state and return into it
mov.w r10, &mms_task.ptr ;4 set current task area to r10
mov.w @r10+, r12 ;2 r12 = task[0] (request flags)
mov.w @r10+, r8 ;2 r8 = task[1] (PC)
mov.w @r10+, r4 ;2 r4 = task[2]
mov.w @r10+, r5 ;2 r5 = task[3]
mov.b @r10+, r6 ;2 r6 = task[4] (low byte)
mov.b @r10, r7 ;2 r7 = task[4] (high byte)
mov.w #0, r10 ;1 clear r10 pointer for safety
mov.w r8, PC ;2 "return" into the yield
Assembly language is usually an almost one-to-one correspondence with machine code instructions of the target architecture. Assemblers can build on that by providing macros, etc but it's fundamentally about the programmer being aware of the instructions the CPU supports and using them to program.
I'm having trouble understanding the difficulty in the example. If "jnc" doesn't suit then there's "jc" to do the opposite.
If the difficulty is the conceptual leap to unstructured branching then I'd say that's what the teacher's job is - teach the students how to think about it.
Cryptic names like "jnc" and "jc" is the huge hindrance in understanding it. It would be so much easier if it was just called what it did instead of an acronym.
"jump_if_condition_is_met" or "jump_short_if_not_carry" (or which ever one it maps to) is much better and clearer than "jnc"
> are assemblers designed around simplicity of parsing and speed at processing compiler-produced assembly, or user experience for humans writing assembly, or is the latter more common on architectures without good C compilers?
That really depends on the assembler. Each one is different. Some support a degree of structured programming, while others do not even allow for relocatable addresses.
Are you sure it was TI? The Blackfin DSPs made by Analog Devices are known for using an infix assembly syntax in the official documentation, although it might be a stretch to tie it to C specifically.
If you refer to "The Art of 64-Bit Assembly", while this is correct, there are two things to consider
1. the real meat of ASM doesn't really depend on the Assembler, so one can (and ends up) coding the exercise with the Assembler of their choice
2. people have freely contributed the translation of the listings/programs to all the major Assemblers, so even if one doesn't want to code their own version, they can just use the contributed versions
Certainly, there are some chapters expressly dedicated to MASM functionalities, so those are wasted.
As a matter of fact, I worked through it on Linux/NASM, and wasn't really bothered, as the book is really good quality.
I'm reading RP2040 Assembly Language Programming book right now. It's very interesting introduction to the assembly language. And using assembly for microcontrollers seems very fitting to me, at least for hobbyist level.
Another interesting book--at least for historical context--is Michael Abrash's Zen of Assembly Language Programming. A lot of the optimizations are a good read if you're into that sort of thing but there's not a lot of practical interest for today. (Abrash in fact never wrote the Volume 2 he was planning to.)
Would this still be relevant/approachable for a beginner coming from higher level languages, or is there a more modern resource for learning assembly that would be better suited?
I guess most of it is still relevant, but I'd rather not deal with DOS to follow along, and would prefer working on Linux.
Also, the x86 instruction set seems daunting to pick up for a beginner. Would it be better learning on a 4/8-bit or toy machine first?
I’m not an assembly programmer, but I’ve learned assembly as part of introductory CS courses (computer architecture classes) and the approachable alternative you’re looking for is the assembly language of a RISC architecture such as ARM, MIPS, or RISC-V. I’d recommend learning the latter because of how approachable it is.
I found the book "Programming From the Ground Up" to be a pretty good intro to programming in x86, and importantly, more recent than this text. I had to do just a tiny bit of research to find out how to get my shell to emulate 32 bit mode, since it's x86 and not x86-64, but it is in fact Linux based. I imagine it'd be an order of magnitude easier than trying to figure out a DOS setup.
The full x86 ISA is daunting, I'm sure, but for a beginner looking to get a sense of what asm is, you don't engage with the entirety of the ISA. The PFTGU text is great specifically because it assumes an audience of beginners who want to learn the basics of how asm operates to enable core programming concepts. It's not aiming to be exhaustive like the linked text.
Edit: And also importantly, it's also (legitimately) available for free online.
To get a feel for assembly, then yes, any 8-bit CPU would be a good choice. The 8080, Z80, 6502 or 6809. There are plenty of emulated machines today, tons of material are available online, and even cross assemblers that run on modern machines to make development easier. The base concepts will carry over.
The x86 instruction set is daunting and it can appear quite arbitrary unless one knows the near 50 years of historical baggage it carries along with it. But I think if you have some experience with an 8-bit CPU, plus just kind of accept the weirdness of it, it's not that bad (especially the 64-bit stuff, which in a way, is simpler, even if half the registers are named, and half are numbered).
The 8051 microcontroller is pretty easy to learn. I'd recommend that as a starter on assembly. But if you want to end up playing with x86 why not start there.
If you want to start small, Z80 might be a good point, but it seems opposed to your not wanting to start with DOS; a small VM will be very useful, and with Asm you'll soon discover that even 64k is plenty to play with.
> Also, the x86 instruction set seems daunting to pick up for a beginner. Would it be better learning on a 4/8-bit or toy machine first?
The x86 instruction set isn't daunting, the 16-bit x86 memory model is, speaking as someone who learned x86 assembly at a young age. It is easier to work on Linux, because segmentation is a non-issue and you have a flat address space.
Segmentation is also a non-issue if you stay within a single segment. Especially if you're writing in Asm and just starting out, even 64k is a lot. A "hello world" has less than a dozen bytes of executable code. The majority of the standard commands in DOS are <64k binaries.
Is segmentation really that bad? With an assembler you have direct control of which segment is being used. It's unlikely a hobby project will span more than 64k. Even if it does, for a lot of projects the individual data structures are small enough to split between segments.
It only becomes an issue of have a data structure that can exceed 64K in size, or multiple structures, who aggregate size is greater than 64k in size. Then you have to deal with so called FAR pointers.
People interested in this might also enjoy 'Inner Loops' (1997), an excellent book that covers optimizing x86 at that time but is instructive in general and the concepts are still applicable today.
A very useful book for MS-DOS programming. MASM is also an excellent assembler, even in early versions. The proc/endp blocks for instance make reading a subroutine much easier. Does anyone know of a similar book for AMD64 assembly language programming?
Related(?) I'm looking for recommendations for a good intro/reference to the intel arch wrt operations and registers, esp wrt Unix conventions (if unix differs from (eg) Windows wrt creating a process or pushing/popping stack...)
One might say it is the only programming language, everything else is just macro expansion, converting pseudo-english mnemonics into machine instruction.
Snark aside, yes, people program entire operating systems and programs and games in assembly. Famously, Rollercoaster Tycoon was written entirely in assembly.
This is true only for the most simple assemblers. In practice, each assembler comes with its own distinct set of features.
Microsoft Macro Assembler for example offers:
• Records/structs, bitfields
• Typed labels/pointers. Intel in particular introduced this feature.
• Procedure blocks which allow locally scoped labels within them
• The automatic allocation of local variables onto the stack.
• if/then blocks, for loop blocks
• Memory model directives
• Macros, equates, etc.
Other assemblers like NASM omit some of these features, as introducing versions of these features which are completely compatible with the MASM ones would be difficult. For example, NASM allows MASM-like procedure blocks, but they are not block-scoped. They're just a notation for the programmer.
Yes, certainly Assembly is a programming language. If you could not express your program in it, all of our discipline would fall (if a script defined a process and what it translates to did not...). "Things work" because /other/ languages translate into it: that is the chief language, named ("symbolic") reading of machine code. It has conditionals and loops, it has logic and arithmetic operators, it has "exactly" what the processor has: it is the most precise definition of the program, how can it not be a programming language?
You are probably thinking of the assembling process of pieces of machine code - through the assembler. But the assembler also translates symbolics into machine code, in the goal to join the different pieces of code and data - so it's more than just concatenating. To define the memory addresses to be encoded for jumps, for example, you must have defined them - so, in the assembler (e.g. concatenating subroutines) you imply the assembly (i.e. naming pieces of machine code, symbolically treated).
assuming you have enough registers, burn a small number of them to support internal use in macros. assign the rest to values of interest. you may end up needing to write an additional layer of macros with their own registers that use the first layer as primitives.
No, the book introduces these serial numbers as idealized versions of the processors. It allows the reader, who is treated as a beginner to processors, to develop a mental model of them without any complications. The edge cases, pitfalls, and issues of the true processors are revealed over the course of the book.
Not sure about the merits of calling the ideal processors by special names.
Never heard of it that way in france with my friends, for me it was always 80-86, 80-88 or just 386 or 286 (or 80-286 if you wanted to sound technical).
So, if you buy the book (like I did), please be aware of this.