Beginning x64 Assembly Programming Errata

November 19, 2021 -
Tags: assembler, performance

I’ve recently completed the book Beginning x64 Assembly Programming Errata.

The book has several errors, at least a couple of which are significant; since there is no official errata, I’m publishing my findings.



This book left me very conflicted. I was enthusiastic at the beginning, but I’ve found its production to be very unprofessional.

First, Apress didn’t publish an errata (besides a small file with two corrections in the book’s companion repository). One of the errors is also, very amusingly, caused by improper typography.

Second, the authors don’t seem to take the subject seriously, either:

We have carefully written and tested the code used in this book. However, if there are any typos in the text or bugs in the programs, we do not take any responsibility. We blame them on our two cats, who love to walk over our keyboard while we are typing.

It’s worrying that two errors are conceptual (the explanations given are fundamentally wrong); after finishing the book I’m now questioning the quality of what I’ve learned.

Errors are a fact of life, and there’s nothing wrong with them, but since readers consume time and hair because of them (I did), it’s important for the topic to be recognized, and addressed in some way. At a minimum, a publisher can setup a simple web page with a table, where readers can submit their findings (I’ve seen this approach a few times).

But now, to the errors!

Page 68: Addressing forms

I find this error very funny. In the following listing:

mov   rax, text1+1     ;load second character in rax
lea   rax, [text1+1]   ;load second character in rax

the operations actually load the address of the second character in rax.

The funny part is that the text is actually there: copy/pasting reveals the missing text (address) is there, but due to a typographic error in the PDF, it’s not visible.

Page 156: Using objdump

This is one the two conceptual errors.

From page 156:

The assembler took the liberty to change the sal instruction into shl, and that is for performance reasons.

The two instructions are exactly the same: they’re actually one; therefore, the explanation of why sal is turned into shl is baseless.

Consequently, also the statement that follows the previous:

As you remember from Chapter 16 on shifting instructions, this can be done without any problem in most cases.

is not exact; the change sal <> shl can be done without any problem in any case, not in most cases.

Page 160: Working with I/O

In the following listing:

push rbp
mov  rbp, rsp
; rsi contains address of the inputbuffer
; rdi contains length of the inputbuffer
     mov  rax, 0  ; 0 = read
     mov  rdi, 1  ; 1 = stdin

the length of inputbuffer is in rdx, not rdi.

Page 206: Moving Strings

This is the other conceptual error, which I find alarming; it is also very interesting.

In page 206, there is a routine to print a string in reverse:

;reverse copy my_string to other_string
  prnt string6,40
  mov rax, 48   ;clear other_string
  mov rdi,other_string
  mov rcx, length
      rep stosb
  lea rsi,[my_string+length-4]
  lea rdi,[other_string+length]
  mov rcx, 27   ;copy only 27-1 characters
  std           ;std sets DF, cld clears DF
  rep movsb
  prnt other_string,length

The companion repository has an additional error in the comment; it reads “copy only 10 characters”.

The paper version reads as above; since the string consists of the alphabet, it’s intuitive that the loops count should be 26 instead of 27.

However, the authors don’t notice this error, and in the following page, they give another baseless explanation to support the value 27:

Why do we put 27 in rcx when there are only 26 characters? It turns out that rep decreases rcx by 1 before anything else in the loop. You can verify that with a debugger such as SASM.

Anybody who really tries the routine in a debugger (I did) will find that it is incorrectly copying one byte more than it should (the first copied). As a consequence, also rsi and rdi should be decreased by one.

Something that I find mystyfying is that the authors themselves copy the specification of the rep instruction:

WHILE CountReg =/ 0
                Service pending interrupts (if any);
                Execute associated string instruction;
                CountReg ← (CountReg – 1);
                IF CountReg = 0
                    THEN exit WHILE loop; FI;
                IF (Repeat prefix is REPZ or REPE) and (ZF = 0)
                or (Repeat prefix is REPNZ or REPNE) and (ZF = 1)
                    THEN exit WHILE loop; FI;

This is in conflict with their statement (the associated operation is performed before rcx is decreased).

I find amusing that this bug is hidden (this is the likely reason why the authors didn’t notice the bug(s)) by the fact that the prnt routine takes the string length as argument, so copying any text before or after the correct locations, doesn’t yield any visible effect.

Above all, this bug leads to a fundamental reflection. It is an off-by-one error - a very famous type - which shows how difficult and utterly fragile Assembly programming is; so much, that the error found its way even in a book written by experienced programmers.

Page 217: Using cpuid

In the following listing:

    test ecx,9h          ;test bit 0 (SSE 3)
    jz sse41             ;SSE 3 available

the correct values are:

    test ecx,200h        ; test bit 9 (SSE 3)

Page 328: Matrix Print: printm4x4

In the following reference (emphasis mine):

To align the stack on a 16-byte boundary, we cannot use the trick with the and instruction from Chapter 16.

the trick is actually in Chapter 15 (page 125).

Page 384: Using More Than Four Arguments

The following listing shows how to perform a Windows call with more than four arguments:

  sub   rsp, 8
  mov   rcx, fmt
  mov   rdx, first
  mov   r8, second
  mov   r9, third
  push  tenth
  push  ninth
  push  eighth
  push  seventh
  push  sixth
  push  fifth
  push  fourth
  sub   rsp, 32       ;  shadow  space
  call  printf
  add   rsp, 32 + 8

However, the stack point reset following the call is not accounting for the (7) pushes; the correct reset is:

  add   rsp, 32 + 56 + 8    ; 56 = 7 * 8

In the alternative call structure, on pages 385/386, the stack pointer is correctly reset, by adding the value (32 + 56 + 8).


I’m not sure if I should suggest this book or not.

For a casual user who wants a fun (!) read, it may be an effective book. On the other hand, motivated readers who wish quality knowledge, should definitely consider The Art of 64-Bit Assembly, written by a veteran Assembly programmer (although, sadly, it’s based on MASM/Windows).

Happy optimizing 😃