MSG_LOOP_PostMessage:
mov a, MSG_Head
cjne a, #MSG_BUFFER_END - 2, mlpm0
mov a, #MSG_BUFFER
sjmp mlpm1
mlpm0:
inc a
inc a
mlpm1:
cjne a, MSG_Tail, mlpm2
clr a
ret
mlpm2:
mov r0, MSG_Head
mov @r0, MSG_Code
inc r0
mov @r0, MSG_Parameter
mov MSG_Head, a
mov a, #1
ret
I wrote that about fifteen years ago. Of course, the routine's label tells you something: "MSG_LOOP_PostMessage". It must post a message to a message buffer. The rest is giberish. What's the intent behind each and every block of code? Well, of course, that's not how I wrote it. This is what I wrote:
; POST MESSAGE --------------------------------------------------------------------------
; This routine posts a new message to the message loop buffer.
;
; The message code and parameter are written to the buffer only if buffer space is
; available. This is determined by first looking at the difference between the
; head and tail pointers. By design, we sacrifice one set of locations (two
; bytes) in the buffer in order to create a gap between an advancing head pointer and
; a lagging tail pointer. If a lot of messages are issued and not processed immediately
; the head pointer will quickly wrap around and threaten to collide with the tail
; pointer. By sacrificing a set of locations we avoid having to keep a counter of
; unprocessed messages. This, because there would be ambiguity when both head and
; tail pointers point to the same location: it could mean that there are no messages
; to process or that there's a buffer full of messages to process. The two byte
; gap we are imposing between head and tail removes this ambiguity and makes it easy
; to determine the buffer empty and full conditions.
;
; If there's space for a new message it is stored and the head pointer advanced to
; the next available location. However, if no space remains, the message is discarded
; and the head pointer will remain one message (two bytes) away from the tail pointer.
;
; Arguments
; ----------------------------
; MSG_Head Message buffer head pointer
; MSG_Tail Message buffer tail pointer
; MSG_Code Message code
; MSG_Parameter Message parameter
;
; Return
; ----------------------------
; ACC = 0 -> Message did not post
; ACC <> 0 -> Message posted
;
; Modified
; ----------------------------
; MSG_Head
; ACC, R0
;
; Preserved
; ----------------------------
; MSG_Tail
; MSG_Code
; MSG_Parameter
;
MSG_LOOP_PostMessage:
mov a, MSG_Head ;Increment the head pointer by one message
cjne a, #MSG_BUFFER_END - 2, mlpm0 ;and parameter.
mov a, #MSG_BUFFER ;Need to wrap around.
sjmp mlpm1 ;
mlpm0: ;
inc a ;No need to wrap around, just increment.
inc a ;
mlpm1:
cjne a, MSG_Tail, mlpm2 ;Check for a buffer full condition.
clr a ;Flag that we did not post the message.
ret ;Exit if it is.
mlpm2:
;The buffer isn't full, we can store the new message
mov r0, MSG_Head ;Store message
mov @r0, MSG_Code
inc r0
mov @r0, MSG_Parameter ;Store parameter
;Now set the head pointer to the next available message location.
mov MSG_Head, a ;The accumulator already has the next "head" location
;there's no need to execute the same logic again.
mov a, #1 ;Flag that the message was posted
ret ;and exit
Now you don't have to know assembler to understand this code. A couple of years later I had to re-write this in C. It was incredibly easy to do because of the exhaustive in-code documentation. This is what came out:
// POST MESSAGE --------------------------------------------------------------------------
// This routine posts a new message to the message loop buffer.
//
// The message code and parameter are written to the buffer only if buffer space is
// available. This is determined by first looking at the difference between the
// head and tail pointers. By design, we sacrifice one set of locations (two
// bytes) in the buffer in order to create a gap between an advancing head pointer and
// a lagging tail pointer. If a lot of messages are issued and not processed immediately
// the head pointer will quickly wrap around and threaten to collide with the tail
// pointer. By sacrificing a set of locations we avoid having to keep a counter of
// unprocessed messages. This, because there would be ambiguity when both head and
// tail pointers point to the same location: it could mean that there are no messages
// to process or that there's a buffer full of messages to process. The two byte
// gap we are imposing between head and tail removes this ambiguity and makes it easy
// to determine the buffer empty and full conditions.
//
// If there's space for a new message it is stored and the head pointer advanced to
// the next available location. However, if no space remains, the message is discarded
// and the head pointer will remain one message (two bytes) away from the tail pointer.
//
//
// Return
// ----------------------------
// 0 = Message did not post
// 1 = Message posted
//
//
U8 msg_loop_post_message(U8 message, U16 parameter)
{
if(msg_loop.count == MESSAGE_LOOP_SIZE)
{
return(0); // Can't post because buffer is full
}
else
{
msg_loop.item[msg_loop.head].message = message; // Post message
msg_loop.item[msg_loop.head].parameter = parameter; //
msg_loop.count++; // Update message count
if( msg_loop.head == (MESSAGE_LOOP_SIZE - 1)) // Wrap around?
{
msg_loop.head = 0; // Yes.
}
else
{
msg_loop.head++; // No
}
return(1);
}
}
This code is definitely readable, but I feel like the port from assember to C might have been too direct. Originally head was a pointer and it made sense to have conditional jumps, but in the C code head is an offset starting at 0. Why keep the entire if structure when you can simply have
msg_loop.head = (msg_loop.head + 1) % MESSAGE_LOOP_SIZE // Advance head and wrap
I remember I had to convert this to C in a hurry because new features had to be added. Doing it in assembler was just about unthinkable. The entire project consists of nearly 100 assembler files and resulted in about 90 files once translated into C.
This was a marathon run and once the entire project was translated and working the focus was narrowly shifted in the direction of adding functionality rather than optimizing the code.
This code was running on an 8051-derivative. Performance was important. You are always counting clock cycles on these systems. I think that if you look at the resulting machine code with the relevant variables mapped to the data memory space the modulo statement doesn't really do any better than a spelled-out conditional statement. In fact, it could be a couple of clock cycles slower (just guessing).
Again, it's been a long time. I can totally see looking at the generated assembly and saying something like "That's as fast as it it going to get. Move on!".
We all make less-than-elegant code at one point or another. Hopefully not too many times. :-)
> the modulo statement doesn't really do any better than a spelled-out conditional statement
Actually, on modern processors the modulo would probably be more efficient than the conditional due to cool stuff they do like pipelining. Pipelining is basically where they execute instructions ahead of time (sort of in pseudo-parallel); and so when there's a branch, there's a fair chance the results from future processing might have to be thrown out.
One of the techniques that's used to go around this is branch prediction[1], where they initially guess whether a branch will be taken or not and then depending on whether the branch was actually taken, they change their guess. This works particularly well for for loops and the kind.
Another development on the ARM side (and MIPS too, I think) are instructions that are embedded with conditionals in them.[2]