Monday, July 11, 2011

The x86_64 Calling Convention


I suppose I can consider myself an 'old-school' developer now; even though I have been reading the AMD64 ABI documentation, I still haven't fully absorbed it into my head yet, which is evidenced by the recent two situations I had today where RTFM-ing would have had saved me hours of GDB debugging pain.

I have been coding some assembly instructions to make C-calls at runtime to a debugging routine, but the call seems to always ends up mysteriously trampling the JIT-ed routines, making the VM take unexpected execution paths and causing some unlikely assertions to be fired.

The situation is confounded by a number of issues:
  1. the code generated is dynamic, and therefore there are no debugging symbols associated with them compared to code typically generated by the assembler/compiler;
  2. there are different types of call-frames for a given method; 1 for a pre-compiled stub, 1 for a frame that's crossed-over from JIT-ed code to native code, and 1 for the JIT-ed code itself;
  3. when the eventual assertion does manifest, the code is already far away in the rabbit-hole from where the original problem manifested. And because some of the JIT-ed code actually makes a "JMP", unlike a "CALL", you can't actually figure out where the code originated from, since %rip is never saved on the call stack.
While situations 1 and 2 make debugging difficult by having the need to keep a lot of contextual information in order to figure out what's going on, situation 3 is just impossible to debug if the bug is non-deterministic in nature. For example, each compiled method in the VM generates a small assembly stub that replaces the actual code to be executed; when the stub gets executed for the first time, it triggers of the JIT compiler at runtime to compile the real method from its intermediate representation. The compiled method then replaces the stub, hence subsequent invocations will simply call the already JIT-generated method, thereby executing at native-speed, like just as you would get on compiled code.

To optimise on space, the stubs are made as small as possible (~20 bytes), and the common execution body shared by all stubs is factored into a common block. All stubs will eventually perform a global "JMP" instruction to this common block. In order to faciliate communication, all shared data between the stub and the common code block is passed on the thread stack, where the common offset to the method handle is agreed upon. 

While the design is elegant, it is also impossible to debug when it breaks; the non-deterministic-ness of the bug seems to surface from time-to-time, where it seems to suggest that the thread stack got corrupted or that it's not passing the method handle correctly. Even when GDB is left running, by the time the assertion triggers, it's already past the fact, and therefore it is unable to trace back to the originating path.

I thought it might be a good idea to inject some debugging calls to trace the execution and stack pointer at runtime, so that I can figure out which stub was last called and the stack height when the call was made; the two information combined should give me sufficient hints on where the problem might lie. However, my injected code has introduced two other issues that I had overlooked, which brings me back into the discussion of the x86_64 ABI again; if you ever wanted to template any assembly instructions into your code that relies on an external library call, do keep these 2 points from the ABI specification in mind:
  1. Save ALL caller-saves registers, not just only the ones that you are using.
  2. (§3.2.2) The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32) when control is transferred to the function entry point. The stack pointer, %rsp, always points to the end of the latest allocated stack frame.
I have to say that I've dismissed (1) since I've gotten use to the style of only documenting and saving the registers that was used; the convention was something that I had picked up from Peter Norton's 1992 book, "Assembly Language for the PC". For those who don't know, he's the "Norton" that Symantec's Norton Antivirus is named after. I still have the out-of-print book on my desk as a keepsake; it reminds me of the the memories of reading it and scribbling code on a piece of paper at my local library. Remarkably, that was how I learnt assembly, since I didn't have a computer back then. Thumbing through the book today, I still have an incredible respect for Peter's coding prowess. He had a way of organising his procedures so elegantly such that each of them all fitted perfectly together from chapter to chapter.


Sorry, got sidetracked. So yes, point (1) - to save ALL registers; this is necessary because all caller-saved registers can actually be occupied by the JIT routines as input arguments to the callee; while this typically means the 6 defined registers (%rdi, %rsi, %rdx, %rcx, %r8, %r9) for general input (see §3.2.3), other registers can also be trashed upon a call return, so as a rule-of-thumb save everything, except the callee-saved registers (%rbx, %rbp, %r12 to %r15), which are guaranteed to be preserved.

Point (2) - I haven't observed a reproducible side effect from this; however the failure points between adhering to it and not actually causes a visible difference in the JIT-ed code's path; therefore there is a need to be on the side of caution. I seem to have observed that some memory faults from not following this directive, but I can't ascertain this for a fact yet.

Finally, a self-inflicted bug that I'd like to remind myself of; remember make sure to deduct from %rsp if any memory has been written onto the thread stack; otherwise any function calls may unknowingly overwrite it!

For all the trouble with debugging that I've gotten myself into, there is at least a silver-lining to it; I had made the problem deterministic, or if it isn't the same problem, it was a similar class of problem that I can consistently reproduce to analyse its behaviour and learn from the mistakes I have been making. Because of the determinism, I was able to use GDB's reversible debugging feature to record the execution from the stub to the common code to gain a better understanding of how the generated code actually works. It's a really nifty feature, and I'm glad to have it as my first useful case of applied reversible debugging in practice.
Saturday, June 04, 2011

Page Faults

While going through some code emitted by a Just-In-Time compiler (JIT), I’ve encountered a curious piece of code which suggested that if access to the process' utilisable stack space isn’t done incrementally, it will cause an “access violation”.

Normally, I wouldn’t have bothered with the problem. But in this case, the JIT-ed code makes uninitialised reads to the process stack, causing valgrind to generate a huge amount of spurious warnings in its log. This makes it difficult to sieve through relevant details, and makes it impossible to generate a static suppression for because the JIT-ed code's call frames are dynamically generated and arbitrary in nature.

I didn’t see what’s wrong with touching memory that's already accessible by the application, so I didn’t really grok what the “access violation” exactly implies. A little sleuthing is required, so I wrote a little code to test out the “access violation”:

#.equ INCREMENT, 0x1000
.equ INCREMENT, 0x1800

.section .text

FORMAT_STR:
 .string "%d\n"

.globl _start
_start:
 push %rbp
 movq %rsp, %rbp

 # loop and keep touching stack space
 movq $640, %rcx
 movq $-0x1000, %rbx
again:
 movq (%rbp, %rbx), %rax
 subq $INCREMENT, %rbx

 movq %rbx, %r12   # use callee-save to prevent push to stack   
 movq %rbx, %rsi
 movq $FORMAT_STR, %rdi
 movq $0, %rax
 call printf       # but call pushes to stack too :(
 movq %r12, %rbx
 loop again

 movq $1, %rbx
 movq $1, %rax
 int $0x80


Note the commented code at the first line; this represents the original page size boundary in which the JIT emitted that’s causing the offending uninitialised memory access; if the size isn’t extended, the code will segmentation fault at around the 8MB mark. The corresponds to what ‘ulimit -s’ reports on the OS.

However, if the size gets incremented to 0x1800 bytes for example, the code will segmentation fault way much earlier, at around the 140k mark, which puzzled me. Looking at ‘dmesg’ shows something interesting:

[806331.042666] incremental[28910]: segfault at 7fff983ad8a8 ip 0000000000400256 sp 00007fff983cf8a8 error 4 in incremental[400000+1000]


I’m surprised that the kernel actually reports this error, so I started searching for the error string on Google code search. The likely matches came from cygwin which does mention “access violation” and xen-source where it indicates a page fault.

Reading through the definition suggests that I’m causing a hard page fault, but I wanted to make sure that the error code 4 is exactly meaning this. Some cursory research led me to a pretty helpful CS page explaining page faults, with excerpts from Linux 1.0’s sources; scanning through it which showed that "error 4" means that the error comes from user space (as opposed to kernel space).

The code also indicates that if an allocation exceeds the OS page size, the kernel is free to abort the program, which explains the error. Further research also led me to the getpagesize() system call, which verified that the page size for Linux is set at 4KB.

So mystery solved. I suppose the next thing I can do, is to make a nasty hack in the JIT to make spurious writes instead of reads instead; that should get rid of all the valgrind false positives, but I can’t say it’s the most elegant way of resolving the issue.
Sunday, April 17, 2011

The Future of Linux UI Scares Me

I don’t think I have mentioned that I moved from Ubuntu to Fedora. Two years ago.

Why has this to do with the state of the Linux desktop? I’d say at least somewhat to do with it. When I last switched from Gentoo to Ubuntu, it was due to the eventual frustration with the incessant amount of tinkering I had to do in order to get things work.

Most people would have jumped the bandwagon and moved to the newer, and trendy MacOSX. But you know what? What most Linux windows manager have is the “focus follows mouse” feature, which is the most Zen-like simplicity that no other non-unix OSes have. That was why I swapped to Ubuntu, which was the new poster-boy for the “Linux that Just Works”.

The charm however, did not last. “apt-get” was the loveliest feature that I embraced, and it was great that Ubuntu finally fast-tracked Debian to bring forth the most bleeding edge of software packages, albeit with a higher defect rate than the rock solid Debian. Even so, the defect rate in Ubuntu wasn’t something that I perceptively noticed. Not until it came to development tools.

Fedora is the undisputed leader for being the distro by the developers, for the developers. Ubuntu is great, but only when you don’t have to tinker under the hood. If you are, then be prepared for pain. Badly configured packages like GDB, with debugger instability and crashes, and badly placed debugging symbols for packages made it hard to treat Ubuntu as a serious development environment.

It so happened that my company was relocating, and as part of the transition, it was just a good time to think about the software infrastructure that we were using, and to set things up correctly. It was also fortuitous that at the same time, we had hired a very capable sysadmin who is an expert on Redhat based distros, so the decision was to maintain one and one (free) distribution only - Fedora.

I have to say it has been a good choice; personally, I think the QA behind Fedora is very solid generally, and especially in developer tools. But what I thought had been a good choice was that Fedora stuck to the original Gnome desktop where everything was simple, like Windows 98 simple. No, it isn’t a pun; older desktop environments did get it correct, like how OS/2, Windows XP and the KDE 3.x did. They just worked.

There is nothing wrong with the existing paradigm of having an app-menu selector, a taskbar, and a widget area for notifications, plus a few bells and whistles here and there. But Ubuntu decided that it wasn’t good enough; “No, we’ve got to look like Apple”, Mark Shuttleworth says. Then he starts tinkering with the menu icons, switching it from the right hand side to the left.

I’m glad that I’ve left Ubuntu before then; I'm sure he must have realised that getting about 80% of the desktop users to make a context switch of a long-established habit won't be pleasant. It’s like telling a heroine addict that going cold-turkey is a piece of cake. Bad analogy? But you get the point.

Then Mark decides that a singular change isn’t enough, “I have an idea, let’s revamp the whole desktop altogether!” And this is how the Unity interface came about. Still, that’s ok. Ubuntu is Mark’s baby, he’s entitled to drive the design of his distro any way he likes.

I don’t really have much to say about Unity, since I’ve never used it. I don’t think I will anyway; it looks too different to what I have come to be very comfortable with as a desktop environment. But it is not just that I ain't adventurous; field reports from users who had tried it just didn’t look encouraging.

However, the bad news is, Gnome 3 will start shipping the new Gnome-shell interface, which appears to have taken a leaf from Unity's design. It means that Gnome will be the last major window manager to jump the shark. Well so long Gnome, it’s fun while it lasted.

Fedora 15 will be shipping with Gnome 3. The thought of upgrading makes me shudder. Will I be productive with it, or will I be "enjoying" my time in discovering what new features the new UI will bring? Unfortunately, I don’t understand what all that fuss about, competing to reinvent the desktop. I’ll just get a Mac instead*.


*Oh wait, that’s a joke. Don’t get too upset, my Mac fanboy friends. I’ll show you my new shiny Xfce-compiz desktop, or my zen-like fluxbox windows manager. Trust me, you’ll love it.
Saturday, April 02, 2011

No Technical Support Provided

Readers,

If you're came here through the links from my other blog posts, please take time to read what I have to say.

The solutions to the problems I solve, are large done to "scratch my own itch". I like to share these solutions through my blog with the hope that it'll be useful to others facing the same issues. However, I have a pretty intense full-time job in my own company, which leaves me very little time to be providing any specific guidance in any related problems that you may face.

You may want to comment on the blog post, and hopefully, some good Samaritans may give out more wisdom or advice. Or you may be the one who's giving help to others, and good on you if that's the case. That is what community spirit is all about.

Unfortunately, I cannot be here to provide technical support for you and if I don't have further insight or time to the problems you have, you are largely on your own.

Thank you for understanding.
Thursday, March 31, 2011

Finding the inclination to write

I'm still here; yes, I have been very quiet.

Mostly because I'm busy with things. Work mainly. Some aspects I do not enjoy, but it is still fun when it comes to the technical bits. Enjoying those!

Off work, life consists of quite a bit of physical training, something I have come to enjoy too. That is complementary to another habit of mine: sleep. I'm sleeping sufficiently early to wake up for exercise these days. That is a luxury for most people. Ask yourself, when is the last time you have had a good sleep? People are usually so sleep deprived these days, which is sad, literally.

It does mean I sacrifice other material enjoyments in life; TV watching, net surfing or out having a few late night drinks with friends. I don't miss them much, perhaps just a little bit on the socialising. But there's no point in hanging out late in this country; people tend to go overboard with their drinks - there'll hardly be any sensibilities left for meaningful interactions after late ... but I think a lot of people here will digress!

Life is coasting along. It does feel a little aimless sometimes. But then again, what is there to be aiming for? Does it matter whether if you have not done the gazillon things that you wanted to do? In the long run, we are all dead anyway. Maybe that's not enough, for we need that epitaph to survive us?

Well right now, it is enough.

For myself, keep writing. For my friends, keep in touch.