Vincent Liu: 2009

Saturday, November 28, 2009

Compiling Ruby 1.9 for Mac OSX 10.4

This article contains specifics for installing Ruby on PowerPC MacOSX 10.4. Newer Intel Macs with > 10.4 OS has more updated dependencies and probably do not require the instructions described here; you shouldn't need to compile from source in that case as well - there are one-click installers for binaries for Ruby 1.9 on the Internet, look for it.

On 10.4 PPC, the default version of Ruby is 1.8.2. This version is now old for some of the ruby plugins to be installed, specifically rubygems 1.3.5 in my case. I won't go into the reasons here on not going for older versions of rubygems (Read here if you are really interested why), but since an upgrade of Ruby is warranted, instead of upgrading to the latest 1.8.x series, it might be worthwhile to try out the new features from the 1.9.x series instead.

In order to recompile ruby, you'll need Xcode from Apple. That will install the SDK for developing on Apple's MacOS, but more importantly, it contains the gcc part of the GNU toolchain required for compilation.

However the GNU toolchain is incomplete at this stage; we'll require at the minimum GNU m4, autoconf and automake to complete the chain of dependencies Ruby needs.

Not Using Darwin Port/Fink

I'm going to compile these things from source directly without any package management system, and for a good reason; 3rd party MacOS package management systems are usually more trouble than it's worth, either having longer compilation time from spurious dependencies, compilation breakages due to improperly configured parameters or missing specific compiler flags that you'll end up having to fix by hand anyway.

That said, don't be put off by the exercise of compiling from scratch. It is not difficult, and you'll get to learn a thing or two about the internals of your OS.

Open up your terminal, and sudo into root:


Tigershark:~ vince$ sudo -i
Password:
Tigershark:~ root#

In order to separate your newly compiled stuffs from your existing binaries, I recommend you create a separate directory under the '/' directory so that they will be cleanly partitioned. I'm using the directory '/lfs' in my example here, as a tribute to the Linux from Scratch project, where I got my derivative knowledge from. Feel free to choose your own directory names though.


Tigershark:~ root# export LFS=/lfs
Tigershark:~ root# echo "export PATH=$LFS/bin:$PATH" >> /Users/YOURUSERNAME/.profile
Tigershark:~ root# export PATH=$LFS/bin:$PATH
Tigershark:~ root# mkdir -p $LFS/src
Tigershark:~ root# cd $LFS/src

The code above creates the /lfs directory, and the '/lfs/src' directory as well. It also sets up the PATH environment to firstly look in '/lfs/bin' before it searches all other paths. This is necessary so that you'll be using your new binaries instead of the old system binaries. The same change is added into your user's .profile file so that Terminal will know to look for the new binaries in the future when it starts up.

Installing m4


Tigershark:/ root# cd $LFS/src
Tigershark:/lfs/src root# curl http://ftp.gnu.org/gnu/m4/m4-1.4.9.tar.gz > m4-1.4.9.tar.gz
Tigershark:/lfs/src root# tar -zxvf m4-1.4.9.tar.gz
Tigershark:/lfs/src root# cd m4-1.4.9
Tigershark:/lfs/src/m4-1.4.9 root# ./configure --prefix=$LFS
Tigershark:/lfs/src/m4-1.4.9 root# make && make check

Make sure that the test results come up without any errors. When the test results are satisfied, install it.


Tigershark:/lfs/src/m4-1.4.9 root# make install

Installing autoconf


Tigershark:/ root# cd $LFS/src
Tigershark:/lfs/src root# curl http://ftp.gnu.org/gnu/autoconf/autoconf-2.65.tar.gz > autoconf-2.65.tar.gz
Tigershark:/lfs/src root# tar -zxvf autoconf-2.65.tar.gz
Tigershark:/lfs/src root# cd autoconf-2.65
Tigershark:/lfs/src/autoconf-2.65 root# ./configure --prefix=$LFS
Tigershark:/lfs/src/autoconf-2.65 root# make && make check

Make sure that the test results come up without any errors. When the test results are satisfied, install it.


Tigershark:/lfs/src/autoconf-2.65 root# make install

Installing automake


Tigershark:/ root# cd $LFS/src
Tigershark:/lfs/src root# curl http://ftp.gnu.org/gnu/automake/automake-1.9.6 > automake-1.9.6.tar.gz
Tigershark:/lfs/src root# tar -zxvf automake-1.9.6.tar.gz
Tigershark:/lfs/src root# cd automake-1.9.6
Tigershark:/lfs/src/automake-1.9.6 root# ./configure --prefix=$LFS
Tigershark:/lfs/src/automake-1.9.6 root# make && make check

Make sure that the test results come up without any errors. When the test results are satisfied, install it.


Tigershark:/lfs/src/automake-1.9.6 root# make install

Additional Step: Upgrading libreadline

Apple's implementation of the readline library has missing symbols in which Ruby (or more accurately, irb) requires in order to retrieve command line history. If you do not use irb (which I doubt, unless you never require testing some code you're uncertain of), you may skip compiling this. But I recommend you do :)


Tigershark:/ root# cd $LFS/src
Tigershark:/lfs/src root# curl http://ftp.gnu.org/gnu/readline/readline-6.0.tar.gz > readline-6.0.tar.gz
Tigershark:/lfs/src root# tar -zxvf readline-6.0.tar.gz
Tigershark:/lfs/src root# cd readline-6.0

Tigershark:/lfs/src/readline-6.0 root# curl http://ftp.gnu.org/gnu/readline/readline-6.0-patches/readline60-001 > readline60-001
Tigershark:/lfs/src/readline-6.0 root# curl http://ftp.gnu.org/gnu/readline/readline-6.0-patches/readline60-002 > readline60-002
Tigershark:/lfs/src/readline-6.0 root# curl http://ftp.gnu.org/gnu/readline/readline-6.0-patches/readline60-003 > readline60-003
Tigershark:/lfs/src/readline-6.0 root# curl http://ftp.gnu.org/gnu/readline/readline-6.0-patches/readline60-004 > readline60-004
Tigershark:/lfs/src/readline-6.0 root# patch -p0 < readline60-001
Tigershark:/lfs/src/readline-6.0 root# patch -p0 < readline60-002
Tigershark:/lfs/src/readline-6.0 root# patch -p0 < readline60-003
Tigershark:/lfs/src/readline-6.0 root# patch -p0 < readline60-004

Tigershark:/lfs/src/readline-6.0 root# ./configure --libdir=/usr/local/lib
Tigershark:/lfs/src/readline-6.0 root# make && make check

The instructions also contains the additional patches that was released since libreadline 6.0. Make sure you follow those additional instructions to apply the patches correspondingly.

Also, note that the ./configure parameter has changed. In this case, we are installing a library, so the path to install to is different. We are installing it to '/usr/local/lib', which will make sure that we won't conflict with the original libreadline in '/usr/lib'.

Finally, make Ruby!


Tigershark:/ root# cd $LFS/src
Tigershark:/lfs/src root# curl ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p243.tar.gz > ruby-1.9.1-p243.tar.gz
Tigershark:/lfs/src root# tar -zxvf ruby-1.9.1-p243.tar.gz
Tigershark:/lfs/src root# cd ruby-1.9.1-p243
Tigershark:/lfs/src/ruby-1.9.1-p243 root# ./configure --prefix=$LFS LDFLAGS=-L/usr/local/lib
Tigershark:/lfs/src/ruby-1.9.1-p243 root# make && make install

Note the additional parameter 'LDFLAGS=-L/usr/local/lib' that we pass to './configure'. This is required so that when ruby gets compiled, it'll first search '/usr/local/lib' for it's library dependencies before looking at its system default paths. Because we have our new 'libreadline' installed, this makes sure that Ruby is compiled with our newer library instead of the system one.

Congratulations! Once you've reached here, you have your new shiny Ruby 1.9 interpreter to play with! To check:


Tigershark:/ root# which ruby
/lfs/bin/ruby
Tigershark:/ root# ruby --version
ruby 1.9.1p243 (2009-07-16 revision 24175) [powerpc-darwin8.11.0]

Thursday, October 01, 2009

Why you should Blame your Tools, sometimes.

As software developers, we would all wish to have 3rd party libraries that works magically immediately after installation. While this is to be expected and holds true for popular and widely used libraries, people do tend to forget that it is still more of a rule of thumb than an iron-cast fact.

Like the urban myth goes, "a good workman never blames his tools", this mindset is so ingrained in the software development culture, that it is always bad form to assume that bug may reside in a library rather than in your code. After all, it is more likely that you are the offending culprit, given it is less likely a widely used code base will make a mistake when compared to code just seen by you alone. But with everything statistical, there is always a chance that an outlier event can happen.

And it did for me.

Internally, we rely on log4cxx as our logging library, which a peculiar bug seems to only be triggered upon the termination of our application. After spending more than 2 days of debugging, the error seemed to have consistently been at the line where the logger object is being accessed. And like all true developers, I bit my lip and kept soldiering on, reading through my own code, checking the same lines over and over again, trying to see if I had missed defining a variable, or if I had forgotten to allocate memory for it. Naturally, after while of doing so, I began to start questioning my personal sanity rather than the lines of code I've written, amid this 'impossibility' that the error could actually be from the library.

And then I popped.

I turned over to my colleague and mentioned to him that the code kept failing on the logging line. The team had known that I've been stuck on this problem for the last day. Even as I started, I could see the mild derision on his face, "it can't be the library. Maybe gdb is not showing what actually is going on."

So on and on the circle I went. Then someone else suggested using valgrind.

Since I was running out of options, I didn't see any harm trying. At most I'll waste an hour more to the day I've already wasted, anyway.

Valgrind didn't really show anything more different to what gdb was showing. Basically saying the same thing - that log4cxx is accessing an invalid memory location. Given I wasn't an expert on valgrind, I got someone else to come over and discussed what we were seeing on the stack trace.

No luck still.

It's gone past the point of frustration, into the point of desperation that I decided to just fucking comment out the offending statement and see if the problem goes away.

After recompling and re-running the application 30+ times without the bug re-surfacing, I was very certain to point at the logger library being the culprit. But of course if I did, I couldn't definitively prove it, and worse, I had no workaround; the entire code base is reliant on it for logging, and to take it away would mean a major re-engineering exercise.

Sometimes talking to others help. Sometimes they offer a difference in perspective, and other times, they may have caught something you hadn't.

Given that we had multiple threads that were still operating when a shutdown signal is received, I thought maybe it'll be worth trying to stop those threads safely before the termination signal is processed - since it's likely that some portions or memory may be freed while the threads are still utilising them. But I was just shooting wildly - it was a very long shot, given how the threading interactions were designed, it was difficult to believe that it could have been the problem. But trying something is better than doing nothing.

So I sat down with another colleague of mine, and talked him through what I wanted to do, and invariably the discussion lead to another where we talked about the valgrind backtrace that I had generated, which showed memory destruction caused by exit(). After a bit of theorising back and forth about the behaviour, and on how I'd would write a quick hack to test my hypothesis, my colleague decided to do a ubiquity search on 'log4cxx exit crash^H^H^H^H', which then our eyes popped when we saw the preview result before it changed into something else that he was meaning to type.

"Go back again", I said excitedly. He was already as excited as I was, so there wasn't really the need for any prompting.

It turns out that there is a known problem in log4cxx found months ago, and someone had even posted a patch on it already. But here's the thing about software: unless you are willing to roll your own fixes, be prepared to wait for a long time before someone eventually fixes it. And even if someone did fix it, the upstream maintainer will still take a while before committing it into the main trunk. Or even worse, he may just choose to ignore it. And yes it does happen - ask Ulrich Drepper. :P

On how to fix it, and package it up nicely for use, I'll write about that in a later post, for now I want to share an invaluable lesson when it comes to software development, which also equally applies in life: while established conventions are more often right than wrong, it never hurts to question them just in case they aren't. It may not be okay to blame your tools all the time, but it sometimes worth to know when to call a shoddy spade when you see one.

Tuesday, September 22, 2009

How to build a Debian Package for GDB

I've resisted titling this post as 'building an Ubuntu package' even though I'm building it for Ubuntu - technically it's more proper to call it a Debian package given its lineage. Nevertheless the mechanism behind building your own packages is pretty much the same for the two.

I'll use GDB as an example of how to build your own package - for a good reason, firstly because the stock version of GDB that is shipped with Ubuntu is terribly broken. Here's what I mean:


% gdb --args java
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(no debugging symbols found)
(gdb) r
Starting program: /usr/bin/java Test
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New process 16487]
Executing new program: /usr/lib/jvm/java-6-openjdk/jre/bin/java
warning: Cannot initialize thread debugging library: generic error
warning: Cannot initialize thread debugging library: generic error
warning: Cannot initialize thread debugging library: generic error
warning: Cannot initialize thread debugging library: generic error
[New process 16487]
[Thread debugging using libthread_db enabled]
Segmentation fault

The stock build of GDB doesn't handle multi-threaded applications properly, among other minor issue like not setting the path to point to the correct debug library paths, which makes it unusable for serious debugging tasks.

Secondly, GDB 7.0 has reversible debugging, which makes it doubly tempting to roll my own. Finally, GDB has minimal external and library dependencies, which is an easy example to build a package without going into the complexity of having to generate a chrooted environment.

But why not just do the typical 'configure && make install' combination?

The drawback in doing so, is that the process is one-way: once you've installed it like that, there is no easy way of uninstalling it, short of remembering the list of what's being installed and removing them manually. Do-able? Sure, but certainly cumbersome. The neater way to do this is to create a package and have the package manager deal with installation/uninstallation for you.

Building GDB

We need to perform the usual compiling and installing steps like we normally do; the only difference is that we want the installer to place all the resultant files into a separate directory for generating a package. Doing this is straightforward using the prefix flag provided by configure. The steps are commented and reproduced below:


# Assuming that you're in the current source directory /home/user/gdb-sources
% mkdir -p custom-gdb-7.0-amd64/usr/
% configure --prefix=/home/user/gdb-sources/custom-gdb-7.0-amd64/
% make && make install

There are dependencies that GDB will need in order to compile properly (things like bison, lex as far as I remember), but I'll assume that you know how to resolve these dependencies yourself. Otherwise, the source should finish compiling and installing to the destination /home/user/gdb-sources/custom-gdb-7.0-amd64/.

Generating the Control file

In order to generate a package, a Debian control file is required, which contains the information that the 'dpkg-deb' package generator will need. Here's how we write one:


% mkdir custom-gdb-7.0-amd64/DEBIAN
% cat > custom-gdb-7.0-amd64/DEBIAN/control
Package: customgdb
Version: 7.0
Section: base
Priority: optional
Architecture: amd64
Depends: lex, bison
Maintainer: Vincent Liu <blog@vinceliu.com>
Description: Custom build of GDB
This version of GDB provides cutting edge
capabilities that the stock package does not provide.
^D
%
# The control-D symbol above is to indicate the
# file termination character

There are plenty of details I've omitted here, and you will have to read more to understand and tune your own control file configuration. Here's the tutorial I referenced, and the Debian manual to help you figure out the details of each control field.

Generating and Installing the Package

Once you've got the control file generated, building the package is just a single dpkg-deb away:


% fakeroot dpkg-deb --build custom-gdb-7.0-amd64

You will get a resulting custom-gdb-7.0-amd64.deb package generated for installation. To install it, you'll have to remove the existing GDB package, as it conflicts with your new installation. Do the following:


# remove the original gdb
% dpkg -r gdb

# install the new gdb
% dpkg -i custom-gdb-7.0-amd64.deb

If you ever needed to revert back to the stock versions of GDB, you can now easily remove your custom version by dpkg -r customgdb, and reinstalling it using apt-get or your own favourite package manager.

Thursday, September 17, 2009

GDB 7.0 Is Out!

Read the announcement here. It's actually the pre-release version, nevertheless it's quite stable.

What makes this version interesting? It's got reversible debugging, which means that it's the first time you can actually make your code go back in time to find out what it did before it crashed.

I've tried the new commands, but I'm haven't been successful in stepping backwards still - well more experimenting ahead!

Update: The tutorial is available at
http://www.sourceware.org/gdb/wiki/ProcessRecord/Tutorial. (Thanks to Michael!)

Thursday, August 13, 2009

Bug Hunting with Revision Control using Mercurial

My impression through reading other software developers' blogs is that, when presented a choice of distributed revision control, there seem to be a majority preference for Git over Mercurial. This popularity does not come as surprising, since it is the brainchild of the most prominent icon of open source software today, the famed Linux kernel hacker Linus Torvalds.

It feels like the Mercurial camp is a little bit unrepresented, but today's post isn't about a blow-by-blow account between the two software revision control systems; you will probably be more informed by experts out there who have better qualifications in critiquing the pros and cons among the two. More importantly, I feel these differences are mostly nitpicking to general laypeople - when the demands of our projects do not scale to a level where the performance differences are visible, it really doesn't matter.

People will have their biases, and the git-vs-mercurial battle will probably be not too different from the perennial ideological battles between the emacs-vs-vim crowds. And obviously, you should have detected my bias in this case already :)

Anyway, back to our original discussion.

I have only used git briefly, and I'll not comment much on it given my limited impression. I'd agree with most people that it is a remarkable software revision control tool and leave it as that. It must strike you as curious to why I am using Mercurial then - it's largely of based on pragmatism, given I've acquired much familiarity through constant use at work. Besides, if I haven't been complaining enough to be tempted to try out git again, it must have not been that bad, right?

I'll share with you the best thing I love about using these revision control tools, namely, for their amazing ability to find out where the sources of your bugs are. It's not uncommon for software to break between the revisions you've check in; and sometimes these breakages fail to be noticed until you've checked it in after a bunch of some other code, making your original error even harder to find.

Well, not anymore.

Mercurial has this ability called 'bisect' which allows you to toggle between revisions until you're able to zone down to the exact patch where your code first breaks. Given that most code check-ins tend to be of small increments, knowing which patch that causes the break helps you narrow down the error to a very small subset where you can focus your efforts on finding the bug.

How does this work? Let me show you a real life session of how bisect works in finding an erroneous commit:


vincentliu@vm1:~/replicode$ hg bisect -g 1614
vincentliu@vm1:~/replicode$ hg bisect -b 1671
Testing changeset 1666:ee86f6717c42 (57 changesets remaining, ~5 tests)
36 files updated, 0 files merged, 8 files removed, 0 files unresolved
vincentliu@vm1:~/replicode$ hg bisect -b 1666
Testing changeset 1630:6e221cda7176 (28 changesets remaining, ~4 tests)
28 files updated, 0 files merged, 2 files removed, 0 files unresolved
vincentliu@vm1:~/replicode$ hg bisect -b 1630
Testing changeset 1623:f7b12d17a79b (14 changesets remaining, ~3 tests)
10 files updated, 0 files merged, 0 files removed, 0 files unresolved
vincentliu@vm1:~/replicode$ hg bisect -b 1623
Testing changeset 1618:ac9135ec8e99 (9 changesets remaining, ~3 tests)
17 files updated, 0 files merged, 0 files removed, 0 files unresolved
vincentliu@vm1:~/replicode$ hg bisect -g 1618
Testing changeset 1620:032f83fb6b8c (5 changesets remaining, ~2 tests)
26 files updated, 0 files merged, 1 files removed, 0 files unresolved
vincentliu@vm1:~/replicode$ hg bisect -g 1620
Testing changeset 1621:1d8191199d0d (3 changesets remaining, ~1 tests)
5 files updated, 0 files merged, 0 files removed, 0 files unresolved
vincentliu@vm1:~/replicode$ hg bisect -g 1621
Testing changeset 1622:65b4f19e8941 (2 changesets remaining, ~1 tests)
9 files updated, 0 files merged, 0 files removed, 0 files unresolved
vincentliu@vm1:~/replicode$ hg bisect -g 1622
The first bad revision is:
changeset:   1623:f7b12d17a79b
parent:      1618:ac9135ec8e99
parent:      1622:65b4f19e8941
user:        Anonymous Person <xxx@xxx.xxx>
date:        Fri Aug 07 14:13:56 2009 +0100
summary:     Automated merge with http://xxx

Some of the output has been slightly modified to protect the innocent[1].

The first two bisect commands, with the -g and -b arguments indicate the two changesets that you know for certain which are good and bad respectively. With the boundaries set, Mercurial goes on its way and starts checking out changesets that are committed in between.

For each checked out changeset, you'll have the opportunity of testing it out; once you are done, you can indicate whether if the changeset is good or bad by using the same commands with the checked out changeset number. Mercurial then checks out the next changeset for you to test.

Here's where there is some serious voodoo at work; notice that I had 57 changesets in between. It's going to be a nightmare if I had to test out every single one of them - instead, Mercurial tries to subdivide through the changeset smartly to isolate the problem (hence the term 'bisect') and gives you a general estimate of how many more tests you have to do before Mercurial can definitively point out the problematic changeset. In my case, it took me 8 tries. Pretty impressive eh?

I certainly have my few criticisms about Mercurial, like my initial incredulity of how the developers came up with the command of hg when their software system is called 'Mercurial'; but past those kinks and initial gag-reflexes, I have to admit that my experiences using has largely remained satisfactory. And if you're one of those who are still at two minds about using a distributed revision control system, I urge you to give it a try. You just might like it!

[1] It's not nice to put the names of the people I work with on my blog without their consent.

Wednesday, July 22, 2009

Java is not the JVM

For many IT people, it sounds funny to assert that the Java language has nothing to with the JVM itself. But as incredulous as it sounds, this is actually true. Let me explain, using some code as a shallow illustration how this is the case.

When I was hacking at the Java bytecode level, one of the things that I do is to optimise for memory efficiency. There is a need for storing an array of booleans, and the most obvious way of saving memory is to store it at a bitwise level, by stashing 8 boolean values within a byte.

Within the JVM, booleans are stored as bytes (executionally, they are worse: the VM treats booleans as ints!). Furthermore, in Java, there isn't a low-level means of utilising booleans as integral types like C can. If you had to write code in pure Java, at best you'll end up writing code like this:


// assume z == boolean[8]
byte b = 0;
for ( int i=0; i < 8; i++ ) {
  if  ( z[i] == true ) {
b |= ( 1 << i );
}
}

Unlike C, the code is clunky, as you are having to perform a conditional check on a boolean, before you can perform bitwise operations on the values, because Java considers booleans as a non-integral type. How annoying!

But this constrain only affects the Java language - the same rules do not apply when it comes to the JVM. On the VM, it is perfectly legit for you to express code like this:


// assume z == boolean[8]
byte b = 0;
for ( int i=0; i < 8; i++ ) {
  b |= z[i] << i;
}

However, just about any Java compiler disallows this code to compile - the operations on the boolean violates type-safety. But don't blame the compilers, they are just conforming to the language specifications. But since the JVM has nothing to do with the Java language, there is nothing illegal in doing so outside the Java language, let say by using bytecode assembly. Here's an equivalent, using jasmin assembly code:

.source BooleanToByte.j
.class BooleanToByte
.super java/lang/Object

.method public static main([Ljava/lang/String;)V
.limit stack 4
.limit locals 3

iconst_0
istore_1      ; byte b = 0;

iconst_0
istore_2      ; int i = 0;

LOOP:

iload_2
bipush 8
if_icmpge EXIT_LOOP:       ; if i>=z.length exit loop

; here's the magic code that allows you to do direct
; bitwise b |= z[i] << b="">
iload_1
aload_0
iload_2
baload
iload_2
ishl
ior
istore_1

iinc 2 1
goto LOOP:

LOOP_EXIT:

return

.end method

The jasmin code will probably assemble, but don't expect the JVM to execute it; it serves only as an example, and lacks a few things (I'm missing the constructor block and other nitty gritty little things that's needed to satisfy the bytecode verifier). It is but a case study to separate the JVM from the Java language as people typically assume.

There has been a number of other languages that has since mushroomed which relies on the JVM as its core; these languages include Groovy, Scala, Jython and JRuby, many of which are rather interesting, although they are more of a curiosity at this stage - I've yet to see any of these implementations deployed in a production environment, although I don't say that as a criticism of any of these languages. In fact, I am actually quite impressed with the JRuby, and I recommend you give it a try. It's very faithful to the actual Ruby implementation and allows you to use Java directly. Good fun, I'd say, especially when it combines the expressive of the former with the features of the latter. It's quite impressive that the JVM has been able to be so versatile in allowing other languages to plug into it directly.

GDB: Relaying Trapped OS Signals

By chance, I have managed to land myself in a situation where the bug occurs only at a signal handler. In my situation, this means that it happens only when I try to kill the program using a SIGINT, or more commonly known as the 'ctrl-c' keystroke.

GDB usually traps this signal, and other signals such as SIGSEGV (Segmentation Fault), so that you can trace buggy behaviour that is causing your application to fail. But once in a while, the error may occur after the signal is sent, when the code failure resides within the signal handler.

However the default behaviour of GDB is to trap these signals and subsequently consume them, in effect preventing the bug from occurring. To prevent unwanted behaviour in rare cases such as mine, you need to issue the following command:


(gdb) handle SIGINT pass

Given that GDB utilises this signal internally, it will ask you for a confirmation to change it. Say 'y' to it, and GDB will correspondingly pass the signal to the application after trapping, which will give you a chance to debug the handler code that is causing the bug.

Sunday, July 19, 2009

GDB's Conditional Breakpoints

Conditional Breakpoints for Scalar Types

Let's assume that you, the brilliant hacker, has coded up some really uber-cool stuffs, like this piece of code below:


1: for ( int i = 0; i < gazillion; i++ ) {
2:   doSlightlyBuggyButUberCoolStuffs(i)
3: }
4:
5: void doSlightlyBuggyButUberCoolStuffs(int i) {
6:   // your code here that needs some
7:   // fixing before it becomes uber-cool
8: }

It is doing all the cool stuffs as intended, but somehow something always goes wrong when the code executes up to 2147483648, which is kind of puzzling.

So what to do?

You may be tempted to breakpoint at line 5, at the start of the doSlightlyBuggyButUberCoolStuffs():


(gdb) br doSlightlyBuggyButUberCoolStuffs

And gdb dutifully does what it's told; every single time doSlightlyBuggyButUberCoolStuffs() gets executed, it stops and waits for you to act on it:


Breakpoint 1, doBuggyButUberCoolStuffs (i=1) at test.cpp:6
6:        // start of your uber-cool code
(gdb) c

Breakpoint 1, doBuggyButUberCoolStuffs (i=2) at test.cpp:6
6:        // start of your uber-cool code
(gdb) c

.....

Breakpoint 1, doBuggyButUberCoolStuffs (i=100) at test.cpp:6
6:        // start of your uber-cool code
(gdb) c

After 100 iterations, you think you've had enough! So it's time to do it the smart way, by setting a conditional:


(gdb) br test.cpp:2
Breakpoint 1 at 0x1234: file test.cpp, line 2.
(gdb) cond 1 i==2147483648
(gdb) run

After the breakpoint is set, gdb only notifies you when the loop is at its 2147483648th iteration:


Breakpoint 1 at 0x5678: file test.cpp:2
2:   doBuggyButUberCoolStuffs(i)
(gdb) s
6:         // start of your uber-cool code
(gdb) p i
$1 = 2147483648

Jackpot! You're now at the 2147483648th iteration! And very soon after, you found the offending piece of code, caused by a numerical overflow of a signed integer. Another bug trampled, and peace returns to your realm once more.

Conditional Breakpoints for char* Strings
But very soon after, you run into another irritating problem which is happening within another section of your uber-cool code. This time, the conditional depends on parsing a huge portion of text that comes from, um..., /dev/random :P


1: while ( true ) {
2:   char* c = getStringFromDevRandom();
3:   launchNuclearMissileIfCodeMatch(c);
4:}

Somehow, you are absolutely convinced that /dev/random will eventually provide correct codes to launch the nuclear missile, but given that launchNuclearMissileIfCodeMatch() is a really top-secret and highly obfuscated code residing in an external library called libtopsecret.so, it isn't such a good idea to debug into the call unless you want the NSA bursting through your front doors...

But since you do know the launch code (it's one of those things that you'll have to kill your friends if you ever told them), you can perform a conditional check on the string, and breakpoint at it to tell you if the secret code is ever generated by /dev/random to find out if launchNuclearMissileIfCodeMatch() is really a hoax:


(gdb) br test.cpp:3
Breakpoint 1 at 0xdeadbabe: file test.cpp, line 3.
(gdb) set $secret_code = "MyUberSecretivePassword"
(gdb) cond 1 strcmp ( $secret_code, c ) == 0
(gdb) run

And then, you let your code run... (!)

Well, unfortunately, you get sick of sitting around and waiting for it to happen after a whole day. It seems like /dev/random doesn't really generate your uber-secret nuclear launch codes as frequently as you would like to think. In the meantime, the world thanks their lucky stars that you haven't caused a nuclear winter to materialise just yet... :)

Saturday, July 11, 2009

How to lose weight and be healthy

I have little patience with people who complain about their weight problems, and for a good reason - I have a really simple way for losing weight which does not involve going on a crazy exercise binge or starving yourself silly. Sounds miraculous? Well, read on.

The easy part that you already know, is to make some wise choices about your diet, exercise moderately, and be consistent at both. Yes, it is both sensible and achievable.

The second part is usually what squeezes the life out of one, when I say:

"Stop eating chocolates, cake and ice-cream!"

My advice makes me sound like I'm a Nazi Dictator violating their 8th Amendment rights. But no, and let me reassert this again, it is that simple.

I'm deadly serious.

Chocolates, cake and Ice-cream belong to a broader category I classify as 'leisure food'. It's leisure because people eat these food for their own enjoyment, not for sustenance. This applies to all other snacks (cookies, biscuits, sweets, chips and flavoured drinks) - if that is what you are usually consuming when you're not hungry, then you shouldn't be having it.

"But, but... that's impossible!"

I'll tell you a story about a friend of mine. Let's call him 'Bryan'. (Oh, that is your real name! Let's see how long before someone points you to read this entry :P) Bryan has a minor weight issue, and he once told me about this amazing story that his household fridge was like The Bag of Holding; each time he opens it, there will be a slice of chocolate cake or other sweet delights waiting for him. No surprises to why he isn't able to keep his hands off the sugary stuffs.

Now let me tell you another story of another friend of mine. Let's call him 'Vincent'. (Oh hey, that's me!) Vincent grew up in a poor household where there's no cake awaiting for him in the fridge any time he opens it. In fact, there's nothing there for him whenever he opens it. Poor Vincent! You might even begin think that his childhood is real deprived!

On the contrary, Vincent had never been hungry in his life; there is always food on the table, and the meals involve a simple serving of rice, vegetables, and either eggs, chicken, fish or other protein sources. Given the tight budget Vincent's mum had, there wasn't enough money to buy chocolates, snacks and fizzy drinks anyway. 20 years later, Vincent do not have much cravings for these 'leisure foods' . To others, he just seems to have an incredible self-control.

No, this isn't a tit-for-tat competition here, all I'm saying is that good moderate choices makes harder things easier in the long run.

So what do you do if you grew up in the life of Bryan?

Start small, and be consistent. Consistency is the key, my friend. In the long run, it will get easier for you. Just don't get into the delusion of wanting instant results.

There is this one other issue, like what if:

"I get hungry really easily!"

That is why you should stop eating chocolates, cake and ice-cream! And eat more meat!

Some of you might be puzzled. More meat? Let me explain why in my own personal, anecdotal example. If you ever read food labels, you'll find that a 40g bar of Cadbury's chocolate contains 205 Calories of energy. In comparison, 125g of chicken breast contains only 133 Calories of energy!

I'll answer how these number translate to in human terms. If I have more than a good portion of chicken for lunch, say 250g, I'll have consumed 266 Calories and it will keep me from being hungry till dinner. In comparison, if I had two chocolate bars for lunch, I would have consumed 410 Calories of energy AND I'll be hungry again 2 hours later!

I'm aware that there are some scientific studies out there that supports my claim, but what's more important, is that I know it works for me. This I want you to keep in mind. I certainly can tell that something within meat (or protein, I suspect) keeps one from feeling hungry sooner. Carbohydrates just don't do the same thing. It's no wonder some people swear by faddish regimes like the Atkin's Diet.

But I do not recommend anybody from trying funny diet plans, or simply going on diets at all. Firstly, if you are starving yourself, you are doing it wrong. Because it'll never work - primal instinct will always override any artificial discipline that you try inflicting on yourself. Secondly, diets promote deprivation, and the deprivation of essential nutrients to your body is certainly always harmful. All I'm saying, is to cut down on unnecessary leisure food carbohydrates, and have more meat if your body reacts to hunger the same way as I do. In addition, do stick to what the doctors recommend with regards to healthy servings of fruits and vegetables.

Eating less is good for you

Recently, it's been proven that caloric restriction will improve health and extend longevity. And this includes us primates, and not just in lab mice. If you're too lazy to read the links, here's a visual difference between two monkeys where one is on caloric restriction and the other isn't. Guess which one is which?

If you're cringing in horror about how miserable the rest of your life will be starving yourself, think again. You'll be surprised how it does not involve starving. There's no contradiction, and I'll use myself as an example. For the numbers to make sense, we need to know how much Calories I need - and there are plenty of calculators out there that will help to figure this out, Google it.

Now, for a 30 year-old, 1.75m, 65kg guy, my Caloric requirement is about +/- 2500 Calories, depending on which calculators used. A 25% Caloric reduction is what they've used in the scientific studies, so that brings my figure to 1875 Calories. Now, lets calculate what I eat on a given day, which includes breakfast, lunch and dinner with no snacks.

Breakfast
4 slices of bread, 68g - 200 Calories
Nutella, 18g spread! - 98 Calories
Tea - 2 Calories

Lunch
200g Chicken - 212 Calories
250g Rice - 867 Calories

Dinner
300g Salmon - 447 Calories
100g Broccoli - 35 Calories
133g Apple - 65 Calories

Total Calorie Count : 1926 Calories

It's slightly more than 1875, but it's not too far off range. And I'm not rationing myself either - I've given generous estimates for the spreads (18g spreads to 67g of bread!), and dinner is 2 portions of Salmon (supermarkets estimate of portion sizes are ridiculous!). Give and take, that is what I normally have without having to feel hungry at all, it's quite amazing to know that I'm still within their bounds of 'Caloric Reduction'!

If you've noticed, it is the carbohydrate foods that are giving an amazingly high Caloric count - look at rice at 867 Calories at 250g! And I never knew this until I looked this up myself!

More chicken from now on, please!

Supersized Nations

Being fat is a peculiar problem that exists only in industralised nations. No surprises, since that is where all the money is. Indulgence is an economic problem if you actually think about it - it only afflicts people who can afford it. I'm just 'lucky' I grew up in a circumstance where I do not have a luxury to indulge in leisure foods, although this doesn't make the make me the model case study for solving the fat nations problem.

It is really not a problem if you understand this is how capitalism works. Businesses are incentivised to create things you want, and for food companies, to create foods that are tasty, so that you'll buy them. Tasty foods means that you'll want to have more. People often cite the reasons of growing fat on the an abundance of food and the general lack of exercise in the modern world - it is true, but it's no more correct to blame it on the companies who make tasty foods, than to attribute blame on themselves. If food companies are predatory to the addiction of your taste buds, should we go after them like we do on hard drugs?

I like capitalism - it is your prerogative to see how best to spend your money, so it is within your rights to eat yourself silly, and then spend a fortune on treatment - that is a personal choice.

But it irks me to see how governments are socialising these costs at the expense of sensible people - it's just one of the things in our society I like to describe as 'mad'. If it's 'too big to fail', nobody should have allowed it to grow that big in the first place. So far, governments are just plain inept at dealing with both supersized companies and supersized people.

So, do not bring weight loss problems as a conversation to my dinner table. And if anybody does have the temerity to raise it, be prepared for a session of ridicule - and if he or she takes it in good humour, maybe I'll refer them to what I have to say on this subject matter.

Friday, July 03, 2009

Scottish Whisky

Feels like you've drunk a cigar.

Wednesday, June 24, 2009

I ♥ Amsterdam!

First Stop, Brussels. Ok, you must be scratching your head now, and asking "Brussels? Like did you read your maps upside-down and got lost?" Well, the truth can't be any simpler: it is cheaper if I fly into Charleroi and drove to Amsterdam! Seeing two cities for the price of one, nice!

That's the town centre of Brussels, gorgeous!

In one part of the city, there's a wall with the mural of Tin Tin. There's quite a bit of Tin Tin stuffs here, no surprises, given this the birthplace of its author, Herge.

Belgium waffles! Quite delectable, I had the one covered with strawberries and laced with chocolate, mmmmh! And it was free! Not from the shop, but from a new friend of mine that I've met while travelling out from Dublin airport, a fellow Singaporean traveller, which is as rare as hen's teeth! Surprisingly, he was from Singapore's Ministry of Foreign Affairs, and remarked that I'm quite possibly the only Singaporean living here! I feel so special already! :)

The architecture in the town centre is just incredible. Look at all the life-like statutes that are part of the walls of the buildings. For the lack of time, since I had a long drive ahead, that's all from Brussels. After which I had to take off and drive down to my newest favourite city, Amsterdam!

My new friend had actually spent a few days in Amsterdam already, so essentially I've gotten a free guide to show me around, which was really cool. It's really a quirk of fate, given that most Singaporeans may gladly give Ireland a pass, but not him - being an avid reader, and a fan of James Joyce, which was the main reason why we got to cross paths.

Weed! You can now understand why I find this place so charming - not the soft drugs if that's what you're thinking about! It's the liberalism, dammit! People are free to do whatever they want here, but take the responsibility for their actions, of course. Still can't believe that I'm travelling with a ranked civil servant here - hopefully he picks up a few lessons and report to his political masters that it is fine to have more liberalism! There aren't any major law and order problems here just because soft drugs are available - proves that we certainly don't need to hang people just because they have a bit of grass in their pockets. Well, even Barack Obama has inhaled it ... had he been living in Singapore rather than America, they would have snuffed the life out of him before he can ever live up his potential to become the 44th President of the United States.

Sorry guys, if you are looking for NSFW pictures - for consolation, you can see that I'm in the 'red light district' of Amsterdam. There's an Erotic Museum where I was told you'll be able to learn all you need to know about the history of sex. Fortunately, or unfortunately, Amsterdam has cleaned up quite a fair bit the last couple of years - I was told that half the city centre used to be areas where you can see skimpily dressed women behind glass boxes. Today, it's reduced to a only small section within one alley - and there isn't much to see as well. Hell, there's probably more naked bodies lying on Bondi Beach (NSFW!) any given summer day than in Amsterdam!

The only nakedness I've witnessed was from a drunken Englishman who had decided that his penis was probably a better show than all the ladies there. The crowd were wild, clapping, cheering and cat-whistling while he's twirling his thing out of his pants and flashing it to the crowd. On a related note, the booth you see up there is a peeing booth - yes, you go in and pee on the side of the streets, pretty visible to everybody else. No prizes for guessing what that dark patch on the floor is. ;)

Amsterdam is full of canals and quaint little houses, some tilted with age, as you can see up there. Quite charming, although I wouldn't want to be living in there if there's a serious risk of it toppling over.

That's how beautiful Amsterdam is. Oh, yeah you can see that it's devoid of people in the picture, which is kind of cool - that's because I woke up at 5am on a Sunday morning to take a wander about. The only people left on the streets were the few revellers who were probably too drunk to know how to go home.

I'll leave you with a final picture of one of the canals of Amsterdam. It's absolutely gorgeous for a place, and certainly one of the best cities I've been to in my life, and if you ever have the opportunity to visit it, go - I assure you won't regret it!

Sunday, June 21, 2009

Configuring your Linux Firewall using iptables

When I first started out using Linux, I was quite daunted by 'iptables', the built-in firewall that is bundled with the Linux kernel. Given there is a general misconception from a lot of people's that configuring it is anything but easy has also compounded towards my reluctance to try to learn it in detail initially - but no surprises here, as the good tutorial I've referenced has 16 chapters and 10 appendices! It's little wonder why some people might be scared away by that.

But there is a good reason why a tutorial about iptables is that big - computer security is all about the details, most of the time you know all the details on the different aspects of network security to understand the whole picture before you can design a comprehensive firewall that provides all features you want without letting malicious traffic through.

Still, if you're just setting up a simple home network + firewall, it shouldn't be that difficult. And it isn't really. I'll show you a few recipes you can use to set things up properly without having too much RTFM.

For illustration, I'll use the following setup that I'm running at home as an example:

My server is an old Celeron PC which acts as the firewall. It has an ethernet card which connects to a wireless switch where the Internet connection gets shared by all my laptops connected to My LAN. How the server connects to the Internet is via my Huawei E220 broadband modem. It just convenient to have my configuration this way as well, since my old iBookG4 has no suitable drivers. The broadband device is recognized as ppp0 as shown in the diagram above. Let me now show you a few interesting things you can do with your 'iptables' firewall.

Recipe #1 Forward Internet Connections using IP Masquerading
You want to let your LAN make connections from the Internet. This is one of the cool features that iptables provide that makes it more than just a firewall. Before you make changes to your firewall entries, you'll need to make some changes to your kernel's configuration to enable it to forward IP traffic. To do this dynamically, run the following command:


echo 1 > /proc/sys/net/ipv4/ip_forward

The changes you've made above will be lost the next time you reset your computer. To make this change permanent, you have to make changes to /etc/sysctl.conf to include the following line:


net.ipv4.ip_forward=1

Once that's set up, we can issue the commands to iptables to start forwarding traffic from the LAN to the Internet:


iptables -t nat -A POSTROUTING -o ppp0 -j MASQUERADE
iptables -A FORWARD -i eth0 -o ppp0 -j ACCEPT
iptables -A FORWARD -i ppp0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT

The formal name for forwarding network traffic is called 'Network Address Translation', or NAT for short. This explains why the first iptables command has nat in it. It basically instructs the firewall to remember the connection that gets forwarded out the Internet. It needs to do this do multiplex different connections from the LAN into a single connection out to the Internet, and then smartly demultiplexes the received data back to the requesters. The next two iptables directives tell the firewall to allow forwarding of packets to the Internet from the LAN, and only allow data packets from the Internet to be sent back to the LAN only if there previously are connections requesting for it. This effectively denies any illegal traffic from coming into the LAN unless computers within it explicitly allows it to.

Recipe #2 Differentiating Traffic between LAN and the Internet
Often, you'll want to assign different rights to traffic from your LAN vs. the Internet. Traffic from your LAN is usually trusted, and hence is within the safe boundary, while Internet traffic is regarded as hostile, hence classified as unsafe. Like the diagram shown in my example above, data from the Internet via device ppp0 is the unsafe network which I'll want to have differentiating rules from my safe LAN network originating from eth0.

Firstly we want to create the two different chains to represent traffic from eth0 and ppp0:


iptables -N ETH0_FILTER
iptables -N PPP0_FILTER

Once the chains are created, we have to tell the main INPUT traffic chain to segregate the traffic between the two networks:


iptables -A INPUT -i eth0 -j ETH0_FILTER
iptables -A INPUT -i ppp0 -j PPP0_FILTER

Once the different changes are linked to the main input, we can now provide rules to treat the different networks separately. For example, if we want to let our LAN network access everything, and only allow SSH traffic from the Internet, we can put rules like these:


iptables -A ETH0_FILTER -j ACCEPT
iptables -A PPP0_FILTER -p tcp -m tcp --dport 22 -j ACCEPT
iptables -A PPPo_FILTER -j DROP

This will drop all other traffic except SSH on ppp0. For other interesting ways of configuring how you want to filter your traffic between the different chains keep reading the remaining tips.

Recipe #3 Logging Suspicious Traffic
How would you know if you are under attack by malicious Internet traffic? Simple, by logging these intrusions. Here's one way of logging these intrusions:


iptables -A PPP0_FILTER -p tcp -m tcp --dport 22 -m state --state NEW -m recent --update --seconds 60 --hitcount 2 --name DEFAULT --rsource -j LOG --log-prefix "DROPPED:"

The example above says that if there are more than 2 consecutive connections from the Internet to my SSH port (22) within the last 60 seconds, then LOG the message with the prefix "DROPPED:". Obviously, this line only logs the connection, what I've omitted is to drop the connection (see Recipe #4 below).

Recipe #4 Rate Limit Spam Traffic
Bots and spammers usually rely on software that repeatedly scan and access your server to try to bruteforce their way in. On machines with a noisy harddisk like mine, the repeated clicking sound is a dead-giveaway (not to mention the annoyance!) So to stop from them from repeatedly doing so, we enact a rule that drops packets if too many incoming new connections are attempted within a short period of time:


iptables -A PPP0_FILTER -p tcp -m tcp --dport 22 -m state --state NEW -m recent --update --seconds 60 --hitcount 2 --name DEFAULT --rsource -j DROP
iptables -A PPP0_FILTER -p tcp -m tcp --dport 22 -j ACCEPT

The first line tells the firewall to log all new incoming connections - if more than 2 new connections are attempted within 60 seconds, then all the remaining connections will be dropped until the 60 second period times out. Given my default policy of my firewall is to drop connections, the second line is included to explicitly ACCEPT the connection, if the first rule does not match (ie, no more than 2 connections are seen within the last 60 seconds).

Recipe #5 Fight Back Spammers By Tarpitting
A tarpit connection is one that delays incoming network connections as long as possible. This technique causes spam connections to slow down, limiting the amount of computers that it can spam. However the iptables version of tarpit is a slightly more advanced variant: it sets the TCP acknowledgement window of the network packet to 1, forcing the sender to generate a full TCP packet per 1 byte of data it tries to send, making it computationally costly for spammers as it saps the computer's CPU resources. If you like to fight back against spammers, then this tip is for you.

To enable tarpitting, this requires you to patch and recompile your kernel, which is an entire post itself, so read my more detailed post on how to enable tarpitting.

Recipe #6 Making Your Firewall Changes Permanent
After making all those nifty changes, it would be a shame if they got lost the next time your computer rebooted. So here's how you can make these firewall settings permanent. Once you are satisfied with all the changes you are making to your firewall, save it by invoking iptables-save:


iptables-save > /etc/iptables.rules

The above command pipes all the configuration into /etc/iptables.rules file. Once you have that, you'll want to restore the configuration every single time your computer starts up. There are quite a few places where you can start restoring the firewall, I do it in my /etc/rc.local file, after my ppp connection is started, where I insert the following line:


iptables-restore < /etc/iptables.rules

And you're all done. Now you can sit back, relax and enjoy the security features of your firewall!

Wednesday, June 17, 2009

Getting System Information from Linux

Here are some commands that I commonly use to find information about my system. The amount of information you can get on your computer can be vast and varied - it depends on how detailed you want to go into each of the subsystem on your computer. I'll try to group them in order that is most sensible, and also, note that these commands may be Ubuntu/Debian specific.

Listing devices on your mainboard:

 
biosdecode                 # information about your BIOS
lshw                       # gets quite a bit of information on everything about your CPU
lspci                      # get devices on your PCI bus
lsusb                      # list devices on your USB
dmidecode                  # get device information via BIOS
fdisk -l                   # get partition info on your harddisk

Getting information on your OS:


cat /proc/cpuinfo          # get information about your processor 
cat /proc/meminfo          # shows memory usage
free                       # show available free memory
top                        # detailed memory usage by process
htop                       # a better version of top
lsof                       # shows which file handle is opened by processes
lsmod                      # shows loaded kernel modules
dmesg                      # output bootup information
lsb_release -a             # see which distro of OS you're using
ps -e aux                  # list all running processes
df --si                    # show amount of free disk space
hdparm -t harddisk_device  # show performance of harddisk
ifconfig                   # show network configuration
route                      # show network routing configuration
iwconfig                   # show wireless network information

Sunday, June 14, 2009

Ubuntu on iBook G4

People must think I am going gaga; I have installed Ubuntu on every different CPU architecture I have laid my hands on, and now on my Macbook G4!

Mac Zealots won't be pleased. But, don't you worry - the Mac OSX image is still living somewhere in the system. Unfortunately Ubuntu isn't as efficient in power utilisation as the Mac OSX is on the iBook G4: the machine gets hotter much quicker and you can hear the fan whirring at a much more regular interval.

So I've got Ubuntu/Xubuntu living in various incarnations now; on an UltraSparc, PowerPC, x86 and AMD64 (ok, I've double counted if you consider 64-bit as a variant of the x86 architecture ;)

Before I get labelled an Ubuntu zealot, I need to clear the air a little. I've installed Linux because it has plenty of development tools that a software developer needs; and Ubuntu because it's an easy distro for installation. Still I'm no less impressed by the vast amount of hardware Linux supports.

I certainly think Linux takes the crown for being an ubiquitous OS, in spite of being driven by a purely free software development movement - remember that nobody gets paid to do this, and yet people are generous enough to donate code and effort to make this all happen. The irony in this, is that it is exactly of Linux's free nature that makes supporting so many different hardware possible in the first place.

Related Posts: It's Alive! (Linux on UltraSparc)

Thursday, June 11, 2009

Setting up a tarpit on Ubuntu Linux

It's amazing to see how big botnets can grow up till these days, and they really have plenty of computing power to spare. So what do botnet owners do with these unused free computing power after looting all valuable information from the poor victim? They waste it on scanning on any potential possibilities no matter how minute a chance of finding an opening is.

In the days when computer resources are scarce, computer bots don't bother port scanning addresses when ping requests doesn't provide a response. But not anymore. They know that there are people out there who are slightly more tech-savvy and do not want to be annoyed - so today's bots have no qualms in trying to scan every single port on a network address, even if ping does not respond.

Well, my computer security philosophy is simple: scanning the ports on my computer constitutes as aggression - if you engage in such activity, then it means I am free to retaliate in response to it.

Even so, I do not mean launching an attack on the infected computer; but I'll make your bots waste it's resources by making connections that leads to a dead end. On the flip side, in the process of doing that, this scheme will not waste my own resources by doing it. Typically, an activity like this is termed as 'tarpitting'. So let's see how we can set up a tarpit to fight these bots.

Patching the Kernel
In order to perform tarpitting, we need to rely on Linux's firewall, iptables and the 'tarpit' module. But since the 'tarpit' module on iptables isn't supported on default on Debian/Ubuntu anymore, the only way to enable it is to patch the kernel and recompile it. This may sound daunting to a novice user, but there really isn't a need to; all you need is just some basic knowledge and patience to set things up.

Firstly, a patch to the kernel becomes necessary. It's currently unofficially maintained at http://enterprise.bih.harvard.edu/pub/tarpit-updates/, and marked as being 'unsupported' or 'obsolete' by netfilter team themselves, which essentially means use at your own risk! I'm usually a risk-taker (only when it comes to computer software ;) so it's not a big issue. You should work out if this is right for you.

You'll first need to download the kernel sources, and set up the corresponding environment for recompiling your kernel. A detailed step-by-step procedure is provided in the Ubuntu Wiki. I'm just going to skim through the details from the wiki, and show you the commands that is relevant for version Ubuntu Intrepid:

% apt-get install linux-kernel-devel fakeroot build-essential makedumpfile
% apt-get install build-dep linux
% apt-get source linux-source

Now you need to find out what version of the kernel you're running before you can download and apply the corresponding patch. The version is shown as the directory name of the source you've downloaded, eg:

% ls -l /usr/src/
linux-source-2.6.27

What we are interested is the number indicated in bold. In my case, it's 2.6.27. We need to do a few things here: firstly we want to inherit all the old configuration that came with your currently working kernel, so that the newly compiled kernel will be the same as the original. Then we can download the patch and apply it to the linux source, so that only change is the addition of the tarpit feature:

% cd /usr/src/linux-source-2.6.27
% make oldconfig
% wget http://enterprise.bih.harvard.edu/pub/tarpit-updates/tarpit-2.6.27.patch
% patch -p1 < tarpit-2.6.27.patch

The patch should apply cleanly, which means now you have the tarpit feature in the kernel. But that's not enough, you need to make sure tarpit is compiled, as a module generally. To do this run:

% make menuconfig

And select 'M' from the menu options

Networking Support -> Network packet filtering framework (Netfilter) ->Core Netfilter Configuration -> "TARPIT" target support

.

Compile Time!

This is when you need to sit back, go make yourself a cup of coffee, and be patient. On my 500Mhz Celeron box, it took about 6-8 hours of compilation time on a Saturday morning. Essentially, I just left it compiling while I went out to enjoy a bit of the sunshine - you should too, especially if you are compiling on a slow computer like me.

There really isn't anything too exciting watching a computer churn out code, kind of like watching grass grow. :)

Issue the following commands to start the compiling process, and then wait:

make-kpkg clean # only needed if you want to do a "clean" build
fakeroot make-kpkg --initrd --append-to-version=-tarpit kernel-image kernel-headers

If Ubuntu complains about not finding make-kpkg, then you may have to install 'kernel-package' (apt-get install kernel-package). This will start off the compilation. Once you've completed there should be 2 Debian packages resulting from the compilation. All that's left to do is to install them:

% ls *.deb
linux-headers-2.6.27.18-tarpit_2.6.27.18-tarpit-10.00.Custom_i386.deb
linux-image-2.6.27.18-tarpit_2.6.27.18-tarpit-10.00.Custom_i386.deb

% dpkg -i linux-image-2.6.27.18-tarpit_2.6.27.18-tarpit-10.00.Custom_i386.deb
% dpkg -i linux-headers-2.6.27.18-tarpit_2.6.27.18-tarpit-10.00.Custom_i386.deb

The installer will make modifications to the boot loader (usually GRUB these days), and adds two new entries into your boot menu. If you haven't made any customised changes to it, usually the installation process will not require any intervention and should complete automatically.

Reboot your computer and you're set for setting a tarpit up!

Configuring 'iptables' for Tarpitting

To utilise tarpit, you need to configure the rules on your firewall (iptables) to tarpit on incoming connections. There are plenty of excellent tutorials out there explaining how to use iptables to achieve what you want to do with your firewall, and it's beyond the scope of my entry to cover it all here. I'll just give a few simple examples on how you can use it to waste the resources of bots and spammers.

To tarpit SMTP connections (assuming that you are not running an SMTP server):

iptables -A INPUT -p tcp -m tcp --dport 25 -j TARPIT

To tarpit incoming botnet bruteforce attacks on SSH:

iptables -A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -m recent --set --rsource
iptables -A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -m recent --update --seconds 60 --hitcount 2 --rsource -j TARPIT

The example limits SSH attempts to 2 connections in 60 seconds. And if any connection tries to connect at a rate higher than that, then the connection is sent to the tarpit immediately. My actual configuration is even more stringent than that; given that my SSH connections are verified by keys and not by password, there is never a chance that I could have sent a wrong password and hence tarpitting myself. For an average user who accidentally connects to my server, it isn't really too much of a problem - the connection will eventually time out.

But lets see what happens when a spambot tries to connect repeatedly. I'll simulate this by using nc to act as a spammer. Let see what happens when I set the rule to just DROP:

# iptables -I INPUT 1 -p tcp -m tcp --dport 25 -j DROP
# nc localhost 25
^C
# nc localhost 25
^C
# nc localhost 25
^C
# nc localhost 25
^C
# nc localhost 25
^C
# netstat -apn
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      4227/sshd

DROP just does what it's told; that is to drop the packet, and that's the end of the story. The spambot will just shrug its shoulders and move on to find another spamming target. But see what happens when when we turn tarpitting on:

# iptables -D INPUT 1
# iptables -I INPUT 1 -p tcp -m tcp --dport 25 -j TARPIT
# nc localhost 25
^C
# nc localhost 25
^C
# nc localhost 25
^C
# nc localhost 25
^C
# nc localhost 25
^C
# netstat -apn
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      4227/sshd   
tcp        0      1 127.0.0.1:36183         127.0.0.1:25            FIN_WAIT1   -           
tcp        0      1 127.0.0.1:36185         127.0.0.1:25            FIN_WAIT1   -           
tcp        0      1 127.0.0.1:36184         127.0.0.1:25            FIN_WAIT1   -           
tcp        0      1 127.0.0.1:36181         127.0.0.1:25            FIN_WAIT1   -           
tcp        0      1 127.0.0.1:36182         127.0.0.1:25            FIN_WAIT1   -

As you can see, the connections are stuck in the FIN_WAIT1 state, waiting for socket time outs to occur. So tarpitting works like a reverse syn-flood attack, but in this case the 'damage' is self-inflicted - the more aggressive a spambot is in trying to make a connection to us, the more it gets its resources exhausted. This helps to use up the computing resources of the spam computer, and engaging it in unproductive activities, thus preventing it from spamming more targets.

What if the 90% of the world Tarpits?

Unfortunately, most spambot code writers have wisen up to these techniques, and correspondingly have adapted their system to make their socket timeouts relatively short, thereby minimising the impact of such a defensive system. However, if most of the computer systems in the world utilises such a system, it will make it prohibitively expensive for spammers to engage in such activities.

But the reality is, the majority of computer users do not understand the implications of this philosophy for it to work out. In fact, tarpitting will have been a good way of deterring most spam without adding more costs to paying customers like us. Imagine if 90% of all the computers are adversarial like this; then spam bots will have been wasting their resources 90% of the time. That should make the economics of spam a bad proposition to spammers, rather than the reverse situation we are having today - the majority of spam is handled by ISP's filtering, wasting 90% of the Internet's email traffic on spam, annoying email users, and charging consumers money to take away the problem.

If you haven't noticed it yet, in essence, we are indirectly paying for the costs these spammers incur. And that pisses me off.

As a parting note to my post, I hate all spammers with a passion, so let this be a warning to all link-spammers on my blog - as much as I dislike spammers that I'll tarpit their connections, I do not take kindly to your link spam on my blog. Don't even bother do try, they are screened, and if your comments are just superficial irrelevant stuffs, you can bet your ass that it's never going to see the light of the day! And don't ever let me get my hands on your IP address ... :P

Thursday, June 04, 2009

Examining binary files in Linux

A few different tips assembled together for one to find out information about an executable binary in Linux.

To assert that the file is a binary executable (or some other file types):


file file.bin

To see what the legible strings within the binary file is:


strings file.bin

To do a hexdump of the file:


od -tx1 file.bin

To disassemble a compiled binary:


readelf -b file.bin -m i8086

To disassemble an binary object file:


objdump -DaflSx -b file.bin -m i8086

To list the symbols in an object file:


nm file.bin

To see what shared library it's being linked with:


ldd file.bin

To see a trace of what libraries it calls / files open dynamically:


dtrace file.bin

To debug through it's execution:


gdb file.bin

To unmangle function names if code is compiled with C++:


echo "<mangled_symbol_name>" | c++filt

Monday, June 01, 2009

How to 'make' a Euro / Sterling Key In Linux

I never had to deal with the problem of handling foreign currency symbols, given that the countries I've lived in the past use the same terminology, where the only difference is the prefixing of their respective country name to the word 'dollar'.

But living in the Eurozone and for being so near to the UK, the idea of expressing money in dollars is relatively quaint experience to them as much 'a quid' is to me. This difference is visibly noticed when it comes to computer keyboards.

Keyboards for Europe with the exception of the UK have their default currency symbols mapped to the '€/£' symbol by default - there are key other layout quirks which make these keyboards infuriating to use but I'll leave them for another day.

Even though I still reflexively swap Euros for Dollars in my daily conversations, at least my 'foreign' accent helps people to contextually frame what I meant, but typing '$' signs when you mean '€' certainly confuses people. My workaround in the past was to type 'Euros' at every instance when I mean currency, which is really becoming tiresome.

So, the impetus aside, here's a quick tutorial to show you how to generate a Euro sign.

First, we need to find out the keycodes of the keys that we want to remap. We do this by invoking 'xev', which traps all keystrokes and mouse movements. The keys we want to trap are the currency symbol, which is usually the same key as the numerical key '4' on the alphabetical side of the keyboard, and the right 'alt' key, which I will use as the special shift key to get € and £ without losing the $ symbol. A capture of xev looks like this:

% xev
KeyPress event, serial 31, synthetic NO, window 0x2800001,
  root 0x6b, subw 0x0, time 2804155, (256,85), root:(807,409),
  state 0x0, keycode 13 (keysym 0x34, 4), same_screen YES,
  XLookupString gives 1 bytes: (34) "4"
  XmbLookupString gives 1 bytes: (34) "4"
  XFilterEvent returns: False

KeyRelease event, serial 34, synthetic NO, window 0x2800001,
  root 0x6b, subw 0x0, time 2804251, (256,85), root:(807,409),
  state 0x0, keycode 13 (keysym 0x34, 4), same_screen YES,
  XLookupString gives 1 bytes: (34) "4"
  XFilterEvent returns: False

KeyPress event, serial 34, synthetic NO, window 0x2800001,
  root 0x6b, subw 0x0, time 2807796, (256,85), root:(807,409),
  state 0x0, keycode 108 (keysym 0xff7e, Alt_R), same_screen YES,
  XLookupString gives 0 bytes:
  XmbLookupString gives 0 bytes:
  XFilterEvent returns: False

KeyRelease event, serial 34, synthetic NO, window 0x2800001,
  root 0x6b, subw 0x0, time 2807933, (256,85), root:(807,409),
  state 0x2000, keycode 108 (keysym 0xff7e, Alt_R), same_screen YES,
  XLookupString gives 0 bytes:
  XFilterEvent returns: False

ClientMessage event, serial 34, synthetic YES, window 0x2800001,
  message_type 0x11a (WM_PROTOCOLS), format 32, message 0x118 (WM_DELETE_WINDOW)

There's a number of other events being truncated so that I'm only showing the relevant portions. The first two keypress/keyrelease set shows the keycode for '4' as 13 and the second set shows that my right 'alt' key has the keycode of 108.

Armed with these numbers, let's create a .xmodmaprc file in your home directory:

keycode 108 = Mode_switch
keycode 13 = 4 dollar EuroSign sterling

Once the file is created, to activate the change immediate, simply issue xmodmap:

% xmodmap ~/.xmodmaprc

And viola*, by pressing 'right alt' + '4' gives me '€' and 'shift' + 'right alt' + '4' gives me '£'!

* Don't even get me started on umlauts and accents ;P

Friday, May 29, 2009

Huawei E220 Modem on Linux

Unfortunately for mobile broadband Internet connections, network setups are usually difficult and inconsistent experiences between the different network providers. Getting the modem to work can be a rather frustrating experience if things don't work straight out of the box.

Before I start, I'll let you know that some of the settings here may be specific only to my provider, O2 Ireland, which you may have to do your own specific tweaks, and as the saying goes, "your mileage may vary".

On the good side, the Huawei E220 modem seems to be a rather popular and well supported device, and it did on one instance worked straight out of the box on one of my friend's computer running Ubuntu 9.04 with Network Manager. It doesn't seem to work on 8.10, not on my machine (Ubuntu) or my laptop (Xubuntu), which may just boil down to configuration issues, or not. On the funny side, when I tried to get the settings off my friend's computer by right-clicking on it, it simply froze the machine entirely. (Windows users, insert your jibes here ;p)

This problem is probably specific to O2, since I had a Vodafone modem dongle that I had borrowed before, which worked flawlessly on when plugged into my laptop, which I had assumed will be the case when I got O2, but it turned out not to be so.

Update: Found the reason why Network Manager will work in 9.04 but not 8.10, the newer release had included Modem Manager which had specific setup that will request for a PIN which the O2 card was set up in default. By comparison, the Vodafone dongle did not require a PIN, hence network manager worked without a hitch.

Anyhow, maybe these information I've gleaned from will help you find out what you need to get things to work.

What the lights on the modem really means.

Ignore what the documentation says (partially) that came with the subscription. From my personal observation, if the light is green and flashing, it means that the modem is active, just that it's not authenticated to the provider. That should be a sign that your modem is working.

The modem is also capable of flashing in blue colour (which is nowhere explained in the booklet). This means that your connection is authenticated with the provider, but currently is not having ppp connection established.

If the lights are in solid green, blue, or light blue, it means that your connection is active, and in various different operating speeds (GPRS, 3G, HSDPA respectively) as explained in the booklet. From my observations, it seems like the connection typically reside in the 3G mode (dark blue colour) when it's passive, and only switches to HSDPA mode whenever you start sending or receiving data from the network.

Modem doesn't work with Network Manager

A number of sources suggests that the E220 modem works straight with Linux via Network Manager, but it certainly didn't work for me straight off. So I had do some reading on the wireless broadband forums to try to find answers. Most of them are geared towards solving problems for Vodafone, and the information is really spotty when it comes to O2. Given that I had no idea where to look to find out what's actually happening inside Network Manager, I had to try some other alternatives.

Update: I didn't know where to look before, but I've since found that Network Manager logs to /var/log/daemon.log - still the messages are not too helpful to actually tell you what exactly is the problem.

The saviour - wvdial

'wvdial' is the alternative application that got the connection to work after a bit of reading. The documentation on wvdial can be confusing, and even as I've gotten it to work, I still don't fully understand the relationship between wvdial and pppd. Here's the excerpted config I had in '/etc/wvdial.conf', it's a little half-baked, and sometimes still fails:


[Dialer O2]
ISDN = 0
Baud = 460800
Modem = /dev/ttyUSB0
Phone = *99#
Modem Type = Analog Modem
Stupid Mode = 1
Username = gprs
Password = gprs
Init1 = ATZ
Init2 = ATQ0 V1 E1 S0=0 &C1 &D2
Init3 = AT+CPIN="1234"
Init5 = AT+CGDCONT=1,"IP","open.internet"

[Dialer O22]
ISDN = 0
Baud = 460800
Modem = /dev/ttyUSB0
Phone = *99#
Modem Type = Analog Modem
Stupid Mode = 1
Username = gprs
Password = gprs
Init1 = ATZ
Init2 = ATQ0 V1 E1 S0=0 &C1 &D2
#Init3 = AT+CPIN="1234"
Init5 = AT+CGDCONT=1,"IP","open.internet"

Do replace '1234' with the actual PIN number that you have.

You can see that I have 2 entries for the connection, which the only difference between the two is that the second entry doesn't have a PIN authentication command. This is the intentional, because wvdial does not know whether the modem is in authenticated mode or not (ie, whether if it's flashing blue or flashing green. The ATZ command does not seem to reset this authenticated state.)

If you started your computer fresh and your modem is showing a flashing green light, invoke wvdial this way:


% wvdial O2

That should work if you started your modem cold. Sometimes the first connection gets dropped for no reason, and to try to reestablish the connection, run wvdial with the O22 connection instead. If you succeed, you should see an output like this:


--> WvDial: Internet dialer version 1.60
--> Cannot get information for serial port.
--> Initializing modem.
--> Sending: ATZ
ATZ
OK
--> Sending: ATQ0 V1 E1 S0=0 &C1 &D2
ATQ0 V1 E1 S0=0 &C1 &D2
OK
--> Sending: AT+CGDCONT=1,"IP","open.internet"
AT+CGDCONT=1,"IP","open.internet"
OK
--> Modem initialized.
--> Sending: ATDT*99#
--> Waiting for carrier.
ATDT*99#
CONNECT
--> Carrier detected.  Starting PPP immediately.
--> Starting pppd at Sat May 30 08:20:44 2009
--> Pid of pppd: 7312
--> Using interface ppp0
--> pppd: H�c X�c ��c
--> pppd: H�c X�c ��c
--> pppd: H�c X�c ��c
--> pppd: H�c X�c ��c
--> pppd: H�c X�c ��c
--> pppd: H�c X�c ��c
--> local  IP address 89.204.199.133
--> pppd: H�c X�c ��c
--> remote IP address 10.64.64.64
--> pppd: H�c X�c ��c
--> primary   DNS address 62.40.32.33
--> pppd: H�c X�c ��c
--> secondary DNS address 62.40.32.34
--> pppd: H�c X�c ��c

Trying Network Manager Again

If you have established a connection successfully before, but got dropped for some reason, the light on your modem should be flashing blue. In this case your modem is in the authenticated state, and Network Manager will start working happily if you wanted to use it now.

As I said, Network Manager did work with one of my friend's computer - the difference being that when I tried connecting, his version of Network Manager prompted me to key in my PIN, while my didn't. Even manually setting the PIN in the configuration won't make it work.

But at least it'll work indirectly, and tells us that the problem lies within authentication.

Fun things to do with the modem - AT Commands

Through using wvdial, I realised that the usb modem actually uses a variant of the AT commands of the phone modems I used to have for dialups and BBSes. It kind of piqued my interest a bit, and good for reliving the good old days of fiddling around with AT commands on my modem.

To do so, we'll need to find the interface in which you can send and receive commands to - 'dmesg' will be helpful for these occasions:


[   21.839161] usb-storage: device found at 2
[   21.839163] usb-storage: waiting for device to settle before scanning
[   21.849151] usbserial: USB Serial support registered for GSM modem (1-port)
[   21.849175] option 4-1:1.0: GSM modem (1-port) converter detected
[   21.849429] usb 4-1: GSM modem (1-port) converter now attached to ttyUSB0
[   21.849441] option 4-1:1.1: GSM modem (1-port) converter detected
[   21.849519] usb 4-1: GSM modem (1-port) converter now attached to ttyUSB1
[   21.849535] usbcore: registered new interface driver option
[   21.849538] option: USB Driver for GSM modems: v0.7.2

So, /dev/ttyUSB0 is the interface in which Network Manager/wvdial uses to connect to the mobile phone provider, which kind of perplexed me why there is an additional /dev/ttyUSB1 interface. One of the things that came up from googling was an out-of-date kernel support page for the modem.

It provided a tool to read the signal strength of the modem, which out of curiosity, I downloaded the source code and waded through it. That's when I realised that /dev/ttyUSB1 is the interface in which you can issue AT commands to.

Armed with that knowledge, we can now start issuing commands straight into the device! Relying on a primitive method, do this by starting two terminal windows side by side. On one window, do:


$ cat /dev/ttyUSB1

This shows you what's the output coming out from the commands issued. On the other window it is where you issue your commands. For example:


$ echo "AT" > /dev/ttyUSB1

You should see "OK" coming out from the other window, showing that the modem has acknowledge your 'attention' command. Pretty cool eh?

Fun Things To Do #1: Disabling PIN Authentication

Remember the problem with PIN authentication that prevented Network Manager from working properly? Well you can side-step the problem by disabling the PIN authentication feature on the SIM card:


echo 'AT+CLCK="SC",0,"1234"' > /dev/ttyUSB1

Replace '1234' with the actual PIN number that you have. This should disable the need for authentication. A word of caution: do this only if you're not too concerned about the physical security of your modem, otherwise if it gets lost or stolen, others can start using your Internet connection for free!

Fun Things To Do #2: Get SMS Messages

For linux users, we aren't provided with any GUI for us to access and send SMS messages from the SIM card. Unfortunately the O2 site registration assumes that we are all Windoze users, which is the only way in which we can pull out the authentication SMS message that it sends to the mobile phone.

Well fret no more, here's how we can gain access to SMS messages simply by using AT commands:


$ echo 'AT+CMGF=1' > /dev/ttyUSB1
$ echo 'AT+CMGL="ALL"' > /dev/ttyUSB1

This should turn on SMS mode on the modem and dump out all the received SMSes. And from the output messages, you can pick out the authentication message that looks like this:

+CMGL: 0,"REC READ","02",,"25/05/27,20:33:07+04"

Welcome to O2 Broadband! Should you have any queries, visit www.o2.ie/broadbandfaq or our interactive forum on http://forums.o2online.ie. Best wishes, O2.

+CMGL: 1,"REC READ","0000000000",,"26/05/28,21:04:54+04"

Your verification code is XXXXXXXX. Please go to o2.ie and continue your registration.

This is at least useful if you don't want the hassle of manually pulling out the SIM card to put in into your phone to get the SMS message for authentication.

More References

The Wikipedia page on the Hayes modem command set has a set of good starter commands on AT commands. Or to read up more on SMS AT write commands or read commands can be found in these links.

Vincent Liu

Compiling Ruby 1.9 for Mac OSX 10.4

Why you should Blame your Tools, sometimes.

How to build a Debian Package for GDB

GDB 7.0 Is Out!

Bug Hunting with Revision Control using Mercurial

Java is not the JVM

GDB: Relaying Trapped OS Signals

GDB's Conditional Breakpoints

How to lose weight and be healthy

Scottish Whisky

I ♥ Amsterdam!

Configuring your Linux Firewall using iptables

Getting System Information from Linux

Ubuntu on iBook G4

Setting up a tarpit on Ubuntu Linux

Examining binary files in Linux

How to 'make' a Euro / Sterling Key In Linux

Huawei E220 Modem on Linux

Categories

Blog Archive