Tuesday, December 18, 2007

Regular Expression matching more than a single line in Vim

As grep doesn't allow for regular matching of multiple lines, I've found that vim is a handy substitute if I didn't have use it as a pipe to another program's input.

The trick that allows vim to do so, is to enable multiple line matching by using the special regular expression escape '\_.' combined with a greedy qualifier.

The help line in the vim documentation offers how this trick is accomplished:

\_. Matches any single character or end-of-line.
Careful: "\_.*" matches all text to the end of the buffer!

It's precisely the warning highlighted in red that we are going to 'exploit' in order to allow multi-line matching to occur.

An example where this sort of matching usually applies are usually found in data formats like HTML/XML, where data sets can span multiple lines, separated by markups rather than newline characters.

As an example, say, if you wanted to match data that's bounded by <div> markup pairs, you'll have the following regular expression:


I often use this technique as a 'quick and dirty' way of breaking up markups into separate data easily, especially if it's only for a one-off use that doesn't merit writing a Perl or Awk scripts.

If you like reading this, you may also enjoy:
Wednesday, December 12, 2007

Ada Lovelace, The First Programmer

It is interesting that while in the IT industry today, programming is generally a male-dominated profession, but did you know that the first programmer in the world happens to be a woman?

Augusta Ada King (1815-1852), or more commonly known in Computer Science as 'Ada Lovelace', is widely regarded as being the first computer programmer. The etymology of her surname 'Lovelace' may sound funny, but that's only because of her formal title as 'The Right Honourable Augusta Ada, Countess of Lovelace', having received the honorific from her marriage to the Earl of Lovelace. Ada Lovelace is the most common form of reference to her in terms of modern literature.

Ada got to know of Charles Babbage, inventor of the mechanical general computer, the analytical engine, and through which, she undertook the task of translating Italian mathematician Luigi Menabrea's memoir on Babbage's machine.

In her translation, she had additional notes written to clarify the functioning of the analytical machine, in which the last section had contained an algorithmic example, detailing how the analytical engine should be programmed to compute Bernoulli Numbers. That was recognized by most historians as the first computer program, hence making Ada the world's first programmer.

Besides that fame, Ada was probably one of the more learned women of her time, having gone under the the tutelage of many of the contemporaries of her time. Ada was home schooled by a number of prominent people, one of which was Augustus De Morgan, who was another important person in CS history.

Augustus has contributed to one of the most important concepts that makes programming computers a possibility today. He was the first person to use the algebraic mathematical symbols that we know of today, and was the principal person behind boolean algebra. I'm sure 'De Morgan's Theorem' will ring a bell to most people who had to do engineering mathematics in school.

We may think very little of it today, but without the logic that we use with in 'if-else' that are prevalent in almost all programming languages, computers wouldn't have been too useful at all. It is just as remarkable that 'modern' technology that we know and use today are based on the things that were discovered some 200 years ago.

Ada died at the early age of 36 from medicinal bloodletting, a commonly prescribed procedure in those days, in an attempt to treat her from cancer. Today, the name Ada is probably most well known as a computer language, named in her honour by the U.S. Defence Department.
Sunday, December 09, 2007

Adding text before and after a regular expression match in Vim

Let say that you have a series of lines of text that you want to convert into 'System.out.println' or 'printf' statements, for example:

1 myapplication - an application that does something
2 usage: myapplication [-abch] param1 param2
3 -a: does a particular feature
4 -b: does another feature
5 -c: does yet another feature
6 -h: shows this help message

It's is laborious to append all those print statements manually, so the generally you'll want to use regular expressions to replace it for you. One way of doing this with POSIX compliant regular expression with grouping:


(Note that this assumes that you have selected the given block with the 'v' keystroke or from the mouse already. Subsequent examples will also assume the same.)

Vim requires escapes for the parenthesises, by which one of the ways of shortening it may be by using the '\v' (very magic) operator:


But the shortest way so far that I've found in vim is to use the '\&' matching operator, which works even when grouping is not explicitly used:


The resultant output should for the quoted example is shown below, if you are too lazy to try it and see for yourself:

1 printf("myapplication - an application that does something");
2 printf("usage: myapplication [-abch] param1 param2");
3 printf(" -a: does a particular feature");
4 printf(" -b: does another feature");
5 printf(" -c: does yet another feature");
6 printf(" -h: shows this help message");

It's certainly much more powerful to be able to perform replacements like that with pinpoint accuracy at a single go, something that 'search and replace' text editors that do not have regular expressions capabilities are able to do.

If you like reading this, you may also enjoy:
Thursday, December 06, 2007

What is a Ruby 'symbol'?

A symbol is something that looks like ':symbol' in ruby code, where it is like a variable name but preceded by a colon character. It is used extensively in Rails, which had been confusing me in the past, and it still does, given that I have not grasped full clarity of its significance yet.

The explanation given to me was that the definition of a symbol is 'like a string you never intend to show to the outside world'. The example told to me as an illustration was primarily utilized as hash keys, which I am told that it is cheaper than using strings (probably because it's immutability, which I suspect it's acting like a singleton object too). The closest example that I keep associating to, is the C preprocessor usage of '#define'.

Other than its extensive usage in Rails, I'm don't see how much it will be of any average coding use yet, unfortunately. There's still some way to go in the path of Ruby enlightenment for me.