Tuesday, December 18, 2007

Regular Expression matching more than a single line in Vim

As grep doesn't allow for regular matching of multiple lines, I've found that vim is a handy substitute if I didn't have use it as a pipe to another program's input.

The trick that allows vim to do so, is to enable multiple line matching by using the special regular expression escape '\_.' combined with a greedy qualifier.

The help line in the vim documentation offers how this trick is accomplished:


\_. Matches any single character or end-of-line.
Careful: "\_.*" matches all text to the end of the buffer!


It's precisely the warning highlighted in red that we are going to 'exploit' in order to allow multi-line matching to occur.

An example where this sort of matching usually applies are usually found in data formats like HTML/XML, where data sets can span multiple lines, separated by markups rather than newline characters.

As an example, say, if you wanted to match data that's bounded by <div> markup pairs, you'll have the following regular expression:


<div>\_.\{-}</div>


I often use this technique as a 'quick and dirty' way of breaking up markups into separate data easily, especially if it's only for a one-off use that doesn't merit writing a Perl or Awk scripts.

If you like reading this, you may also enjoy:

0 comments:

Post a Comment