Regular Expressions in Common Scripting Languages
I felt the need to write this post since I’m forever wishing I had this info to hand. A self-confessed scripting language whore, I’ve lost count of the times I’ve been hacking away in, say, PHP, and suddenly thought, “What’s the preg_replace() syntax in PHP?”, or “What is the best way to do a global regex match in Ruby?”
So I’ve decided to create a repsoitory for all those miscellaneous bits ‘n’ pieces, and here it is.
For now I’ll concentrate on the infrastructure issues, such as syntax, supported functions, etc. Maybe another day I’ll examine the language differences in the regex notation itself.
1. Perl
Perl is the one I never forget, since it’s where I cut my regex teeth. So I’ll recount the basics here.
Basic Match
/pattern/gcimosx
m/pattern/gcimosx
m|pattern|gcimosx
m{pattern}gcimosx
The modifiers are:-
- g global match
- c continuation on global match
- i ignore case
- m allow ^ and $ match adjacent to embedded newline
- o compile regex once only - in practice this is not generally necessary since the compiler is smart enough to do this when it knows it’s safe.
- s ‘.’ matches newline
- x ignore whitespace and allow comment
Note the delimiters can be replaced with any character you like (or pair of characters in the case of brackets). Use this sparingly because it can get hard to read, but it is particularly useful when parsing e.g. pathnames, because otherwise you’ll have to explicitly escape all the slashes.
Replace
s/regex/replacement/[modifier]
s|regex|replacement|[modifier]
s{regex}{replacement}[modifier]
Posted in Uncategorized | No Comments »