Fun with regular expressions

Regular Expression Too Complicated Error MessageI’m working on transforming html files containing full Shakespeare plays, cleaning up the tags and adding some semantic content we need to have for a customer project. I’ve been prototyping with the nice Perl-based regular expression in good old UltraEdit, my editor-of-choice for a long time now.

In the course of referring to the UE web site for syntax and other tips, I discovered that UE has a scripting feature I just never paid any attention to. So now I’m gluing together my regEx transforms with javascript and running them in batches as I figure things out.

But I just blew my poor editor’s mind and got one of the better error messages.

I had thought that a few were getting a bit hard to understand. As in:

 strFind = '(?s)(go-scene-)(.*)(</h3>)(.*)(<a )(.*)(line-)(.*)1""';
 strReplace = '\\1\\2\\3\\r\\n<a id="go-\\80" name="line-\\80"></a><br />\\4\\5\\6\\7\\81';

BTW, when I told my daughter Becky a couple of weeks ago that I was embarking on a little regular expression project, she immediately shot back with an XKCD oldie-but-goodie that was new to me:

xkcd regex cartoon