In my post about using regular expressions to find matches in a text file, I promised to tell you about the two applications I used to help me write my regex. By the way, Regex is what the cool kids call Regular Expressions.
Let’s state the problem to be solved first. If you have a text file where you want to change something that’s repeated throughout the file, it’s pretty easy to do a search/change all. We do it all the time in text editors. But what if you have a text file that is repeatedly generated and always has the same thing wrong with it? Maybe it’s a date in the wrong format. Or maybe an online system hasn’t been updated with your new company name. Or what if instead of changing the text, you just need to know what the text actually says? Let’s say it’s a date in a document and you want to write a script to change the name of the document to include the date? All of these examples are a great place to try out regex.
To refresh your memory, regex is (are?) a sequence of characters that define a search pattern. They use all kinds of crazy character combinations to help find a match. Like I said before if you don’t watch the pattern as it’s created and explained, it really looks like a cat walked across your keyboard. My first reaction the first 12 times Bart showed them to me was to assume it was some kind of witchcraft to be avoided at all costs. But I have to begrudgingly admit that I’m warming to them.
The hard part is learning the syntax. Using the googles to figure out a pattern with regex is possible but it’s boring and error-prone. I didn’t like regex until I found these two tools to help me work them out.
Expressions form Apptorium
The first tool I used is found is called Expressions. It’s a $3 app in the Mac App Store from Apptorium.
Expressions is a multi-pane interface. In the top center, you start typing your regex. Below that you paste in an example of the text you want to search. When you have created a regex that finds a match, the text below is highlighted, giving you real-time feedback on your success (or failure).
When you’re learning regex, that real-time feedback is crucial. But how do you even know what to type for the regex? The right side of Expressions displays the basic or advanced sequences of characters you can use to match text with regex. There are SO many crazy sequences in regex, trust me you’ll be glad that there is a basic option!
Here are a few examples of some very simple regex: . matches any character, ^ means to search the beginning of a line, $ means end of a line, and \d means any digit. Seems pretty simple, right? I can’t even keep track of those four simple sequences without looking at the sidebar in Expressions.
The biggest value of Expressions (for that steep $3 I put down) is that you can save an expression. As I was working on matching the time sequences in my chapter markers, I created several different options and I was able to keep them named in the left sidebar of Expressions.
I thought I was done searching for the perfect regular expressions tool, but just like the quest for the perfect text editor, one is never enough.
Patterns from Nikolai Krill
I know we already spent $3 but I’m going to convince you to spend a whole ‘nother $2 to buy Patterns – the Regex App which is $2 in the Mac App Store from Nikolai Krill. Nikolai wrote the fabulous CodeRunner app that we’ve all been using over in Programming By Stealth.
I’d like to say I discovered this app, but it was actually Helma who turned me onto it. I have to admit that I really idn’t grok why she thought it was so awesome until I really started to use it in anger. (To use something in anger is a British phrase I learned from Don McAllister – I think it means to use something for real, like in a production environment.)
At its simplest, Patterns does pretty much what Expressions does. It has a top pane where you type in your expression and a bottom pane where you paste in the text you’re trying to match. As you type in an expression, you can see real-time whether you’ve successfully found a match.
At first, I didn’t like it because it didn’t contain the cheat sheet I depended on in Expressions. But then I noticed a button in the single window that says Reference Sheet. That button brings up a grey window with all the expressions and character classes and more. It’s one of those windows I call Heads Up Display windows. Not sure if that’s a standard term but it’s the kind of window that disappears when you click away. It doesn’t isolate the basic characters from the more advanced like Expressions.
But it does so much more than Expressions. Above the regex you’re writing is a series of checkboxes: Ignore Case, Lazy, Single-line, Multi-line, and Free Spacing. When Helma first showed me Patterns, she kept talking about the checkbox for Multi-line. I had no idea what it was for so I didn’t really pay attention. However, when it came time to write my regex to match the series of timestamps in my chapter marks file, it was the key to my success.
With my regex as written, it was only matching the first time it found a match. I was afraid that I was going to have to write a loop to go through and find every line. I’m not afraid of writing a loop, but remember this is the first time I’ve ever written in Perl so it might have been a difficult task. However, when I tapped that Multi-line checkbox, boom, all of the lines in my example text were matched.
The purpose of using regex is usually to change something after you find it. In Patterns, there are two tabs: Match just for figuring out the regex, and Replace to check your replacement syntax. I also paid no attention when Helma showed it to me. I really should have because Patterns has a crazy cool option I never realized was there.
In my example, I’ve got the regex in a pane at the top. I put the replacement text in the next pane down. Below that shows the text I’m searching on the left, and the right shows the results of the entire regex including replacements. With it all working properly in there, I didn’t know how to write it all as one big fat regex. Where does that replacement text go? Where does that multi-line flag go?
I went to the web and did a big ol’ pile of searches to crack the code. (See what I did there?)
I later discovered to options in the upper right that are magical. There’s a pulldown for the language you’re going to be using and a button that says copy code. I set the language to Perl and hit copy code, and then pasted it and the entire syntax was all completely written for me!
$searchText =~ s/^(\d{2}\:\d{2}\.\d{3})/00:$1/gm;
I think Patterns is way underpriced at $2. I kept hoping as I was writing this up that it was a black Friday sale or something but it’s still only $2. Crazy inexpensive for something so capable.
Regex101
I want to give one more tip for writing regex: it’s a site called regex101.com. As I was building up this script, I was borrowing a lot of information from other sources and I didn’t always know what it meant. In regex101, it not only checks your regex syntax to see if it finds matches, like Expressions and Patterns, it also explains what you’ve written.
For example, if you type the hat (^) symbol, in the right sidebar it will write, “asserts position at start of a line”. There’s also a little question mark that you can hover over to reveal more information. The ^ explanation says “lines are delimited by \n”. Remember the story of how Helma figured out that the chapter file export from Hindenburg didn’t have proper Unix line feeds at the end and instead had legacy macOS carriage returns? Well, the fix in regex was to swap out \r for \n. If I’d really understood what regex101 was telling me, maybe I’d have thought to go check that file.
In the same bar where you enter your regex, way over on the right is a forward slash / and a little flag. When I was halfway through figuring out my regex, I noticed that it said /g and then the flag. I tapped on it and noticed the second option was multi-line and the explanation “^ and $ match start/end of line”. I know I said that I figured out to use multi-line using Patterns but now that I think about it, I think I actually found the flag thing in Regex101 and then looked for it in Patterns.
Regex101 also has a quick reference section where you can search for the secret regex code you desire, or you can scroll through all tokens, common tokens, general tokens and more. It’s really a pretty swell tool.
Bottom Line
If you’ve never needed to do an efficient search and replace repeatedly on text files, then these tools might not be for you. But if you’ve been curious or even intimidated by regex, then I hope I’ve given you some tools that will help you enjoy figuring them out. Now I’m excited that Bart and I have a little project we’re working on with backups of my web server that is crying out for a regex to be written.