string-match and dealing with state

(or, How I Spent a Few Hours Wrestling with Emacs)

I encountered some strange behavior and errors when calling replace-regexp-in-string in some Emacs code today.

My code was passing a function as the REP argument to replace-regexp-in-string. Here’s a similar, simplified example:

Fine, that’s what we expected. Now let’s change how mask-ssn works. It should still do the same thing, but does it?

Whoa, that’s not right at all!

What happened? Well, there’s a while loop in replace-regexp-in-string that calls string-match, then calls the REP function, and then uses the result in a call to replace-match. Since mask-ssn also calls string-match (via split-string), this messes up the next call to replace-match in the loop.

Figuring this out took a while, as my actual function was way more complicated than mask-ssn. I didn’t even suspect string-match to be the problem initially, so I went through the process of taking everything out and adding code back in line by line, chunk by chunk, call by call, until I pinned it down. In addition to weird string replacements, I also sometimes got an “args-out-of-range” error, depending on the size of the strings and the various functions using string-match that I experimented with calling.

There IS a solution, happily: save-match-data, which is better documented here than in the Emacs help. In a nutshell, the macro saves the current match state, evaluates the forms inside the body, and then restores the state afterwards.

That does the trick.

I discovered this only after going through the trouble of building a replacement to replace-regexp-in-string.

I felt pretty happy with myself after writing that. Since it doesn’t call any match functions after calling REP, all is well.

But then I thought about it some more. my-replace ITSELF is changing match data, so any callers will suffer from the same problem. D’OH. So we still need to wrap our code in save-match-data, if we’re going to be responsible.

I think save-match-data illustrates an interesting “pattern” of sorts. It deals with a problem of state, which is unavoidable, as we’re concerned with needing to track searches of text buffers. Since we HAVE to store this state, we can’t do it in a clean, purely functional way.

save-match-data handles this by saving the state of the string match in progress, probably pushing it on a stack somewhere (I didn’t dive too deep), letting the wrapped code manipulate the world, and then popping the original state off the stack to restore it afterwards. From the outside, it seems as if no state has changed, between the point at which you enter save-match-data and the point at which you exit. This allows for nested code, any number of call levels deep, to do what it needs, without interfering with other levels, as long as the relevant chunks of code are wrapped in the macro.

Pretty cool.