Coding the Impossible: Palindrome Detector with a Regular Expressions

By Tony Tonev

Tony Tonev
Photo by Andre Mouton on Unsplash

I came across a Stack Overflow question which asks how to check if a string is a palindrome using regular expressions. The top answer with 147 upvotes points out the it is impossible, so there’s no point to even try. Well, technically he’s right for arbitrary-length palindromes, but that doesn’t mean we can’t make one for palindromes up to a maximum length. As a side note, there are much easier ways to check for palindromes, such as to reverse the string and compare it to the original. So why bother? For the mental challenge and to work out my regex skills. I wanted it to work on my favorite palindrome: “A man, a plan, a canal, Panama!” so I wrote a regular expression which detects palindromes up to 22 characters ignoring tabs, spaces, commas, and quotes.

Here’s the solution

\b(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*\11?[ \t,'"]*\10|\10?)[ \t,'"]*\9|\9?)[ \t,'"]*\8|\8?)[ \t,'"]*\7|\7?)[ \t,'"]*\6|\6?)[ \t,'"]*\5|\5?)[ \t,'"]*\4|\4?)[ \t,'"]*\3|\3?)[ \t,'"]*\2|\2?))?[ \t,'"]*\1\b

You’re probably thinking that either a cat walked on my keyboard or this is the secret incantation to summon Cthulhu in some long forgotten tongue. Nope, it’s actually a valid regex. Feel free to check it using regexr.com, which is an awesome tool to instantly see the results of your regex’s and explains what each part does. I’ll walk you through how I got to it so it makes sense.

First things first, we need to detect if a character is repeated.

(.)\1
The text matching the regex is highlighted.

Easy enough. Two characters repeated? The new part is in bold.

Old: (.)     \1
New: (.)(.)\2\1

OK, this only works for palindromes that are exactly 4 characters long. We can make it work with 3 characters palindromes if we make the second look-back optional.

Old: (.)(.)\2 \1
New: (.)(.)\2?\1

Cool, now we can make it work for 2 character strings by making the inner part optional. The (?: … ) creates a non-capturing group.

Old: (.)   (.)\2?  \1
New: (.)(?:(.)\2?)?\1

We’re on a roll, let’s make it longer.

Old: (.)(?:(.)           \2?)?\1
New: (.)(?:(.)(?:(.)\3?)?\2)?\1

Uh oh… We managed to detect 5 and 6 character strings, but we lost 3 character strings. The problem is \2 is not optional, and so for example, with aba when it gets to the 3rd char, it’s looking for a second b char, and not finding it. This is still a valid palindrome since the b is exactly in the middle of the string. All odd number palindromes will have a single, non-repeated character in the very middle. We can fix it. Let’s make the \2 optional!

Old: (.)(?:(.)(?:(.)\3?)?\2 )?\1
New: (.)(?:(.)(?:(.)\3?)?\2?)?\1

Yay! Wait… now abca is a false positive 🤦. Now here is the trickiest bit of all. If you’re up for a challenge, take a moment to try to figure out how you would fix it before scrolling down to see the answer.

Photo by Juan Rumimpunu on Unsplash

We really wish we could use an if statement to replace \2 with \2? only if there is no \3, but regex has no if statements… BUT, it does have an OR operator which we can use to get the same result.

Old: (.)(?:(.)(?:(.)\3?)? \2? )?\1
New: (.)(?:(.)(?:(.)\3?\2|\2?))?\1

Jackpot! It’s a subtle change that makes all the difference. On one side of the OR I have (.)\3?\2 which means “if there’s a new inner char (or two of the same), we must repeat group two” and on the other side we have \2? which means “if there is no new inner char, optionally repeat the second group.” The second case indicates that we are currently in the very middle of the palindrome, so it’s OK to have only one of the middle char, or there could be two. That’s why we need the ?. Otherwise, group 2 must be present for it to be a valid palindrome. Can we keep going? Yup!

Old: (.)(?:(.)(?:(.)            \3? \2|\2?))?\1
New: (.)(?:(.)(?:(.)(?:(.)\4?\3|\3?)\2|\2?))?\1

Nice! I simply replaced \3? with (?:(.)\4?\3|\3?), which is the same logic as before. Now we can keep replacing the inner group to make a regex palindrome detector of any length! The only catch is the length of regex will also keep growing, since we can’t write a loop inside a regex. Let’s make one big enough to detect my target sentence.

(.)(?:(.)(?:(.)(?:(.)(?:(.)(?:(.)(?:(.)(?:(.)(?:(.)(?:(.)(?:(.)\11?\10|\10?)\9|\9?)\8|\8?)\7|\7?)\6|\6?)\5|\5?)\4|\4?)\3|\3?)\2|\2?))?\1

Even though whitespace is not ignored in regex, I’ve added new lines and indentation to make this mess a little easier to read, but remember the top expression is the one that works.

(.)(?:
(.)(?:
(.)(?:
(.)(?:
(.)(?:
(.)(?:
(.)(?:
(.)(?:
(.)(?:
(.)(?:
(.)\11\10|\10?)
\9|\9?)
\8|\8?)
\7|\7?)
\6|\6?)
\5|\5?)
\4|\4?)
\3|\3?)
\2|\2?))?
\1

It works! Now the hard part is done. For a victory lap, let’s make the palindrome character only alphanumeric and independent words. I’m using the case-insensitive flag.

\b(\w)(?:(\w)(?:(\w)(?:(\w)(?:(\w)(?:(\w)(?:(\w)(?:(\w)(?:(\w)(?:(\w)(?:(\w)\11?\10|\10?)\9|\9?)\8|\8?)\7|\7?)\6|\6?)\5|\5?)\4|\4?)\3|\3?)\2|\2?))?\1\b

I replaced every . with \w and I wrapped the expression in \b so inner parts of words will be excluded, like in Panama. To support phrases let’s make it ignore spaces, tabs, commas, and quotes.

\b(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*(?:(\w)[ \t,'"]*\11?[ \t,'"]*\10|\10?)[ \t,'"]*\9|\9?)[ \t,'"]*\8|\8?)[ \t,'"]*\7|\7?)[ \t,'"]*\6|\6?)[ \t,'"]*\5|\5?)[ \t,'"]*\4|\4?)[ \t,'"]*\3|\3?)[ \t,'"]*\2|\2?))?[ \t,'"]*\1\b

I simply inserted a bunch of [ \t,’”]* expressions between every char outside of the capture groups, so those characters can exist in the string without needing to be mirrored on the other side.

And there we have it! Mission accomplished. They said it was impossible! … and they were right, it is impossible for a finite regular expression to detect palindromes of arbitrary length, but we can still make a limited one for fun!

Your homework assignment is to write a function that takes a length, and outputs a regex that can detect palindromes up to that length.