First off, for those readers who aren’t programmers, it’s important to define what are the (in)famous regular expressions. According to the Regex Crossword FAQ…
A regular expression (regex or regexp for short) is a special text string for describing a search pattern. (…) For example, if you wanted to find all references to the name "Casper" but using all the different ways of spelling it, you could use the regular expression [CK]asp[ae]r, which will match both "Casper", "Caspar", "Kasper" and "Kaspar".
The thing is: this definition by no means reflects how insanely difficult it is to learn them. There is no other way than repeatedly using them until you eventually start memorizing them, a process that is particularly arduous and unintuitive. This is what Regex Crossword aims to prevent, by hosting a series of Sudoku-like puzzles to help you master them; it won’t be an easy process still, but at least it may be more fun…
On the left you can see a typical puzzle, in which you must carefully observe the regexs that represent every row or column, and write a letter from a selected list that matches both criteria. You can always press the "Help" button at the top right of the website to visualize a diagram with all the information you need. As you can see, it’s easy to understand the rules, but hard to master them.
The website features several sections to make the levels as varied as possible. There is also another area which includes levels made by other users, along with a stats page. Also, if you check the Help and FAQ section, you will be recommended other tools and online resources in case you want to learn a bit more about regexs. Don’t forget to use an account so that your progress on the levels can be saved.
Finally, although this project is "something we do for fun", you can donate via PayPal or several cryptocurrencies (check the Help and FAQ section to see which ones are available) to help with hosting expenses and to keep ensuring further improvements and levels.
Visit Regex Crossword via the following link.
Weird. The definition sounds like a straightforward, fairly intuitive thing. What makes them so hard?
The infinity combinations you can get and that they aren't easy to read. IMO is one of the most powerful tool for text search but also a dangerous one (the day you create your RE it totally makes sense and it is hyper intuitive... six weeks later you want to cry).
I see. So the cautionary tale for young programmers is, if you aren't solid in your commenting and variable names, you'll be condemned to use this stuff.Weird. The definition sounds like a straightforward, fairly intuitive thing. What makes them so hard?
The infinity combinations you can get and that they aren't easy to read. IMO is one of the most powerful tool for text search but also a dangerous one (the day you create your RE it totally makes sense and it is hyper intuitive... six weeks later you want to cry).
I see. So the cautionary tale for young programmers is, if you aren't solid in your commenting and variable names, you'll be condemned to use this stuff.Weird. The definition sounds like a straightforward, fairly intuitive thing. What makes them so hard?
The infinity combinations you can get and that they aren't easy to read. IMO is one of the most powerful tool for text search but also a dangerous one (the day you create your RE it totally makes sense and it is hyper intuitive... six weeks later you want to cry).
I did some commenting in few regex I had to do at work the other day... and two days later when I revisit them it wouldn't work. Turned out with all the rush to leave the office I had confused some symbols and had messed it xD
Nice. Next step - regex golf! https://xkcd.com/1313/
This might have been inspiration for the comic: https://alf.nu/RegexGolf
Oh man, finally a -hopefully- way to learn better regex!
I see. So the cautionary tale for young programmers is, if you aren't solid in your commenting and variable names, you'll be condemned to use this stuff.Weird. The definition sounds like a straightforward, fairly intuitive thing. What makes them so hard?
The infinity combinations you can get and that they aren't easy to read. IMO is one of the most powerful tool for text search but also a dangerous one (the day you create your RE it totally makes sense and it is hyper intuitive... six weeks later you want to cry).
I did some commenting in few regex I had to do at work the other day... and two days later when I revisit them it wouldn't work. Turned out with all the rush to leave the office I had confused some symbols and had messed it xD
The linked site, https://regex101.com/ is actually nice for situations like that. It parses the regular expression into components and explains what they do.
As for the crossword itself, it's actually doable and I completed almost all the official puzzles, only the hexagonal one is unfinished.
Weird. The definition sounds like a straightforward, fairly intuitive thing. What makes them so hard?Unlike programming languages, you don’t have meaningful variable or function names to rely on. It’s just symbols and a lot of nesting of rules and patterns. So it’s about a legible as minified code, if not less.
For example, this very simple pattern will match semi-colons that appear at the start or end of a line, or are followed by another semi-colon (I use it to clean up a list after removing duplicate entries):
^;+|;(?=;)|;$
Every single character has significance, a purpose of its own. Namely:
<start of line><semi-colon><1 or more><OR><semi-colon><start of group><look ahead (without including in results)><match><semi-colon><end of group><OR><semi-colon><end of line>
It’s very hard to come up with the exact pattern you are trying to match (without getting false positives), and often as hard to remember what it’s trying to match if you come back to it after the fact. If someone else made it and you don’t even know what it’s looking for, you have to decipher it piece by piece.
Last edited by Salvatos on 1 Apr 2020 at 7:29 pm UTC
"Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. "
There are some things in life that are easy to understand, and there are some things that are difficult. The really tricky ones are those that you think you understand perfectly, but it turns out you couldn't be more wrong. Regexes are like that for me. The rules all make perfect logical sense, but whenever I try to follow them I come up with something that totally fails to work as expected.Weird. The definition sounds like a straightforward, fairly intuitive thing. What makes them so hard?
The infinity combinations you can get and that they aren't easy to read. IMO is one of the most powerful tool for text search but also a dangerous one (the day you create your RE it totally makes sense and it is hyper intuitive... six weeks later you want to cry).
Fortunately, we have things like txt2regex. (The output from which always leaves me saying, “But that's what I did! Isn't it?”. Oh, well. It works.)
This is pretty straighforward as far as regex goes, but I'm struggling with crap like this while editing a wiki. Not least because I need about 50 rules probably, not all looking like this. So yeah, a million things can go wrong. Thankfully there are excellent sites such as https://regexr.com/ (LOVE the cheatsheet on the left), that make it a bit more tolerable.
(\+?\d*\s?[–-]?\s?\d*\s?%?\s?)(\{\{Bleeding3\}\})
(\+?\d*%?)(\s?[–-]\s?)(%?\s)(\{\{\w*\}\})
Weird. The definition sounds like a straightforward, fairly intuitive thing. What makes them so hard?The main point of regexes is that they can use wildcards—symbols which don't match a literal symbol, but any of a (possibly very large set) of (possibly combinations of) literal symbols. (Like using '*' on the command line, but much more powerful and complex.) Using them for any but the simplest tasks thus usually requires a lot abstract, symbolic thought, and generally people aren't too good at being able to follow the combinatorially-explosive number of consequences that can come from changing a single symbol in the regex.
Like, imagine scanning a webpage to find a company's mailing address. As humans, we've got a lot of in-built tolerance for what an "address" might look like—depending on where the company is in the world it might have more or fewer fields than would be the norm in our home country, maybe the street or city name contains multiple separate words, etc. But we have a mental "prototype" of roughly what an address looks like and can use fuzzy logic to recognize things which, while deviating from the strict image of that prototype, are still addresses. Now imagine trying to get a computer to do the same thing; regexes allow you to encapsulate that insane amount of flexibility via (appropriately complex) combinations of wildcards. (Actually attempting to set up a regex that could handle addresses sounds like an unutterable nightmare. :O)
Guest Writer
February 2016 - September 2016
June 2019
December 2019 - April 2020
Contributing Editor
September 2016 - July 2017
Opinions at the moment of writing the articles were mine, though in some cases contents were edited or critical information was added by GOL Editors before approval.
See more from me