Parser combinators are essentially a neat and powerful way of declaratively writing a parser, which could be used for writing your own programming language, or DSL. You make small parsers that act like building blocks and then combine them to create more complex parsers until you have one that encompasses them all. There are quite a few parser libraries, so you can probably use your programming language of choice with all its power and expressiveness to write your grammar rules.
I’ve been meaning to look at parser combinators for a while, ever since I read a Fog Creek blog post about using the fparsec library in F#. Being able to search specifying field names via free-text seems to be quite popular – Github does it too.
I’m glad I saw this, because recently at work we had the same challenge of allowing users to search in real-time through many rows of data. We wanted to allow them to specify individual fields so that searches could be more granular. We also wanted to allow them to omit the field names and search through all of the possible fields for speed.
I’ve created a sample repo that demonstrates the end effect. A list of countries are displayed with their name and capital city. The search box can be used to search via name, capital city or both. If you don’t want to clone and run the repo, this is what it looks like:
The code that actually defines the parser is reproduced in a gist:
Hopefully it’s fairly clear what the code is doing, but I’ll break it down anyway.
At the top level the parser is composed of alternatively (Parsimmon.alt) “field” matches or “any” (field being when a user specifies field names and any when they don’t care which field the value is in). First the parser tries to match at least one field name and search values, but if it can’t then it will use Parsimmon.all, which simply matches the whole input text.
The individual fields are found by matching a sequence (Parsimmon.seq) of parsers. We have optional whitespace, the field name, optional whitespace, a colon character, optional whitespace, a regular expression to match any characters apart from whitespace to make up the search value, and finally more optional whitespace.
I hope you agree that the code is more readable, terse and extendable than manually splitting the string, disregarding whitespace and pulling out the intended search values.