How to Use Column Names in QUERY - Google Sheets
2022-11-12 • 4 min read
In this article I will explain how to refer to the columns with their header when using the Google Sheets QUERY function.
We will be doing this by creating an equivalent Named Function _BETTERQUERY() which you can import from here.
Named Function
Usage example
=_BETTERQUERY(A1:B7, "select `name` where `age` > 18", 1)
Formula definition
=QUERY({range},IF(IFERROR(SPLIT(better_query,"`")=better_query,1),better_query,REGEXREPLACE(REDUCE(better_query,REGEXEXTRACT(better_query,REGEXREPLACE(REGEXREPLACE(better_query,"([()\[\]{}|\\^$.+*?])","\\$1"),"`(.*?)`","`($1)`")),LAMBDA(acc,cur,SUBSTITUTE(acc,cur,IFNA("Col"&MATCH(cur,INDEX(range,1),0),cur),1))),"`(Col\d+)`","$1")),headers)
Argument placeholders
- range - The range of cells to perform the query on.
- better_query - The query to perform, written in the Google Visualization API Query Language. The columns can either be referred to with their header or with the Col# notation.
- headers - The number of header rows at the top of data. (Note that this parameter is not optional since named functions do not currently allow optional parameters)
Explanation
The main goal of the _BETTERQUERY() Named Function is to turn a query from this:
select `name` where `age` > 18
to this:
select Col1 where Col2 > 18
So our built-in QUERY function can understand it and execute it.
We will be doing this in three simple steps.
Step 1 - Extract everything between ` backquotes `
The first step is to extract everything between a chosen delimiter. Our chosen delimiter is a backquote but the choice of the delimiter is not important as long as it is not a REGEX or QUERY special character.
In order to extract the desired substrings, we must first capture them, which means to enclose them in round brackets. The function we will be using for this is REGEXREPLACE.
=REGEXREPLACE(D1,"`(.*?)`","`($1)`")
Now we can easily extract the captured subtrings with REGEXEXTRACT.
=REGEXEXTRACT(D1,D2)
The reason we first capture the substrings and then extract them and do not extract them directly is because REGEXEXTRACT() is designed to only return the first match so if we tried to extract directly from D1 it would only return name which is not what we want.
Step 2 - Substitute the column names with their corresponding column position
This step is the most challenging as it relies on an advanced function which most people are unfamiliar with. (See REDUCE)
=REDUCE(D1,D3:E3,LAMBDA(acc,cur,SUBSTITUTE(acc,cur,"Col"&MATCH(cur,A1:B1,0),1)))
The way REDUCE works is by repeatedly applying the same formula but with a variable parameter (known as current value) to a given value (known as accumulator). The accumulator is updated at each iteration with the result of the previous computation.
In our example REDUCE is equivalent to the following two formulas:
=SUBSTITUTE(D1,D3,"Col"&MATCH(D3,A1:B1,0),1) \\ background
=SUBSTITUTE(D6,E3,"Col"&MATCH(E3,A1:B1,0),1) \\ returned value
REDUCE only returns the result of the last computation — which I highlighted in green — where the amount of computations depends on the size of its second parameter D3:E3 (two in our case).
Tip
SCAN works in a very similar way but instead of only returning the result of the last computation, it returns all of them.
Step 3 - Remove backquotes
The final step is to remove the backquotes. We will be doing this with REGEXREPLACE.
=REGEXREPLACE(D4,"`(Col\d+)`","$1")
Note that in our case this step can be skipped since `Col#` is not syntactically wrong and it's equivalent to Col#. This would not be true however if we had chosen a different delimiter.
Final formula
If we put everything together we get our final formula:
=REGEXREPLACE(REDUCE(D1,REGEXEXTRACT(D1,REGEXREPLACE(D1,"`(.*?)`","`($1)`")),LAMBDA(acc,cur,SUBSTITUTE(acc,cur,"Col"&MATCH(cur,A1:B1,0),1))),"`(Col\d+)`","$1")
Where D1 is the provided query and A1:B1 is the header row of the provided data.
Note
The formula you see in Named Function is slightly more complicated as it implements error handling.
That's all. Thanks for reading.