Dijkstra's Algorithm in Google Sheets
2023-02-13 • 11 min read
In this article I will propose a possible implementation of Dijkstra's algorithm in Google Sheets using a single formula and without resorting to Google Apps Scripts.
We will first go over how the algorithm works to understand what it is exactly that we are trying to do and then I will propose a possible implementation. If you are already familiar with how the algorithm works, you can skip straight to the implementation.
How does the algorithm work?
The problem this algorithm aims to solve is to find the shortest path between a source node and all the other nodes in a weighted graph. It works by keeping track of the shortest known distance of all the nodes from the source node along with the last node via which the shortest known distance was achieved. We will keep track of this information in a 3xN table where N represents the number of nodes and 3 are the three rows: node, shortest distance (d), previous node (π).
Let's see an example. Consider the following graph:
We have to find the shortest path from node S to all the other nodes. We start by initializing the distance to S to 0 and the distance to all the other nodes to infinity.
Now starting from S we check the distances to reach the adjacent nodes and update the table.
Now we choose A which is the node with the shortest known distance among the unvisited nodes and update the table.
Now we choose E (which is the node with the shortest known distance among the unvisited nodes) and update the table.
We keep repeating this process until we have visited all the nodes. If we find a way to reach a node with a lower distance than what we have, we update the table to reflect that, similarly, if we find a way to reach a node with a greater distance than what we have, we keep the table as it is. Let's continue by visiting B.
And then H.
Going from H to E would cost us more than 5, so we only update the distance for G and proceed with visiting it.
The shortest known distance to reach D was 20 but we can reach D through G with a distance of 12 so we update the table to reflect that and proceed with visiting D.
From D we cannot go anywhere, so we leave everything as it is and proceed with visiting the remaining nodes starting from C since it's the one with the shortest known distance.
Now we visit F which is the remaining unvisited node.
Reaching B through F would cost us more than 6, so we leave the table unchanged.
And here's our final state:
If we remove the unneeded paths, we get the so-called "shortest-path tree":
Implementation
Generic implementation
For the implementation our goal is to generate the final table which has all the information we need for us to construct the shortest-path tree.
Let's start by transferring the graph into our spreadsheet. We will represent the structure of the graph using three columns: From, To and Cost.
The order of the rows is irrelevant as any order would be representing the same graph.
Now we take the unique nodes in the From and To columns. We can do this using TOCOL in combination with UNIQUE.
=UNIQUE(TOCOL(A2:B13,3))
We will be creating a Nx3 table instead of a 3xN because I prefer having the data organized by rows rather than by columns. This is a minor detail that will not interfere with the final result.
We can now proceed with the initialization of the table by creating the d and π columns.
=ARRAYFORMULA(IF(A18:A26=A16,0,9E+99)) //distances (d) column
This formula is simply checking if the Nodes column is equal to the starting node (S), if it is, it sets the distance to 0, otherwise it sets it to 9E+99 (this is just a very high number meant to simulate infinity).
Now we can create the π column:
=ARRAYFORMULA(IF(N(A18:A26)+1,"-")) // previous nodes (π) column
N(value) will either return 0 if value is a text value or value if value is a numerical value. Since in Sheets every non-zero number is truthy, by adding 1 we ensure that whatever it returns is a truthy value; this forces the IF to return a column of -, effectively initializing all the previous nodes.
Now we visit the unvisited node with the shortest known distance from the source and update the table. We will be appending a + next to the node name to indicate that it has been visited.
We start by filtering out the visited nodes and their distances.
=FILTER(A18:B26,RIGHT(A18:A26)<>"+")
So far we haven't visited any node, so the table returned by this formula is the same as our initial table.
Now that we have a table of the unvisited nodes, we can determine which one is the closest to the source using XLOOKUP combined with MIN:
=XLOOKUP(MIN(F18:F26),F18:F26,E18:E26)
Now we can filter from the graph (A2:C13) the nodes adjacent to the closest unvisited node.
=FILTER(B2:C13,A2:A13=G18)
We now have all the information we need to update the table.
Let's start by appending a + to the node we are visiting:
=ARRAYFORMULA(A18:A26&IF(A18:A26=G18,"+",))
Now we can apply the algorithm to update the table:
0. =ARRAYFORMULA(
1. LET(nodes,A18:A26,
2. d,B18:B26,
3. π,C18:C26,
4. adj_nodes,H18:H19,
5. adj_d,I18:I19,
6. new_d,IFNA(VLOOKUP(nodes,SORT({adj_nodes,adj_d},2,1),2,0),d)+MIN(F18:F26),
7. IF(new_d<d,{new_d,IF(N(nodes)+1,G18)},{d,π})))
Let's analyze the formula:
Lines 1-5 are self explanatory, we are creating the variables nodes, d, π, adj_nodes, adj_d and assigning them respectively A18:A26, B18:B26, C18:C26, H18:H19, I18:I19.
In line 6 we are creating the variable new_d which represents the new distances. The new distances are calculated by running a VLOOKUP where the nodes are the search key, the adj_nodes are the search column and the adj_d are the result column, if the VLOOKUP doesn't find a match, we return the distances. After running the VLOOKUP we add MIN(F18:F26) to the result which represents the cost of the node that is currently being visited.
In line 7 we are checking if the new distances (the ones we have just calculated) are less than the distances we currently have, if that's the case we update the table with the new distances and the node we are currently visiting, if that's not the case we leave the table unchanged.
Now we have to keep repeating this process until all the nodes have been visited.
We start by copying the range E17:I26 and pasting it in E28:
Then we copy A28:I37 and keep pasting it below until all the Nodes have a + appended to them.
All the copy-pasting, including the final table, can be replicated by a single formula using the REDUCE function. But first we will condense each step into one formula that only depends on our initial graph and the starting node. This is easily done by replacing all the ranges with the formulas that are generating them:
=ARRAYFORMULA(
LET(nodes,UNIQUE(TOCOL(A2:B13,3)),
d,IF(nodes=A16,0,9E+99),
π,IF(N(nodes)+1,"-"),
unv_nodes,FILTER(nodes,RIGHT(nodes)<>"+"),
unv_d,FILTER(d,RIGHT(nodes)<>"+"),
unv_closest,XLOOKUP(MIN(unv_d),unv_d,unv_nodes),
adj,SORT(FILTER(B2:C13,A2:A13=unv_closest),2,1),
new_d,IFNA(VLOOKUP(nodes,SORT(adj,2,1),2,0),d)+MIN(unv_d),
{nodes&IF(nodes=unv_closest,"+",),IF(new_d<d,{new_d,IF(N(nodes)+1,unv_closest)},{d,π})}))
You may have noticed that the very first step of initializing the table is unnecessary since by knowing the starting node we can directly proceed with visiting it as it's guaranteed to be the one with the shortest distance. I have added that step for completeness.
Now that we have one formula that depends on the initial graph and the starting node, it's easy to turn it into a REDUCE formula that keeps applying the algorithm until all the nodes have been visited. Recall the syntax of REDUCE:
REDUCE(initial_value, array_or_range, LAMBDA(accumulator, current_value, ...))
Our initial_value is the initialized table that we have done in the very first step, which consists of {nodes,d,π}.
The array_or_range argument will only be used to tell REDUCE how many iterations to run. We have to visit each node once, so the number of iterations depends on the number of nodes, which means that we can just provide nodes as the second argument.
REDUCE({nodes,d,π}, nodes, LAMBDA(accumulator, current_value, ...))
And the entire formula above will be our LAMBDA, with the only difference that nodes, d and π will depend on the accumulator, specifically:
_nodes will be INDEX(acc,,1)
_d will be INDEX(acc,,2)
_π will be INDEX(acc,,3)
I have prepended an underscore _ before the variables to indicate that the _nodes, _d and _π inside our LAMBDA, apart from the first iteration, are not the same nodes, d and π that we've provided as our initial_value, rather, they depend on the result of the LAMBDA computation.
If we put everything together, the formula to generate the final table is the following:
=ARRAYFORMULA(
LET(nodes,UNIQUE(TOCOL(A2:B13,3)),
d,IF(nodes=A16,0,9E+99),
π,IF(N(nodes)+1,"-"),
REDUCE({nodes,d,π},nodes,LAMBDA(acc,i,
LET(_nodes,INDEX(acc,,1),
_d,INDEX(acc,,2),
_π,INDEX(acc,,3),
unv_nodes,FILTER(_nodes,RIGHT(_nodes)<>"+"),
unv_d,FILTER(_d,RIGHT(_nodes)<>"+"),
unv_closest,XLOOKUP(MIN(unv_d),unv_d,unv_nodes),
adj,SORT(FILTER(B2:C13,A2:A13=unv_closest),2,1),
new_d,IFNA(VLOOKUP(_nodes,SORT(adj,2,1),2,0),_d)+MIN(unv_d),
{_nodes&IF(_nodes=unv_closest,"+",),IF(new_d<_d,{new_d,IF(N(_nodes)+1,unv_closest)},{_d,_π})})))))
Now that we have the final table we can pretty much do anything with it. So let's construct the shortest-path tree.
Constructing the shortest-path tree
We know the shortest distance from S to every other node but we also know how to reach every other node, that's the purpose of the π column. Let's see an example of how we can do that.
Recall the final table:
Let's say we want to reach E, we consult the table, find E and see what it previous node (π) is, in our case it's A, so we get the path A->E. Now we see what's the previous node for A, in our case it's S, so we have S->A->E. Since S is our starting node we stop here and we conclude that the shortest path from S to E is S->A->E.
Let's translate this to Google Sheets formulas. We will again be using REDUCE.
If we call nodes the nodes column and f_π the previous nodes column of the final table, the formula that does what we have just seen is:
REGEXEXTRACT(REDUCE(target,nodes,LAMBDA(acc,_,XLOOKUP(IFNA(REGEXEXTRACT(acc,"(.+?)->"),acc),nodes,new_π&"->"&acc,acc))),"->(.+)")
This is a repeated XLOOKUP where at each iteration the search_key is everything that comes before the first occurence of ->.
To get the cost, the formula is:
=XLOOKUP(target,nodes,new_d)
And finally, to iterate across all the nodes we can use the MAP function.
MAP(nodes,LAMBDA(target,LET(
path,REGEXEXTRACT(REDUCE(target,nodes,LAMBDA(acc,_,XLOOKUP(IFNA(REGEXEXTRACT(acc,"(.+?)->"),acc),nodes,new_π&"->"&acc,acc))),"->(.+)"),
{IF((path=target)*(path<>source),source&"->"&path,path),
XLOOKUP(IFNA(REGEXEXTRACT(path,".+->(.+)"),IF(path=source,path)),nodes,new_d,9E+99)})))
If we put everything together, this is what we get:
=ARRAYFORMULA(
LET(nodes,UNIQUE(TOCOL(A2:B13,3)),
d,IF(nodes=A16,0,9E+99),
π,IF(N(nodes)+1,"-"),
table,REDUCE({nodes,d,π},nodes,LAMBDA(acc,i,
LET(_nodes,INDEX(acc,,1),
_d,INDEX(acc,,2),
_π,INDEX(acc,,3),
unv_nodes,FILTER(_nodes,RIGHT(_nodes)<>"+"),
unv_d,FILTER(_d,RIGHT(_nodes)<>"+"),
unv_closest,XLOOKUP(MIN(unv_d),unv_d,unv_nodes),
adj,SORT(FILTER(B2:C13,A2:A13=unv_closest),2,1),
new_d,IFNA(VLOOKUP(_nodes,SORT(adj,2,1),2,0),_d)+MIN(unv_d),
{_nodes&IF(_nodes=unv_closest,"+",),IF(new_d<_d,{new_d,IF(N(_nodes)+1,unv_closest)},{_d,_π})}))),
new_d,INDEX(table,,2),
new_π,INDEX(table,,3),
MAP(nodes,LAMBDA(target,LET(
path,REGEXEXTRACT(REDUCE(target,nodes,LAMBDA(acc,_,XLOOKUP(IFNA(REGEXEXTRACT(acc,"(.+?)->"),acc),nodes,new_π&"->"&acc,acc))),"->(.+)"),
{path,IF(A16=path,0,XLOOKUP(REGEXEXTRACT(path,".+->(.+)"),nodes,new_d,9E+99))})))))
We can also create a Named Function.
Named function
Calculates the shortest path from the source node to all the other nodes using the Dijkstra's algorithm.
Syntax
_DIJKSTRA(from, to, cost, source)
Usage example
=_DIJKSTRA(A2:A13, B2:B13, C2:C13, "S")
Formula definition
=SORT(IF(OR(ROWS(from)<>ROWS(to),ROWS(from)<>ROWS(cost),ROWS(to)<>ROWS(cost)),NA(),LET(nodes,UNIQUE(TOCOL({from,to},3)), d,IF(nodes=source,0,9E+99), π,IF(N(nodes)+1,"-"), table,REDUCE({nodes,d,π},nodes,LAMBDA(acc,i,LET( _nodes,INDEX(acc,,1), _d,INDEX(acc,,2), _π,INDEX(acc,,3), unv_nodes,FILTER(_nodes,RIGHT(_nodes)<>"+"), unv_d,FILTER(_d,RIGHT(_nodes)<>"+"), unv_closest,XLOOKUP(MIN(unv_d),unv_d,unv_nodes), adj,SORT(FILTER({to,cost},from=unv_closest),2,1), new_d,IFNA(VLOOKUP(_nodes,SORT(adj,2,1),2,0),_d)+MIN(unv_d), {_nodes&IF(_nodes=unv_closest,"+",),IF(new_d<_d,{new_d,IF(N(_nodes)+1,unv_closest)},{_d,_π})}))), new_d,INDEX(table,,2), new_π,INDEX(table,,3), paths,MAP(nodes,LAMBDA(target,LET( path,REGEXEXTRACT(REDUCE(target,nodes,LAMBDA(acc,_,XLOOKUP(IFNA(REGEXEXTRACT(acc,"(.+?)->"),acc),nodes,new_π&"->"&acc,acc
))),"->(.+)"), {IF((path=target)*(path<>source),source&"->"&path,path),XLOOKUP(IFNA(REGEXEXTRACT(path,".+->(.+)"),IF(path=source,path)),nodes,new_d,9E+99)}))),paths)),2,1)
Argument placeholders
- from - The column containing the nodes that the arrows start from.
- to - The column containing the nodes that the arrows point to.
- cost - The weight of the paths.
- source - The source node.
You can import the _DIJKSTRA function from here.
And that's it. Thanks for reading.