Goto Definition functionality in IDEs is essential to modern software development. It works beautifully when used inter-procedurally to navigate to functions, methods, data specifications, and other top-level declarations.
When used locally on variables, Goto Definition leaves much to be desired. Many language servers navigate to declarations of variables. If the language allows conditional definitions or definitions by first assignment, then we end up in the first or the last assignment. This is unhelpful:
- In typed languages with declarations, the information revealed at going to the declaration is redundant as the hover functionality typically already shows the relevant information without jumping around.
- In all languages, the first assignment is rarely useful unless the variable is used as a constant. The last assignment can be useful, but also misleading when the control flow is not linear due to conditionals and loops. In essence, they use a subtly wrong notions of first and last.
We propose an alternative that we shall call Goto Assignment that takes us to the most recent assignments (plural!) following the control flow of the program. Here’s the navigation in action.
The rest of this post first explores what is wrong with local Goto Definition and gradually builds up to Goto Assignment. Then we examine how this navigation is intimately related to SSA. Despite the conceptual clarity of SSA, it is not the most practical implementation strategy, so we sketch the Goto Assignment computation using a (simple) Haskell tree walk. Finally, we discuss how this feature might manifest in more sophisticated languages and some natural extensions to the idea for a more fine-grained experience.
You can find an LSP server implementation in Haskell on a simple While language as well as a VSCode extension that talks to the LSP server in this repo and try it for yourself.
Everything that is wrong
Let’s arrive at a better Goto Definition through examples. We don’t explicitly
introduce the While language. It is a standard imperative language. To aid the
presentation, we shall use GTA x
statement to point to the
definition/assignment of x
at the time the statement is used.
Consider the following conditional initialisation:
= 0
x :
if rand {
GTA x
= 42
x :}
GTA x
If you were to navigate to the definition of x
from the GTA x
at the end,
depending on the language server implementation, we’d end up either at x := 0
or x := 42
. Neither are helpful. Depending on what rand
evaluates to, x
can be either of those things.
So perhaps, the answer is to list all assignments to the variable? Not quite.
We wouldn’t want GTA x
within the if
statement to take us to x := 42
.
So how about all preceding assignments? That’d contain all the information to
understand what x
is, but it’ll generally contain too many irrelevant
assignments. Consider a linear sequence of assignments:
= 0
x := x + 1
x := x + 1
x := x + 1
x := x + 1
x := x + 1
x := x + 1
x :
GTA x
If code navigation offers all of these assignments as options, it’d be pretty annoying. While debugging, we often observe some outcome (perhaps through logs or tests) and would like to work backwards to find what could have led to that outcome. Seeing all possible assignments doesn’t help us with this at all! It’d be better to follow the chain backwards gradually.
What’s worse is this would also give us the “dead” assignments. Particularly in testing code, variables are often reassigned from scratch for each test case. For example,
= 0
given := 42
expected :
# test with given and expected
= 1
given := 43
expected :
# test with given and expected
GTA given
GTA expected
We wouldn’t want navigation results to include given := 0
and expected := 42
. As those assignments are dead.
Then what we want is a set of most recent assignments. But what is recent (and I promise I’m not being academic)? Consider the following while loop,
= 42
x :while x {
GTA x
= x - 1
x :}
In this example, the most recent assignments to x
at the point the definition
is requested include x := 42
, but it also includes x := x - 1
. So we are
not working with a spatial, but a temporal notion.
All in all, with all of these examples, we reach the conclusion that to effectively navigate assignments to a binder, we need to
- account for control flow
- potentially present multiple options
- only present the most recent options
Relationship to Single Static Assignment
Static single assignment form is an intermediate representation where each assignment is to a unique variable. This simplifies program analysis and correspondingly many optimisations.
What does SSA have anything to do with code navigation? As it turns out, if the program is in the SSA form, computing what I described in the previous section is trivial.
Let’s see what a sequence of assignment in, SSA-form looks like.
= 0
x :GTA x
= x + 1
x :GTA x
= x + 1
x :GTA x
becomes
= 0
x1 :GTA x1
= x1 + 1
x2 :GTA x2
= x2 + 1
x3 :GTA x3
Finding the most recent assignment is trivial. We simply look up the unique assignment for the target binder.
SSA deals with conditionals via phi nodes these are used to consolidate unique assignments from different branches into the same variable. For example,
if rand {
= 1
x :} else {
= 2
x :}
GTA x
becomes
if rand {
= 1
x1 :} else {
= 2
x2 :}
= phi(x1, x2)
x3 :GTA x3
The assignment involving the phi node does not have a corresponding assignment in the source code, but it provides exactly the necessary information to collect the most recent assignments. When we want to navigate to assignments,
- if there is a proper (unique) assignment, we make a note of it,
- if it is a pseudo assignment involving a phi node, then we follow the assignments for the arguments of the phi node and repeat the process
in the end, we end up with a collection of most recent assignments that can be mapped back to assignments in the source code.
The exploration of the idea through SSA makes the concept crystal clear in my mind. Although it is not a measure for “correctness”, it strenghtens my belief that navigating in this way is somehow meaningful or natural.
Sketching an implementation in Haskell
As tempting as it is to provide an implementation via conversion to the SSA form and back, it would be arduous as we’d need to render the program as a graph first, compute the dominance frontiers, do renaming based on that information, and finally compute the Goto Assignment table.
It is much simpler to maintain an environment consisting an assignments-in-effect table and a Goto-Assignment table. We then populate these while walking an abstract-syntax tree (AST). We now explore how to do that in a generic way that yields the first and last assignment implementations for free.
Let’s first look at the AST of the While language.
type Program id = Block id
type Block id = [Statement id]
data Statement id where
Assignment (Variable id) (Expression id)
| Ite (Expression id) (Block id) (Block id)
| While (Expression id) (Block id)
data Expression id =
EConst Int
| EVar (Variable id)
| EPlus (Expression id) (Expression id)
In While programs, we have assignment, if-then-else, and while statements in a list. The exressions are constants, variables, and addition. We abstract over identifiers which can be a numbering of nodes or simply positions.
Walking the AST
We now traverse a given program and update tables within the environment in the
process. The first higher-order function parameter choose
definitions and the
second one union
s them to handle join points in the control flow.
goToDefSimple :: forall id.
Ord id
=> (S.Set id -> S.Set id -> S.Set id)
-> (S.Set id -> S.Set id -> S.Set id)
-> Program id
-> GoToDef id
Skipping over block folding, let’s focus on statement handling.
We start with assignments first:
@(defsInEffect, _) (Assignment var@(Variable id str) expr) =
goSt env
(defsInEffect', goToDefTbl')where
= chooseDef str (Set.singleton id) defsInEffect
defsInEffect' = (defsInEffect', goExpr env expr)
env' = goVar env' var goToDefTbl'
We first handle the right-hand side (RHS) of the assignment using the input
environment (since the assignment hasn’t occurred yet). This extends the Goto
Assignment table by looking up the assignment in effect and adding entries for
the id
s of the expressions on the RHS.
Then we choose
between the assignment being analysed and the definition in
the environment.
Then we look at the conditionals:
@(defsInEffect, _) (Ite e th el) =
goSt env
(unionDef defsInEffectTh defsInEffectEl, goToDefTbl'')where
= goExpr env e
goToDefTbl = goBlock (defsInEffect, goToDefTbl) th
(defsInEffectTh, goToDefTbl') = goBlock (defsInEffect, goToDefTbl') el (defsInEffectEl, goToDefTbl'')
For the condition, we extend the Goto Assignment table just as we did for the
RHS of the assignment. Then we evaluate the then
and else
blocks extending
the Goto Assignment as we go along. For both blocks we use the input
assignments in effect. Note that the assignments-in-effect table is not
threaded and identical in each branch as the assignments in these blocks are
independent and not visible to each other.
Crucially, we union
the assignments-in-effect tables produced by each block
on exit to account for the join points.
Finally, we look at the most challenging construct, i.e., the while loops:
While e body) = drive env
goSt env (where
= if env' == step env'
drive env' then env'
else drive $ step env'
@(defsInEffect, _) = (defsInEffect'', goToDefTbl')
step env'where
= goExpr env' e
goToDefTbl = (defsInEffect, goToDefTbl)
env'' = goBlock env'' body
(defsInEffect', goToDefTbl') = unionDef defsInEffect defsInEffect' defsInEffect''
We compute the fixpoint of the step
function. In each step, we union
the
definitions in effect before going around the loop with the one at the end of
the body of the loop.
The attentive reader will be scratching their head for our use of fixpoints for a Goto Assignment functionality. It seems like an overkill, but it is necessary, consider the following program:
= 0
x :while x {
while x {
GTA x
= x + 1
x :}
= x + 2
x :}
The most recent assignments include all three assignments for GTA x
. Without
iterating, we would miss out on x := x + 2
. Exercise to the reader, work out
why. Instead of unbounded recursion, we can in fact statically determine the
number of iterations needed (number of nested loops). But for the sake of
simplicity, we just iterate and allow it to converge for every well-formed
program.
Choosing and unioning assignments
It is time for the magic trick, we shall get the desired Goto Assignment
implementation with the following choose
and union
choices:
choose :: S.Set id -> S.Set id -> S.Set id
= const
choose
union :: S.Set id -> S.Set id -> S.Set id
= S.union union
The choice for union
as set union is easy to explain, we want all possible
recent results. That is if there is one assignment in each branch of a
conditional, on exit, it could be either of them. Hence, both definitions will
be in effect.
To understand the choice for const
, requires us to recall the traversal for assignments:
chooseDef :: String -> S.Set id -> DefsInEffect id -> DefsInEffect id
= M.insertWith choose
chooseDef
-- ...
= chooseDef str (Set.singleton id) defsInEffect defsInEffect'
chooseDef
using const
favours the assignment that is being added to the
assignments-in-effect table over whatever might already be in the table. This
allows us to keep the most recent assignment.
The choices to compute the first and last assignments (source-code distance-wise) instead looks like the following:
chooseFirst :: S.Set id -> S.Set id -> S.Set id
= min
chooseFirst
unionFirst :: S.Set id -> S.Set id -> S.Set id
= const
unionFirst
chooseLast :: S.Set id -> S.Set id -> S.Set id
= const
chooseLast
unionLast :: S.Set id -> S.Set id -> S.Set id
= max unionLast
Once again, the reason to figure out why these choices are correct are left as exercises to the reader. Can you figure out why the choices for the last assignment might yield an unexpected result under this implementation? How would you fix it? The reader can consult the test cases of the implementation to examine the differences between the first, last, and most recent assignments differences.
Relationship to other languages
Let’s first talk about the languages where the most recent singular definition is as good as gold. The pure fragment of OCaml and the non-monadic fragment of Haskell are examples to this. Consider the following program:
let x =
if rand
then let x = 1 in x
then let x = 2 in x
in
x
In this program, we are forced to put the let binding around the if
before
referencing x
, so looking at the syntactically closest (and largest)
definition is effectively what we want. A similar example can be written in
Haskell.
How about an imperative language with pass-by-reference such as mutable references in Rust or C++? We could extend the idea of Goto Assignment to Goto Mutation. In addition to considering assignments to update the assignments-in-effect table, we’d also take a note of function and method call arguments that are passed by reference.
How about object-oriented languages such as Java or Ruby? Each property assignment can be considered as an assignment or mutation to the receiver of the property. Once again we can update the assignments-in-effect table accordingly. How about the property itself? For properties, the standard Goto Definition functionality is likely the most useful navigation in general, but one could also track Goto Assignment on properties locally separate from the receiver of the property. This would make Goto Assignment a field-sensitive navigation. That in turn introduces a number of hard problems related to aliasing that any field-sensitive analysis would face.
Conclusion
We looked at Goto Assignment navigation in detail, its relationship to SSA, a simple implementation for it, and some natural extensions to the idea with more sophisticated languages in mind.
My sincere hope is that people take this idea and implement it in all my favourite languages =)
I also wondered how one would measure the efficacy of this navigation over the standard local Goto Definition implemented in many LSP servers today. Besides qualitative assessments, I cannot think of a good way of comparing one navigation tool to the other. Please do reach out if you have ideas.