Posted on August 24, 2023 by Mistral Contrastin

Goto Assignment: A better local Goto Definition

In which we discuss a novel local code navigation more useful for exploring source code than Goto Definition. We explore its relationship to static single assignment form (SSA) intermediate representation. This post comes with a proof-of-concept Haskell LSP server and a Visual Studio Code (VSCode) extension for a simple While language.

Goto Definition functionality in IDEs is essential to modern software development. It works beautifully when used inter-procedurally to navigate to functions, methods, data specifications, and other top-level declarations.

When used locally on variables, Goto Definition leaves much to be desired. Many language servers navigate to declarations of variables. If the language allows conditional definitions or definitions by first assignment, then we end up in the first or the last assignment. This is unhelpful:

We propose an alternative that we shall call Goto Assignment that takes us to the most recent assignments (plural!) following the control flow of the program. Here’s the navigation in action.

Goto Assignment VSCode Demo

The rest of this post first explores what is wrong with local Goto Definition and gradually builds up to Goto Assignment. Then we examine how this navigation is intimately related to SSA. Despite the conceptual clarity of SSA, it is not the most practical implementation strategy, so we sketch the Goto Assignment computation using a (simple) Haskell tree walk. Finally, we discuss how this feature might manifest in more sophisticated languages and some natural extensions to the idea for a more fine-grained experience.

You can find an LSP server implementation in Haskell on a simple While language as well as a VSCode extension that talks to the LSP server in this repo and try it for yourself.

Everything that is wrong

Let’s arrive at a better Goto Definition through examples. We don’t explicitly introduce the While language. It is a standard imperative language. To aid the presentation, we shall use GTA x statement to point to the definition/assignment of x at the time the statement is used.

Consider the following conditional initialisation:

x := 0

if rand {
  GTA x
  x := 42
}
GTA x

If you were to navigate to the definition of x from the GTA x at the end, depending on the language server implementation, we’d end up either at x := 0 or x := 42. Neither are helpful. Depending on what rand evaluates to, x can be either of those things.

So perhaps, the answer is to list all assignments to the variable? Not quite. We wouldn’t want GTA x within the if statement to take us to x := 42.

So how about all preceding assignments? That’d contain all the information to understand what x is, but it’ll generally contain too many irrelevant assignments. Consider a linear sequence of assignments:

x := 0
x := x + 1
x := x + 1
x := x + 1
x := x + 1
x := x + 1
x := x + 1

GTA x

If code navigation offers all of these assignments as options, it’d be pretty annoying. While debugging, we often observe some outcome (perhaps through logs or tests) and would like to work backwards to find what could have led to that outcome. Seeing all possible assignments doesn’t help us with this at all! It’d be better to follow the chain backwards gradually.

What’s worse is this would also give us the “dead” assignments. Particularly in testing code, variables are often reassigned from scratch for each test case. For example,

given := 0
expected := 42

# test with given and expected

given := 1
expected := 43

# test with given and expected

GTA given
GTA expected

We wouldn’t want navigation results to include given := 0 and expected := 42. As those assignments are dead.

Then what we want is a set of most recent assignments. But what is recent (and I promise I’m not being academic)? Consider the following while loop,

x := 42
while x {
  GTA x
  x := x - 1
}

In this example, the most recent assignments to x at the point the definition is requested include x := 42, but it also includes x := x - 1. So we are not working with a spatial, but a temporal notion.

All in all, with all of these examples, we reach the conclusion that to effectively navigate assignments to a binder, we need to

Relationship to Single Static Assignment

Static single assignment form is an intermediate representation where each assignment is to a unique variable. This simplifies program analysis and correspondingly many optimisations.

What does SSA have anything to do with code navigation? As it turns out, if the program is in the SSA form, computing what I described in the previous section is trivial.

Let’s see what a sequence of assignment in, SSA-form looks like.

x := 0
GTA x
x := x + 1
GTA x
x := x + 1
GTA x

becomes

x1 := 0
GTA x1
x2 := x1 + 1
GTA x2
x3 := x2 + 1
GTA x3

Finding the most recent assignment is trivial. We simply look up the unique assignment for the target binder.

SSA deals with conditionals via phi nodes these are used to consolidate unique assignments from different branches into the same variable. For example,

if rand {
  x := 1
} else {
  x := 2
}
GTA x

becomes

if rand {
  x1 := 1
} else {
  x2 := 2
}
x3 := phi(x1, x2)
GTA x3

The assignment involving the phi node does not have a corresponding assignment in the source code, but it provides exactly the necessary information to collect the most recent assignments. When we want to navigate to assignments,

in the end, we end up with a collection of most recent assignments that can be mapped back to assignments in the source code.

The exploration of the idea through SSA makes the concept crystal clear in my mind. Although it is not a measure for “correctness”, it strenghtens my belief that navigating in this way is somehow meaningful or natural.

Sketching an implementation in Haskell

As tempting as it is to provide an implementation via conversion to the SSA form and back, it would be arduous as we’d need to render the program as a graph first, compute the dominance frontiers, do renaming based on that information, and finally compute the Goto Assignment table.

It is much simpler to maintain an environment consisting an assignments-in-effect table and a Goto-Assignment table. We then populate these while walking an abstract-syntax tree (AST). We now explore how to do that in a generic way that yields the first and last assignment implementations for free.

Let’s first look at the AST of the While language.

type Program id = Block id

type Block id = [Statement id]

data Statement id where
    Assignment (Variable id) (Expression id)
  | Ite (Expression id) (Block id) (Block id)
  | While (Expression id) (Block id)

data Expression id =
    EConst Int
  | EVar (Variable id)
  | EPlus (Expression id) (Expression id)

In While programs, we have assignment, if-then-else, and while statements in a list. The exressions are constants, variables, and addition. We abstract over identifiers which can be a numbering of nodes or simply positions.

Walking the AST

We now traverse a given program and update tables within the environment in the process. The first higher-order function parameter choose definitions and the second one unions them to handle join points in the control flow.

goToDefSimple :: forall id.
  Ord id
  => (S.Set id -> S.Set id -> S.Set id)
  -> (S.Set id -> S.Set id -> S.Set id)
  -> Program id
  -> GoToDef id

Skipping over block folding, let’s focus on statement handling.

We start with assignments first:

goSt env@(defsInEffect, _) (Assignment var@(Variable id str) expr) =
  (defsInEffect', goToDefTbl')
  where
    defsInEffect' = chooseDef str (Set.singleton id) defsInEffect
    env' = (defsInEffect', goExpr env expr)
    goToDefTbl' = goVar env' var

We first handle the right-hand side (RHS) of the assignment using the input environment (since the assignment hasn’t occurred yet). This extends the Goto Assignment table by looking up the assignment in effect and adding entries for the ids of the expressions on the RHS.

Then we choose between the assignment being analysed and the definition in the environment.

Then we look at the conditionals:

goSt env@(defsInEffect, _) (Ite e th el) =
  (unionDef defsInEffectTh defsInEffectEl, goToDefTbl'')
  where
    goToDefTbl = goExpr env e
    (defsInEffectTh, goToDefTbl') = goBlock (defsInEffect, goToDefTbl) th
    (defsInEffectEl, goToDefTbl'') = goBlock (defsInEffect, goToDefTbl') el

For the condition, we extend the Goto Assignment table just as we did for the RHS of the assignment. Then we evaluate the then and else blocks extending the Goto Assignment as we go along. For both blocks we use the input assignments in effect. Note that the assignments-in-effect table is not threaded and identical in each branch as the assignments in these blocks are independent and not visible to each other.

Crucially, we union the assignments-in-effect tables produced by each block on exit to account for the join points.

Finally, we look at the most challenging construct, i.e., the while loops:

goSt env (While e body) = drive env
  where
    drive env' = if env' == step env'
                 then env'
                 else drive $ step env'

    step env'@(defsInEffect, _) = (defsInEffect'', goToDefTbl')
      where
        goToDefTbl = goExpr env' e
        env'' = (defsInEffect, goToDefTbl)
        (defsInEffect', goToDefTbl') = goBlock env'' body
        defsInEffect'' = unionDef defsInEffect defsInEffect'

We compute the fixpoint of the step function. In each step, we union the definitions in effect before going around the loop with the one at the end of the body of the loop.

The attentive reader will be scratching their head for our use of fixpoints for a Goto Assignment functionality. It seems like an overkill, but it is necessary, consider the following program:

x := 0
while x {
  while x {
    GTA x
    x := x + 1
  }
  x := x + 2
}

The most recent assignments include all three assignments for GTA x. Without iterating, we would miss out on x := x + 2. Exercise to the reader, work out why. Instead of unbounded recursion, we can in fact statically determine the number of iterations needed (number of nested loops). But for the sake of simplicity, we just iterate and allow it to converge for every well-formed program.

Choosing and unioning assignments

It is time for the magic trick, we shall get the desired Goto Assignment implementation with the following choose and union choices:

choose :: S.Set id -> S.Set id -> S.Set id
choose = const

union :: S.Set id -> S.Set id -> S.Set id
union = S.union

The choice for union as set union is easy to explain, we want all possible recent results. That is if there is one assignment in each branch of a conditional, on exit, it could be either of them. Hence, both definitions will be in effect.

To understand the choice for const, requires us to recall the traversal for assignments:

chooseDef :: String -> S.Set id -> DefsInEffect id -> DefsInEffect id
chooseDef = M.insertWith choose

-- ...
defsInEffect' = chooseDef str (Set.singleton id) defsInEffect

chooseDef using const favours the assignment that is being added to the assignments-in-effect table over whatever might already be in the table. This allows us to keep the most recent assignment.

The choices to compute the first and last assignments (source-code distance-wise) instead looks like the following:

chooseFirst :: S.Set id -> S.Set id -> S.Set id
chooseFirst = min

unionFirst :: S.Set id -> S.Set id -> S.Set id
unionFirst = const

chooseLast :: S.Set id -> S.Set id -> S.Set id
chooseLast = const

unionLast :: S.Set id -> S.Set id -> S.Set id
unionLast = max

Once again, the reason to figure out why these choices are correct are left as exercises to the reader. Can you figure out why the choices for the last assignment might yield an unexpected result under this implementation? How would you fix it? The reader can consult the test cases of the implementation to examine the differences between the first, last, and most recent assignments differences.

Relationship to other languages

Let’s first talk about the languages where the most recent singular definition is as good as gold. The pure fragment of OCaml and the non-monadic fragment of Haskell are examples to this. Consider the following program:

let x =
  if rand
    then let x = 1 in x
    then let x = 2 in x
in
x

In this program, we are forced to put the let binding around the if before referencing x, so looking at the syntactically closest (and largest) definition is effectively what we want. A similar example can be written in Haskell.

How about an imperative language with pass-by-reference such as mutable references in Rust or C++? We could extend the idea of Goto Assignment to Goto Mutation. In addition to considering assignments to update the assignments-in-effect table, we’d also take a note of function and method call arguments that are passed by reference.

How about object-oriented languages such as Java or Ruby? Each property assignment can be considered as an assignment or mutation to the receiver of the property. Once again we can update the assignments-in-effect table accordingly. How about the property itself? For properties, the standard Goto Definition functionality is likely the most useful navigation in general, but one could also track Goto Assignment on properties locally separate from the receiver of the property. This would make Goto Assignment a field-sensitive navigation. That in turn introduces a number of hard problems related to aliasing that any field-sensitive analysis would face.

Conclusion

We looked at Goto Assignment navigation in detail, its relationship to SSA, a simple implementation for it, and some natural extensions to the idea with more sophisticated languages in mind.

My sincere hope is that people take this idea and implement it in all my favourite languages =)

I also wondered how one would measure the efficacy of this navigation over the standard local Goto Definition implemented in many LSP servers today. Besides qualitative assessments, I cannot think of a good way of comparing one navigation tool to the other. Please do reach out if you have ideas.