I recently presented “Automatic Reordering for Dataflow Safety of Datalog” (can be accessed freely if you follow the link from my homepage) in the 20th Symposium on Principles and Practice of Declarative Languages. I collaborated with Dr Dominic Orchard and Dr Andrew Rice for this work. A few commented that the paper is technically dense and consequently hard going. I don’t want this work to be ignored because I suck at writing. So in an attempt to make things right and also to do some PR, I will explain the major ideas of the paper without the formalism and with numerous examples.
I want this post to be understandable with ease. If you find yourself confused or bored at any point or found some terms that are not very well explained, contact me via Twitter or email (both on my homepage).
After explaining some standard terminology about Datalog, we start by describing what extralogical predicates are, why one might want them in Datalog, and what problems they bring along. Then we clarify what is meant by invocation safety of such predicates. This will lead us to the naïve, incomplete, and inefficient solution and that will lead us to a better solution. I’ll conclude by giving summarising the main points, mentioning where this work work might go and why you might like to look at the paper.
Terminology
Let’s briefly look at basic Datalog terminology through an example:
X,Z) :- parent(X,Y), parent(Y,Z). ancestor(
----------- -----------
Subgoal Subgoal
------------- ------------------------
Head Body
------------------------------------------
Rule
- This is a rule in Datalog (as it happens also in Prolog) with some parts highlighted.
- A rule consists of a head (to the left of
:-
) and a body (everything to the right of:-
). A body is a comma separated list of subgoals. - A subgoal is a predicate applied to a tuple.
- A predicate qualifies a tuple with a name (sort). Throughout the text, we
write subgoals in
monospace
and predicates in \(italic\). We sometimes say “the \(p\) subgoal” to refer to a subgoal formed by applying the predicate \(p\) to a tuple. We also say “the \(p\) rule(s)” to mean the rules with the predicate \(p\) featured in the head. - A tuple of subgoal consists of terms which are either constants like
"Dragon Fruit"
(with quotation marks) or variables likeFruit
(no quotations, always capitalised).
We’re good to go! All other terms will be defined in the text as we go along.
Motivation
Example 1
Here’s a hypothetical Datalog rule that authenticates a user:
User,Pass) :- password(User,Pass), hash(Pass,Hash), valid(User,Hash). auth(
The reading of any rule in Datalog is “if every subgoal in the body holds, then we can conclude the head.” In this specific case, we can authenticate a user if we have a known password for a that user and the hash of that user is known to be valid. One of the nicest parts of this reading is that it doesn’t matter which order the subgoals are presented, we still reach the same conclusion. For example,
User,Pass) :- hash(Pass,Hash), password(User,Pass), valid(User,Hash). auth(
would authenticate exactly the same users as the previous example.
When the predicates in a program are logical, there are no
dataflow restrictions for evaluation of a subgoal with that predicate. For
example, password("Milner",Pass)
would bind the password of Milner to
Pass
while password(User,Pass)
would generate all known user and
password pairs. There are no inherent input and output parameters to a logical
predicate.
However, the \(hash\) predicate stands out because hash functions (if they are any good) work only in one direction. It is unreasonable to expect \(hash\) to bind a value for the password given a hash. Let’s refer to predicates like \(hash\) (those with dataflow requirements) as extralogical predicates.
In light of this knowledge, the versions of the rule defining \(auth\) above are
not equivalent. Evaluation of the subgoal hash(Pass,Hash)
produces a
runtime error if the value of Pass
is unknown. Suddenly, we have to be careful
about the way we order subgoals. Not only those that are obviously extralogical
like \(hash\), but also any user-defined predicates that make use of extralogical
ones! To make the matters worse, although semantically arithmetic
relations such as \(>\) are logical, they are usually implemented as extralogical
predicates (or relations).
Example 2
Just when you think things are looking bleak, it gets worse. Extralogical predicates may also cause code duplication. Let’ look at another example:
Pass) :- weak(Pass,Hash).
check_client(Hash) :- weak(Pass,Hash).
check_server(
Pass,Hash) :- hash(Pass,Hash), rainbow(Pass,Hash). weak(
Here, we define a predicate \(weak\) that checks if a password or a hash is weak. It does so looking for the hash of the password in a rainbow function (a function that can reverse some hashes). Like \(hash\), \(rainbow\) is also extralogical but in its second argument rather than the first. Now it is beneficial for both the client and the server to check if a hash is weak, but the client shouldn’t know the hash of a given password (otherwise it can construct a rainbow table of its own!) and the server shouldn’t know the password (in case it gets stolen). Hence, there are two predicates \(check\_client\) that takes a password and \(check\_server\) that takes a hash that can be used by the client and server respectively.
This example is worse than the previous one because regardless how careful the
programmer is, she can’t rewrite the definition of \(weak\) in a single rule such
that all dataflow requirements are satisfied. Now when \(check\_client\) is used,
\(weak\) should have the subgoal hash(Pass,Hash)
first, since the
password is known (satisfying \(hash\)’s dataflow requirement) and evaluating this
subgoal produces a hash, satisfying the dataflow requirement of \(rainbow\). If
\(check\_server\) is used, we need the opposite ordering because this time the
hash value is known initially. So neither rule alone can be used to satisfy the
requirements of the whole program. The only solution is a code duplicating
rewrite:
Pass) :- weak_client(Pass,Hash).
check_client(Hash) :- weak_server(Pass,Hash).
check_server(
Pass,Hash) :- hash(Pass,Hash), rainbow(Pass,Hash).
weak_client(Pass,Hash) :- rainbow(Pass,Hash), hash(Pass,Hash). weak_server(
Example 3
If you’re still not sold on the idea, I have one final motivating example. Even in the purely logical implementations of Datalog, dataflow problems exist. Negation is indispensable in logic programming. However, there is a difficulty with it. Consider the following rule:
"Mistral").
accessed("Hattie").
accessed("Rebecca").
accessed(
"Hattie" ,"171717").
password("Rebecca","242424").
password(
User) :- not password(User,Pass).
guest(
?- accessed(User), guest(User).
This program poses the query “out of those who are known to access the system, which ones are guests?”. Being a guest is defined as not having a password recorded for the given user identifier. Most implementations would reject this program because of something called allowedness. This means we can only conclude facts about constants that are known to the system. In this example, we can only conclude positive or negative facts about Mistral, Rebecca, and Hattie, but not about Alice.
In principle, this should be fine because the values User
variable can take
are already restricted by the accessed(User)
subgoal in the query.
However, most implementations of Datalog would reject this as this subgoal does
not appear directly within the rule of \(guest\) and before the negated subgoal.
This is a conservative approximation. Notice that we have not used any
extralogical predicates, yet negated subgoals automatically require all
variables appearing inside them to be bound by the subgoals preceding them. This
is precisely the problem we have with extralogical predicates.
Overall sentiment
The chances are if you chose a declarative language such as Datalog over an imperative alternative, these sort of details are exactly what you are trying to run away from. It seems here we’re forced to choose between useful functionality and high-level programming. The work that follows allows you to have both and do so without incurring a runtime performance penalty.
Understanding invocation safety
Although the examples illustrate the problems with extralogical predicates, they do not provide us anything concrete to compute with. We need a way of representing bindings of parameters and dataflow requirements in the program. As soon as we define these, the precise definition of invocation safety is simple.
Adornments: concrete parameter binding
Adornments indicate whether a parameter is bound or not. Adornment transformation is the standard analysis that computes it. Let’s understand this using Example 1:
User,Pass) :- password(User,Pass), hash(Pass,Hash), valid(User,Hash). auth(
Is the argument Pass
in the subgoal hash(Pass,Hash)
bound? Yes, it
is because the preceding subgoal password(User,Pass)
has this
variable in it and its evaluation binds Pass
. How about the Pass
argument in
password(User,Pass)
? It is free because we haven’t even seen this
variable before1, there is no way something else could have bound
it. How about the User
argument appearing in the same subgoal? It depends on
the caller of the \(auth\) predicate.
Different queries yield different answers:
?- auth(User,Pass)
asks which user & password pairs can be authenticated. Since we do not know which user to authenticate, the variables inpassword(User,Pass)
are not bound to values.?- auth("Alice",Pass)
asks if Alice is authenticated. Since we know which user to authenticate,User
inpassword(User,Pass)
is bound to a value but notPass
.
What we learn from that example is the following:
- Adornment is a top-down procedure starting from the query, the binding information flows down.
- Constants such as
"Alice"
are ground, hence are bound. - Variables of a subgoal are bound if they appear in subgoals that precede
them like
password(User,Pass)
precedinghash(Pass,Hash)
. - Variables of a subgoal are also bound when they are bound in the head of a
rule. For example,
?- auth("Alice",Pass)
makes theUser
variable in the head of the \(user\) rule bound. Consequently, theUser
variable inpassword(User,Pass)
is bound.
These four rules define a top-down procedure for binding. Let’s invent something
concrete to compute with. Let’s annotate parameters that are bound with b
and
those that are free with f
. In the query, auth("Alice",Pass)
receives the pattern bf
due to the constant and this information flows down to
the rules with head predicate \(auth\). Then, password(User,Pass)
in
the body of \(auth\) rule receives bf
because User
is bound in the head and
Pass
is not. The subgoal hash(Pass,Hash)
would have bf
because
Pass
appears in the preceding subgoal and Hash
has never been observed
before. As an exercise you can try to figure out the binding pattern for
valid(User,Hash)
from the previous example (the answer2).
Example 4
Before moving on from adornments, let’s look at an extension of Example 2. There is something that brings us closer to the solution of the code duplication problem described above.
?- check_client("123456").
?- check_server("deadbeaf").
Pass) :- weak(Pass,Hash).
check_client(Hash) :- weak(Pass,Hash).
check_server(
Pass,Hash) :- hash(Pass,Hash), rainbow(Pass,Hash). weak(
This is the same program in Example 2 except that we have two queries one using the \(check\_client\) predicate and the other using \(check\_server\). Now let’s adorn this program but rather than noting the adornment patterns separately, let’s make them suffixes to the names of the predicates. This is how this analysis is usually conduct and is the reason why it is referred as a transformation despite being an analysis in spirit. As it is a top-down procedure we start from the queries.
?- check_client_b("123456").
?- check_server_b("deadbeaf").
Both queries receive the pattern b
because all arguments are constants. Next
we propagate this to the rules defining \(check\_client\) and \(check\_server\)
predicates.
Pass) :- weak_bf(Pass,Hash).
check_client_b(Hash) :- weak_fb(Pass,Hash). check_server_b(
The heads of each predicate receive their adornments directly from their
callers. Since both callers have the pattern b
, so do the heads. Now in the
bodies, we have \(weak\_bf\) and \(weak\_fb\) predicates because
check_client_bf(User)
binds the first parameter of
weak(Pass,Hash)
and
check_client_fb(User)
binds the second parameter of
weak(Pass,Hash)
. Since
we have two different bindings for \(weak\), we need two versions of \(weak\) body.
Pass,Hash) :- hash_bf(Pass,Hash), rainbow_bb(Pass,Hash).
weak_bf(Pass,Hash) :- hash_fb(Pass,Hash), rainbow_bb(Pass,Hash). weak_fb(
The adornment of these rules are done exactly as before, so I won’t describe
them again. The important point is the adornment patterns of subgoals are
effected by the adornments patern of the head. The binding patterns of hash
above are bf
and fb
and this difference stems only from the head of the
rules.
This looks remarkably similar to the code duplication solution we found to the
problem described in Example 2. Once again, we ended up with two
different versions of weak with suffixes are bf
and fb
instead of client
and server
. This is a superfluous difference. The important difference is that
the \(weak\_fb\) rule is not reordered while the \(weak\_check\) rule is. So it
can’t be a solution to the invocation safety on its own, but it is a step in the
right direction and we develop this further below.
Modes: concrete dataflow requirements
A mode is an indication of input/output behaviour. Each predicate logical or
otherwise have a mode pattern associated with it. We only need two modes for
Datalog +
and ?
. If the invocation a predicate is going to be safe,
parameters with mode +
must be bound and those with mode ?
are always safe
bound or not.
For example, \(hash\) from Example 1 has a +?
pattern because its
first parameter needs to be bound at the time of invocation and the second
parameter may be bound or not. If we evaluate the subgoal
hash("Hattie","42")
, "Hattie"
must be there to safely
execute \(hash\), but it doesn’t matter "42"
is there because if the
hash of "Hattie"
is "0"
, then we can compare
"0"
and "42"
safely. Let’s look at \(password\) now. That
predicate is a lot like a database table, there are no restrictions on what you
can use to look up values. Hence, it has the mode pattern ??
. In standard
Datalog, all predicates have ?
mode associated with all of their
parameters3.
Let’s adopt a scheme of amending the predicate name with mode patterns to make things syntactically obvious just as we did in the adornments example. Example 1 becomes
User,Pass) :-
auth_xx(User,Pass), hash_+?(Pass,Hash), valid_??(User,Hash). password_??(
You notice there is an xx
in place of f
s and b
s for the \(auth\) predicate.
That’s because we don’t know a way of determining modes of user-defined
predicates. Intuitively, dataflow requirements of a user-defined predicate is
some function of the subgoals that define those predicates. Indeed, finding an
effective procedure to precisely determine mode patterns of user-defined
predicates is the only hard part of our quest to ensure invocation safety. We
explore this soon in the solution.
Relating adornments to modes (aka. invocation safety)
Remember b
means the parameter is bound and +
means the parameter
needs to be bound. Modes and adornments are clearly related. Their main
differences are:
- Adornment transformation needs an entry point (a query) and dataflow information flows top-down from heads to the bodies. We don’t have a procedure for computing modes (yet), but we mentioned the predicates defined in the heads should be a function of those in the body. This suggests a bottom-up information flow.
- Binding (adornment) patterns qualify subgoals (predicates applied to tuples of variable and constants). Mode patterns, on the other hand, qualify predicates and hence don’t change from one subgoal to another.
- Related to the previous point, adornment is entirely dependent on the query, whereas modes are not. This suggests we can perhaps compute modes beforehand and consult them as queries change.
The agreement between adornments and modes be made precise: in an adorned
program, invocations of extralogical predicates are safe when all subgoals with
extralogical predicates are adorned with b
for parameters with mode +
. This
can be restated as the following almost tautological statement: if something
needs to be bound, then it is bound.
In the literature, invocation safety is called well-modedness. A well-moded query & program pair doesn’t produce invocation errors just as “well-typed programs don’t go wrong” as Robin Milner put it.
Solution
Now we start with the stupidest possible thing we can do and refine our approach.
Naïve solution
The examples so far suggest that we need a simple reordering of bodies. So why not give that a go. Recall the variation of Example 1 with mode embellishments:
User,Pass) :-
auth_xx(User,Pass), hash_+?(Pass,Hash), valid_??(User,Hash). password_??(
Let’s adorn versions of this rule with the subgoals permuted with respect to
the query ?- auth_xx(User,Pass)
. The heads of the rules are omitted
as they are always auth_xx_ff(User,Pass)
.
✖password_??_ff(User,Pass), hash_+?_bf(Pass,Hash), valid_??_bb(User,Hash).
✖password_??_ff(User,Pass), valid_??_bf(User,Hash), hash_+?_bb(Pass,Hash).
✖valid_??_ff(User,Hash), password_??_bf(User,Pass), hash_+?_bb(Pass,Hash).
✓valid_??_ff(User,Hash), hash_+?_fb(Pass,Hash), password_??_bb(User,Pass).
✓hash_+?_ff(Pass,Hash), valid_??_fb(User,Hash), password_??_bb(User,Pass).
✓hash_+?_ff(Pass,Hash), password_??_fb(User,Pass), valid_??_bb(User,Hash).
The last three are invocation safe while the first three are not. Since \(hash\)
is the only extralogical predicate with dataflow requirements, namely on its
first parameter Pass
, orderings that leads to a bound mode
(b
) at the first parameter are well-moded, while the others are not.
The obvious problem is if there are \(n\) subgoals we may need to adorn \(n\) factorial orderings to find a suitable ordering. But perhaps that’s not a big problem, if in practice programmers stick to 3 to 5 subgoals per rule. The real problem is that we cannot isolate rules.
Example 5
Let’s say we do some software engineering and factor
hash_+?(Pass,Hash)
and valid_??(User,Hash)
into a new
rule with head predicate \(check\) as follows:
?- auth_xx(User,Pass)
User,Pass) :- check_xx(User,Pass), password_??(User,Pass).
auth_xx(User,Pass) :- hash_+?(Pass,Hash), valid_??(User,Hash). check_xx(
Now let’s try adorning this program while considering different orderings of \(check\) as that is where the extralogical predicate \(hash\) occurs.
?- auth_xx_ff(User,Pass)
User,Pass) :-
auth_xx_ff(User,Pass), password_??_bb(User,Pass).
check_xx_ff(
User,Pass) :- hash_+?_ff(Pass,Hash), valid_??_fb(User,Hash).
check_xx_ff(User,Pass) :- valid_??_ff(User,Hash), hash_+?_fb(Pass,Hash). check_xx_ff(
This looks bad because when we reorder \(check\) rule alone, neither of the orderings lead to an adornment that makes the invocation of \(hash\) safe. They both lead to a free first argument.
At this point we can just say that we have a sound but incomplete analysis,
tough luck. Alternatively, we recognise that the first parameter of the subgoal
with \(hash\) is Pass
which appear in the head of the rule. As we learn in
adornment transformation rules,
if a parameter is bound in the head then it is also bound in the body. So all we
need is to turn the ff
binding pattern of the head to fb
or bb
. This is
indeed possible if we reorder the use of \(check\) within the \(auth\) rule as
follows:
?- auth_xx_ff(User,Pass)
User,Pass) :-
auth_xx_ff(_ff(User,Pass), check_xx_bb(User,Pass).
password_??
User,Pass) :- hash_+?_bf(Pass,Hash), valid_??_bb(User,Hash).
check_xx_bb(User,Pass) :- valid_??_bf(User,Hash), hash_+?_bb(Pass,Hash). check_xx_bb(
After permuting the subgoals with \(password\) and \(check\), we end up with bb
adornment for the head of \(check\) and regardless the order of subgoals in the
body \(check\). This makes invocations of \(hash\) safe in both cases.
Computationally, however, this is problematic. Search space is exponential not
only in the size of individual bodies but it is a multiplication of all possible
orderings of all rules from query down to the relevant rule. This is
intractable even for small programs. We address this not with elegance but with
some greed.
Greed is good
We can’t do better than exponential asymptotically, but we can make it exponential the way Hindley-Milner type inference is—only in degenerate cases.
The most important insight to our solution is the following: although
dataflow requirements are inherent to the predicates, they are contextual. That
is to say if a predicate parameter has mode +
(requiring it to be bound) and
the subgoal using that predicate has the variable V
at that parameter, if a
preceding subgoal already binds V
, we can safely pretend as if the predicate
has mode ?
in this context. So we can treat
:- password_??(User,Pass), hash_+?(Pass,Hash), ... ...
as
:- password_??(User,Pass), hash_??(Pass,Hash), ... ...
so \(hash\) in that context behaves as if it is a logical predicate.
Now we exploit the contextual nature of dataflow. We schedule subgoals with no dataflow requirements first such as \(password\) and \(valid\). These subgoals can be freely placed anywhere in the body as they are always safe to execute. So we adopt a greedy approach and schedule subgoals with such predicates as early as possible.
The benefits of this approach are twofold:
- Combinatorially, there are fewer orderings to consider as placement of logical subgoals are restricted.
- More importantly, variables of scheduled subgoals are propagated to the remaining alternatives to relax the dataflow constraints as described above. This would lead to further opportunities to apply the greedy scheduling principle.
Scheduling of subgoals fit nicely into a graph where the edges are labelled with subgoals to schedule and orderings are the paths from the root to vertices with no successors.
Example 6
Let’s apply this strategy to Example 1:
User,Pass) :-
auth_xx(User,Pass), hash_+?(Pass,Hash), valid_??(User,Hash). password_??(
The corresponding graph is:
{password, valid} {hash}
V0 --------------------> V1 -------------------> V2
This graph stores orderings where subgoals with predicates \(password\) and \(valid\) are scheduled first and it doesn’t matter in which order, only then the \(hash\) subgoal is scheduled.
The subgoals with predicates \(password\) and \(valid\) could be scheduled first
because they have no dataflow requirements (indicated by ??
mode pattern).
The subgoal with \(hash\) could not be scheduled at that point due to +?
mode
pattern. However, as we established earlier in this section, mode
patterns can be relaxed in the right context. So if we treat the vertex V1
in
the graph as a context, hash_+?(Pass,Hash)
behaves as
hash_??(Pass,Hash)
because on this path the subgoal
password(User,Pass)
precedes it and binds Pass
, thus we can
schedule it at that point using the greedy principle.
More concretely, this leads to the following orderings:
User,Pass), valid_??(User,Hash), hash_+?(Pass,Hash).
password_??(User,Hash), password_??(User,Pass), hash_+?(Pass,Hash). valid_??(
These are the two of the three well-moded orderings found through brute-force
search in the naïve solution described above. The ordering
we miss has valid_??(User,Hash)
at the end of the body. Greedy scheduling
prevents us from discovering this ordering.
This raises the question “are there cases in which we discard the only safe ordering?” The answer is no. If there is at least one safe ordering that can be found by reordering, greedy scheduling will also find at least one valid ordering. Surprisingly, greed is good4 and it doesn’t compromise completeness.
Before we move on to other examples of scheduling graphs, we can finally address the pending problem of determining mode patterns for user-defined predicates posed where modes are introduced.
Modes of user-defined predicates
Let’s continue with Example 6 and deduce what xx
in
auth_xx(User,Pass)
should be.
Since a mode marks a requirement of boundness, the first x
should be +
if
User
needs to be bound for the invocation to be safe and it should be ?
if
invocation of all predicates in the body are safe regardless binding status of
User
. The same reasoning applies to the second parameter. If invocation is
safe with a ?
mode, then it is also safe with a +
mode, so we are looking
for the most relaxed mode5. The orderings from the graph above
didn’t make any conditional reasoning about boundedness of User
or Pass
,
hence ??
is suitable for \(auth\). We can revise the two versions of the rule as
follows:
User,Pass) :-
auth_??(User,Pass), valid_??(User,Hash), hash_+?(Pass,Hash).
password_??(User,Pass) :-
auth_??(User,Hash), password_??(User,Pass), hash_+?(Pass,Hash). valid_??(
Here’s the beauty of this analysis result: despite the fact that we defined
auth
with the use an extralogical predicate, we manage to make it behave as a
logical predicate wherever it is invoked. We just restored some of the lost
declarative promise!
Example 7
We first look at Example 5 again:
User,Pass) :- check_xx(User,Pass), password_??(User,Pass).
auth_xx(User,Pass) :- hash_+?(Pass,Hash), valid_??(User,Hash). check_xx(
Let’s examine the interesting scheduling graph first that is the one for check
rule.
{valid} {hash}
V1 ----------> V2 ---------> V3
This graph looks similar to the previous one. We apply the greedy principle and
schedule valid_??(User,Hash)
first. The difference is the variable
bound by this subgoal (User
and Hash
) don’t include Pass
which is the
dataflow restricted argument of hash_+?(Pass,Hash)
. So the only way
we can schedule \(hash\) is if Pass
is already bound at the moment we start
evaluating the rule. This is same as saying the second parameter of \(check\)
(which is Pass
) has to be bound at the caller. For the first parameter, there
are no restrictions posed. Then, the overall mode pattern for check is ?+
.
In the description above, we kept track of which variables on the path has to be resolved by the invoker of the predicate being defined. This information naturally fits in the vertices of the graph. This is how we do it in the paper, but we elide the details here to simplify the presentation.
Now that we know the mode pattern for \(check\), we can draw a scheduling graph for the rule that defines \(auth\) and deduce its predicate (again). This time we do not give a detailed explanation. It’d be a good exercise to see why we scheduled which predicate and what the mode pattern of \(auth\) is.
{password} {check}
V1 -------------> V2 ----------> V3
We cheat here a bit and impose an order on building the graphs. In general, we
can’t find such an order because rules can refer to themselves recursively or be
mutually recursive with other rules. Proper way of handling that situation is
to assume ?
in place of x
for modes of user-defined predicates initially,
and repeatedly apply this graph-based analysis to update the mode patterns until
they stabilise (fixpoint) for all user-defined predicates. You can refer to the
paper on why this works and why it terminates at all.
Example 8
So far it looks like all the scheduling graphs are just single paths. This is not the case in general and we can use Example 2 to demonstrate it:
Pass,Hash) :- hash_+?(Pass,Hash), rainbow_?+(Pass,Hash). weak_xx(
We don’t have any logical predicates in this body. This can’t be a good start. There is only one thing we do, that is to default to brute-force search for a moment and try scheduling both subgoals.
{hash} {rainbow}
-------------> V2 -------------> V3
/
V1
\
-------------> V5 -------------> V6
{rainbow} {hash}
Finally, some more interesting ASCII
art.
Let’s inspect the branch we try scheduling hash_+?(Pass,Hash)
first.
In this path, scheduling the subgoal with \(hash\) predicate binds Pass
and
Hash
, as a result at the context of \(V2\), rainbow behaves like a logical
predicate (the dataflow constrained second parameter has Pass
which is already
bound). Since we couldn’t schedule the \(hash\) subgoal for free, it seems we need
to constraint the first parameter (Pass
) of \(weak\) leading +?
mode pattern.
But wait a second! if we inspect the other branch instead, we commit to
requiring Hash
to be bound in the head of the rule (and consequently dealt
with by the caller) instead. This suggests a mode pattern ?+
.
So which is it? In a way, it is both. Depending on the ordering of the rules
we have different dataflow requirements. This is good! Now we can just select a
representative ordering for each mode pattern. When the rule is used
differently, we use the appropriate ordering. So the overall mode pattern of
\(weak\) is +?/?+
to signify either of them can be used.
One reason we ended with two alternatives is that the mode patterns we obtained
are incomparable. If we had options +?
and ++
, instead we could have
discarded ++
because +?
is a more favourable alternative. This allows us to
store fewer orderings. We say more about that in the paper.
Multiple rules
So far we have been assuming that a predicate is defined by a single rule. Now we look at those defined by multiple rules. Let’s extend Example 8 to have a custom extralogical predicate that checks weakness of a password & hash pair and requires both of its parameters to be bound to execute safely.
Pass,Hash) :- hash_+?(Pass,Hash), rainbow_?+(Pass,Hash).
weak_xx(Pass,Hash) :- custom_weak_++(Pass,Hash). weak_xx(
We already know the first rule leads to the mode pattern +?/?+
. The second
rule is so simple that we can tell the mode pattern it leads without drawing the
relevant scheduling graph, namely ++
inherited from its single defining
subgoal.
Now what does it mean to invoke a predicate in Datalog? It means we are going to
try each of its defining rules to collect all possible conclusions. This implies
any invocation of \(weak\) is going to evaluate both rules. Therefore, the only
thing we can do to ensure safety is to satisfy the dataflow requirements of each
rule simultaneously. What this means in our example is that we generate all
possible combinations that is ++
& +?
and ++
& ?+
. In each case, ++
is
stricter for every parameter, so it shadows the other. Hence, we end up with
++/++
, but since both alternatives are identical, the overall mode pattern for
\(weak\) is ++
. Here this sounds ad hoc, but there is an algebraic method to
this madness6. The details of which are in the paper.
Generalised adornment
Now we know how to determine dataflow requirements of user-defined predicates as well as orderings of subgoals that lead to these requirements. However, this does not help very much with our definition of invocation safety. The reason is that definition rely on adorning a program in the given order of subgoals. This is already in the result of Example 4 with two different versions of the \(weak\) rule with different adornments but static ordering of subgoals.
So the final step in our quest to invocation safety is to generalise adornment procedure such that it can take reorderings into the account.
We modify the adornment procedure in a single way. We let the procedure take a reordering function that given a rule and a binding pattern produces an ordering. Of course, we compute these orderings through scheduling graphs as above. Then, right before we start adorning the body of a rule, we apply the reordering first, then adorn from left to right as usual.
Example 9
Consider the following example adapted from Example 4.
?- weak_+?/?+(Pass,"deadbeaf").
Pass,Hash) :- hash_+?(Pass,Hash), rainbow_?+(Pass,Hash). weak_+?/?+(
So the head of \(weak\) rule will have the adornment fb
due to the query. We
almost computed the ordering function for \(weak\) in Example 8. For
?+
mode pattern alternative which is the only alternative compatible with fb
binding pattern, we have an ordering in which the subgoal with \(rainbow\) comes
first and the subgoal with \(hash\) comes second. So we reorder the body first and
then perform the adornment and that leads to the following adorned program:
?- weak_+?/?+_fb(Pass,"deadbeaf")
_fb(Pass,Hash) :- rainbow_?+_fb(Pass,Hash), hash_+?_bb(Pass,Hash). weak_+?/?+
And voilà, we have a \(rainbow\) subgoal with an adornment for its second
parameter b
and \(hash\) subgoal with an adornment for its first parameter b
as required. As all extralogical predicates are used in subgoals with adornments
that match their modes, the program is safe to execute.
Checking invocation safety
Once we compute modes for all-user defined predicates including the query, we don’t have to adorn the program to see if the program is well-moded. We only need to look at the query.
Example 10
In Example 9 the query was ?- weak_+?/?+_fb(Pass,"deadbeaf")
, here the fact that fb
is consistent
with one of the mode pattern alternatives, namely ?+
immediately indicates
that we have a ordering function that satisfy all dataflow requirements.
If, on the other, hand we pose the same query to the version of weak used in explaining modes of predicates with multiple rules:
?- weak_++(Pass,"deadbeaf")
Pass,Hash) :- hash_+?(Pass,Hash), rainbow_?+(Pass,Hash).
weak_++(Pass,Hash) :- custom_weak_++(Pass,Hash). weak_++(
Just by looking at the query, we know the program is not well-moded with respect
to this query because the predicate \(weak\) in this case requires both of its
parameters to be bound but the first argument to the query is the free variable
Pass
.
Some properties
This algorithm enjoys a number of properties. I am neither going to get into their proofs nor the mathematical details, but it would be amiss to omit them entirely:
- Soundness: if the algorithm finds a reordering function for a program, the the reordered rules will be safe to invoke.
- Completeness: if there are any orderings of rules that ensure invocation safety, then the algorithm will find a (possibly different) set of orderings that ensure invocation safety.
- Incremental computation: addition of new rules don’t invalidate the old results, hence the mode patterns of existing user-defined predicates can be used to seed the analysis leading to faster execution.
- Termination: the analysis terminates on all inputs.
In particular, incremental computation is useful for making this analysis scalable. In the special case that the added rule does not extend an existing predicate, i.e., have a fresh head, we don’t have to reanalyse old rules. This means we can analyse libraries and ship the mode patterns of predicates defined in the library with them. The users of the library won’t have to reanalyse the code in the library.
Summary & concluding thoughts
The main take away is declarative programming is awesome! In the case of Datalog, we can supplement expressivity by allowing extralogical predicates. This brings the syntactic order of execution problem with it, but our analysis completely eliminates this and restores the promise of declarative programming!
The analysis relies on computing dataflow requirements (modes) of user-defined predicates and verifying their consistency with respect to the actual bindings (adornments) of the program & query pairs.
We avoid inefficiency through a greedy scheduling algorithm, but we don’t sacrifice completeness. Additionally, our analysis is incremental and thus scalable to multi module programs with relative ease.
Here are few reasons why you might like to read the paper:
- understanding the mathematical foundation for modes & bindings,
- gaining insights into about an implementation (paper is effectively executable maths),
- need more advanced examples (illustrating omitted aspects of the algorithm),
- looking for generalisation from Datalog to other contexts,
- and seeing how this algorithm behaves with common Datalog examples such as negation, aggregation, and effectful extralogical predicates.
There is a number of directions this work may go. I believe the constraint-based approach of the work can be generalised to other declarative languages which need more sophisticated modes. I firmly believe there is a fundamental link between the mode operations we defined and dataflow in general. We didn’t discuss user-level annotations as you noticed. There are interesting design questions about an annotation language for modes that might work exploring. Finally (and a bit more abstractly), extralogical predicates are inevitable in logic programming but they are afterthoughts to the semantics of the language. I’d like them to be first-class citizens, so we can reason about them better.
Pass
in this rule is an existentially quantified variable, so there is indeed nothing that could have bound it out of the rule context.↩︎It is
bb
because bothUser
andPass
appear in preceding subgoals.↩︎This seems to contradict what I said in Example 3, so let me clarify. In a subgoal like
not r(X,Y,Z)
, the predicate \(r\) still has???
mode pattern. It is the negation that imposes makesr
behave as if it has+++
pattern.↩︎This is a quote from a famous movie speech by the character Gordon Gekko in Wall Street. Highly recommended.↩︎
In the paper, we show and heavily use the fact that constraints (which are derived from modes) form a bounded partial order. We can talk about principal moding or sub moding, but they are only relevant if we have explicit annotations for modes which are not strictly necessary for safety checking and automated reordering. Perhaps another paper? Feel free to contact me if you want to discuss this more.↩︎
This is the second time we are combining mode patterns and there is a form of subsumption going on. When we consider two alternative dataflow requirements and their combination leads to a more relaxed mode pattern. When we combine two mode patterns that need to be satisfied simultaneously, the combination is more strict. You must have noticed that these two operations sound very algebraic. Indeed, to our surprise, they form Martelli’s semiring (sorry, there aren’t any non-academic resources I could find) originally used to compute cut-sets of a graph. To the best of my knowledge this hasn’t been used in dataflow context before and seems general enough to be used beyond Datalog and extralogical predicates.↩︎