Technical Report: DCC-97-7
Optimising Parallel Logic Programming Systems
for Scalable Machines
Vítor Santos Costa & Ricardo Bianchini
LIACC & COPPE/Sistemas
Universidade do Porto & Universidade Federal do Rio de Janeiro
September 1997
Abstract
Logic programs are good examples of symbolic applications that often
exhibit large amounts of implicit parallelism and that can greatly
benefit from parallel computers. Parallel logic programming (PLP)
systems have obtained excellent results for traditional bus-based
shared-memory architectures. However, the scalable multiprocessors
being developed today pose new challenges, such as the high latency
of memory accesses and the demand for scalability. Our experience
with a sophisticated PLP system, Andorra-I, demonstrates that indeed
performance suffers greatly on modern architectures. This paper
addresses the question of whether the poor performance is inherent
to the complex structure of PLP systems or can be improved through
careful analysis and tuning. In order to answer this question, we
perform a detailed analysis of the cache behaviour of all Andorra-I
data structures via execution-driven simulation of a DASH-like
multiprocessor. Based on this analysis we optimise the Andorra-I
code using 5 different techniques. We present the isolated and
combined performance improvements provided by these optimisations.
The overall results we obtained are quite impressive. A few of our
logic programs begun to approach linear speedup as a result of our
optimisations. In fact, for one of our programs the speedup of the
modified Andorra-I is a factor of 3 higher than that of the original
version of the system on 24 processors. Our main conclusion is then
that, even though PLP systems are indeed complex and sometimes
irregular, these systems can and should perform well on modern
scalable multiprocessors.
Keywords: Scalable Multiprocessors, Parallel Logic Programming,
And-Or Parallelism, Performance Evaluation.