Algorithms and Data Structures

## What the meeting is about

Algorithms and Data Structures continue to play a very important role in Computer Science research and Discrete Mathematics. This meeting will be an opportunity to focus on some recent developments and discuss open problems, old and new.

## Important Dates

Arrival: Sunday May 29, 2005
Departure: Saturday June 4, 2005

## Location

The meeting will be held in the small medieval hilltop town of Bertinoro. This town is in Emilia Romagna about 50km east of Bologna at an elevation of about 230m.  Here is a map putting it in context. It is easily reached by train and taxi from Bologna and is close to many splendid Italian locations such as Ravenna, a treasure trove of byzantine art and history, and the Republic of San Marino (all within 35km) as well as some less well-known locations like the thermal springs of Fratta Terme and the castle and monastic gardens of Monte Maggio.  Bertinoro can also be a base for visiting some of the better-known Italian locations such as Padua, Ferrara, Vicenza, Venice, Florence and Siena.

Bertinoro itself is picturesque, with many narrow streets and walkways winding around the central peak.  The meeting will be held in a redoubtable ex-Episcopal fortress that has been converted by the University of Bologna into a modern conference center with computing facilities and Internet access.  From the fortress you can enjoy a beautiful the vista that stretches from the Tuscan Apennines to the Adriatic coast and the Alps over the Po Valley.

## Talk Schedule

May 29 May 30 May 31 June 1 June 2 Pellegrini Franciosa Demetrescu Munro Marchetti Pagh Kaplan McGeoch Östlin Pagh Grandoni Roditty Sanders Italiano Sleator Thorup Brodal Goldberg Finocchi Werneck Radzik Iacono Zwick Excursion Georgiadis King to Langerman Zaroliagis Urbino Demaine

## List of confirmed participants (so far)

Scientific Organizing Committee Andrew Goldberg Microsoft Research Giuseppe F. Italiano Università di Roma "Tor Vergata" Andrea Bandini, Elena Della Godenza,, Centro Congressi di Bertinoro BICI   Bertinoro International Center for Informatics

### ABSTRACTS

Cache-Oblivious String Dictionaries

Gerth Brodal

(joint work with Rolf Fagerberg)

We present a cache-oblivious dictionary structure for strings which provides analogues of tries and suffix trees in the cache-oblivious model. Our construction takes as input either a set of strings to store, a single string for which all suffixes are to be stored, a trie, a compressed trie, or a suffix tree, and creates a cache-oblivious data structure which performs prefix queries in $O(\log_B n + |P|/B)$ I/Os, where $n$ is the number of leaves in the trie, $P$ is the query string, and $B$ is the block size. This query cost is optimal for unbounded alphabets.

On a pipelined streaming model

Camil Demetrescu

(Joint work with Irene Finocchi and Andrea Ribichini)

We study the computational power of a streaming model where at each pass one input stream is read and, differently from the classical read-only streaming model, one output stream is written. Streams are pipelined in such a way that the output stream produced during pass i is used as input stream at pass i+1. We provide several upper and lower bounds in this model, discussing different separation results with respect to classical streaming.

Algorithms resilient to memory faults: searching and lower bounds.

Irene Finocchi

(joint work with Fabrizio Grandoni and Giuseppe F. Italiano)

Some of today's applications run on computer platforms with large and inexpensive memories, which are also error-prone: a system with Terabytes of memory, such as a large cluster of computing platforms with a few Gigabytes per node, would experience an error every few minutes (this is the case, e.g., in Web search engines). Unfortunately, the appearance of even very few memory faults may jeopardize the correctness of the computational results.
An algorithm is resilient to memory faults if, despite the corruption of some memory values before or during its execution, it is nevertheless able to get a correct output at least on the set of uncorrupted values. In the talk we will investigate the design and analysis of resilient searching algorithms, presenting also some lower bounds.

Small stretch spanners on dynamic graphs.

Paolo G. Franciosa

(joint work with Giorgio Ausiello and Giuseppe F. Italiano)

We present fully dynamic algorithms for maintaining 3- and 5-spanners of undirected graphs under a sequence of update operations. For unweighted graphs we maintain a 3- or 5-spanner under insertions and deletions of edges; each operation is performed in $O(n)$ amortized time over a sequence of $\Omega(n)$ updates. The maintained 3-spanner (resp., 5-spanner) has $O(n^{3/2})$ edges (resp., $O(n^{4/3})$ edges), which is known to be optimal. On weighted graphs with $d$ different edge cost values, we maintain a 3- or 5-spanner in $O(n)$ amortized time over a sequence of $\Omega(d \cdot n)$ updates. The maintained 3-spanner (resp., 5-spanner) has $O(d \cdot n^{3/2})$ edges (resp., $O(d \cdot n^{4/3})$ edges). The same approach can be extended to graphs with real-valued edge costs in the range $[1,C]$. All our algorithms are deterministic and are substantially faster than recomputing a spanner from scratch after each update.

On the Role of Experiments and Theory: An Update on Bin Packing

Catherine McGeoch

The field of experimental algorithmics develops new methods for analyzing algorithms by simulation experiments, to produce results useful to the theoretician as well as to the practitioner. We consider a case study, with emphasis on new results, in the 20 year history of experimental/theoretical analysis of heuristics for bin packing, particularly the First Fit and First Fit Decreasing rules. New insights about the structure of packings, and some partial answers to long-standing open problems, are presented.

Title: Linear-time algorithms for independent trees and directed s-t numberings of directed graphs.

(joint work with R. E. Tarjan)

Consider a directed graph $G=(V,A,r)$, such that every vertex $v \in V$ is reachable from a distinguished root vertex $r \in V$. We say that a vertex $w$ \emph{dominates} a vertex $v$ if every path from $r$ to $v$ includes $w$. Clearly, $r$ and $v$ dominate $v$ and are called the \emph{trivial dominators} of $v$. In 1987 Whitty showed that if $G$ has only trivial dominators, then $G$ has two spanning trees such that, for any vertex $v$, their corresponding $r-v$ paths are vertex-disjoint. Trees that satisfy this property are called \emph{independent}. Whitty only claimed a polynomial running time for the algorithm implied by his proof. Simpler constructions were given independently by Plehn, and Cheriyon and Reif. Both these constructions are based on an extension of the notion of s-t numberings to digraphs, which they call \emph{directed s-t numberings}. Specifically, they showed how to construct a numbering $\pi: V \mapsto \{1,\ldots,n\}$, such that each vertex $v \neq r$ that is not a successor of $r$ has predecessors $u$ and $w$ that satisfy $\pi(u) < \pi(v) <\pi(w)$. However, they did not determine the complexity of their constructions. Later Huck gave another construction of two independent spanning trees with $O(|A|\cdot|V|)$ running time. In this talk we present an extension of Whitty's result when the vertices of $G$ may have non-trivial dominators. Specifically, we show that $G$ has two spanning trees such that for any $v$ their corresponding $r-v$ paths intersect only at the dominators of $v$. Furthermore, we give a linear-time algorithm for constructing such two spanning trees. Finally, we show how these spanning trees can be used to obtain a directed s-t numbering of $G$ in linear time.

A* Search with Triangle Inequality

Andrew Goldberg

(joint work with Chris Harrelson and Renato Werneck)

We study the problem of finding a shortest path between two given vertices in a directed graph. This is an important problem with many applications, including that of computing driving directions. We are interested in preprocessing the graph using a linear amount of extra space to store additional information, and using this information to answer shortest paths queries quickly. We use A* search in combination with new distance lower-bounding techniques based on landmarks and triangle inequality. Our experiments show that on some graph classes, such as map graphs, the new algorithm works very well. In particular, we are able to solve the problem on the graph of North America road networks with 30 million vertices on a handheld device. In this talk we describe the algorithm, its implementation, and experimental performance.

New Approaches for Virtual Private Network Design

Fabrizio Grandoni

Virtual private network design is the following problem. We are given a communication network, represented as a weighted graph with thresholds on the nodes which represent the amount of flow that a node can send to and receive from the network. The task is to reserve capacities at minimum cost and to specify paths between every ordered pair of nodes such that all valid traffic-matrices can be routed along the corresponding paths. This problem is motivated by the fact that the exact amount of flow which is exchanged between terminals is not known in advance and prediction is often illusive. In this talk: (1) Using techniques from network flow theory, we provide a new lower bound on the cost of an optimum solution. This lower bound is based on arbitrary matchings between terminals and relies on Hu's two-commodity flow theorem. Previous lower bounds were solely based on bipartite matchings between senders and receivers and were therefore considerably weaker. (2) Using the new lower bound we show that a well-known simple (2+R/S)-approximation algorithm is indeed a (1+R/S)-approximation algorithm. Here, R denotes the sum of the input thresholds and S is the sum of the output thresholds (and, w.l.o.g., R>=S). (3) Finally, we present a new approximation algorithm that is based on approximate Steiner tree computations. In contrast to earlier approaches from the literature the resulting solution does not have tree structure. This observation is of particular interest since it is known that an optimal tree solution is not an optimal (graph) solution. (4) Combining our new approximation algorithm with the simple routing scheme yields an algorithm with performance ratio 3.55 for virtual private network design. This is a considerable improvement of the previously best 5.55-approximation result (STOC'03).

Worst-Case Optimal Tree Layout in a Memory Hierarchy

John Iacono

(joint work with Stefan Langerman and Erik Demaine)

Consider laying out a fixed-topology tree of $N$ nodes into external memory with block size $B$ so as to minimize the worst-case number of block memory transfers required to traverse a path from the root to a node of depth~$D$. We prove that the optimal number of memory transfers is $$\Theta\left( \cases{ \displaystyle {D \over \lg (1{+}B)} & when D = O(\lg N) \medskip \cr \displaystyle {\lg N \over \lg \left(1{+}{B \lg N \over D}\right)} & when D = \Omega(\lg N) and D = O(B \lg N) \medskip \cr \displaystyle {D \over B} & when D = \Omega(B \lg N) } \right).$$ This bound can be achieved even when $B$ is unknown to the (cache-oblivious) layout algorithm.

Algorithms resilient to memory faults: sorting.

Giuseppe F. Italiano

(joint work with Irene Finocchi and Fabrizio Grandoni)

Some of today's applications run on computer platforms with large and inexpensive memories, which are also error-prone: a system with Terabytes of memory, such as a large cluster of computing platforms with a few Gigabytes per node, would experience an error every few minutes (this is the case, e.g., in Web search engines). Unfortunately, the appearance of even very few memory faults may jeopardize the correctness of the computational results.
An algorithm is resilient to memory faults if, despite the corruption of some memory values before or during its execution, it is nevertheless able to get a correct output at least on the set of uncorrupted values. In the talk we will investigate the design and analysis of resilient sorting algorithms.

Kinetic and Dynamic Data Structures for Closest Pairs

Haim Kaplan

(joint work with Micha Sharir)

We present a simple fully dynamic and kinetic data structure, which is a variant of a dynamic 2-dimensional range tree, for maintaining the closest pair among $n$ moving points in the plane, allowing insertions and deletions of points.

Stefan Langerman

(joint work with Karim Douïeb)

Consider a directed rooted tree $T=(V,E)$ representing a collection $V$ of $n$ web pages connected via a set $E$ of links all reachable from a source home page represented by the root of $T$. Each web page $i$ carries a weight $w_i$ representative of the frequency with which it is visited. By adding hotlinks, shortcuts from a node to one of its descendents, we are interested in minimizing the expected number of steps needed to visit pages from the home page. We give the first linear time algorithm for assigning hotlinks so that the number of steps to attain a page $i$ from the root of the tree reaches the entropy bound, i.e. is at most $O(\log(W/w_i))$ where $W= \sum_{i \in T} w_i$. The best previously known algorithm for this task runs in time $O(n^2)$. We also give the first efficient data structure for maintaining hotlinks when nodes are added, deleted or their weights modified, in amortized time $O(\log(W/w_i))$ per update. The data structure can be made adaptative, i.e. reaches the entropy bound in the amortized sense without knowing the weights $w_i$ in advance.

Latency Constrained Aggregation in Sensor Networks

Alberto Marchetti-Spaccamela

(joint work with L. Becchetti, M. Skutella, L. Stougie and A. Vitaletti)

Advancements in wireless and sensor technologies have paved the way for the development of tiny and cheap sensor devices equipped with sensors and wireless transceivers. Such devices, named sensor nodes, are able to monitor events, to process the sensed information and to communicate the sensed data to one or more base stations. Being battery powered, sensor networks are highly energy constrained. Data aggregation is a technique that consists in aggregating redundant or correlated data in order to reduce the overall size of sent data, thus decreasing the network traffic and energy consumption. In this paper we study the problem of data aggregation in a sensor network. Such problems give rise to a rich variety of interesting optimization problems. For some first results, we concentrate on sensor networks in which a routing tree connecting sensor nodes to the sink has been previously constructed and we assume that data collected at sensors have to reach the base station within a given deadline. In this paper we focus on the impact of data aggregation in terms of energy consumption reduction: we propose a combinatorial optimization problem that models the problem of designing routing algorithms that optimize the energy saving obtained by data aggregation.

Toward Optimal Multiple Selection

Ian Munro

(joint work with Kanela Kaligosi, Kurt Mehlhorn and Peter Sanders)

The multiple selection problem asks for the elements of ranks $r_1$, $r_2$, \ldots, $r_k$ from a linearly ordered set of $n$ elements. Let $B$ denote the information theoretic lower bound on the number of element comparisons needed for multiple selection. We develop a deterministic divide-and-conquer algorithm that solves the problem in $\Oh{B}$ time and $B+o(B)+\Oh{n}$ element comparisons. The algorithm is relatively simple to state, its analysis is the key aspect.

An Optimal Bloom Filter Replacement.

Anna Östlin Pagh

(joint work with Rasmus Pagh and Srinivasa Rao)

This talk considers space-efficient data structures for storing an approximation S' to a set S such that S subseteq S' and any element not in S belongs to S' with probability at most epsilon. The Bloom filter data structure, solving this problem, has found widespread use. Our main result is a new RAM data structure that improves Bloom filters in several ways:

• The time for looking up an element in S' is O(1), independent of epsilon.
• The space usage is within a lower order term of the lower bound.
• The data structure uses explicit hash function families.
• The data structure supports insertions and deletions on S in amortized expected constant time.
The main technical ingredient is a succinct representation of dynamic multisets. We also consider three recent generalizations of Bloom filters.

On Dynamic Range Reporting in One Dimension

Rasmus Pagh

(Joint work with Christian Worm Mortensen and Mihai Patrascu)

We consider the problem of maintaining a dynamic set of integers and answering queries of the form: report a point (equivalently, all points) in a given interval. Range searching is a natural and fundamental variant of integer search, and can be solved using predecessor search. However, for a RAM with w-bit words, we show how to perform updates in O(lg w) time and answer queries in O(lg lg w) time. The update time is identical to the van Emde Boas structure, but the query time is exponentially faster. Our solution is based on a new and interesting recursion idea which is "more extreme" that the van Emde Boas recursion. Whereas van Emde Boas uses a simple recursion (repeated halving) on each path in a trie, we use a nontrivial, van Emde Boas-like recursion on every such path. To achieve linear space for our data structure, we solve a problem which is of independent interest. We develop the first scheme for dynamic perfect hashing requiring sublinear space.

Efficient IP Table Lookup via Adaptive Stratified Trees with Selective Reconstructions.

Marco Pellegrini

IP address lookup is a critical operation for high bandwidth routers in packet switching networks such as Internet. The lookup is a non-trivial operation since it requires searching for the longest prefix, among those stored in a (large) given table, matching the IP address. Ever increasing routing tables size, traffic volume and links speed demand new and more efficient algorithms. Moreover, the imminent move to IPv6 128-bit addresses will soon require a rethinking of previous technical choices. This article describes a the new data structure for solving the IP table look up problem christened the Adaptive Stratified Tree (AST). The proposed solution is based on casting the problem in geometric terms and on repeated application of efficient local geometric optimization routines. Experiments with this approach have shown that in terms of storage, query time and update time the AST is at a par with state of the art algorithms based on data compression or string manipulations (and often it is better on some of the measured quantities).

Existence and construction of selective families of sets and their applications in communication algorithms in ad-hoc radio networks

(This talk is based on joint work with B. Chlebus, L. Gasieniec, R. Klasing, D. Kowalski and Q. Xin.)

A family F of subsets of set [N] = {1,2,...,N} is "k-selective" if for every non-empty subset X of [N] of size at most k, there is a set B in F which has exactly one element in common with X. Selective families, and variations of this notion, are the main tool in constructions of deterministic communication protocols for "radio networks" with unknown topology. We show in this talk how selective families are used in the currently fastest deterministic protocols for the bradcasting, gossiping and wake-up problems in such networks. We also discuss explicit constructions of the variations of selective families required in these protocols.

Replacement paths and k simple shortest paths in unweighted directed graphs

Liam Roditty

(joint work with Uri Zwick)

Let $G=(V,E)$ be a directed graph and let $P$ be a shortest path from~$s$ to~$t$ in~$G$. In the replacement paths problem we are required to find, for every edge $e$ on~$P$, a shortest path from $s$ to~$t$ in $G$ that avoids~$e$. The only known algorithm for solving the problem, even for unweighted directed graphs, is the trivial algorithm in which each edge on the path, in its turn, is excluded from the graph and a shortest paths tree is computed from $s$. The running time is $O(mn+n^2 \log n)$.
The replacement paths problem is strongly motivated by two different applications:
1. The fastest algorithm to compute the $k$ shortest paths between $s$ and $t$ in directed graphs~\cite{Yen71,Lawler72} computes the replacement paths between $s$ and $t$. Its running time is $\Ot(mnk)$.
2. The replacement paths problem is used to compute the Vickrey pricing of edges in a distributed network. It was raised as an open problem by Nisan and Ronen~\cite{NiRo01} whether it is possible to compute the Vickrey pricing faster than $n$ computations of a shortest paths tree.
In this paper we present the first non-trivial algorithm for computing replacement paths in unweighted directed graphs (and in graphs with small integer weights). Our algorithm is randomized and its running time is $\Ot(m\sqrt n)$ time. This result immediately improves the running time of the two applications mentioned above in a factor of $\sqrt n$.
We also show how to reduce the problem of computing $k$ simple shortest paths between $s$ and $t$ to $O(k)$ computations of a second simple shortest path from $s$ to $t$ each time in a different subgraph of $G$. The importance of this result is that computing a second simple shortest path may turn out to be an easier problem than computing the replacement paths, thus, we can focus our efforts to improve the $k$ simple shortest paths algorithm in obtaining a faster algorithm for the second shortest path problem.

Highway Hierarchies Hasten Exact Shortest Path Queries

Peter Sanders

(joint work with Dominik Schultes)

We present a new speedup technique for route planning that exploits the hierarchy inherent in real world road networks. Our algorithm preprocesses the eight digit number of nodes needed for maps of the USA or Western Europe in a few hours using linear space. Shortest (i.e. fastest) path queries then take around ten milliseconds to produce exact shortest paths. This is about three orders of magnitude faster than using Dijkstra's Algorithm.

Competitive Dynamic Binary Search Trees

Danny Sleator

(Joint work with Chengwen Chris Wang and Jonathan Derryberry)

We introduce the multi-splay tree (MST) data structure, which is the first binary search tree (BST) to simultaneously achieve $O(\log n)$ amortized, $O(\log^2 n)$ worst-case, and $O(\log \log n)$-competitive costs for a sequence of queries. We also prove the sequential access lemma for MSTs, which states that sequentially accessing all keys takes linear time. Furthermore, we generalize the standard framework for competitive analysis of BST algorithms to include updates (insertions and deletions) in addition to queries. We show how MSTs can be modified to support these update operations and be expected $O(\log \log n)$-competitive in the new framework, while maintaining the rest of the properties above in expectation. (The modified MST algorithm makes use of randomization.)

Sampling to estimate arbitrary subset sums.

Mikkel Thorup

(joint work with Nick Duffield and Carsten Lund)

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. Applied in Internet traffic analysis, the items could be records summarizing the flows streaming by a router, with, say, a hundred records sampled each hour. A subset could be flow records of a worm attack identified later. Our past samples now allow us to trace the history of the attack even though the worm was unknown while the samples were made. Estimation from the samples must be accurate even with heavy-tailed distributions where most of the weight is concentrated on a few heavy items. We want the sample to be weight sensitive, giving priority to heavy items. At the same time, we want sampling without replacement in order to avoid selecting heavy items multiple times. To fulfill these requirements we introduce priority sampling, which is the first weight sensitive sampling scheme without replacement that is suitable for estimating subset sums. Testing priority sampling on Internet traffic analysis, we found it to perform orders of magnitude better than previous schemes. Priority sampling is simple to define and implement: we consider a steam of items i=0,...,n-1 with weights w_i. For each item i, we generate a random number r_i in (0,1) and create a priority q_i=w_i/r_i. The sample S consists of the k highest priority items. Let t be the (k+1)th highest priority. Each sampled item i in S gets a weight estimate W_i=max{w_i,t}, while non-sampled items get weight estimate W_i=0. Magically, it turns out that the weight estimates are unbiased, that is, E[W_i]=w_i, and by linearity of expectation, we get unbiased estimators over any subset sum simply by adding the sampled weight estimates from the subset. Also, we can estimate the variance of the estimates, and surpricingly, there is no co-variance between different weight estimates W_i and W_j. We conjecture an extremely strong near-optimality; namely that for any weight sequence, there exists no specialized scheme for sampling k items with unbiased estimators that gets smaller total variance than priority sampling with k+1 items. Very recently Szegedy has settled this conjecture.

A Framework for Dynamic Trees

Renato Werneck

(joint work with R. Tarjan, S. Alstrup, J. Holm and M. Thorup)

The dynamic trees problem is that of maintaining a forest that changes over time through edge insertions and deletions. We can associate data with vertices or edges, and manipulate this data individually or in bulk, with operations that deal with whole paths or trees. Efficient solutions to this problem have numerous applications, particularly in algorithms for network flows and dynamic graphs in general. Several data structures capable of logarithmic-time dynamic tree operations have been proposed. The first was Sleator and Tarjan's ST-tree, which represents a partition of the tree into paths. Although reasonably fast in practice, adapting ST-trees to different applications is nontrivial. Fredman's Topology trees, Alstrup et al.'s top trees, and Acar et al.'s RC-trees are based on tree contractions: they progressively combine vertices or edges to obtain a hierarchical representation of the tree. This approach is more flexible in theory, but all known implementations assume the trees have bounded degree; arbitrary trees are supported only after ternarization. This talk will show how these two approaches can be combined (with very little overhead) to produce a data structure that is least as generic as any other, very easy to adapt, and as practical as ST-trees. It can be seen as a self-adjusting implementation of top trees, and provides a logarithmic bound per operation in the amortized sense. We will also discuss a pure contraction-based implementation of top trees, which tends to be slower in practice but does guarantee a logarithmic bound in the worst case.

Multiobjective Shortest Paths and QoS-aware Multicommodity Flows

Christos Zaroliagis

(joint work with George Tsaggouris)

The QoS-aware Multicommodity Flow (MCF) problem is an important generalization of the weighted MCF problem that models realistic network design scenarios where commodity demands and profits (for routing flow) are elastic to the Quality-of-Service of the paths along which the commodity is routed. We provide a FPTAS for this problem by employing a Lagrangian relaxation method, which assumes the existence of an oracle that identifies the most violated constraint of the dual. This identification is a non-trivial task, since it involves the solution of an NP-hard non-linear combinatorial optimization problem, known as non-additive shortest path, which is a variant of the multiobjective shortest path problem. Both multiobjective and non-additive shortest path problems are fundamental in multiobjective optimization and hence of independent interest. We also provide efficient FPTAS for these two problems that improve significantly upon previous approaches, especially in the case of more than two objectives. To the best of our knowledge, this is the first FPTAS for non-additive shortest paths.

Answering distance queries in directed graphs using fast matrix multiplication

Uri Zwick

(joint work with Raphael Yuster)

Let G=(V,E,w) be a weighted directed graph with integer edge weights of absolute value at most M. We show that G can be preprocessed in O*(Mn^w) time, where w<2.376 is the exponent of fast matrix multiplication, such that subsequently each distance d(u,v) in the graph can be computed exactly in O(n) time. As a very special case, we obtain an O*(Mn^w) time algorithm for the SINGLE SOURCE SHORTEST PATHS (SSSP) problem for directed graphs with integer weights of absolute value at most M. For sufficiently dense graph, with edge weights that are not too large, this improves upon the O*(mn^{1/2}log M) time algorithms of Gabow and Tarjan, and Goldberg.