ADS 2005 2nd Bertinoro Workshop on Algorithms and Data Structures
29 May-4 June 2005 |
---|
Arrival: | Sunday May 29, 2005 |
---|---|
Departure: | Saturday June 4, 2005 |
Bertinoro itself is picturesque, with many narrow streets and walkways winding around the central peak. The meeting will be held in a redoubtable ex-Episcopal fortress that has been converted by the University of Bologna into a modern conference center with computing facilities and Internet access. From the fortress you can enjoy a beautiful the vista that stretches from the Tuscan Apennines to the Adriatic coast and the Alps over the Po Valley.
08.00-09.00 | arrivals | breakfast | |||||
---|---|---|---|---|---|---|---|
09.00-09.50 | Pellegrini | Franciosa | Demetrescu | Munro | Marchetti | departures | |
09.50-10.40 | Pagh | Kaplan | McGeoch | Östlin Pagh | Grandoni | ||
10.40-11.10 | coffee | coffee | coffee | coffee | coffee | ||
11.10-12.00 | Roditty | Sanders | Italiano | Sleator | Thorup | ||
12.00-12.50 | Brodal | Goldberg | Finocchi | Werneck | Radzik | ||
13.00-14.00 | lunch! | lunch! | lunch! | lunch! | |||
15.00-15.50 | Iacono | Zwick | Georgiadis | King | |||
15.50-16.10 | coffee | coffee | coffee | coffee | |||
16.10-17.00 | Langerman | Zaroliagis | Demaine |
Scientific Organizing Committee | Andrew Goldberg Microsoft Research |
---|---|
Giuseppe F. Italiano Università di Roma "Tor Vergata" | |
Local Organization | |
Andrea Bandini, Elena Della Godenza,, Centro Congressi di Bertinoro | |
Sponsored by | BICI Bertinoro International Center for Informatics |
We present a cache-oblivious dictionary structure for strings which provides analogues of tries and suffix trees in the cache-oblivious model. Our construction takes as input either a set of strings to store, a single string for which all suffixes are to be stored, a trie, a compressed trie, or a suffix tree, and creates a cache-oblivious data structure which performs prefix queries in $O(\log_B n + |P|/B)$ I/Os, where $n$ is the number of leaves in the trie, $P$ is the query string, and $B$ is the block size. This query cost is optimal for unbounded alphabets.
We study the computational power of a streaming model where at each pass one input stream is read and, differently from the classical read-only streaming model, one output stream is written. Streams are pipelined in such a way that the output stream produced during pass i is used as input stream at pass i+1. We provide several upper and lower bounds in this model, discussing different separation results with respect to classical streaming.
Some of today's applications run on computer platforms with large and
inexpensive memories, which are also error-prone: a system with
Terabytes of memory, such as a large cluster of computing platforms
with a few Gigabytes per node, would experience an error every few
minutes (this is the case, e.g., in Web search engines).
Unfortunately, the appearance of even very few memory faults may
jeopardize the correctness of the computational results.
An algorithm is resilient to memory faults if, despite the corruption
of some memory values before or during its execution, it is
nevertheless able to get a correct output at least on the set of
uncorrupted values. In the talk we will investigate the design and
analysis of resilient searching algorithms, presenting also some lower
bounds.
We present fully dynamic algorithms for maintaining 3- and 5-spanners of undirected graphs under a sequence of update operations. For unweighted graphs we maintain a 3- or 5-spanner under insertions and deletions of edges; each operation is performed in $O(n)$ amortized time over a sequence of $\Omega(n)$ updates. The maintained 3-spanner (resp., 5-spanner) has $O(n^{3/2})$ edges (resp., $O(n^{4/3})$ edges), which is known to be optimal. On weighted graphs with $d$ different edge cost values, we maintain a 3- or 5-spanner in $O(n)$ amortized time over a sequence of $\Omega(d \cdot n)$ updates. The maintained 3-spanner (resp., 5-spanner) has $O(d \cdot n^{3/2})$ edges (resp., $O(d \cdot n^{4/3})$ edges). The same approach can be extended to graphs with real-valued edge costs in the range $[1,C]$. All our algorithms are deterministic and are substantially faster than recomputing a spanner from scratch after each update.
The field of experimental algorithmics develops new methods for analyzing algorithms by simulation experiments, to produce results useful to the theoretician as well as to the practitioner. We consider a case study, with emphasis on new results, in the 20 year history of experimental/theoretical analysis of heuristics for bin packing, particularly the First Fit and First Fit Decreasing rules. New insights about the structure of packings, and some partial answers to long-standing open problems, are presented.
Consider a directed graph $G=(V,A,r)$, such that every vertex $v \in V$ is reachable from a distinguished root vertex $r \in V$. We say that a vertex $w$ \emph{dominates} a vertex $v$ if every path from $r$ to $v$ includes $w$. Clearly, $r$ and $v$ dominate $v$ and are called the \emph{trivial dominators} of $v$. In 1987 Whitty showed that if $G$ has only trivial dominators, then $G$ has two spanning trees such that, for any vertex $v$, their corresponding $r-v$ paths are vertex-disjoint. Trees that satisfy this property are called \emph{independent}. Whitty only claimed a polynomial running time for the algorithm implied by his proof. Simpler constructions were given independently by Plehn, and Cheriyon and Reif. Both these constructions are based on an extension of the notion of s-t numberings to digraphs, which they call \emph{directed s-t numberings}. Specifically, they showed how to construct a numbering $\pi: V \mapsto \{1,\ldots,n\}$, such that each vertex $v \neq r$ that is not a successor of $r$ has predecessors $u$ and $w$ that satisfy $\pi(u) < \pi(v) <\pi(w)$. However, they did not determine the complexity of their constructions. Later Huck gave another construction of two independent spanning trees with $O(|A|\cdot|V|)$ running time. In this talk we present an extension of Whitty's result when the vertices of $G$ may have non-trivial dominators. Specifically, we show that $G$ has two spanning trees such that for any $v$ their corresponding $r-v$ paths intersect only at the dominators of $v$. Furthermore, we give a linear-time algorithm for constructing such two spanning trees. Finally, we show how these spanning trees can be used to obtain a directed s-t numbering of $G$ in linear time.
We study the problem of finding a shortest path between two given vertices in a directed graph. This is an important problem with many applications, including that of computing driving directions. We are interested in preprocessing the graph using a linear amount of extra space to store additional information, and using this information to answer shortest paths queries quickly. We use A* search in combination with new distance lower-bounding techniques based on landmarks and triangle inequality. Our experiments show that on some graph classes, such as map graphs, the new algorithm works very well. In particular, we are able to solve the problem on the graph of North America road networks with 30 million vertices on a handheld device. In this talk we describe the algorithm, its implementation, and experimental performance.
Virtual private network design is the following problem. We are given a communication network, represented as a weighted graph with thresholds on the nodes which represent the amount of flow that a node can send to and receive from the network. The task is to reserve capacities at minimum cost and to specify paths between every ordered pair of nodes such that all valid traffic-matrices can be routed along the corresponding paths. This problem is motivated by the fact that the exact amount of flow which is exchanged between terminals is not known in advance and prediction is often illusive. In this talk: (1) Using techniques from network flow theory, we provide a new lower bound on the cost of an optimum solution. This lower bound is based on arbitrary matchings between terminals and relies on Hu's two-commodity flow theorem. Previous lower bounds were solely based on bipartite matchings between senders and receivers and were therefore considerably weaker. (2) Using the new lower bound we show that a well-known simple (2+R/S)-approximation algorithm is indeed a (1+R/S)-approximation algorithm. Here, R denotes the sum of the input thresholds and S is the sum of the output thresholds (and, w.l.o.g., R>=S). (3) Finally, we present a new approximation algorithm that is based on approximate Steiner tree computations. In contrast to earlier approaches from the literature the resulting solution does not have tree structure. This observation is of particular interest since it is known that an optimal tree solution is not an optimal (graph) solution. (4) Combining our new approximation algorithm with the simple routing scheme yields an algorithm with performance ratio 3.55 for virtual private network design. This is a considerable improvement of the previously best 5.55-approximation result (STOC'03).
Consider laying out a fixed-topology tree of $N$ nodes into external memory with block size $B$ so as to minimize the worst-case number of block memory transfers required to traverse a path from the root to a node of depth~$D$. We prove that the optimal number of memory transfers is $$ \Theta\left( \cases{ \displaystyle {D \over \lg (1{+}B)} & when $D = O(\lg N)$ \medskip \cr \displaystyle {\lg N \over \lg \left(1{+}{B \lg N \over D}\right)} & when $D = \Omega(\lg N)$ and $D = O(B \lg N)$ \medskip \cr \displaystyle {D \over B} & when $D = \Omega(B \lg N)$ } \right). $$ This bound can be achieved even when $B$ is unknown to the (cache-oblivious) layout algorithm.
Some of today's applications run on computer platforms with large and
inexpensive memories, which are also error-prone: a system with
Terabytes of memory, such as a large cluster of computing platforms
with a few Gigabytes per node, would experience an error every few
minutes (this is the case, e.g., in Web search engines).
Unfortunately, the appearance of even very few memory faults may
jeopardize the correctness of the computational results.
An algorithm is resilient to memory faults if, despite the corruption
of some memory values before or during its execution, it is
nevertheless able to get a correct output at least on the set of
uncorrupted values. In the talk we will investigate the design and
analysis of resilient sorting algorithms.
We present a simple fully dynamic and kinetic data structure, which is a variant of a dynamic 2-dimensional range tree, for maintaining the closest pair among $n$ moving points in the plane, allowing insertions and deletions of points.
Consider a directed rooted tree $T=(V,E)$ representing a collection $V$ of $n$ web pages connected via a set $E$ of links all reachable from a source home page represented by the root of $T$. Each web page $i$ carries a weight $w_i$ representative of the frequency with which it is visited. By adding hotlinks, shortcuts from a node to one of its descendents, we are interested in minimizing the expected number of steps needed to visit pages from the home page. We give the first linear time algorithm for assigning hotlinks so that the number of steps to attain a page $i$ from the root of the tree reaches the entropy bound, i.e. is at most $O(\log(W/w_i))$ where $W= \sum_{i \in T} w_i$. The best previously known algorithm for this task runs in time $O(n^2)$. We also give the first efficient data structure for maintaining hotlinks when nodes are added, deleted or their weights modified, in amortized time $O(\log(W/w_i))$ per update. The data structure can be made adaptative, i.e. reaches the entropy bound in the amortized sense without knowing the weights $w_i$ in advance.
Advancements in wireless and sensor technologies have paved the way for the development of tiny and cheap sensor devices equipped with sensors and wireless transceivers. Such devices, named sensor nodes, are able to monitor events, to process the sensed information and to communicate the sensed data to one or more base stations. Being battery powered, sensor networks are highly energy constrained. Data aggregation is a technique that consists in aggregating redundant or correlated data in order to reduce the overall size of sent data, thus decreasing the network traffic and energy consumption. In this paper we study the problem of data aggregation in a sensor network. Such problems give rise to a rich variety of interesting optimization problems. For some first results, we concentrate on sensor networks in which a routing tree connecting sensor nodes to the sink has been previously constructed and we assume that data collected at sensors have to reach the base station within a given deadline. In this paper we focus on the impact of data aggregation in terms of energy consumption reduction: we propose a combinatorial optimization problem that models the problem of designing routing algorithms that optimize the energy saving obtained by data aggregation.
The multiple selection problem asks for the elements of ranks $r_1$, $r_2$, \ldots, $r_k$ from a linearly ordered set of $n$ elements. Let $B$ denote the information theoretic lower bound on the number of element comparisons needed for multiple selection. We develop a deterministic divide-and-conquer algorithm that solves the problem in $\Oh{B}$ time and $B+o(B)+\Oh{n}$ element comparisons. The algorithm is relatively simple to state, its analysis is the key aspect.
This talk considers space-efficient data structures for storing an approximation S' to a set S such that S subseteq S' and any element not in S belongs to S' with probability at most epsilon. The Bloom filter data structure, solving this problem, has found widespread use. Our main result is a new RAM data structure that improves Bloom filters in several ways:
We consider the problem of maintaining a dynamic set of integers and answering queries of the form: report a point (equivalently, all points) in a given interval. Range searching is a natural and fundamental variant of integer search, and can be solved using predecessor search. However, for a RAM with w-bit words, we show how to perform updates in O(lg w) time and answer queries in O(lg lg w) time. The update time is identical to the van Emde Boas structure, but the query time is exponentially faster. Our solution is based on a new and interesting recursion idea which is "more extreme" that the van Emde Boas recursion. Whereas van Emde Boas uses a simple recursion (repeated halving) on each path in a trie, we use a nontrivial, van Emde Boas-like recursion on every such path. To achieve linear space for our data structure, we solve a problem which is of independent interest. We develop the first scheme for dynamic perfect hashing requiring sublinear space.
IP address lookup is a critical operation for high bandwidth routers in packet switching networks such as Internet. The lookup is a non-trivial operation since it requires searching for the longest prefix, among those stored in a (large) given table, matching the IP address. Ever increasing routing tables size, traffic volume and links speed demand new and more efficient algorithms. Moreover, the imminent move to IPv6 128-bit addresses will soon require a rethinking of previous technical choices. This article describes a the new data structure for solving the IP table look up problem christened the Adaptive Stratified Tree (AST). The proposed solution is based on casting the problem in geometric terms and on repeated application of efficient local geometric optimization routines. Experiments with this approach have shown that in terms of storage, query time and update time the AST is at a par with state of the art algorithms based on data compression or string manipulations (and often it is better on some of the measured quantities).
A family F of subsets of set [N] = {1,2,...,N} is "k-selective" if for every non-empty subset X of [N] of size at most k, there is a set B in F which has exactly one element in common with X. Selective families, and variations of this notion, are the main tool in constructions of deterministic communication protocols for "radio networks" with unknown topology. We show in this talk how selective families are used in the currently fastest deterministic protocols for the bradcasting, gossiping and wake-up problems in such networks. We also discuss explicit constructions of the variations of selective families required in these protocols.
Let $G=(V,E)$ be a directed graph and let $P$ be a shortest path
from~$s$
to~$t$ in~$G$. In the replacement paths problem we are
required to find, for every edge $e$ on~$P$, a shortest path from
$s$ to~$t$ in $G$ that avoids~$e$. The only known algorithm for
solving the problem, even for unweighted directed graphs, is the
trivial algorithm in which each edge on the path, in its turn, is
excluded from the graph and a shortest paths tree is computed from
$s$. The running time is $O(mn+n^2 \log n)$.
The replacement paths problem is strongly motivated by two
different applications:
1. The fastest algorithm to compute the $k$
shortest paths between $s$ and $t$ in directed
graphs~\cite{Yen71,Lawler72} computes the replacement paths
between $s$ and $t$. Its running time is $\Ot(mnk)$.
2. The replacement paths problem is used to compute the
Vickrey pricing of edges in a distributed network. It was
raised as an open problem by Nisan and Ronen~\cite{NiRo01} whether
it is possible to compute the Vickrey pricing faster than $n$
computations of a shortest paths tree.
In this paper we present the first non-trivial algorithm for
computing replacement paths in unweighted directed graphs (and in
graphs with small integer weights). Our algorithm is randomized
and its running time is $\Ot(m\sqrt n)$ time. This result
immediately improves the running time of the two applications
mentioned above in a factor of $\sqrt n$.
We also show how to reduce the problem of computing $k$ simple
shortest paths between $s$ and $t$ to $O(k)$ computations of a
second simple shortest path from $s$ to $t$ each time in a
different subgraph of $G$. The importance of this result is that
computing a second simple shortest path may turn out to be an
easier problem than computing the replacement paths, thus, we can
focus our efforts to improve the $k$ simple shortest paths
algorithm in obtaining a faster algorithm for the second shortest
path problem.
We present a new speedup technique for route planning that exploits the hierarchy inherent in real world road networks. Our algorithm preprocesses the eight digit number of nodes needed for maps of the USA or Western Europe in a few hours using linear space. Shortest (i.e. fastest) path queries then take around ten milliseconds to produce exact shortest paths. This is about three orders of magnitude faster than using Dijkstra's Algorithm.
We introduce the multi-splay tree (MST) data structure, which is the first binary search tree (BST) to simultaneously achieve $O(\log n)$ amortized, $O(\log^2 n)$ worst-case, and $O(\log \log n)$-competitive costs for a sequence of queries. We also prove the sequential access lemma for MSTs, which states that sequentially accessing all keys takes linear time. Furthermore, we generalize the standard framework for competitive analysis of BST algorithms to include updates (insertions and deletions) in addition to queries. We show how MSTs can be modified to support these update operations and be expected $O(\log \log n)$-competitive in the new framework, while maintaining the rest of the properties above in expectation. (The modified MST algorithm makes use of randomization.)
Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. Applied in Internet traffic analysis, the items could be records summarizing the flows streaming by a router, with, say, a hundred records sampled each hour. A subset could be flow records of a worm attack identified later. Our past samples now allow us to trace the history of the attack even though the worm was unknown while the samples were made. Estimation from the samples must be accurate even with heavy-tailed distributions where most of the weight is concentrated on a few heavy items. We want the sample to be weight sensitive, giving priority to heavy items. At the same time, we want sampling without replacement in order to avoid selecting heavy items multiple times. To fulfill these requirements we introduce priority sampling, which is the first weight sensitive sampling scheme without replacement that is suitable for estimating subset sums. Testing priority sampling on Internet traffic analysis, we found it to perform orders of magnitude better than previous schemes. Priority sampling is simple to define and implement: we consider a steam of items i=0,...,n-1 with weights w_i. For each item i, we generate a random number r_i in (0,1) and create a priority q_i=w_i/r_i. The sample S consists of the k highest priority items. Let t be the (k+1)th highest priority. Each sampled item i in S gets a weight estimate W_i=max{w_i,t}, while non-sampled items get weight estimate W_i=0. Magically, it turns out that the weight estimates are unbiased, that is, E[W_i]=w_i, and by linearity of expectation, we get unbiased estimators over any subset sum simply by adding the sampled weight estimates from the subset. Also, we can estimate the variance of the estimates, and surpricingly, there is no co-variance between different weight estimates W_i and W_j. We conjecture an extremely strong near-optimality; namely that for any weight sequence, there exists no specialized scheme for sampling k items with unbiased estimators that gets smaller total variance than priority sampling with k+1 items. Very recently Szegedy has settled this conjecture.
The dynamic trees problem is that of maintaining a forest that changes over time through edge insertions and deletions. We can associate data with vertices or edges, and manipulate this data individually or in bulk, with operations that deal with whole paths or trees. Efficient solutions to this problem have numerous applications, particularly in algorithms for network flows and dynamic graphs in general. Several data structures capable of logarithmic-time dynamic tree operations have been proposed. The first was Sleator and Tarjan's ST-tree, which represents a partition of the tree into paths. Although reasonably fast in practice, adapting ST-trees to different applications is nontrivial. Fredman's Topology trees, Alstrup et al.'s top trees, and Acar et al.'s RC-trees are based on tree contractions: they progressively combine vertices or edges to obtain a hierarchical representation of the tree. This approach is more flexible in theory, but all known implementations assume the trees have bounded degree; arbitrary trees are supported only after ternarization. This talk will show how these two approaches can be combined (with very little overhead) to produce a data structure that is least as generic as any other, very easy to adapt, and as practical as ST-trees. It can be seen as a self-adjusting implementation of top trees, and provides a logarithmic bound per operation in the amortized sense. We will also discuss a pure contraction-based implementation of top trees, which tends to be slower in practice but does guarantee a logarithmic bound in the worst case.
The QoS-aware Multicommodity Flow (MCF) problem is an important generalization of the weighted MCF problem that models realistic network design scenarios where commodity demands and profits (for routing flow) are elastic to the Quality-of-Service of the paths along which the commodity is routed. We provide a FPTAS for this problem by employing a Lagrangian relaxation method, which assumes the existence of an oracle that identifies the most violated constraint of the dual. This identification is a non-trivial task, since it involves the solution of an NP-hard non-linear combinatorial optimization problem, known as non-additive shortest path, which is a variant of the multiobjective shortest path problem. Both multiobjective and non-additive shortest path problems are fundamental in multiobjective optimization and hence of independent interest. We also provide efficient FPTAS for these two problems that improve significantly upon previous approaches, especially in the case of more than two objectives. To the best of our knowledge, this is the first FPTAS for non-additive shortest paths.
Let G=(V,E,w) be a weighted directed graph with integer edge weights of absolute value at most M. We show that G can be preprocessed in O*(Mn^w) time, where w<2.376 is the exponent of fast matrix multiplication, such that subsequently each distance d(u,v) in the graph can be computed exactly in O(n) time. As a very special case, we obtain an O*(Mn^w) time algorithm for the SINGLE SOURCE SHORTEST PATHS (SSSP) problem for directed graphs with integer weights of absolute value at most M. For sufficiently dense graph, with edge weights that are not too large, this improves upon the O*(mn^{1/2}log M) time algorithms of Gabow and Tarjan, and Goldberg.