miChord:
Decoupling Object Lookup from Placement
in DHT-Based Overlays
{kchen25,
loai, kyehia}@mit.edu
Updated 12/24/2003 1:54 AM | Draft; DO NOT
REDISTRIBUTE | For .PDF Version
Abstract –
Recently, Distributed Hash Tables (DHTs) have been a popular area of
research. Existing DHTs make implicit or
explicit assumptions about the homogeneity of a peer-to-peer system’s
participants and workload. In reality, peer-to-peer node and workload
characteristics are not homogenous. As such, applications built on DHTs may opt
to exploit this heterogeneity and prefer certain nodes over others. miChord uses
pointers to decouple object lookup from object placement in Chord, enabling
applications to influence the final location of object storage based on a
preference of the application’s choosing. This paper introduces the miChord
decoupling scheme and discusses different metrics/properties of peers that can
motivate such decoupling. We further provide simulation results showing
performance gains over Chord when object placement is skewed in favor of
different metrics.
I.
Introduction
Over the past few years, the
Internet has been witnessing the evolution of a new breed of innovative network
architectures labeled as peer-to-peer networks.
Peer-to-peer networks are characterized by direct access between peer
computers rather than through a centralized server. Such systems constitute highly dynamic
networks of peers with complex overlay topologies that are often unrelated to
the physical network that connects the different nodes. Of the many features of recent peer-to-peer
systems, efficient location of data items remains the core and most important
operation [26].
Peer-to-peer systems can be
differentiated by the degree to which they are structured: Is there a way of
directly knowing which nodes contain which data items? Is a random
search through the entire network necessary to find them? Structured networks, such as CAN [20], Chord [26], Pastry [22], Tapestry [31], and Viceroy [30], have recently emerged in an
attempt to address the scalability issues caused by the inherently unscalable
random search methods adopted by unstructured systems. In structured peer-to-peer systems, the
overlay network topology is tightly controlled and objects are placed at precisely
specified locations [2]. These systems use a distributed routing or hash
table (DHT) to provide a mapping between the object identifier and location so
that queries can be efficiently routed to the node where the desired data
object is located.
With the advent of this class
of structured systems, different research groups have explored a wide variety
of applications, services, and infrastructures built on top of a DHT
abstraction. Examples of such proposals
include systems for wide-area distributed storage, indexing, search, querying [3], [8], web caching, application layer multicast, event
notification [24], indirection services [29], and DoS attack prevention. As such, DHT systems hold
great promise for the rapid deployment of Internet scale applications and
services [28].
All of the aforementioned
structured lookup algorithms make, in essence, some unrealistic explicit or
implicit assumptions. These algorithms
assume that all peers are uniform in resources [29] (such as
storage capacity, network bandwidth, and CPU power) and that all peers are
equally desirable to store prospective objects. However, measurements have
shown that heterogeneity in deployed peer-to-peer systems is relatively extreme
(with up to 3 orders of magnitude difference in bandwidth [1]). Therefore,
the bottleneck caused by less capable peers could lead to the suboptimality of
these existing lookup algorithms depending on the needs of the higher-level
application. While these algorithms provide the basic service of data lookup,
peer-to-peer applications provide
high-level functionality using the underlying lookup algorithm. Much of the complexity in such systems arises
because the environment of peer-to-peer systems is in reality very different
from that of traditional distributed systems, such as those hosted by server
farms. A single peer-to-peer system instance might simultaneously span many
different types of constituent elements, such as dedicated servers, idle
workstations, and old desktops [9].
We claim that it is possible
to favorably separate object lookup from placement in DHT-based overlays using
pointers. This separation would improve
the performance of the overlying application according to the metric
exploited. We choose to base our study
on Chord. Performance is improved by
publishing to nodes that have some relatively desirable property either to the publisher and/or to the overall
underlying overlay. While this abstraction would only influence
the final location of the actual object, a pointer to the node holding this
object will be mapped to the node deemed responsible for it in conformance with
Chord’s regular consistent hashing process (see [26] for more details on Chord). Accordingly, locating the object is still
deterministically bound and the simplicity, provable correctness, and provable
performance (lookup via O(logN) messages in steady-state)
properties of Chord are all still preserved.
Also, controlling data placement is in direct tension with the goal of a
DHT, which is to uniformly distribute data across a system in an automated
fashion [9]. Our level of
indirection, however, is not meant to disrupt the generally desirable load
balancing characteristics of Chord [26], but to only incrementally better align it with the
publisher’s or system’s implicit desires through constrained load balancing.
In the context of a file
system (CFS [7], PAST [22], Pond [20]) or file-sharing application (KaZaa, FreeNet [5], Gnutella, Napster), the aforementioned objects would
be files and the sought-after resource would be storage. The mechanism to store published files would
choose locations which are desirable to the publisher, based on some relative
properties of the network and/or available nodes. Relevant properties of
storage nodes might include up/downtime trends, bandwidth, number and size of
files to store, duration of guaranteed storage, or latency-wise distance from
the publisher. These and other properties will be maintained individually and
communicated to fingering nodes when fingered as part of the periodic
stabilization and finger table build-up and refreshing process defined by
Chord. This would guarantee low metric
state routing overhead and that properties of the nodes are as fresh as the
finger table itself.
The rest of this paper is
organized as follows. Section II provides background information on DHT-based
object storage and lookup. Section III
describes in detail the proposed object lookup-placement decoupling scheme
using pointers and how it varies from basic Chord. In Section IV, we discuss various system
characteristics and circumstances to be considered when selecting metrics that
can be exploited by miChord. In Section
V, we further discuss two sample metrics that miChord could exploit, namely 1)
load balancing object serving requirements among peers in a miChord-enabled
peer-to-peer system, and 2) optimizing overall object read latencies in the
system. In Section VI, we introduce the
miChord simulator and simulation environment before providing the methodology
and results of our attempt to quantify the performance gain in optimizing for
the two aforementioned metrics. We conclude
in Section VII with discussing the tradeoffs incurred in divorcing object
lookup from placement and summarizing the results of our findings and areas of
future work.
A note on
terminology used throughout the rest of this paper – The following words are used interchangeably: write, place, and
publish, read, request, and download, data objects and files, and nodes and
peers.
II.
Background
Several entirely distributed
storage systems, such as CFS [7], PAST [22], and Pond [20], have been designed and implemented, however none are
widely used. On the one hand, these systems demonstrate many desirable
characteristics, including decentralized control, self organization and
adaptation, sharing of system resources, and scalability. On the other hand, in order for these systems
to become prevalent, they have to perform efficiently, both with respect to
read and write performance, and with respect to network resource consumption.
Most of the existing research
on improving DHT performance has taken one of two approaches. The first approach is to minimize lookup
latency by reducing the number of hops (one hop [9]). The idea is
that if the average number of hops until a key lookup is resolved is reduced,
then the average latency will be reduced as well. The second approach is to build the overlay
so that it better reflects the underlying network [4]. The intuition
here is that two nodes which are physically close in the underlying network
should also have close virtual IDs so that when a lookup is performed, the
actual packet does not have to cross the network unnecessarily many times.
In terms of storing pointers
in DHTs, there are two systems that use this idea: Internet Indirection
Infrastructure (i3) [27] and PAST [22]. i3 uses pointers as a
decoupling mechanism in a DHT. However,
the purpose of this decoupling in i3 is to offer a rendezvous-based
communications abstraction rather than to enable a preference for data
placement. PAST, a large-scale
peer-to-peer persistent storage utility, uses pointers
in replica diversion. To guarantee the durability of its data, PAST
replicates each file k times. Files and nodes in PAST exist in the same Id
space. A node with nodeId numerically closest to a file’s fileId is responsible for that file. Furthermore, when a node stores a file, the
node creates replicas of that file at the k-1
nodes with nodeIds
closest to the fileId. Replica diversion happens when a node A, one
of the k-1 replica nodes, does not
have enough storage capacity for the file.
In such a case, A would ask another node B to store the file and A would
store a pointer to B. miChord is not system-specific and is meant to enable a
variety of object placement preferences.
A. Chord
miChord
consists of a layer on top of an enhanced Chord. Chord is fully described and evaluated in [26]. In this
paper, we give a brief overview of Chord.
The Chord protocol uses a
consistent hashing function such as SHA-1 to hash both the nodes’ IP addresses
and files to produce node Ids and keys respectively. These identifiers are of length m-bits. The
node identifiers are mapped onto corresponding positions in an identifier
circle modulo 2m. Each hashed
key is then assigned to the node with the smallest identifier that is greater
than or equal to the key’s identifier.
This node is referred to as the key’s successor node. An example
of this assignment of keys is shown in Figure 1(a), where the file with identifier
“52” has been hashed to the node with identifier “55”. The consistent hashing function arranges the
nodes on the Chord circle independent of the underlying topology. A number of load balance simulations have
demonstrated that consistent hashing does not distribute keys evenly among all
nodes, and that this variation increases linearly with the number of keys in
the system. One of the reasons for this
imbalance is that nodes do not always cover the entire identifier space evenly [26].

Chord guarantees the
correctness of lookups for objects in the system. A basic version of Chord maintains that each
node need only know its successor in order to properly route queries; an
enhanced version that resolves queries faster is one in which nodes store
additional routing information in finger
tables. Each ith entry in a node n’s
finger table corresponds to the address of the first node that succeeds n’s identifier by at least 2i-1. As a result, earlier entries in the table
tend to be addresses of nearby nodes, whereas later entries correspond to nodes
further on along the circle. This
structure allows nodes to forward queries to other nodes in the remaining half
of the identifier circle and cut down on forwarding time. Lookup is resolved
with O(logN) forwardings with high probability [3]. This lookup mechanism is illustrated in Figure
1(a). Node N8 has been asked to lookup
the document with hashed key “52”. It looks in its finger table and forwards
the query to the node having the greatest identifier that is less than or equal
to “52”, which in this case is the node N42. Thereafter, the lookup is refined
by succeeding nodes.
In addition to specifying
algorithms that maintain the network in the face of node joins and leaves, the
Chord protocol implements a stabilization
routine that keeps each node’s successor pointers up-to-date. This ensures
the correctness and speed of lookups when nodes join and leave the system, even
if these dynamics occur concurrently.
Stabilization is carried out periodically – it essentially verifies each
node’s current predecessor and successor, and refreshes the node finger tables.
In order to withstand the
implications of node failure, each node keeps a “successor-list” of its r nearest successors. A node will use this list to replace its
successor if it fails (i.e. leaves the network in an ungraceful manner). This
is important because it ensures correct lookups before the network has stabilized,
and equally important, it allows nodes to store replicas of the object in the r nodes to guarantee the existence of
the object in the network at all times.
III.
Design
The miChord protocol seeks to
place documents at locations based on some preference. It does so without compromising Chord’s
robust lookup mechanism. To influence
placement of objects, a metric is associated with each node. Essentially, a node’s metric is a
quantification of how desirable the node is to store the object. Typically the metric is not constant and will
evolve with the network dynamics.
Sections IV and V present a further discussion on the subject of
metrics.
In miChord, each node n on the identifier circle is aware of
its own metric as well as the metrics of the nodes in n’s finger table. To support
metrics, Chord finger tables have been enhanced to store metrics in addition to
the identifiers and addresses of successor nodes. A node needs a mechanism
whereby it “learns” the metrics of some other nodes. This learning is achieved
as part of the stabilization routine. During stabilization, in addition to
refreshing each node in the table’s successor addresses, the routine also
refreshes their metrics. Each node is responsible for maintaining and updating
its own metric.
In addition to Chord
enhancements, miChord consists of a layer on top of Chord called the DataPlacer (DP). When a user wants to publish an object with
Id k, the user tells the DP at some
node n about the object (its Id) and
the user’s (or application’s) publishing
preferences. The DP then searches the
set of nodes that n knows about
through n’s finger table for the preferred node p that has the optimum metric in n’s known node set. Once p is found, DP sends the object to p for storage and places a pointer
(pointing to p) at the actual
successor node, s, of the object’s
Id, k. Thus, preference is granted
based on a limited view of the world: whatever n sees of it through its finger table window. This is what we refer
to as constrained load balancing.
When a request for k is made, the lookup query first yields
k’s successor, s, in conformance with Chord, which will respond to the requestor
with providing p’s address, now
deemed responsible for k by miChord.
An example of this mechanism is shown in Figure 1(b). Lookup first proceeds as it would with
Chord. Then, upon reaching the successor
node (node N55), a pointer directs the requester to node N27, where the object
has been placed by N8’s DP. Also shown
in Figure 1(b) is the modified miChord finger table that has an additional
column to store node metrics.
IV.
Motivating Metrics
While DHT-based peer-to-peer
overlays load-balance for keys and provide good key-lookup time guarantees,
they do so at the expense of randomly controlling where data is placed. We believe the adverse side-effects of random
data placement are very much application-specific. That is, depending on the
higher-level application’s requirements, peer-to-peer overlays can be performing
suboptimally in different regards and different metrics can be exploited to
alleviate this suboptimality. One clear disadvantage, for example, is that data
may be undesirably placed on a node that is far from its owner/publisher or
user. We see that the challenge of
peer-to-peer computing goes beyond simple file-sharing [16], and hence we try to make our decoupling scheme
metric-/property-agnostic and capable of propagating and promoting any
node-specific property that is valuable to the overlying application and
capable of being tracked, maintained, propagated, and chosen in accordance with
the basic scheme described in Section III.
In [13] the authors assert that peer-to-peer applications
form their dynamic user-level networks based on available bandwidth between
peers and that overlay networks can configure their routing tables based on the
bandwidth of overlay links. As much as
available bandwidth is a sought-after resource in file-sharing applications and
can be a metric miChord optimizes for, it is one that is relatively expensive
to accurately and honestly track, mostly due to the intrusiveness of current
available bandwidth measurement tools.
Therefore the cost incurred in maintaining a given metric is a very
important consideration that should be evaluated when using miChord to divorce
information from location based on a specific metric.
Optimizing for bandwidth is
not the only way to enhance performance.
Although bandwidth in the core of the internet has been doubling at an
incredible rate, latency has not been improving as quickly [13]. For small
objects, latency, not bandwidth, is the dominating factor. If we consider distributed event notification
[24], chatting, or gaming peer-to-peer systems, the
sought-after resources could be significantly smaller and constantly interacted
on by the publisher/users making them very latency-sensitive. Depending on the application, the original
publisher may, for example, be the most important (if not only) benefactor
miChord could serve. In a file-sharing
application there are many others involved, but in a distributed file-system
application, the publisher will be the most
one interacting with his/her objects, so why not have them close? In a
secure-backup system, the publisher will be the only one interacting with them. To serve the publisher best, one
metric miChord could exploit is the distance between the publisher and the
different nodes it is aware about in its finger table (or in that of a randomly
contacted proxy node in the ring if the publisher is not a peer). The node
yielding the smallest distance, measured by RTT, with respect to the publisher
could now serve as the best object storage option, based on the publisher’s or
proxy node’s limited view of the world. In a collaborative file-system
application, members of the workgroup (the collaborators)
could be part of the same organization, department, or research group and are
most probably within the same geographic location, hence making a storage node
that is desirable (or close) to the publisher relatively desirable to the rest
of the collocated collaborators. All the more, the object owner/publisher may
choose to serve the object from his/her own node. In this case only a pointer to the
publisher/owner is mapped to the node responsible for the object’s key
according to Chord.
Nodes
can also easily keep track of their up/downtime or availability trends making
highly available nodes better contenders for objects deemed critical by their
publishers. Highly available nodes are
also optimal locations for replicating data, as they tend to fail or leave the
overlay less frequently causing less replication traffic in the system.
While
peer-to-peer systems have been proposed as the solution to a diverse set of
problems, many peer-to-peer systems will be used to present services to end
users. End users are often skeptical of
services that consume local resources in order to support anonymous outside
users. User acceptance is often
predicated on the extent to which end users feel they have fine-grained control
over the intrusiveness of the service [2]. Accordingly,
this metric-based abstraction can also be used to propagate nodes’ different
storage policies where relevant: How many files a node is willing to store? How
big are the files permitted to be? How long is their storage guaranteed?
Ideally,
miChord aims at supporting more than a single-metric-based decision (as in RON [1]) through cleverly taking into account all that a
given node can say about itself (up/downtime, closeness to activity, available
bandwidth, storage policy, lon./lat. coordinates, etc...). The node would then
feed this information to some function either calculated by the publisher at
the time of publishing (if it contains relative parameters, such as RTT between
publisher and contending nodes) or readily calculated and propagated by the
node itself when fingered. Again, this
would be very dependent on the higher-level application and whether metric
exploitation is built within the application logic or is at the subjective
discretion of its users, namely object publishers.
V.
Sample Metrics
To
demonstrate the usefulness of miChord and relevant considerations when picking
a metric, we created two sample metrics.
Each improves read performance in a different way. The first metric aims to load-balance the
amount of data each node serves. The
second metric aims to publish data to the
activity center (to be defined later).
In this section, we describe how the metrics are measured, calculated,
and updated. In Section VI, we describe
how we evaluated them and discuss the evaluation results.
A.
Metric-1: Load Balancing the Amount of Data Served
The distribution of demand for
real-data items is often skewed, leading to potentially poor load balancing,
swamped nodes, and discarded requests [10]. Metric-1 aims to improve the overall read time by
load-balancing the amount of data each node serves. The overall read time is
the time from which a read is requested to the time at which all of the data is
received. Improvement is most apparent
in large files whose read time is bandwidth dominated. By load-balancing the amount of data each
node is serving, we are optimizing the amount of bandwidth that is available to
each request.
We made the following
assumptions:
·
Files
have significantly different popularity, which is justified by real world
observation [11].
·
Writes
are not concentrated at fewer than log2
N nodes, which remain unchanged over time. This assumption is also
justified by real world observation [25]. Since each node only has a partial view of the
world, if writes are concentrated, there may be a set of nodes which have no
data published to them because the publishing nodes do not know about them.
·
Each
node has the same uploading bandwidth.
For nodes with different bandwidth capacities, multiple virtual
instances can be instantiated on nodes with higher bandwidth to make this
assumption appear to be true [7].
For this metric, each node
keeps track of the amount of data D
that it served in the last W
seconds. W is the same for all nodes.
Then the node’s metric value is D. The best value for W depends on the frequency of stabilization. We want W to be small enough to reflect the
current activity at a node, but big enough to compensate for the fact that the
system will not react instantaneously to a change in a node’s activity. As mentioned in Section III, a node’s metric
is propagated to some other nodes when stabilization occurs. Since a node’s knowledge of other nodes’
metrics is only as up-to-date as the last stabilization, the best value for W is directly related to the time
between stabilizations.
B.
Metric-2: Writing to the
Metric-2 aims to improve the
read performance by decreasing the overall latency in the system for read
requests. Improvement is most apparent
for small data objects whose read time is dominated by latency. The idea with metric-2 is that publishing
data at or near the center of activity
optimizes the read latency for the active users. We define activity
to be the reads in the system. A node’s
metric-2 reflects how close it is to the activity center. The smaller its metric-2, the closer it is to
the activity center. We measure distance
using round trip times (RTTs) between nodes.
We made the following
assumptions:
·
Geographically
speaking, activity is unevenly distributed.
This characteristic is true in a system where all nodes are equally
active, but unevenly distributed. For
instance, some geographic areas are denser with nodes than others, which is
true in the real world.
·
All
nodes are equally likely to request the same object.
For this metric, each node’s
metric value is an exponential weighted average (EWMA). Upon receiving a read request from node j, node i updates i’s metric
value M:
Mnew =
(k x RTTi-j) + (1 – k) x Mold
A node’s initial metric value
M0 is initialized to an estimate of the average latency between all
nodes. M0 should be the same
for all nodes.
We expect
that the propagation delay of the nodes’ metrics will not be a significant
performance penalty since we do not expect the location of activity to shift
abruptly. Using the EWMA to keep track
of the distance to the activity center, if the activity center shifts over time
as happens in a global network with nodes in different time zones joining and
leaving the system, each node’s metric value will reflect the change
accordingly.
There are limitations to this metric. By nature of trying to place object at or
near the activity center, its performance is heavily dependent on the network
of nodes and the activity pattern.
Furthermore, placing all the data at a few central nodes may overwhelm
those nodes. There are two factors that
reduce this effect of overwhelming a few nodes.
The first factor is that each node only knows about a subset of the
nodes in the system. Thus, as long as
writes are not concentrated at a few nodes, writes will be somewhat spread
out. The second factor is that this
metric is optimizing read performance for small data objects since those are
the ones whose performance are dominated by latency.
Also, we
would like a node’s metric value to stabilize around a value that truly
reflects a node’s closeness to the activity center. When a node first joins, it is possible that
one or two metric value updates with very long RTTs makes the node seem far
from the center even though the node is actually close to the center. One way to prevent good nodes from being
stuck with bad metric values is to introduce randomness. Use the metric for data placement 80% of the
time (i.e 4 out of 5 times). The other
20% of the time, place the data randomly.
A good node’s metric value will recover when given the chance by
randomness. Furthermore, randomness will
also help in keeping data from concentrating at a few nodes.
VI.
Evaluation
A.
Experimental Test Setup
The miChord simulator has been
developed in C# (C-Sharp) over Microsoft’s .NET framework and consists of 10
classes and around 1100 LOC. C# is a new type-safe and garbage collected
language with a very rich class library for Rapid Application Development
(RAD). The simulator simulates the DataPlacer layer described in Section III and the
basic Chord lookup, node join, finger table generation, and stabilization
operations specified in [26] and enhanced to support metrics. The simulator’s
implementation included a DataRequestor layer, which upon honoring an object
request, updates the relevant metric on the server node through: 1)
incrementing the object serving counter of the node in the case of Section V-A,
and 2) adjusting the RTT metric through an EWMA implementation as described in
Section V-B. The up-to-date metrics are
propagated to fingering nodes as they periodically update their finger
tables.
Performance tracers are also
inserted where relevant to keep track of the difference performance
characteristics and results discussed later in this section. A separate EventGenerator module is responsible for generating the
different events (node joins, file publishing, and file downloads) and their
transitional delays. Events are probabilistically
skewed to result in more file requests than writes, some nodes being more
active than others, and some files being more popular than others. This makes our simulations better resembling
of real-life peer-to-peer usage and traffic trends.
The generated event file is
then fed simultaneously into 3 DHT-implementations: 1) basic Chord, 2) miChord
attempting to load balance node’s object serving requirements (see Section
V-A), and 3) miChord optimizing overall object read request latencies (see Section
V-B). The output constitutes the
performance results contrasted and discussed in depth below.
B.
Simulation Environment
We use the PlanetLab testbed [6] as the underlying network topology in our
simulations. The topology consists of
the 207 production node list for which all-pairs-pings information is
available. A matrix of the pair-wise
latencies between all nodes (the average of which is around 150ms) is fed to
the simulator and consulted whenever the RTT between two nodes is required to
track performance during lookup and object publishing/retrieval. While the
Chord Simulator developed in parallel with [26] keeps track of the total number of hops during a
lookup operation, the miChord simulator also tracks the cumulative lookup
latency incurred during lookup and object publishing/retrieval.
A total of
80,000 publish and read request events with varying in-between delays were
generated. Of the 80,000 events, 16,000
were publish
events, populating the simulation with 16,000 files of the same size with Ids
in a 15-bit id space. The events
generated reflect the assumptions stated in Section V. The results discussed
hereunder are based on an EWMA constant k of value 0.9 (see Section V-B).
C.
Experimental Results for Metric-1
Our
simulations for metric-1 demonstrate that using metric-1 to place objects will
result in better load balancing for the amount of data that each node
serves. Numbering the nodes from 1 to
207, with the same node getting the same number in the Chord and miChord
simulation, Figure 2 shows the number of read requests that each node served by
the end of our simulation. As expected,
miChord achieves much better load balancing when files have different
popularities. The number of requests
served by miChord are tightly clustered around 309, the number of requests each
node would serve if the system load-balanced perfectly. The standard deviation for the number of
files served is 383 and 93 for Chord and miChord respectively (75%
improvement).

Metric-1 is
only aimed to load-balance the nodes’ serving load, but we wanted to check how
load-balanced the nodes’ storage was.
Figure 3 ranks the nodes in each simulation in order from those storing
the most files to those with the least files.
The number of files each miChord node stores is more even, with a
standard deviation of 40 compared to Chord’s 80 (50% improvement).

D.
Experimental Results for Metric-2
As metric-2
is a more complicated metric than metric-1, we do a sanity check before
proceeding to evaluate its performance.
Once again numbering the nodes from 1 to 207, with the same node getting
the same number in the Chord and miChord simulation, Figure 4 shows how much
data each node stored by the end of the simulation. We see a few miChord nodes that store many
files (nodes near the activity center) while the rest store almost no
files. Since metric-2 aims to minimize
read latency, and object’s whose read performance is dominated by latency tend
to be small, we are less concerned about load-balancing the amount of data each
node stores. However, by the end of our
simulation results, the worst miChord node, where worst is defined by storing
the most objects, stored 3,189 (or 18%) of the approximately 16,000 data
objects. The worst Chord node stored 570
objects, one sixth that of the worst miChord node. Although ending up with data highly
concentrated at a few nodes supports that metric-2 places data at or near the
activity center, it also brings up scalability concerns. Before we propose possible solutions, let us
discuss the performance gains.

Taking the
average latency over all read requests, there was an 8% improvement for miChord
requests over Chord requests. Our first
observation is that 8% is small enough that it might be statistically
insignificant. However, the large number
of events in our simulation in combination with Figure 5, the cumulative
distribution functions of latency in Chord vs miChord, and Figure 6, the
average read latency experienced by each node in Chord vs miChord, leads us to
believe that there is in fact a small performance gain from placing data using
metric-2. For Figure 6, each point below
the x=y line is a node request that had better average read latency in miChord;
almost all points are below the x=y line.
Moreover, the
total request RTT in the above simulations is defined as the sum of the latency
accumulated during the different hops required to resolve a certain lookup and the latency between the requestor
and object server incurred when retrieving the object. miChord’s
intervention attempts to improve only this last retrieval hop between active
requestors and servers. This does not directly affect the bulk of the total
request RTT incurred during lookup; hence the marginal, yet consistent, 8%
improvement.


As stated in
Section V, the amount of improvement in read latency depends heavily upon
topology. Should the topology merit the use of this metric, there are ways to
work around data being too concentrated at a few nodes. One possible workaround idea is to place data
according to two metrics. The first
metric which reflects how much more data a node has than other nodes it knows
about is used to determine whether a node is available for object placement. Of
the available nodes, metric-2 is used to determine object placement. From
another perspective, one might come to conclude that the need for optimizing a
system by way of metric-2 can be easily alleviated by a simple caching scheme
that guarantees a cached copy of popular objects in different active clusters.
The interplay between caching, pointers, and replication is further discussed
in the next final section.
VII.
Conclusion and Future Work
We have proposed the use of
pointers to decouple object lookup from placement in DHT-based overlays. Since existing DHTs falsely assume
homogeneity of peer-to-peer nodes and workloads, we claim and demonstrate that
being able to influence the placement of an object in favor of the overlying
application’s needs, and thus account for heterogeneity in the system, can lead
to performance gains depending on the metric exploited.
One cost that comes with this
scheme is that, in the face of a high churn rate, both the objects and the
pointers need to be constantly replicated and maintained to ensure data
availability and persistence. With pointers, there is now an additional point
of failure during object retrieval besides its unavailability: the
obsolescence/unavailability of the crucial pointer(s) that lead to it.
Furthermore, to preserve the
preferences during initial object placement, a replication scheme more
elaborate than the simple scheme of replicating to the r successor nodes (implemented in CFS [7] on top of Chord [26]) is required. This is because the r successors
are unlikely to have the same desirable characteristics that made the original
object holder the optimal placement choice. One possible workaround is to start
with an r-successors replication scheme, and when the original holder
fails/leaves the system, the secondary copy is migrated to a similarly preferable
node and is elevated to primary copy status. Additionally, pointers to the
primary copy need to be updated. This is an area of future work and a current
limitation of the proposed scheme, especially if the initial placement
preference need be maintained throughout the object’s lifetime.
Just like any meaningful
application using Chord is responsible for providing its desired caching and
replication needs [26], applications layered on top of miChord are expected
to complement and/or use the pointer scheme in implementing their caching and
replication schemes. In fact, miChord eases the implementation of these
features. Since pointers are relatively much cheaper (storage-wise) than the
data objects they are pointing to, extensive caching of pointers can be made
and less caching of the expensive objects – hence consuming less storage space
and network traffic while guaranteeing more cache availability. Moreover, by
replicating objects on nodes advertising higher availability guarantees (see
Section IV), less replication traffic will be incurred since such nodes tend to
fail/leave the system less often.
Finally,
we would like to reiterate that miChord aims at eventually supporting more than
a single-metric-based decision through cleverly taking into account a lot of
what nodes can track and advertise about themselves. This information can be
fed into some function either calculated by the publisher at the time of
publishing (if it contains relative parameters, such as RTT between publisher
and contending nodes) or readily calculated and propagated by the maintaining
nodes when fingered. Again, this would
be very dependent on the higher-level application’s needs to control data placement.
Accordingly, miChord is keen to remain generic and implementation-agnostic. It
imposes as few constraints as possible on its adopting application, which can
easily reduce it back to basic Chord by restoring data objects in place of
their pointers.
VIII.
Acknowledgements
We would like to thank Jeremy Stribling for
kindly providing the PlanetLab latency data and Mike Walfish for his
clarifications about the Chord Simulator. This work would have also not been
possible without the continuous support and insight from
IX.
References
[1] D. Andersen, H. Balakrishnan,
F. Kaashoek, R. Morris, "Resilient Overlay Networks", In Proceedings
of SOSP '01, October 2001.
[3] H. Balakrishnan, F. Kaashoek,
D. Karger, R. Morris,
[4] M. Castro, P. Druschel, et.
al., "Exploiting Network Proximity in Distributed Hash Tables", In
Proceedings of the International Workshop on Future Directions in Distributed
Computing (FuDiCo), 2002.
[7] F. Dabek, M. F. Kaashoek, D.
Karger, R. Morris, and
[9] L. Garcés-Erice, E.W.
Biersack, P.A. Felber, K.W. Ross, G. Urvoy-Keller, "Hierarchical
Peer-to-peer Systems", In Proceedings of ACM/IFIP International Conference
on Parallel and Distributed Computing (Euro-Par), 2003.