Back to current essay
(published in November, 2003)
Techniques for Peer Organization and Search in P2P Networks
The emergence of decentralized and dynamic file-sharing applications such as
Napster1 in 1999, and Gnutella2 in 2000 provided the catalyst that drew a lot of
attention to a new breed of distributed systems called peer-to-peer (P2P)
systems. A P2P system is a distributed system in which network-addressable computing
elements, called peers, have comparable roles and responsibilities, communicate
information, and, share or consume services and resources between them. P2P
systems have the ability to harness vast amounts of resources from a scalable
collection of autonomous peers and especially emphasize on de-centralization and
lack of a central authority. As a result these systems are particularly
attractive to everyday home computer users, who seem empowered by the potential
to independently select and change their own policies, roles, and
responsibilities. By allowing peers to share a portion of the authority, these
systems also possess other interesting technical characteristics such as
self-organization and adaptation.
The recent popularity of peer-to-peer systems has fueled numerous research
proposals and commercial ventures to organize P2P networks, efficiently search
for files, secure information, and provide new applications. By far, one of the
biggest challenges of P2P systems is the ability to locate information like
files or resources. Without a central server to index the content of peers,
searching for information in P2P systems becomes a potentially costly operation.
Centralized searching (as used by Internet search engines such as Google3) has
the downside that the central authority controls the indexing and presentation
of the information. P2P networks, having no such central regulation, require
more innovative search techniques to tackle the scale and irregularity.
Current P2P applications employ search algorithms that attempt to bring down
the cost of searching in terms of number of hops/messages while still covering
the largest possible number of peers in the system. Of these approaches, some
propose unstructured p2p networks, where there is no coupling between network
topology and the location of data. In fact, the topology of the network is
allowed to adapt, grow, or shrink dynamically as new nodes join or leave the
network based on actions of their users. Search techniques in such systems
therefore have loose guarantees. Existing techniques for searching include the
naïve flooding; propagating search queries to all neighboring peers who in turn
forward the query until the query has been forwarded a pre-defined number of
times (Gnutella); forwarding a search query with a pre-defined hop limit to only
one neighboring peer at a time (Freenet4); employing highly connected peers or
“superpeers” to propagate or broker the search query (Adamic5, KaZaA6); peer
gossiping to maintain accurate local copies of membership directories and
summaries of shared content (PlanetP7); and so on.
In structured P2P networks, the reverse is true, i.e. there is a close
coupling between the topology of the network and the location of data. Often
structured approaches employ Distributed Hash Tables (DHT) to organize P2P
networks. DHT based systems require participating peers to store either entire
files or file locations when the identity of the peer corresponds to the hash of
the filename published by another peer. Therefore, DHT based systems can be used
to perform efficient filename searches because they guarantee the location of
the data if it exists within the system at the cost of a data insertion overhead
(i.e. the process of updating tables at the peer whose identity matches the hash
of the filename). Example algorithms include Chord8, which uses a distributed
hash lookup primitive to build a scalable and robust P2P system; and Pastry9,
which is a large-scale, peer-to-peer archival storage utility that provides
scalability, availability, security and cooperative resource sharing.
Another interesting approach to organize and search the P2P network employs
communities10 of peers. Peer communities are a generalization of the notion of a
peer group to a multiplicity of groups (possibly overlapping). While a group is
a physical collection of objects, a community is a set of active members, who
are involved in sharing, communicating and promoting a common interest. The
concept of peer communities is loosely based on the idea of “interest groups”,
such as Yahoo Groups11 or Usenet Newsgroups. The user of a peer in the system
claims to have some interests and depending upon the claims of all the peers’
users, communities are implicitly formed (made up of peers with the same or
similar interests). Note that communities are formed implicitly, i.e. they are
self organizing. A peer may belong to many different communities and communities
Interest-based communities of peers help in pruning the search space12 and in
content-based searches within the P2P network. Earlier peer-to-peer search
techniques had a major drawback that information located farther away from a
peer can be found only at a considerable search expense. A community-based
search query propagation scheme provides more efficient searching by targeting
one or more communities, irrespective of the current membership of the searching
peer. The technique follows the innate method of searching that human beings use
in the analogous social network, where queries are asked off “those that know.”
The community-based search technique also allows search operations to be based
on content rather than just filename searches employed by many existing
peer-to-peer search techniques.
P2P networks are autonomously created, self-organizing, decentralized
systems. However before P2P systems can be built for everyday use, it is
important to address some challenges of the paradigm. Efficient search for
information is one of the mainstays of P2P networks. Since search algorithms and
their performance closely depend on network topology, one finds a proliferation
of proposals to organize peers into unstructured, structured, and
semi-structured networks. Each organization is best suited for specific types of
applications. For instance, structured DHT-based networks make the assumption
that peers (especially those that store hash tables) will not chaotically
join/leave the network. In contrast, unstructured P2P systems do not make this
assumption and therefore usually incur a higher communication cost during
searching. In conclusion, addressing the challenge of organizing peers and
search in P2P networks is important and will take many more creative solutions,
analyses, prototypes, and commercial ventures before we have a clearer
understanding of the applicability of the P2P paradigm.
I. Clarke, T.W. Hong, S.G. Miller,
O. Sandberg and B. Wiley. “Protecting Free Expression Online with Freenet,” IEEE
Internet Computing, IEEE Press, vol. 6, no. 1, 2002, pp.40–49.
L. A. Adamic, R. M. Lukose, A. R.
Puniyani, and B. A. Huberman, “Search in power-law networks,” Physical
Review E, vol. 64, no. 4, 046135, 2001.
F. M. Cuenca-Acuna, C. Peery, R. P.
Martin, and T. D. Nguyen, “PlanetP: Using Gossiping to Build Content
Addressable Peer-to-Peer Information Sharing Communities,” Proc. 12th
Int’l. Symp. High Performance Distributed Computing, June 2003.
I. Stoica, R. Morris, D. Karger, F.
Kaashoek, H. Balakrishnan. “Chord: A scalable peer-to-peer lookup service for
internet applications,” Proc. ACM SIGCOMM, 2001, pp.149-160.
A. Rowstron and P. Druschel,
“Pastry: Scalable, distributed object location and routing for largescale
peer-to-peer systems,” in Proc. 18th IFIP/ACM Int’l. Conf. Distributed
Systems Platforms, 2001, pp. 329-350.
M. Khambatti, K. Ryu, and P.
Dasgupta, “Efficient discovery of implicitly formed peer-to-peer communities,”
Int’l. J. Parallel and Distr. Sys. and Networks, vol. 5, no. 4, 2002,
Yahoo Groups. http://groups.yahoo.com
M. Khambatti, K. Ryu, and P.
Dasgupta. “Structuring peer-to-peer networks using interest-based
communities,” Int’l Work. on Databases, Info. Sys. and Peer-to-Peer
Mujtaba Khambatti (email@example.com) is a PhD candidate in computer
science and engineering at the Arizona State University. His research interests
include peer-to-peer systems, security, distributed operating systems, and
mobile networks. Visit his website at http://www.public.asu.edu/~mujtaba.