Student Essays

Operating Systems

Back to current essay

(published in November, 2003)

Techniques for Peer Organization and Search in P2P Networks

The emergence of decentralized and dynamic file-sharing applications such as Napster¹ in 1999, and Gnutella² in 2000 provided the catalyst that drew a lot of attention to a new breed of distributed systems called peer-to-peer (P2P) systems. A P2P system is a distributed system in which network-addressable computing elements, called peers, have comparable roles and responsibilities, communicate information, and, share or consume services and resources between them. P2P systems have the ability to harness vast amounts of resources from a scalable collection of autonomous peers and especially emphasize on de-centralization and lack of a central authority. As a result these systems are particularly attractive to everyday home computer users, who seem empowered by the potential to independently select and change their own policies, roles, and responsibilities. By allowing peers to share a portion of the authority, these systems also possess other interesting technical characteristics such as self-organization and adaptation.

The recent popularity of peer-to-peer systems has fueled numerous research proposals and commercial ventures to organize P2P networks, efficiently search for files, secure information, and provide new applications. By far, one of the biggest challenges of P2P systems is the ability to locate information like files or resources. Without a central server to index the content of peers, searching for information in P2P systems becomes a potentially costly operation. Centralized searching (as used by Internet search engines such as Google³) has the downside that the central authority controls the indexing and presentation of the information. P2P networks, having no such central regulation, require more innovative search techniques to tackle the scale and irregularity.

Current P2P applications employ search algorithms that attempt to bring down the cost of searching in terms of number of hops/messages while still covering the largest possible number of peers in the system. Of these approaches, some propose unstructured p2p networks, where there is no coupling between network topology and the location of data. In fact, the topology of the network is allowed to adapt, grow, or shrink dynamically as new nodes join or leave the network based on actions of their users. Search techniques in such systems therefore have loose guarantees. Existing techniques for searching include the naïve flooding; propagating search queries to all neighboring peers who in turn forward the query until the query has been forwarded a pre-defined number of times (Gnutella); forwarding a search query with a pre-defined hop limit to only one neighboring peer at a time (Freenet⁴); employing highly connected peers or “superpeers” to propagate or broker the search query (Adamic⁵, KaZaA⁶); peer gossiping to maintain accurate local copies of membership directories and summaries of shared content (PlanetP⁷); and so on.

In structured P2P networks, the reverse is true, i.e. there is a close coupling between the topology of the network and the location of data. Often structured approaches employ Distributed Hash Tables (DHT) to organize P2P networks. DHT based systems require participating peers to store either entire files or file locations when the identity of the peer corresponds to the hash of the filename published by another peer. Therefore, DHT based systems can be used to perform efficient filename searches because they guarantee the location of the data if it exists within the system at the cost of a data insertion overhead (i.e. the process of updating tables at the peer whose identity matches the hash of the filename). Example algorithms include Chord⁸, which uses a distributed hash lookup primitive to build a scalable and robust P2P system; and Pastry⁹, which is a large-scale, peer-to-peer archival storage utility that provides scalability, availability, security and cooperative resource sharing.

Another interesting approach to organize and search the P2P network employs communities¹⁰ of peers. Peer communities are a generalization of the notion of a peer group to a multiplicity of groups (possibly overlapping). While a group is a physical collection of objects, a community is a set of active members, who are involved in sharing, communicating and promoting a common interest. The concept of peer communities is loosely based on the idea of “interest groups”, such as Yahoo Groups¹¹ or Usenet Newsgroups. The user of a peer in the system claims to have some interests and depending upon the claims of all the peers’ users, communities are implicitly formed (made up of peers with the same or similar interests). Note that communities are formed implicitly, i.e. they are self organizing. A peer may belong to many different communities and communities may overlap.

Interest-based communities of peers help in pruning the search space¹² and in content-based searches within the P2P network. Earlier peer-to-peer search techniques had a major drawback that information located farther away from a peer can be found only at a considerable search expense. A community-based search query propagation scheme provides more efficient searching by targeting one or more communities, irrespective of the current membership of the searching peer. The technique follows the innate method of searching that human beings use in the analogous social network, where queries are asked off “those that know.” The community-based search technique also allows search operations to be based on content rather than just filename searches employed by many existing peer-to-peer search techniques.

P2P networks are autonomously created, self-organizing, decentralized systems. However before P2P systems can be built for everyday use, it is important to address some challenges of the paradigm. Efficient search for information is one of the mainstays of P2P networks. Since search algorithms and their performance closely depend on network topology, one finds a proliferation of proposals to organize peers into unstructured, structured, and semi-structured networks. Each organization is best suited for specific types of applications. For instance, structured DHT-based networks make the assumption that peers (especially those that store hash tables) will not chaotically join/leave the network. In contrast, unstructured P2P systems do not make this assumption and therefore usually incur a higher communication cost during searching. In conclusion, addressing the challenge of organizing peers and search in P2P networks is important and will take many more creative solutions, analyses, prototypes, and commercial ventures before we have a clearer understanding of the applicability of the P2P paradigm.

References

Napster. http://www.napster.com
Gnutella. http://www.gnutelliums.com
Google. http://www.google.com
I. Clarke, T.W. Hong, S.G. Miller, O. Sandberg and B. Wiley. “Protecting Free Expression Online with Freenet,” IEEE Internet Computing, IEEE Press, vol. 6, no. 1, 2002, pp.40–49.
L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman, “Search in power-law networks,” Physical Review E, vol. 64, no. 4, 046135, 2001.
KaZaA, http://www.kazaa.com
F. M. Cuenca-Acuna, C. Peery, R. P. Martin, and T. D. Nguyen, “PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities,” Proc. 12th Int’l. Symp. High Performance Distributed Computing, June 2003.
I. Stoica, R. Morris, D. Karger, F. Kaashoek, H. Balakrishnan. “Chord: A scalable peer-to-peer lookup service for internet applications,” Proc. ACM SIGCOMM, 2001, pp.149-160.
A. Rowstron and P. Druschel, “Pastry: Scalable, distributed object location and routing for largescale peer-to-peer systems,” in Proc. 18th IFIP/ACM Int’l. Conf. Distributed Systems Platforms, 2001, pp. 329-350.
M. Khambatti, K. Ryu, and P. Dasgupta, “Efficient discovery of implicitly formed peer-to-peer communities,” Int’l. J. Parallel and Distr. Sys. and Networks, vol. 5, no. 4, 2002, pp. 155-164.
Yahoo Groups. http://groups.yahoo.com
M. Khambatti, K. Ryu, and P. Dasgupta. “Structuring peer-to-peer networks using interest-based communities,” Int’l Work. on Databases, Info. Sys. and Peer-to-Peer Computing, 2003.

Mujtaba Khambatti (mujtaba@asu.edu) is a PhD candidate in computer science and engineering at the Arizona State University. His research interests include peer-to-peer systems, security, distributed operating systems, and mobile networks. Visit his website at http://www.public.asu.edu/~mujtaba.

DS Online ISSN: 1541-4922 • Feedback? Send comments to