Tuesday, 13 May 2008

BitTorrent (Protocol)


BitTorrent is a protocol designed for transferring files. It is peer-to-peer in nature, as users connect to each other directly to send and receive portions of the file.

BitTorrent is a method of distributing large amounts of data widely without the original distributor incurring the entire costs of hardware, hosting, and bandwidth resources.

However, there is a central server (called a tracker) which coordinates the action of all such peers. The tracker only manages connections, it does not have any knowledge of the contents of the files being distributed, and therefore a large number of users can be supported with relatively limited tracker bandwidth. The key philosophy of BitTorrent is that users should upload (transmit outbound) at the same time they are downloading (receiving inbound.) In this manner, network bandwidth is utilized as efficiently as possible. BitTorrent is designed to work better as the number of people interested in a certain file increases, in contrast to other file transfer protocols.

Instead, when data is distributed using the BitTorrent protocol, each recipient supplies pieces of the data to newer recipients, reducing the cost and burden on any given individual source, providing redundancy against system problems, and reducing dependence on the original distributor.

The most common method by which files are transferred on the Internet is the client-server model. A central server sends the entire file to each client that requests it -- this is how both http and ftp work. The clients only speak to the server, and never to each other. The main advantages of this method are that it's simple to set up, and the files are usually always available since the servers tend to be dedicated to the task of serving, and are always on and connected to the Internet. However, this model has a significant problem with files that are large or very popular, or both.

Namely, it takes a great deal of bandwidth and server resources to distribute such a file, since the server must transmit the entire file to each client. Perhaps you may have tried to download a demo of a new game just released, or CD images of a new Linux distribution, and found that all the servers report "too many users," or there is a long queue that you have to wait through. The concept of mirrors partially addresses this shortcoming by distributing the load across multiple servers. But it requires a lot of coordination and effort to set up an efficient network of mirrors, and it's usually only feasible for the busiest of sites.

Another method of transferring files has become popular recently: the peer-to-peer network, systems such as Kazaa, eDonkey, Gnutella, Direct Connect, etc. In most of these networks, ordinary Internet users trade files by directly connecting one-to-one. The advantage here is that files can be shared without having access to a proper server, and because of this there is little accountability for the contents of the files. Hence, these networks tend to be very popular for illicit files such as music, movies, pirated software, etc. Typically, a downloader receives a file from a single source, however the newest version of some clients allow downloading a single file from multiple sources for higher speeds.

A BitTorrent client is any program that implements the BitTorrent protocol. Each client is capable of preparing, requesting, and transmitting any type of computer file over a network, using the protocol. A peer is any computer running an instance of a client.
To share a file or group of files, a peer first creates a small file called a "torrent" (e.g. MyFile.torrent). This file contains metadata about the files to be shared and about the tracker, the computer that coordinates the file distribution. Peers that want to download the file first obtain a torrent file for it, and connect to the specified tracker, which tells them from which other peers to download the pieces of the file.

Though both ultimately transfer files over a network, a BitTorrent download differs from a classic full-file HTTP request in several fundamental ways:

BitTorrent makes many small data requests over different TCP sockets, while web-browsers typically make a single HTTP GET request over a single TCP socket. BitTorrent downloads in a random or in a "rarest-first"[2] approach that ensures high availability, while HTTP downloads in a sequential manner. Taken together, these differences allow BitTorrent to achieve much lower cost, much higher redundancy, and much greater resistance to abuse or to "flash crowds" than a regular HTTP server. However, this protection comes at a cost: downloads can take time to rise to full speed because it may take time for enough peer connections to be established, and it takes time for a node to receive sufficient data to become an effective uploader. As such, a typical BitTorrent download will gradually rise to very high speeds, and then slowly fall back down toward the end of the download. This contrasts with an HTTP server that, while more vulnerable to overload and abuse, rises to full speed very quickly and maintains this speed throughout.

No comments: