Author: Arvid Norberg, arvid@libtorrent.org
Version: 1.1.2

home

Bencoding

Bencoding is a common representation in bittorrent used for for dictionary, list, int and string hierarchies. It's used to encode .torrent files and some messages in the network protocol. libtorrent also uses it to store settings, resume data and other state between sessions.

Strings in bencoded structures are not necessarily representing text. Strings are raw byte buffers of a certain length. If a string is meant to be interpreted as text, it is required to be UTF-8 encoded. See BEP 3.

There are two mechanims to decode bencoded buffers in libtorrent.

The most flexible one is bdecode() bencode(), which returns a structure represented by entry. Oncea buffer has been decoded with this function, it can be discarded. The entry does not contain any references back to it. This means that bdecode() copies all the data out of the buffer and into its own hierarchy. This makes this function expensive, which might matter if you're parsing large amounts of data.

Another consideration is that bdecode() bencode() is a recursive parser. For this reason, in order to avoid DoS attacks by triggering a stack overflow, there is a recursion limit. This limit is a sanity check to make sure it doesn't run the risk of busting the stack.

The second mechanism is the decode function for bdecode_node. This function builds a tree that points back into the original buffer. The returned bdecode_node will not be valid once the buffer it was parsed out of is discarded.

Not only is this function more efficient because of less memory allocation and data copy, the parser is also not recursive, which means it probably performs a little bit better and can have a higher recursion limit on the structures it's parsing.

entry

Declared in "libtorrent/entry.hpp"

The entry class represents one node in a bencoded hierarchy. It works as a variant type, it can be either a list, a dictionary (std::map), an integer or a string.

class entry
{
   data_type type () const;
   entry (list_type const&);
   entry (integer_type const&);
   entry (dictionary_type const&);
   entry (string_type const&);
   entry (preformatted_type const&);
   entry (data_type t);
   void operator= (string_type const&);
   void operator= (integer_type const&);
   void operator= (entry const&);
   void operator= (dictionary_type const&);
   void operator= (list_type const&);
   void operator= (preformatted_type const&);
   void operator= (bdecode_node const&);
   preformatted_type& preformatted ();
   const integer_type& integer () const;
   const string_type& string () const;
   const preformatted_type& preformatted () const;
   const dictionary_type& dict () const;
   string_type& string ();
   list_type& list ();
   dictionary_type& dict ();
   integer_type& integer ();
   const list_type& list () const;
   void swap (entry& e);
   entry& operator[] (std::string const& key);
   const entry& operator[] (std::string const& key) const;
   entry& operator[] (char const* key);
   const entry& operator[] (char const* key) const;
   entry const* find_key (char const* key) const;
   entry* find_key (char const* key);
   entry* find_key (std::string const& key);
   entry const* find_key (std::string const& key) const;
   std::string to_string () const;

   enum data_type
   {
      int_t,
      string_t,
      list_t,
      dictionary_t,
      undefined_t,
      preformatted_t,
   };

   mutable boost::uint8_t m_type_queried:1;
};

type()

data_type type () const;

returns the concrete type of the entry

entry()

entry (list_type const&);
entry (integer_type const&);
entry (dictionary_type const&);
entry (string_type const&);
entry (preformatted_type const&);

constructors directly from a specific type. The content of the argument is copied into the newly constructed entry

entry()

entry (data_type t);

construct an empty entry of the specified type. see data_type enum.

operator=()

void operator= (string_type const&);
void operator= (integer_type const&);
void operator= (entry const&);
void operator= (dictionary_type const&);
void operator= (list_type const&);
void operator= (preformatted_type const&);
void operator= (bdecode_node const&);

copies the structure of the right hand side into this entry.

string() integer() dict() preformatted() list()

preformatted_type& preformatted ();
const integer_type& integer () const;
const string_type& string () const;
const preformatted_type& preformatted () const;
const dictionary_type& dict () const;
string_type& string ();
list_type& list ();
dictionary_type& dict ();
integer_type& integer ();
const list_type& list () const;

The integer(), string(), list() and dict() functions are accessors that return the respective type. If the entry object isn't of the type you request, the accessor will throw libtorrent_exception (which derives from std::runtime_error). You can ask an entry for its type through the type() function.

If you want to create an entry you give it the type you want it to have in its constructor, and then use one of the non-const accessors to get a reference which you then can assign the value you want it to have.

The typical code to get info from a torrent file will then look like this:

entry torrent_file;
// ...

// throws if this is not a dictionary
entry::dictionary_type const& dict = torrent_file.dict();
entry::dictionary_type::const_iterator i;
i = dict.find("announce");
if (i != dict.end())
{
        std::string tracker_url = i->second.string();
        std::cout << tracker_url << "\n";
}

The following code is equivalent, but a little bit shorter:

entry torrent_file;
// ...

// throws if this is not a dictionary
if (entry* i = torrent_file.find_key("announce"))
{
        std::string tracker_url = i->string();
        std::cout << tracker_url << "\n";
}

To make it easier to extract information from a torrent file, the class torrent_info exists.

swap()

void swap (entry& e);

swaps the content of this with e.

operator[]()

entry& operator[] (std::string const& key);
const entry& operator[] (std::string const& key) const;
entry& operator[] (char const* key);
const entry& operator[] (char const* key) const;

All of these functions requires the entry to be a dictionary, if it isn't they will throw libtorrent::type_error.

The non-const versions of the operator[] will return a reference to either the existing element at the given key or, if there is no element with the given key, a reference to a newly inserted element at that key.

The const version of operator[] will only return a reference to an existing element at the given key. If the key is not found, it will throw libtorrent::type_error.

find_key()

entry const* find_key (char const* key) const;
entry* find_key (char const* key);
entry* find_key (std::string const& key);
entry const* find_key (std::string const& key) const;

These functions requires the entry to be a dictionary, if it isn't they will throw libtorrent::type_error.

They will look for an element at the given key in the dictionary, if the element cannot be found, they will return 0. If an element with the given key is found, the return a pointer to it.

to_string()

std::string to_string () const;

returns a pretty-printed string representation of the bencoded structure, with JSON-style syntax

enum data_type

Declared in "libtorrent/entry.hpp"

name value description
int_t 0  
string_t 1  
list_t 2  
dictionary_t 3  
undefined_t 4  
preformatted_t 5  
m_type_queried
in debug mode this is set to false by bdecode to indicate that the program has not yet queried the type of this entry, and should not assume that it has a certain type. This is asserted in the accessor functions. This does not apply if exceptions are used.

bdecode() bencode()

Declared in "libtorrent/bencode.hpp"

template<class InIt> entry bdecode (InIt start, InIt end);
template<class OutIt> int bencode (OutIt out, const entry& e);
template<class InIt> entry bdecode (InIt start, InIt end, int& len);

These functions will encode data to bencoded or decode bencoded data.

If possible, bdecode() producing a bdecode_node should be preferred over this function.

The entry class is the internal representation of the bencoded data and it can be used to retrieve information, an entry can also be build by the program and given to bencode() to encode it into the OutIt iterator.

The OutIt and InIt are iterators (InputIterator and OutputIterator respectively). They are templates and are usually instantiated as ostream_iterator, back_insert_iterator or istream_iterator. These functions will assume that the iterator refers to a character (char). So, if you want to encode entry e into a buffer in memory, you can do it like this:

std::vector<char> buffer;
bencode(std::back_inserter(buf), e);

If you want to decode a torrent file from a buffer in memory, you can do it like this:

std::vector<char> buffer;
// ...
entry e = bdecode(buf.begin(), buf.end());

Or, if you have a raw char buffer:

const char* buf;
// ...
entry e = bdecode(buf, buf + data_size);

Now we just need to know how to retrieve information from the entry.

If bdecode() encounters invalid encoded data in the range given to it it will return a default constructed entry object.

operator<<()

Declared in "libtorrent/entry.hpp"

inline std::ostream& operator<< (std::ostream& os, const entry& e);

prints the bencoded structure to the ostream as a JSON-style structure.