btree_seq container

The links

Code: https://github.com/alexkupri/array/tree/master/trunk

Documentation of the class: http://alexkupri.github.io/array/annotated.html


Table of Contents

Using the btree_seq container

What is the purpose of btree_seq container?

When should I use std::vector and btree_seq container?

Additional features: concatenation and splitting.

Additional features: visiting.

Guarantees: like normal containers.

Implementation overview

Implementation overview: tree structure for the sequence.

Implementation overview: accessing the element.

Overview of the implementation: keeping up-to-date.

Discussion: why fat leaves and fat branches are efficient.

Implementation details

Implementation details: only one leaf, if the container is small.

Implementation details of short insertion: one-pass operation.

Implementation details of large insertions: inserting multiple leaves at a time.

Implementation details of erasing, cleaning and visiting: one-pass, recursion only if necessary.

Implementation details of iterator: lazy iterator and dereference operation.



Using the btree_seq container

What is the purpose of btree_seq container?

This container implements sequence, in other words, it mimicks std::vector interface. You can insert, delete elements at random positions, as well as access them randomly or iteratively. It's major benefit over std::vector is that it inserts/deletes elements at random positions for O(log(N)) time. Use it, if your application is insert/delete intensive.


When should I use std::vector and btree_seq container?

Let us see complexities for these two containers for different operations:


vector

btree_seq

btree_seq/vector

Random insertion/deletion

O(n)

O(log(n))

log(n)/n

Iterative access

O(1)

O(1) (amortized, for real applications, see discussion in iterator chapter)

1.2-1.5 roughly

Random access

O(1)

O(log(n))

log(n)

Concatenating/splitting

not supported

O(log(n))


What conclusions can be made? If you are definitely sure that your application uses mostly iterative/random access, use std::vector. (I.e., you perform "container.resize()", then use [] or iterators). But if you use random insertions/deletions (erase_if, for example), you should prefer btree_seq.

btree_seq performs random insertions/deletions better on all sizes of container.

If you use btree_seq instead of vector by mistake, you can loose at most log(n) times, which is not so crucial. However, if you use vector instead of btree_seq by mistake, you can loose n/log(n) times and that is crucial. So, if you are not sure, use btree_seq.


Additional features: concatenation and splitting.

Two additional operations are supported: concatenation and splitting. These operations use only O(log(N)) time and do not copy all elements.

If you have containers A={0,1,2} B={3,4,5} and perform A.concatenate_right(B), after the operation A={0,1,2,3,4,5} B={}.

If you have container A={0,1,2,3,4} and perform A.split_right(B,2), after the operation A={0,1} and B={2,3,4} (previous contents of B is removed).

From logical point of view, elements are moved (not copied) from one container to another.

From physical point of view, most elements are remain in the same memory location, so only O(log(N)) time is necessary.


Additional features: visiting.

There is a method of sequential access, which is even faster than access using iterators (by a constant factor, roughly 1.25-1.3). It is visiting.

You just call container.visit(interval_start,interval_end,visitor), where visitor have

"bool operator()(const element&e)". It executes until the end of the interval or until visitor returns true first time. This technique can be used for summation of sequences or for finding the first element with a given criterium. However, if you don't want to implement a visitor or if you make operation with two or more sequences, use iterators.


Guarantees: like normal containers.

btree_seq have the same guarantees as all other containers.

You can use iterators unless you perform insert/delete/split/concatenation operations. If you perform these operations, all iterators are broken.

You can read or modify different elements from different threads, but you can perform insert/delete operations from only one thread at a time.

If elements throw execptions during insertions, the container is correctly rolled back to the previous state and exception is thrown. If you run out of memory during one of operations, the container(s) is(are) also rolled back to the previous state and exception is thrown. (However, elements must not throw exceptions from their destructors.)


Implementation overview

Implementation overview: tree structure for the sequence.

There is a tree inside. It consists of Branches and Leaves. The hierarchy is the following:

	struct Branch;
	struct Node
	{
		Branch *parent;
	};
	struct Branch:public Node
	{
		Node* children[L];
		size_type nums[L];
		size_type fillament;
	};

In the branches, each subtree is described by two values: pointer to the subtree (children) and total number of elements in the subtree (nums). The number of subbranches can vary and it is described by fillament.

	struct Leaf:public Node
	{
		value_type elements[M];
		size_type fillament;
	};

Leaf contains elements and number of elements.

Both branch and node are descendants of Node. Node contains only pointer to the parent branch (or null, if the node is root).

The container has three data members:

Node *root;

size_type depth,count;

root is a pointer to the root node; count is total number elements of the container. The distance from the root to the leaves is the same across all tree and is hold in depth variable.

Nodes by itself are not instantiated. L and M are template parameters. In the picture below L=M=4, default values L=30, M=60.

The tree can look like:

                             +-----------+
                             | 7   6   6 |
                             +-|---|---|-+
                               |   |   |
            +------------------+   |   +------------------+
            |                      |                      |
            v                      v                      v
      +-----------+            +-------+          +--------+
      | 2   2   3 |            | 3   3 |          | 2   4  |
      +-|---|---|-+            +-|---|-+          +-|---|--+
        |   |   |                |   |              |   |       
   +----+  ++   +---+         +--+   +--+       +---+ +-+   
   |       |        |         |         |       |     |     
   v       v        v         v         v       v     v     
+-----+ +-----+ +-------+ +-------+ +-------+ +-----+ +---------+
| A B | | C D | | E F G | | H I J | | K L M | | N O | | P Q R S |
+-----+ +-----+ +-------+ +-------+ +-------+ +-----+ +---------+

Implementation overview: accessing the element.

Let us review how find_leaf function works. It takes pos (position) as parameter. It returns position of the given element in the leaf, and puts pointer to the leaf into variable l.


size_type btree_seq::find_leaf(Leaf *&l,size_type pos)
{
	Node *node=root;
	Branch *br;
	size_type j=depth,k,val;
	while(j){
		k=0;
		br=static_cast(node);
		val=br->nums[0];
		while(pos>=val){
			pos-=val;
			k++;
			val=br->nums[k];
		}
		node=br->children[k];
		j--;
	}
	l=static_cast(node);
	return pos;
}

There are two cycles. The outer cycle iterates over the depth of the tree. The variable depth (class member) determines how many levels of branches between the root and leaves (it is the same for all leaves, because tree is balanced) and determines, how to interpret children: as Leaf* or as Branch*. If depth==0, the tree contains only one leaf (or no elements at all and no nodes), if depth==1, the tree contains one branch with leaves and so on.

The inner cycle iterates over numbers of elements in subtrees and enters the appropriate one. The variable pos contains the number of elements remained before desired element.

Let us look at little example: we need an element #11.

Large cycle - iteration #1 (node=branch 7-6-6, j=depth=2, pos=11). We start from the root and iterate over the branch 7-6-6. The first subtree contains 7 elements, so element #11 is not there. We updete pos=11-7=4. Two elements remained until the desired element is reached. The next subtree contains 6 elements and we enter that subtree, because 6>4.

Large cycle - iteration #2 (node=branch 3-3, j=1, pos=4). We skip subtree HIJ, because it contains 3 elements and update pos=4-3=1. We enter next subtree, because 3>1.

Final statements of the function. We exit main cycle, because j=0. Now node points to the Leaf KLM and pos=1. We return these values to the caller, and the resulting element is L (element #1 in the Leaf KLM).

The complexity of the access is obviously O(log(N)).


Overview of the implementation: keeping up-to-date.

If we need to delete or insert some elements, you need only to update the corresponding Leaf and increment or decrement some values in the branches. For example, if you want to erase element L, you update the Leaf KLM so that it becomes KM, the branch 3-3 so that it becomes 3-2 and root branch so that it changes from 7-6-6 to 7-5-6. If you insert or delete one element, you typically update only one value at each level (unless overflow or underflow is performed). That's why all modifications take O(log(N)) time.

There are following restrictions for the number of elements and children:

1) each leaf must have from M/2 to M elements, unless it is the only leaf in the tree;

2) if the whole tree consists of only one leaf, it can contain from 1 to M elements;

3) all branches except root have from L/2 to L children;

4) root branch contains from 2 to L children.

If these rules are broken during modification process, leaves and branches are balanced, merged, splitted; tree depth is incremented or decremented.

These rules also guarantee that the tree will consume not very much memory, if the tree is large enough.


Discussion: why fat leaves and fat branches are efficient.

The simplest tree is binary tree. Each branch contains up to two children, each node contains only one element. (Various algorithms for balancing exist, such as red-black tree.)

Now consider we have a tree of 1000 elements. To access an element, we need to visit 10 nodes. If the nodes are scattered randomly in the memory, we perform 10 accesses of random memory locations.

Modern computers load big amounts of data from the outer memory to the cache. The typical length of the cache line is 128 bytes. If you access a value, they load some amount of memory near that value. If your tree have fat leaves and fat branches, each branch fits into the cache line. Suppose you have a tree of 1000 elements. If L=30, M=60, the tree have only two levels. If you access an element, you only access memory at two random locations.

The discussion above is true for both sequential trees and map trees. One of example of maps based on binary tree is called red-black tree. Maps based on fat trees are called b-trees. There is an implementation of sequential tree with fat leaves and binary branches, it's called __gnu_cxx::rope. This implementation uses fat leaves and branches and outperforms rope roughly 10 times.

However, leaves and branches must not be too fat. If they are too fat, much time is necessary to insert or delete element at random place.

You can choose your parameters L and M depending on the target platform and operations your application uses most of all.

The parameters are restricted: L>=4, M>=4. (Default values L=30, M=60).


Implementation details

Implementation details: only one leaf, if the container is small.

We have discussed general organisation of the container. Now we discuss some special cases or particular optimizations of the algorithm.

One special case is when there are few elements in the container, and it contains only one leaf. Basically, it behaves like vector: it moves elements when insertion or deletion is needed, random access is very fast in this case, because you don't need to travel down the tree. Insertions and deletions are performed faster than in vector in this case, because no reallocation is necessary. However, the whole leaf is allocated even for one element, so if your application uses many containers of small size, set L and M to small values.


Implementation details of short insertion: one-pass operation.

Short insertion is performed when the precise number of elements is known a-priori and this number is less or equal M/2.

In this case, either elements are inserted into existing leaf, or one leaf is splitted and then elements are inserted into one of new leaves.

The algorithm travels through the tree and makes two things at the same time: updates the num values in the branches and finds the required leaf. (Alternative implementation can be: two-pass algorithm - find the leaf first, then travel back and increment the num values, or, even worse, recursive implementation is possible, which requires many calls/returns.) Then, if necessary, leaves are splitted, elements are inserted into the proper place, and, if leaves were splitted, overflow procedure is performed.


Implementation details of large insertions: inserting multiple leaves at a time.

Large insertion is performed when the precise number of elements is unknown (for example using non-random access iterators) or the number of elements is greater than M/2.

We suppose that many leaves will be inserted. (In the special case, if the tree is empty, we initialize it, so the tree has at least one leaf.) First, we split the leaf at the place of insertion, if the insertion place is in the middle of the leaf. Then, we fill the left leaf in the place of insertion until it is full. Then we fill and insert into the tree bundles of L-1 leaves. Finally, we check for underflow condition if the last inserted leaf or a leaf next to it or both have less elements than required.


Implementation details of erasing, cleaning and visiting: one-pass, recursion only if necessary.

I implemented erasing operation and visit operation using the same engine. Destructor and clean operation use erase(0,count) as well. Here, I describe only erase operation, visit operation behaves similarly. Erase operation involves recursion only if necessary. If one element is erased, no recursion is involved at all; if the whole contents is erased, recursion is used on the whole contents.

The erase operation is implemented as following. Firstly, if the range of deletion completely located in only one subtree, we go into that subtree completely, decreasing the number of elements in that subtree by the way. (If the all erased elements are in one leaf, neither recursion nor two-pass operation is performed.) Then, if the range of deletion is located in two or more subtrees, we use cycle and call the erase procedure recursively. Finally, if we reached the leaf, we erase elements of the leaf and the leaf itself if necessary. After all elements are deleted, underflow condition is checked.


Implementation details of iterator: lazy iterator and dereference operation.

I met the most complicated dilemma when I implemented the iterator.

Most applications will probably use old std::vector style of insertion/deletion like container.erase(container.begin()+3,container.begin()+6) instead of native interface container.erase(3,6), and will use iterator when it is not really necessary to use the iterator. If I implemented the iterator in non-lazy way, the iterator "container.begin()+3" would actually point to the third element, it would require iterator to pass down the tree actually and the whole call would require at least three passes, no matter how I implemented them.

Instead, I implemented lazy iterators. They do not actually point to the elements of the container until the moment they are dereferenced. If they are used as parameters to "erase" or "insert" calls, they never point to the actual elements and are used only as wrappers to size_t.

An iterator contains value called abs_idx (the absolute position in the container) in it, as well as rel_idx (relative index in the current leaf) as well as some other data. Operations like increment or changing position just change the values of abs_idx and rel_idx, but don't check whether rel_idx really points to the actual data. Comparison and substraction operations comapre snd substract abs_idx. Dereference operations check, whether rel_idx points to the correct value and, if necessary, update the rest variables of the iterator, so that it points to the required element.

The update iterator is written in such way, that it takes almost constant time amortized, if the iterator is used for sequential access. I say "almost constant time", because it takes C1*log(N)+C2, and the constants are such, that C2 is much more than C1*log(N) even if iterator contains 2**64 elements. I could implement the iterator in the way that it required honest constant time, but it would be slower practically.