Last Updated: 10/3/2001
A_BTree.rb
Purpose
License
Description
Dependencies
Details
Limitations
Performance Considerations
To Do
Usage
Class Methods
Instance Methods
Testing
A_Debug Usage
Purpose:
This is an n'ary tree or b*tree written in Ruby. This class quickly stores and retrieves lots of key/value pairs (also called records). ArunaDB uses this class for the A_Table and A_Index classes to read and write records. ArunaDB is a database server written in Ruby. Records are stored in pages (see a_buffer.html for a description of the class used for storing btree pages). Each page can hold many records. Each page is sorted for fast binary searching. The pages are written to disk such that 1 row can be found in 1 million by reading as few as 3 pages in the btree (* the actual number of pages read may vary depending how big the key is and how large the page size is). A good discussion about n'ary trees or b*trees can be found at: http://perl.plover.com/BTree/article.html
Description:
This class is thread safe. Many threads can read and write to the same btree at the same time. Only one process may access a btree at a time. This class uses A_Catalog.rb to remember information about the btree so it can be reopened by name only. This class uses A_FileStore.rb for all file I/O. A filestore allows many btrees to reside in one file and one btree to span many files. This allows you to spread a large btree across many physically different disk drives. Note: A record, a row, and a key/value pair all describe the same thing.
Here are the main features:
- You can iterate over the rows in the btree and find/update/delete rows (using update_row and delete_row) while you are iterating.
- It class is thread safe such that many threads can be accessing the btree at the same time (iterating, reading and writing). You can see what threads own or are waiting for btree and filestore locks by calling Thread.a_show. For more information, take a look at the extensions to the Thread class made in a_debug.rb.
- Any Ruby object can be stored in the btree as either a key or a value. The key must support the == and <=> methods. Marshal.dump/Marshal.load must be supported for both the key and the value.
- You can export and import btrees for backing up your btree data and changing any of the parameters in the btree.
- The show() method allows you to print the content in your btree in three different ways. Two of the ways provide a nice graphical representation of the btree. The third is a one liner that describes the btree.
Details:
The btree uses stacks when iterating. When you start iterating over rows in the tree, a stack is created and populated with one page for each level in the tree. This is a working stack and reused as much as possible. The stack is always memory resident and is used is such that you can traverse forward or backward through the rows in the btree at any time. As you iterate over the rows in the btree, the stack is updated by reading new pages only as needed. This is technique very flexible and very efficient. There are two stacks for each btree. One stack is used by the iterator methods such as each(), each_reverse(), and each_duplicate(), and another stack is used by methods that use iterators such as find(), insert(), update(), and delete(). This approach allows you to find() rows in the btree while you are iterating over rows in the btree.
This btree class provides a mostly balanced btree. By mostly balanced, I mean, I tried to get the best performance while keeping the unused space in the btree to a minimum. As a result, here are rules that were followed:
- when a page drops below 45% full:
- try to merge with the left or right sibling. The smallest sibling is chosen and merging is done only if all rows can be moved and 10% of the page is still empty. If the rows can be moved then this page is deleted from the btree.
- if all rows can't be moved to the smallest sibling, then rows are merged with the largest sibling to even out the rows in these two pages.
- when root contains only one element, a level is removed from the btree
- when a page grows too large:
- try to merge with the left or right sibling. The smallest sibling is chosen and merging is done only if both pages will be at least 10% free after moving the rows. Rows are moved until the two pages are equal in size and a split is avoided.
- if merging does not work, then the page is split in two.
- when root fills up and splits a new level is added to the tree
Locking Strategies:
- The primary goal to lock the btree as little as possible. Locks are acquired and released on a per page basis. Basically, when a page is read, the read lock is grabbed and released, when a page is updated, the write lock is grabbed before the first update and released when the updated page is written to disk. The locking strategy is different when the key is unique verses not unique. Here are the details when the key is unique:
- There can be many concurrent reader and writer threads. When iterating over rows in the btree, the filestore's read lock is grabbed and immediately released as each page is read into the stack. Technically, only one thread can read from the tree at the exact same time, but the period of time that the tree is actually locked is very small. This technique allows many reader threads to iterate over rows in the same btree at the same time and update or delete rows while iterating.
- When you call delete_row() or udpate_row() the write lock is grabbed and held until the updated page is written to disk. This occurs when the iterator is done processing each row on the page and is ready to read the next page in the btree. When the newly updated page is written to disk, the write lock is released. This means the write lock could be held for longer periods of time but also means that delete_row() and update_row() are very fast and efficient for updating and deleting lots of rows in the btree. In other words, the write lock is acquired and held from the moment the first update occurs until that page is written to disk. This iterators ensure this update occurs even if you stop iterating in the middle of a page.
- When delete_row() or update_row() changes the btree, all readers are quickly informed of the changes to the tree. Readers will reread any altered pages as needed. This works great when the keys are unique because it is easy to reread the stack to exactly the same position with out missing a row.
- When the key is not unique:
- There can be many concurrent reader and one writer thread. When iterating over rows in the btree, the filestore's read lock is grabbed and immediately released as each page is read into the stack. Technically, only one thread can read from the tree at the exact same time, but the period of time that the tree is actually locked is very small. This technique allows many reader threads to iterate over rows in the same btree at the same time and update or delete rows while iterating.
- In order to call delete_row() or udpate_row() you must set the grab_write_lock parameter in the iterator as you start iterating. This grabs the write lock before you start iterating. The write lock is release when you are done iterating. This is necessary because I can't correctly reposition the stack if another thread update the btree. If the key is unique, then I can reread and reposition the stack as needed when the btree changes. When the key is not unique I can't do this. Therefore, only one iterator can update the btree.
- One more point, the writer can block readers in certain situations. If you delete_rows() from the btree while you are iterating and this caused a page to be deleted or you update_rows() and this causes a page to split, all new readers will be blocked and the writer waits until all readers are done before the page delete or split can be completed. This is necessary because readers have no way to reread or reposition their stacks when the btree changes.
- The net effect of the above limitations with non-unique key btrees is that non-unique keys are less efficient in a multi user environment than unique key btrees.
- You can use a class called
a_tempfile to allow you to iterate and update() btrees that have non unique keys.
The following apply to both unique and non-unique indexes:
- When you insert() or update() the btree, the filestore's read lock and write lock are grabbed until the insert() or update() completes. These two methods operate on a single key and should complete quickly. These will block readers until the update/insert is completed. This is necessary because the insert or update may need to split or delete a page that is currently in the stack of other reader threads. When this happens, each active reader thread will re-read it's stack as necessary. This creates an interesting and unusual condition regarding the persistence of the data in the btree, while one thread is iterating over the rows in a btree it will see changes that were made by other threads as soon as the writer thread releases the write lock. In other words, the data in the btree can change while you are iterating over the rows in the btree. This was done to reduce thrashing (rereading of pages) in the btree. FYI, If your btree is not unique, the insert() and update() will wait for the write lock if there is an iterator holding the write lock.
- Loading the btree, importing, exporting, dropping the btree, and deleting all rows from the btree grabs filestore's the read and write lock and blocks all readers and writers until done.
Dependencies:
a_debug.rb - this provides debugging messages for all of my Ruby code. See
a_debug.html for more details.
a_catalog.rb - this class is used to store and retrieve information about the btree. This allows you open btrees by name only after they have been created. See a_catalog.html for more details.
a_filestore.rb - the btree reads and writes to a filestore. A filestore consists of one or more physical files and allows you to delete and reuse space. See a_filestore.html for more details. Benefits of using a FileStore:
- allows a btree to span upto 256 files
- allows many btrees to reside in a single file
- allows you to easily create btrees that can span disk drives
- allows you to easily regulate or limit how much disk space is used
- file space can be added to at any time without impacting existing btrees
aruna - aruna refers to aruna.c and a_buffer.c. These are c extensions for Ruby containing the A_Buffer class and the apack/aunpack methods. These modules (more specifically a_buffer.c) also contain two methods in that provide support for directly writing an object of the A_Buffer class to a filestore. This was done to enhance the performance of the A_BTree class which stores data in the A_Buffer class and writes that data to disk using the A_FileStore class.
Limitations:
A single btree can span as many as 256 files, see a_filestore.html for more details.
A btree page must be <= 64K in size. This is a limitation in the a_buffer class. I plan to fix this soon.
All I/O to each file is written in blocks. If a block is 1000 bytes then each file could be as large as 4 TERA bytes (4,294,967,296,000 bytes) in size (assuming your operating system will support files > 2Gig in size). Add 256 files to the filestore and you can have a single btree that is 1,095,216,660,480,000 bytes or 1,095,216 Gig in size (assuming your operating system will support files > 2Gig in size). If your operating system has a 2 Gig file size limit, then you could have a single btree as large as 512 Gig (256 files * 2 Gig per file).
The btree is multi-thread safe but not multi-process safe. Two or more threads can access the same btree at the same time. Two separate ruby programs can't access the same btree at the same time. A_FileStore.rb should raise an exception if you try. ArunaDB will have a database server constantly running. Each program/user/client application will connect to the ArunaDB server as a separate thread and the server (not the clients) will perform all reads/writes to the btrees.
The btree is not persistent. In other words, other threads can update the btree while you are iterating over the rows in the btree. You will see the updated rows as you are iterating.
It you used apack/aunpack for the key or value by providing a key pack string or data pack string, the key or value can't be nil.
You can iterate over the rows in the btree with the following limitations:
- If the keys are not unique, a single thread can't insert rows while it is iterating over the btree.
- If the keys are not unique, a single thread can't call update(), it can call update_row() while iterating.
- If the keys are not unique, a single thread can't call delete(), it can call delete_row() while iterating.
Performance Considerations:
Set your node and leaf page sizes such that approximately 50-100 rows are stored in each page. This seems to maximize performance for most btrees. This may require you set the block size in the filestore to 512 rather than 1K and set the page size to 1 (this is a 512 byte page size).
Keep the delete nodes in a filestore to a minimum. This is done by making the page size of all btrees in the same filestore similar in size. A lot of delete nodes with varying sizes will hurt the performance of all btree in the filestore. See the section below called "BTree and FileStore Considerations" for more details.
Use packed strings (apack/aunpack) when possible. In most cases packed string are faster and use less disk space.
BTree and FileStore Considerations:
- The primary performance consideration is keeping the delete nodes in a filestore to a minimum. A lot of delete nodes with varying sizes will hurt the performance of all btrees in the filestore. The method A_FileStore.show('FileStoreName') will report how many delete nodes a filestore has.
- When putting many btrees in the same filestore, keep the leaf and node pages size similar for all of the btrees in the filestore. For example:
- block sizes of 1, 2, 4, 8 are acceptable.
- block sizes of 5, 10, 20 are acceptable.
- block sizes of 2, 3, 4 are not acceptable. This may create too many unusable delete nodes and impact performance.
- explaination 1 (how delete nodes work): suppose there are two btrees
- btree1 - page size = 2.
- btree2 - page size = 4.
- Insert 10 rows into each btree.
- Delete a row from btree2, this creates a delete node of 4.
- Insert into btree1, this will use 2 bytes of the delete node and change the delete node from 4 to 2.
- Insert another row into btree2, the delete node is not used unused.
- Insert another row into btree1, the last delete node is used, and the all delete nodes are eliminated.
- explaination 2 (why the above example is bad): suppose there are two btrees
- btree1 - page size = 3.
- btree2 - page size = 4.
- Insert 10 rows into each btree.
- Delete a row from btree2, this creates a delete node of 4.
- Insert into btree1, this will use 3 bytes of the delete node and change the delete node from 4 bytes to 1 byte.
- There is now a delete node in the filestore of 1 block that will never be used. Get a lot of these and performance will suffer.
- Put any very large btrees in a filestore by themselves. This will allow you to easily drop the filestore before reloading the btree. This will eliminate all delete nodes and optimize performance.
- Put any btrees that truncated and reloaded regularly in a filestore by themselves. This will allow you to truncate the filestore before reloading and keep the delete nodes to minimum.
- To rebuild a filestore, so you can eliminate all the delete nodes, you must export all objects (btrees) in the filestore, truncate the filestore, and reload all the objects (btrees) back into the filestore.
- See a_filestore.html limitations for limitations on the A_FileStore class.
- See a_buffer.html limitations for limitations on the A_Buffer class.
- See apack.html limitations for limitation on the apack/aunpack methods.
- For ArunaDB, I used the following guidelines:
- I created a filestore called Tables to contain most general use tables.
- I created a filestore called Indexes to store the indexes for all general use tables.
- I created a filestore called HeavyUse to contain tables that are frequently updated and deleted.
- I created a filestore called HeavyUse_i to store the indexes for tables that are frequently updated and deleted.
- I created a filestore called Daily to contain tables are truncated and loaded daily.
- I created a filestore called Daily_i to store tables that are truncated and loaded daily.
- I create a separate filestore with the same name as the btree for every very large btrees or btrees that take a vary large amount of updates and deletes. This makes it easy to truncate the filestore before reloading the tables. Truncating the filestore will eliminate all delete nodes.
- I create a filestore with the same name as the btree + "_i" for the indexes of each very large btree. Sometimes I put the indexes in the same filestore as the very large table depending on how large the table is and if the block sizes for the indexes and the table are similar. Sometimes I put each index in a separate filestore if the block sizes are different for each index or the table is very, very large.
To Do:
drop/add a column - add support for a block in the import to allow caller to change the imported row while importing
Usage:
require "A_FileStore"
# we need a system catalog so let's create one
A_Catalog.use('./test.ctl')
# create a filestore
A_FileStore.create('test', 1024, './filename.fs)
# create a btree
btree = A_BTree.new('btree_test2', 'test', false, 1, 1)
# insert a few rows, the keys are integers, the values are an array
btree.insert(10, [0, 0])
btree.insert(14, [4444, 3333])
btree.insert(11, [1111, 1111])
#Print the rows
btree.each {|key, value| print key, value.a_show(), "\n"}
# delete some row (10, [0, 0]) while iterating
btree.each(10, 10, true) {|key, value| btree.delete_row() if (value==[0, 0])}
# drop the btree
A_BTree.drop('btree_test2')
A_FileStore.drop('test') # drop the filestore and delete it's files
Class Methods:
A_BTree.connect(name)
Connects you to an existing btree. Return an A_BTree object for you to use.
A_BTree.drop(name)
Drops the btree from the default system catalog (the catalog currently in use).
A_BTree.exists?(name)
Returns true if this btree exists in the default system catalog (the catalog currently in use), otherwise returns false.
A_BTree.new (name, file_store_name=nil, unique=nil, node_page_size=1, leaf_page_size=1, key_pack_string=nil, value_pack_string=nil, check_values=nil)
- name - the name of this btree
- file_store_name - the name of the filestore to put this btree in. This filestore must have been previously created.
- unique - when == true the keys in this btree must be unique. Unique keys have several advantages over non-unique keys so use this unique feature when possible. See the locking strategies section for details on the differences between unique and non-unique keys.
- node_page_size - the size of each node page in blocks. A node is a page that points to other pages.
- leaf_page_size - the size of each leaf page in blocks. A leaf is a page that contains key/value pairs. The node page size and leaf page size can be the same size. I make the distinction between node and leaf page sizes because node pages only stores key/A_Pos pairs while leaf pages store key/value pairs. If the size of the value is much larger than the size of A_Pos (5 bytes) then a larger leaf_page_size would be valuable.
- key_pack_string - when keys are stores in the btree, they must be stored such that they can be written to disk. This paramater allows you specify how this happens. There are basically two options: Marshal.dump/load and apack/aunpack. If you leave this parameter nil or use '*', then Marshal.load is called on the key as the key is stored in the btree. If you provide a valid apack string for the key, then apack is called. See apack documentation for details on valid apack strings. Marhsal.load will allow you store almost any Ruby object but is more inefficient that apack. Apack is very limited in it's usage but is much more efficient than Marshal.load is. If the key is a single value, then the key_pack_string will contain one value. If the key is an array, then the key_pack_string should contain one value per value in the array. If you use apack, then the key can't contain nil. Apack/aunpack works very similar to Array.pack and String.unpack.
- value_pack_string - this work exactly like the key_pack_string above for the values stored in the btree. It allows you specify how values are stored in the btree. There are basically two options: Marshal.dump/load and apack/aunpack. If you leave this parameter nil or use '*', then Marshal.load is called on the value as the value is stored in the btree. If you provide a valid apack string for the data, then apack is called. See apack documentation for details on valid apack strings. Marhsal.load will allow you store almost any Ruby object but is more inefficient that apack. Apack is very limited in it's usage but is much more efficient than Marshal.load is. If the value is a single value, then the key_pack_string will contain one value. If the value is an array, then the key_pack_string should contain one value per value in the array. If you use apack, then the value can't contain nil. Apack/aunpack works very similar to Array.pack and String.unpack.
- check_values - nil will disable key checking. True will enable key checking. When true, a lot of extra key checking is performed in the btree. This makes sure the keys are ordered correctly after inserting, updating, and deleting rows. Setting this to true will hurt the performance of the btree. This is only used when I in the scripts that I use to test the btree.
A_BTree.rename(oldname, newname)
Renames a btree.
A_BTree.version()
Returns the current version of the btree. The most current version is 0.90.
Instance Methods:
[](key)
Alias for find(). Returns the value for the key.
[]=(key, value)
If key exists in the btree, it's value is updated. If key does not exist, it is inserted.
bof()
Short for Beginning Of File. Returns true if any iterator method or read_prev is at the beginning of the file (you read the first key/value pair), otherwise returns nil.
bof?()
Alias for bof.
clear()
Alias for delete_all.
close()
Close this btree.
check_values()
Returns true if the extra key checking is turned on. Otherwise returns nil.
delete(key, value=nil)
Delete this key/value pair from this btree. If value == nil, then all rows matching the key are deleted. Using value allows you to delete one row when the key is not unique.
delete_all()
Delete all elements in this B-Tree.
delete_row()
Delete the current row in the btree. This works in conjunction with each(), each_key(), each_value(), or each_duplicate(). For example:
Each(){|key, value| delete_row() if (value == 22)}
disconnect()
Alias for close.
drop()
Drop this btree. All elements are deleted and the btree is remove from the system catalog.
each(min_key=nil, max_key=nil, grab_write_lock=nil)
Yield(key, value) each key/value pair in the btree. The min_key and max_key parameters are a bit unusual. The btree uses these values to narrow down how many pages have to be read to retrieve the values you are looking for. If both values are nil, then all rows in the table are iterated over. If you are looking to iterate over a range of keys, then set the min_key to smallest key value and the max_key the largest. Using min_key and max_key values could really improve your performance.
- min_key - if you provide a min key, iterating starts where key >= min_key. This allows you reduce the number of pages that have to be read get the specific key(s) you are looking for.
- max_key - if you provide a max key, iterating start when key <= max_key. This allows you reduce the number of pages that have to be read get the specific key you are looking for.
- grab_write_lock - set this to true and this method will grab the write lock before it begins iterator and releases the write lock when iterating completes. This is required when the key is not unique and you want to call delete_row()r update_row(). If the key is unique then setting this to true will keep other threads from iterating and updateing. Don't use this if the key is unique. Use this if the key is not unique and you want to update while iterating.
each_duplicate(min_key=nil, max_key=nil, grab_write_lock=nil)
Yield(key, value, counter) each key/value pair in the btree where the key is duplicated (exists more than once). Yielding "counter" allows you to easily delete duplicate by using "delete_row() if (counter > 0)" as you are iterating over each_duplicate. The min_key and max_key parameters are a bit unusual. The btree uses these values to narrow down how many pages have to be read to retrieve the values you are looking for. If both values are nil, then all rows in the table are iterated over. If you are looking to iterate over a range of keys, then set the min_key to smallest key value and the max_key the largest. Using min_key and max_key values could really improve your performance.
- min_key - if you provide a min key, iterating starts where key >= min_key. This allows you reduce the number of pages that have to be read get the specific key(s) you are looking for.
- max_key - if you provide a max key, iterating start when key <= max_key. This allows you reduce the number of pages that have to be read get the specific key you are looking for.
- grab_write_lock - set this to true and this method will grab the write lock before it begins iterator and releases the write lock when iterating completes. This is required when the key is not unique and you want to call delete_row()r update_row(). If the key is unique then setting this to true will keep other threads from iterating and updateing. Don't use this if the key is unique. Use this if the key is not unique and you want to update while iterating.
each_key(min_key=nil, max_key=nil, grab_write_lock=nil)
Yield(key) each key in the btree. The min_key and max_key parameters are a bit unusual. The btree uses these values to narrow down how many pages have to be read to retrieve the values you are looking for. If both values are nil, then all rows in the table are iterated over. If you are looking to iterate over a range of keys, then set the min_key to smallest key value and the max_key the largest. Using min_key and max_key values could really improve your performance.
min_key - if you provide a min key, iterating starts where key >= min_key. This allows you reduce the number of pages that have to be read get the specific key(s) you are looking for.
- max_key - if you provide a max key, iterating start when key <= max_key. This allows you reduce the number of pages that have to be read get the specific key you are looking for.
- grab_write_lock - set this to true and this method will grab the write lock before it begins iterator and releases the write lock when iterating completes. This is required when the key is not unique and you want to call delete_row()r update_row(). If the key is unique then setting this to true will keep other threads from iterating and updateing. Don't use this if the key is unique. Use this if the key is not unique and you want to update while iterating.
each_leaf(min_key=nil, max_key=nil)
Yield(page_pos, i) each leaf page in the tree. A stack is used to make iterating easier, faster, and allow parent pages to be updated when needed. The min_key and max_key parameters are a bit unusual. The btree uses these values to narrow down how many pages have to be read to retrieve the values you are looking for. If both values are nil, then all rows in the table are iterated over. If you are looking to iterate over a range of keys, then set the min_key to smallest key value and the max_key the largest. Using min_key and max_key values could really improve your performance.
- min_key - if you provide a min key, iterating starts where key >= min_key. This allows you reduce the number of pages that have to be read get the specific key(s) you are looking for.
- max_key - if you provide a max key, iterating start when key <= max_key. This allows you reduce the number of pages that have to be read get the specific key you are looking for.
- grab_write_lock - set this to true and this method will grab the write lock before it begins iterator and releases the write lock when iterating completes. This is required when the key is not unique and you want to call delete_row()r update_row(). If the key is unique then setting this to true will keep other threads from iterating and updateing. Don't use this if the key is unique. Use this if the key is not unique and you want to update while iterating.
each_value(min_key=nil, max_key=nil, grab_write_lock=nil)
Yield(value) each value in the btree. The min_key and max_key parameters are a bit unusual. The btree uses these values to narrow down how many pages have to be read to retrieve the values you are looking for. If both values are nil, then all rows in the table are iterated over. If you are looking to iterate over a range of keys, then set the min_key to smallest key value and the max_key the largest. Using min_key and max_key values could really improve your performance.
- min_key - if you provide a min key, iterating starts where key >= min_key. This allows you reduce the number of pages that have to be read get the specific key(s) you are looking for.
- max_key - if you provide a max key, iterating start when key <= max_key. This allows you reduce the number of pages that have to be read get the specific key you are looking for.
- grab_write_lock - set this to true and this method will grab the write lock before it begins iterator and releases the write lock when iterating completes. This is required when the key is not unique and you want to call delete_row()r update_row(). If the key is unique then setting this to true will keep other threads from iterating and updateing. Don't use this if the key is unique. Use this if the key is not unique and you want to update while iterating.
eof()
Short for End Of File. Returns true if any iterator method or read_next is at the end of the file (you read the last key/value pair), otherwise returns nil.
eof?()
Alias for eof.
exists?(key, value=nil)
Returns true if this key exists in the btree, returns false when the key does not exist. If value is provided, then returns true if this key/value pair exists in the btree. Return false when this key/value pair does not exist.
export(fp)
Writes all key/value pairs in the tree to data_fp. Also wraps the btree data with strings so the import process knows when to stop reading key/value pairs. This allows you to put more than one btree in the file. This process Marshal.dumps all the key/value pairs to the file.
fp - the File pointer of an opened file. You must open the file first.
fetch(key, default=nil)
Alias for find().
fetch_last(key, default=nil)
Alias for find_last().
filestore_name()
Returns the name of the filestore where this btree resides.
find(key, default=nil)
Returns the value for the first element in this btree that matches key. Returns default if key is not found.
find_last(key, default=nil)
Returns the value for the last element in this btree that matches key. Returns default if key is not found.
flush_updates()
If there is a pending update, this will write the updated page to the disk. Under normal circumstances, you don't need to call this method because the iterators will automatically call this when needed. Call this only if you are iterating and call update_row() or delete_row() and you want the update or delete written immediately to disk.
has_key?(key, value=nil)
Alias for exists?().
import(fp, percent_full=95)
Load the data into the btree from fp. The data must have been previously exported.
fp - the File pointer of an opened file. You must open the file first.
include?(key, value=nil)
Alias for exists?().
insert(key, value)
Insert a new record (key/value pair) into this Btree.
key?(key, value=nil)
Alias for exists?().
leaf_page_size()
Returns the size of leaf pages in blocks.
load_start()
Prepare for loading. Loading is a quick way to load data into a btree. The data must be sorted by the key before you start loading. There are three steps, load_start(), load_row(for each row), load_finish(). This grabs the read and write locks.
load_finish()
This completes the load process by writing all unwritten pages. I releases the read and write locks.
load_row(key, value, percent_full=95)
This is called for each key/value pair you want to add to the btree. The data must be sorted by the key before you start calling this method. This is 3-4 times faster than calling insert.
Percent_full - this indicates how full the tree will be after all rows are loaded. The larger the number the less unused space. The smaller the number the more unused space. If there will be a lot of inserts to the btree, then a smaller number will reduce the number of page splits. 95% works well for trees with few inserts/updated/deletes. 80% works well for btree that heavily updated and inserted into.
lock()
Grab the lock on this btree. There is a unique lock for each btree. This is unique to the each btree not each object used for this btee.
name()
Returns the name of this btree.
nitems()
Returns the number of key/value pairs in the tree.
node_page_size()
Returns the size of node pages in blocks.
pages_accessed()
This allows you to see how efficient the tree is. It returns the # of pages read after calling any of the methods that iterate over rows such as delete(), update(), each(), each_value(), each_duplicate, etc.
read_end(release_write_lock=nil)
Read_first(), read_last(), read_next(), read_prev(), read_end() allow you move forward and backward through the key/value pairs in the btree. All iterator methods call these methods. If you call read_first() or read_last(), you must call read_end(). Example:
btree.read_first()
begin
while(!btree.eof)
btree.read_next()
end
ensure
btree.read_end()
end
read_first(min_key=nil, max_key=nil, grab_write_lock=nil)
Read_first(), read_last(), read_next(), read_prev(), read_end() allow you move forward and backward through the key/value pairs in the btree. All iterator methods call these methods. If you call read_first() or read_last(), you must call read_end(). See read_end for an example.
read_last(min_key=nil, max_key=nil, grab_write_lock=nil)
Read_first(), read_last(), read_next(), read_prev(), read_end() allow you move forward and backward through the key/value pairs in the btree. All iterator methods call these methods. If you call read_first() or read_last(), you must call read_end(). See read_end for an example.
read_next(max_key=nil)
Read_first(), read_last(), read_next(), read_prev(), read_end() allow you move forward and backward through the key/value pairs in the btree. All iterator methods call these methods. If you call read_first() or read_last(), you must call read_end(). See read_end for an example.
read_prev(min_key=nil)
Read_first(), read_last(), read_next(), read_prev(), read_end() allow you move forward and backward through the key/value pairs in the btree. All iterator methods call these methods. If you call read_first() or read_last(), you must call read_end(). See read_end for an example.
refresh()
This will force a re-read of all pages used by this btree. You should never need to call this method. This method is called internally when an error occurs to make sure pending inserts and updates are cleared.
rename(newname)
Rename this btree to newname.
reverse_each(min_key=nil, max_key=nil, grab_write_lock=nil)
Yield(key, value) each key/value pair in the btree.
- min_key - if you provide a min key, iterating starts where key >= min_key. This allows you reduce the number of pages that have to be read get the specific key(s) you are looking for.
- max_key - if you provide a max key, iterating start when key <= max_key. This allows you reduce the number of pages that have to be read get the specific key you are looking for.
- grab_write_lock - set this to true and this method will grab the write lock before it begins iterator and releases the write lock when iterating completes. This is required when the key is not unique and you want to call delete_row()r update_row(). If the key is unique then setting this to true will keep other threads from iterating and updateing. Don't use this if the key is unique. Use this if the key is not unique and you want to update while iterating.
reverse_each_key(min_key=nil, max_key=nil, grab_write_lock=nil)
Yield(key) each key in the btree.
- min_key - if you provide a min key, iterating starts where key >= min_key. This allows you reduce the number of pages that have to be read get the specific key(s) you are looking for.
- max_key - if you provide a max key, iterating start when key <= max_key. This allows you reduce the number of pages that have to be read get the specific key you are looking for.
- grab_write_lock - set this to true and this method will grab the write lock before it begins iterator and releases the write lock when iterating completes. This is required when the key is not unique and you want to call delete_row()r update_row(). If the key is unique then setting this to true will keep other threads from iterating and updateing. Don't use this if the key is unique. Use this if the key is not unique and you want to update while iterating.
reverse_each_value(min_key=nil, max_key=nil, grab_write_lock=nil)
Yield(value) each value in the btree.
- min_key - if you provide a min key, iterating starts where key >= min_key. This allows you reduce the number of pages that have to be read get the specific key(s) you are looking for.
- max_key - if you provide a max key, iterating start when key <= max_key. This allows you reduce the number of pages that have to be read get the specific key you are looking for.
- grab_write_lock - set this to true and this method will grab the write lock before it begins iterator and releases the write lock when iterating completes. This is required when the key is not unique and you want to call delete_row()r update_row(). If the key is unique then setting this to true will keep other threads from iterating and updateing. Don't use this if the key is unique. Use this if the key is not unique and you want to update while iterating.
set_check_values=(value)
To turn on extra key checking on in the btree, set this to true (i.e. set_check_values = true). To turn off extra key checking in the btree, set this to nil. When set to true, a lot of extra checking is performed on the btree to make sure that inserts, updates, and deletes are working properly. I only use this when testing the btree. Setting this to true will hurt the performance of the btree.
show(report_type='A', prefix='')
Prints information about the btree to standard out.
- report_type specifies the type of report you want to see. There are three possible reports:
'A' will print a one line summary of the btree and its contents
'S' will print 'A' plus information on each node in the tree. All leafs under the same node are all printed together as one page. This is a good condensed view of the btree.
'L' will print a very detailed view of the tree. The details of each leaf are printed. This will produce a lot of output when the tree has a lot of key/value pairs.
- prefix - this allows you easily indent the report
size()
Returns the number of blocks used by this btree.
synchronize()
Works just like Mutex.synchronize for the lock on this btree. There is a unique lock for each btree. This is unique to the each btree not each object used for this btee.
to_s()
Returns the content of show('A').
truncate()
Alias for delete_all(). Removes all key/value pairs from the btree.
unique()
Returns true if the keys in this btree are unique, otherwise returns nil.
unique=(value)
Set to true if the keys in this btree are unique. Set to false or nil if the keys in this btree are not unique. This value is also provided when you create the btee. This method allows you to change the unique value. You can turn this on and off as needed. Keep in mind that if this value is changed from false to true you could have duplicate keys in the btree. In other words, you can set this to true even if there are duplicate keys in the btree. You need to manually remove any duplicate values.
unlock()
Release the lock on this btree. There is a unique lock for each btree. This is unique to the each btree not each object used for this btee.
update(key, new_value, old_value=nil)
Update the value for key in this btree with new_value. If the key is not unique, you can provide and old_value to update only one key/value pair in the tree. If old_value == nil, then all rows matching key will be updated with new_value.
update_row(value)
Update the value for the current row in the btree. This works in conjunction with each(), each_key(), each_value(), or each_duplicate(). For example:
Each(){|key, value| update_row() if (value == 22)}
Testing:
a_btree_tst.rb - this script tests the basic functionality of the btree. It's great for seeing this class in action.
a_btree_tst_detail.rb - this script tests the as much of the functionality as possible. It makes sure everything works.
a_btree_tst_perf.rb - this script runs performance tests.
a_btree_tst_threads.rb - this script makes sure this class is thread safe.
a_btree_tst_merge.rb - this script runs a lot of detailed test to make sure merging and splitting page works correctly in all conditions. These tests can take days to complete.
A_Debug Usage:
20 - prints high level stuff like create, open, drop, etc
21 - loading rows - global btree locking
22 -
23 - insert/update/delete/find/exists
24 - stack related stuff
25 - create/drop btree pages
26 - not used
27 - read/write btree pages
28 - btree read/write locking
29 - searching
See a_debug for more details.