Seva Software

 

What is Aruna DB?

Last Updated: 7/13/2001

A_FileStore.rb

Purpose

License

Description

Dependencies

Limitations

Performance Considerations

To Do

Usage

Class Methods

Instance Methods

Testing

A_Debug Usage

 

Purpose:

This contains all I/O methods for ArunaDB via the A_BTree class. Basically, it is an abstraction layer to the File class. I wrote this class to allow a btree to span multiple files and allow multiple btrees to reside in a single file. A_FileStore also allows me to free or delete blocks of file space when btree pages are delete that are reused when new btree pages are created. I recommend that you suffix all filestore file with '.fs' so they can be easily identified. This works great when the items you are writing to the A_FileStore are similar in size, persistent, and you want to randomly access the data. This works poorly for writing many small items that vary in size or you only need sequential access the data.

 

Description:

Provides an abstraction layer to the File class that allows one or more (up to 256) physical files to belong to a filestore. A_FileStore allows you to delete a block of file space and reuse that deleted file space with future writes. Deleted file space is cached into memory to keep the performance good. This module is not thread safe. I have provided methods that allow you lock and unlock a A_FileStore. If you are running multiple threads, you should lock the filestore before accessing or updating it and unlock it when you are finished. A synchronize method has been provided to make this easy and consistent with Mutex usage.

You can specify the block size for each filestore. All I/O is done in terms of the block size. For example, if your block size is 100 and you write 2 blocks to the filestore, the actual File.write will be 200 bytes in size. By setting the block size to 1K, you can have files that are much bigger than 4G (1000 times larger), assuming you have a disk drive with more than 4G free space and the operating system can support files larger than 4G. Setting the block size to 1 is the same as writing directly to a file with fread and fwrite.

This uses A_Catalog.rb to remember information about the filestore so it can be reopened by using only it's name. This also used the A_Pos class for randomly reading and writing blocks to and from disk.

Created a new error called FileStoreIsFull. This is raised whenever you call write(), use_space(), or make_avail() and there is not enough free space in the filestore.

There are two classes used by A_FileStore that can be found in the C module a_buffer.c called A_Pos and A_DeleteNode. The A_Pos class is used to represent the position of an object in the filestore. It is similar to File.pos. The write(), use_space(), and read() method use and return an A_Pos object. The A_DeleteNode class is used internally by A_Filestore to track deleted file space in a filestore. Since this class is never returned or used outside of a filestore, there is no documentation on the A_DeleteNode class at this time.

 Several C methods were added to the A_FileStore class in module called a_buffer.c (see A_Buffer.html for a complete description of a_buffer.c) for performance reasons. These are used internally by the A_FileStore class to read and write A_Buffer class directly to disk. They are:

 

Dependencies:

 

Limitations:

  • A filestore can contain a max of 256 files.
  • You must drop and recreate a filestore to change the block_size of any of it's files. Make sure you export ALL of the objects in the filestore before you drop it.
  • If you have a large disk drive (30 Gig or more) and your operating system has a 2 Gig limit per file, you can add 15 files (each with a max size of 2 Gig) to the filestore that all reside on the 30 Gig drive. Since a single filestore can have 256 files, you could have a single filestore that is 512 Gig (256 * 2 Gig) in size (assuming you have that much free disk space).

 

Performance Considerations:

 

To Do:

 

Usage:

require "a_filestore"

# we need a system catalog so let's create one

A_Catalog.use('./test.ctl')

A_FileStore.drop('test') if (A_FileStore.exists?('test') == true) # delete the filestore if it exists

 

# create a filestore

A_FileStore.create('test', 1024, './filename.fs')

# add a file to the filestore and limit the max file space in the new file to 50K (50 blocks)

A_FileStore.add_file('test', './filename2.fs', 50)

fs=A_FileStore.connect('test') # use the filestore

pos1 = fs.write('This is a test'.ljust(1024), nil, 1) # write to the end of the filestore, the buffer must be 1K

pos2 = fs.write('This is a test2'.ljust(1024), nil, 1) # write to the end of the filestore, the buffer must be 1K

print 'pos2: ', fs.read(pos2, 1).strip, "\n"

print 'pos1: ', fs.read(pos1, 1).strip, "\n"

print "There should be no delete blocks\n"

fs.show()

fs.delete(pos1, 1) # delete position 1

print "There should be one delete block\n"

fs.show() # you should see one delete block

pos3 = fs.write('This is a test3'.ljust(1024), nil, 1) # write again to the end of the filestore

pos3 = fs.write('This is a test4', pos3, 1) # padding is only needed when writing to the end of the of the filestore

print "There should be no delete blocks\n"

fs.show() # you should see no delete blocks

fs.disconnect() # disconnect from the file store

A_FileStore.drop('test') # drop the filestore and delete it's files

 

Class Methods:

A_FileStore.add_file(name, filename, max_size=nil, delete_cache_size=200)

Add a file to the filestore - new writes are evenly distributed across all files, when a file becomes full or the device this file reside on becomes full, then this file is no longer written to.

A_FileStore.close(name)

Close all open files associated this the filestore 'name'.

A_FileStore.close_all(force=nil)

Close all open file stores - all callers must disconnect() first unless you set force=true.

A_FileStore.connect(name)

Use or open a previously created file store.

A_FileStore.create(name, block_size=1K, filename=nil, max_size=nil, delete_cache_size=200)

Creates the filestore and returns it's object. This is the same as A_FileStore.new() except that create() will fail if the filestore already exists, new() will reopen the filestore if it already exists. See the description for A_Filestore.new() for a description of the parameters.

A_FileStore.delete_cache_size(name)=value

Change the delete_cache_size for all files in a filestore. Value is the number of delete nodes to store in memory.

A_FileStore.drop(name, force=nil)

Drop a filestore and delete all of its files. You will loose all data currently stored in the filestore. You should drop all A_BTree objects store in the filestore before dropping the filestore. You should also consider exporting the objects stored in the filestore before dropping it. All callers must disconnect() before you can drop a filestore (unless you set force=true).

force==true will drop the filestore even if other threads are connected.

A_FileStore.drop_file(name, filename)

Drop a file from a filestore - this is very dangerous - you will loose data if a write has occurred in this file since you added the file. You should never drop a file. You can use move_file() to move this file to a different disk drive, folder, or directory.

A_FileStore.exists?(name)

Returns true if this filestore exists, otherwise returns false. Example:

A_Catalog.use('catalog_name')

if (A_FileStore.exists?('filestore name') == true)

A_FileStore.show('filestore name')

end

A_FileStore.max_size(name, filename, new_max_size)

Change the max size of a file, if you are reducing the max_size and the new max_size is less than the current file size, then the current file size will become the new max_size. In other words, you can't set the max_size to a value that is smaller than the current file size. FYI: the new_max_size is in number of blocks and not number of bytes.

A_FileStore.move_file(name, old_filename, new_filename)

Move (mv) a file to a different name and/or path. This will rename old_filename to new_filename. You should never do this directly using the mv command because the filestore will fail. This method will update the system catalog to properly remember this change.

A_FileStore.new(name, block_size=1K, filename=nil, max_size=nil, delete_cache_size=200)

If name exists as a filestore then it is opened using the values (parameters) from the system catalog (A_Catalog). If the filestore does not exist then it is created and the parameters are recorded in the system catalog. All connections to FileStores are cached in memory. As a result, when you connect to the same filestore twice, both connections will share the same file pointers. For example, the A_BTree class connects to a filestore every time you open a btree. If there are many btrees in the same filestore, then each filestore could get opened many times. By caching and sharing FileStores, connecting to A_BTree objects should be fast and efficient.

A_FileStore.open(name)

Alias for connect(name)

A_FileStore.open_all()

Open all existing (previously created) file stores.

A_FileStore.show(name=nil, prefix='')

Print info about the file store called name, see show() below for details. If name is nil then information about all filestores is printed. Prefix is printed at the front of the info for each filestore. This allows you to indent or prefix each line that is printed for nicer formatting.

A_FileStore.sync(name, true/false)

Turns the sync flag on or off for each file in the file store. False should be faster but more risky if the server crashes.

A_FileStore.truncate(name)

Empties each file in the filestore. This will quickly erase all data in a filestore. You will loose all data currently stored in the filestore. You should consider exporting the objects stored in the filestore before truncating it. You should never, never truncate a filestore that contains a btree. You should drop all btrees in this filestore before you truncate it.

 

Instance Methods:

add_file(filename, max_size=nil, delete_cache_size=200)

Add a file to the filestore - new writes are evenly distributed across all files, when a file becomes full or the device this file reside on become full, this file is is no longer written to.

delete_cache_size=value

Change the delete_cache_size for all files in a filestore. Value is the number of delete nodes to store in memory.

disconnect()

Disconnect from the file store. This just tells the filestore you are finished with it. It still remains loaded in memory and ready for use.

close()

If you are the only user of the filestore, then it and all of its files are closed. If you are not the only caller, then this calls disconnect(). Closing the files when there are other callers will cause the other callers to die.

delete(apos, size)

Deletes the buffer at apos so that it can be reused later.

drop_file(filename)

Drop a file from this filestore - this is very dangerous - you will loose data if a write has occurred in this file since you added the file. You should never drop a file. You can use move_file() to move this file to a different disk drive, folder, or directory.

free_space()

Returns the amount of unused space (in blocks not bytes) in the filestore.

get_max_size(filename=nil)

Returns the max size of filename. It will return nil if the max size is nil. If filename is nil, then the sum of the max size of all files in the filestore is returned.

lock()

Works exactly like Mutex.lock. This locks the Mutex for this filestore.

make_avail(size, count)

Makes sure that count number of size bytes is available in this filestore. If there are enough deleted blocks then this return a number > 0. If there aren't enough delete block then space is created and deleted by writing empty blocks to the file. This will raise a FileStoreIsFull error if count * size blocks can't be written to this filestore. This is used by A_BTree to make sure there is enough free space to successfully complete a page split before starting the page split. Raises FileStoreIsFull if the filestore is full.

max_size(filename, new_max_size)

Change the max size for filename in this filestore.C

move_file(old_filename, new_filename)

Move (mv) a file to a different name and/or path. This will rename old_filename to new_filename. You should never do this directly using the mv command because the filestore will fail. This method will update the system catalog to properly remember this change.

read(apos, size)

Reads and returns the String object at apos.

read_buffer(abuffer, apos)

Reads from the filestore into abuffer at apos. This support direct reading of A_Buffer objects. This is used by A_BTree for performance reasons. Write can write either a String object or an A_Buffer object.

show()

Print info about this filestore. The following example has to files:

FileStore:'tst_a_filestore',blocksize:128,#files=2, size:12,free=3,opened=1:

File:0,'tst_a_filestore1.txt',size=4,free:1,max=10,sync:T{DeleteCache:size=200,cached=1,counts=1@1 block}

File:1,'tst_a_filestore3.txt',size=2,free:2,max=2,sync:T{DeleteCache:size=200,cached=2,counts=2@1 block}

Explanation:

The block size of each read/write is 128 bytes; there are 2 files in this filestore; the total physical file size of all files in the filestore is 6 blocks; there are 3 free or unused blocks; and it has been opened 1 time. The actual file size of this filestore is 6 * 128 bytes. The max size this filestore could get is 12 blocks (max=10 + max=2).

File:0 has a file name of 'tst_a_filestore1.txt'; it's physical size is 4 blocks; it has 1 free or unused block that is 1 block in size; it has a max size of 10 blocks; the File.sync flag is true; the delete cache is 200 nodes large (this is the default and should be adequate for most applications); there is currently one delete node cached in memory; the one delete node has a size of 1 block. If there were 3 delete nodes with different sizes, the counts could look list this =2@1 block,1@2 blocks.

File:1 has a file name of 'tst_a_filestore3.txt'; it's physical size is 2 blocks; it has 2 free or unused blocks that are 1 block in size each; it has a max size of 2 blocks; the File.sync flag is true; the delete cache is 200 nodes large; there is currently two delete node cached in memory; the two delete nodes have a size of 1 block. When the two unused blocks become used, this file will full and all future writes will to File:0.

size()

Returns the number of blocks used by all files in the filestore. This is the total amount of disk space, in blocks, used by this filestore. This includes unused space or deleted blocks.

sync = true/false

Turns the sync flag on or off for each file in the file store. False should be faster but more risky if the server crashes.

synchronize()

Works exactly like Mutex.synchronize. A Mutex is create for each filestore. This locks the Mutex for this filestore, executes the block, and releases the Mutex. For example:

fs = A_FileStore.connect('test_filestore')

fs.synchronize do

perform your filestore updates here

end

truncate()

Empties each file in the filestore. This will quickly erase all data in a filestore. You will loose all data currently stored in the filestore. You should consider exporting the objects stored in the filestore before truncating it. Make sure you truncate all btrees in the filestore before you truncate a filestore, otherwise you will have to manually remove the btrees from the catalog and recreate them.

unlock()

Works exactly like Mutex.unlock. This unlocks the Mutex for this filestore.

use_space(size, skip_deletes=nil)

Returns the apos (see A_Pos) where this size can be safely written - this is automatically called by write() when apos==nil. Raises FileStoreIsFull if the filestore is full.

write(buffer, apos, size, skip_deletes=nil)

Write buffer at apos, if apos == nil then write at the first deleted block or EOF if there are no deleted blocks, returns apos (see A_Pos). Raises an error if apos == nil and buffer.size != size * block_size. If (apos != to nil) then an error is raised only if buffer.size > size * block_size. Raises FileStoreIsFull if the filestore is full.

 

Testing: 

a_filestore_tst.rb - this script tests the basic functionality of the filestore. It's great for seeing this class in action.

a_filestore_tst_detail.rb - this script tests the as much of the functionality as possible. It makes sure everything works.

a_filestore_tst_perf.rb - this script runs performance tests.

a_filestore_tst_threads.rb - this script is an example of multiple threads accessing a filestore at the same time.

 

A_Debug Usage:

3 - prints high level stuff like creating and opening an A_FileStore object

31 - global locking related to filestores

32 - not used

33 - not used

34 - not used

35 - not used

36 - finding/reading/writing delete notes

37 - reading/writing blocks

38 - lock/unlock wheh reading/writting in a filestore

39 - not used

See a_debug for more details.