RegisterSign In

API Documentation: Class OrderedSetData

Package: com.mckoi.data
extends java.util.AbstractSet<com.mckoi.data.ByteArray>
implements java.util.SortedSet<com.mckoi.data.ByteArray>

An ordered set of variable length data strings mapped over a single DataFile object. Set modifications are immediately reflected in the underlying data file. This object grows and shrinks the size of the underlying data file as values are inserted and removed from the set. This is a convenient way to manage a set of arbitrary variable length data objects in a single DataFile.

For value look up, this class implements a binary search algorithm over the address space of all bytes of all strings stored in the file. The byte string items are encoded in the DataFile such that each string item is followed by an 0x0FFF8 sequence. 0x0FFF8 in the data item is encoded as a double 0x0FFF8 sequence.

Note that every array read and written must go through an encoding/decoding process that ensures the data is escaped correctly. The worst case scenario (where the string contains all 0x0FFF8 sequences) the encoded size will be double the input size. The '0x0FFF8' sequence was chosen because it will never appear in valid utf-8 encoded form.

While this structure is able to store data strings of any length, it should be noted that the search algorithm will read and store an entire string in memory for each item it visits during a query. It is therefore recommended that excessively large data strings should not be stored in this structure if good search performance and low memory usage is desired.

OrderedSetData stores 64 bits of meta information (a static magic value) at the start of the DataFile on any set that contains a none zero quantity of strings. This meta information is intended to help identify DataFile structures that are formatted by this object.

This object implements java.lang.SortedSet<ByteArray>.

PERFORMANCE: While the data string search and iteration functions are efficient, the size() query requires a full scan of all the data in file to compute.

Constructors Summary

OrderedSetData(DataFile data, Comparator<ByteArray> collator)
OrderedSetData(DataFile data)

Methods Summary

int size()
boolean isEmpty()
Iterator<ByteArray> iterator()
boolean contains(Object str)
boolean add(ByteArray value)
boolean replace(ByteArray value)
void replaceOrAdd(ByteArray value)
boolean remove(Object value)
void clear()
Comparator<ByteArray> comparator()
OrderedSetData subSet(ByteArray from_element, ByteArray to_element)
OrderedSetData headSet(ByteArray to_element)
OrderedSetData tailSet(ByteArray from_element)
ByteArray first()
ByteArray last()

Constructor Details

OrderedSetData(DataFile data, Comparator<ByteArray> collator)

Creates this structure mapped over the given DataFile object. 'collator' describes the collation of strings in the set, or null if the order of strings should be lexicographical.

Note that the collator object behavior must be consistent over all use of instances of this object on a DataFile object. An OrderedSetData that has managed a backed DataFile under one collation will not work correctly if the collation is changed. If such a situation happens, the class function behavior is undefined.

OrderedSetData(DataFile data)

Creates this structure mapped over the given DataFile object. The order of strings in this string set is lexicographical.

Method Details

int size()

Returns the total number of elements in the set or Integer.MAX_VALUE if the set contains Integer.MAX_VALUE or more values.

PERFORMANCE: This operation will scan the entire set to determine the number of elements. Avoid using this operation to scale for large sets.

boolean isEmpty()

Returns true if the set is empty. This is a low complexity query.

Iterator<ByteArray> iterator()

Returns an Iterator over all the strings stored in this set in collation order.

boolean contains(Object str)

Returns true if the set contains the given string. Assumes the set is ordered by the collator.

boolean add(ByteArray value)

Adds a string to the set in sorted order as defined by the collator defined when the object is created. Returns true if the set does not contain the string and the string was added, false if the set already contains the value.

boolean replace(ByteArray value)

Finds a data string in the set that the comparator compares as equal with the given value and replaces the content with the given value. This is used with custom comparators that only consider a small part of the data element when determining order. For example, a record may contain a key and a variable value - the key is used for collation ordering and the variable value may be changed using this method.

Returns true if a value was found and replaced. Returns false if the value was not found and nothing was replaced.

void replaceOrAdd(ByteArray value)

Replaces or adds a data string to the set depending on whether an entry is found in the set. If an entry that compares equally (as determined by the comparator) is found in the set then it is replaced. If no entry is found that compares equally then the entry is added in the correct sorted location in the set.

boolean remove(Object value)

Removes the value from the set if it is present. Assumes the set is ordered by the collator.

void clear()

Clears the set of all string items.

Comparator<ByteArray> comparator()

The comparator for this set.

OrderedSetData subSet(ByteArray from_element, ByteArray to_element)

Returns the sorted subset of string items from this set between the string 'from_element' (inclusive) and 'to_element' (exclusive), as ordered by the collation definition. The behavior of this method follows the contract as defined by java.util.AbstractSet.

OrderedSetData headSet(ByteArray to_element)

Returns the sorted subset of string items from this set between the start and 'to_element' (exclusive) from this set, as ordered by the collation definition. The behavior of this method follows the contract as defined by java.util.AbstractSet.

OrderedSetData tailSet(ByteArray from_element)

Returns the sorted subset of string items from this set between the string 'from_element' (inclusive) and the end of the set, as ordered by the collation definition. The behavior of this method follows the contract as defined by java.util.AbstractSet.

ByteArray first()

Returns the first (lowest) string item currently in this set.

ByteArray last()

Returns the last (highest) string item currently in this set.

The text on this page is licensed under the Creative Commons Attribution 3.0 License. Java is a registered trademark of Oracle and/or its affiliates.
Mckoi is Copyright © 2000 - 2020 Diehl and Associates, Inc. All rights reserved.