Involved Source Fileshash_funcs.gohash_string.go Package hashing provides utilities for and an implementation of a hash
table which is more performant than the default go map implementation
by leveraging xxh3 and some custom hash functions.xxh3_memo_table_types.go
BinaryMemoTable is our hashtable for binary data using the BinaryBuilder
to construct the actual data in an easy to pass around way with minimal copies
while using a hash table to keep track of the indexes into the dictionary that
is created as we go. CopyFixedWidthValues exists to cope with the fact that the table doesn't keep
track of the fixed width when inserting the null value the databuffer holds a
zero length byte slice for the null value (if found) CopyLargeOffsets copies the list of offsets into the passed in slice, the offsets
being the start and end values of the underlying allocated bytes in the builder
for the individual values of the table. out should be at least sized to Size()+1 CopyLargeOffsetsSubset is like CopyOffsets but instead of copying all of the offsets,
it gets a subset of the offsets in the table starting at the index provided by "start". CopyOffsets copies the list of offsets into the passed in slice, the offsets
being the start and end values of the underlying allocated bytes in the builder
for the individual values of the table. out should be at least sized to Size()+1 CopyOffsetsSubset is like CopyOffsets but instead of copying all of the offsets,
it gets a subset of the offsets in the table starting at the index provided by "start". CopyValues copies the raw binary data bytes out, out should be a []byte
with at least ValuesSize bytes allocated to copy into. CopyValuesSubset copies the raw binary data bytes out starting with the value
at the index start, out should be a []byte with at least ValuesSize bytes allocated(*BinaryMemoTable) Exists(val []byte) bool Get returns the index of the specified value in the table or KeyNotFound,
and a boolean indicating whether it was found in the table. GetNull returns the index of a null that has been inserted into the table or
KeyNotFound. The bool returned will be true if there was a null inserted into
the table, and false otherwise.(*BinaryMemoTable) GetOrInsert(val interface{}) (idx int, found bool, err error) GetOrInsertBytes returns the index of the given value in the table, if not found
it is inserted into the table. The return value 'found' indicates whether the value
was found in the table (true) or inserted (false) along with any possible error. GetOrInsertNull retrieves the index of a null in the table or inserts
null into the table, returning the index and a boolean indicating if it was
found in the table (true) or was inserted (false). GetOrInsert returns the index of the given value in the table, if not found
it is inserted into the table. The return value 'found' indicates whether the value
was found in the table (true) or inserted (false) along with any possible error. Release is used to tell the underlying builder that it can release the memory allocated
when the reference count reaches 0, this is safe to be called from multiple goroutines
simultaneously Reset dumps all of the data in the table allowing it to be reutilized. Retain increases the ref count, it is safe to call it from multiple goroutines
simultaneously. Size returns the current size of the memo table including the null value
if one has been inserted.( BinaryMemoTable) TypeTraits() TypeTraits(*BinaryMemoTable) Value(i int) []byte ValuesSize returns the current total size of all the raw bytes that have been inserted
into the memotable so far. VisitValues exists to run the visitFn on each value currently in the hash table.(*BinaryMemoTable) WriteOut(out []byte)(*BinaryMemoTable) WriteOutSubset(start int, out []byte)
*BinaryMemoTable : MemoTable[T]
*BinaryMemoTable : github.com/apache/arrow-go/v18/arrow/scalar.Releasable
*BinaryMemoTable : github.com/gogo/protobuf/proto.Sizer
func NewBinaryMemoTable(initial, valuesize int, bldr BinaryBuilderIFace) *BinaryMemoTable
MemoTable interface for hash tables and dictionary encoding.
Values will remember the order they are inserted to generate a valid
dictionary. CopyValues populates out with the values currently in the table, out must
be a slice of the appropriate type for the table type. CopyValuesSubset is like CopyValues but only copies a subset of values starting
at the indicated index. Get returns the index of the table the specified value is, and a boolean indicating
whether or not the value was found in the table. Will panic if val is not the appropriate
type for the underlying table. GetNull returns the index of the null value in the table, but does not
insert one if it doesn't already exist. Will return -1 if it doesn't exist
indicated by a false value for the boolean. GetOrInsert returns the index of the table the specified value is,
and a boolean indicating whether or not the value was found in
the table (if false, the value was inserted). An error is returned
if val is not the appropriate type for the table. GetOrInsertBytes returns the index of the table the specified value is,
and a boolean indicating whether or not the value was found in
the table (if false, the value was inserted). An error is returned
if val is not the appropriate type for the table. This function is intended to be used by
the BinaryMemoTable to prevent unnecessary allocations of the data when converting from a []byte to interface{}. GetOrInsertNull returns the index of the null value in the table,
inserting one if it hasn't already been inserted. It returns a boolean
indicating if the null value already existed or not in the table. Reset drops everything in the table allowing it to be reused Size returns the current number of unique values stored in
the table, including whether or not a null value has been
inserted via GetOrInsertNull.( MemoTable) TypeTraits() TypeTraits WriteOut copies the unique values of the memotable out to the byte slice
provided. Must have allocated enough bytes for all the values. WriteOutSubset is like WriteOut, but only writes a subset of values
starting with the index offset.
*BinaryMemoTableNumericMemoTable(interface)
*Table[...]TypedMemoTable[...] (interface)
MemoTable : github.com/gogo/protobuf/proto.Sizer
func github.com/apache/arrow-go/v18/arrow/array.GetDictArrayData(mem memory.Allocator, valueType arrow.DataType, memoTable MemoTable, startOffset int) (*array.Data, error)
CopyValues populates out with the values currently in the table, out must
be a slice of the appropriate type for the table type. CopyValuesSubset is like CopyValues but only copies a subset of values starting
at the indicated index. Get returns the index of the table the specified value is, and a boolean indicating
whether or not the value was found in the table. Will panic if val is not the appropriate
type for the underlying table. GetNull returns the index of the null value in the table, but does not
insert one if it doesn't already exist. Will return -1 if it doesn't exist
indicated by a false value for the boolean. GetOrInsert returns the index of the table the specified value is,
and a boolean indicating whether or not the value was found in
the table (if false, the value was inserted). An error is returned
if val is not the appropriate type for the table. GetOrInsertBytes returns the index of the table the specified value is,
and a boolean indicating whether or not the value was found in
the table (if false, the value was inserted). An error is returned
if val is not the appropriate type for the table. This function is intended to be used by
the BinaryMemoTable to prevent unnecessary allocations of the data when converting from a []byte to interface{}. GetOrInsertNull returns the index of the null value in the table,
inserting one if it hasn't already been inserted. It returns a boolean
indicating if the null value already existed or not in the table. Reset drops everything in the table allowing it to be reused Size returns the current number of unique values stored in
the table, including whether or not a null value has been
inserted via GetOrInsertNull.( NumericMemoTable) TypeTraits() TypeTraits WriteOut copies the unique values of the memotable out to the byte slice
provided. Must have allocated enough bytes for all the values.( NumericMemoTable) WriteOutLE(out []byte) WriteOutSubset is like WriteOut, but only writes a subset of values
starting with the index offset.( NumericMemoTable) WriteOutSubsetLE(offset int, out []byte)
*Table[...]
NumericMemoTable : MemoTable[T]
NumericMemoTable : github.com/gogo/protobuf/proto.Sizer
Type Parameters:
T: MemoTypes CopyValues populates out with the values currently in the table, out must
be a slice of the appropriate type for the table type. CopyValuesSubset is like CopyValues but only copies a subset of values starting
at the indicated index.( TypedMemoTable[T]) Exists(T) bool Get returns the index of the table the specified value is, and a boolean indicating
whether or not the value was found in the table. Will panic if val is not the appropriate
type for the underlying table. GetNull returns the index of the null value in the table, but does not
insert one if it doesn't already exist. Will return -1 if it doesn't exist
indicated by a false value for the boolean. GetOrInsert returns the index of the table the specified value is,
and a boolean indicating whether or not the value was found in
the table (if false, the value was inserted). An error is returned
if val is not the appropriate type for the table. GetOrInsertBytes returns the index of the table the specified value is,
and a boolean indicating whether or not the value was found in
the table (if false, the value was inserted). An error is returned
if val is not the appropriate type for the table. This function is intended to be used by
the BinaryMemoTable to prevent unnecessary allocations of the data when converting from a []byte to interface{}. GetOrInsertNull returns the index of the null value in the table,
inserting one if it hasn't already been inserted. It returns a boolean
indicating if the null value already existed or not in the table.( TypedMemoTable[T]) InsertOrGet(val T) (idx int, found bool, err error) Reset drops everything in the table allowing it to be reused Size returns the current number of unique values stored in
the table, including whether or not a null value has been
inserted via GetOrInsertNull.( TypedMemoTable[T]) TypeTraits() TypeTraits WriteOut copies the unique values of the memotable out to the byte slice
provided. Must have allocated enough bytes for all the values. WriteOutSubset is like WriteOut, but only writes a subset of values
starting with the index offset.
TypedMemoTable : MemoTable[T]
TypedMemoTable : github.com/gogo/protobuf/proto.Sizer
for smaller amounts of bytes this is faster than even calling into
xxh3 to do the Hash, so we specialize in order to get the benefits
of that performance.
NewBinaryMemoTable returns a hash table for Binary data, the passed in allocator will
be utilized for the BinaryBuilder, if nil then memory.DefaultAllocator will be used.
initial and valuesize can be used to pre-allocate the table to reduce allocations. With
initial being the initial number of entries to allocate for and valuesize being the starting
amount of space allocated for writing the actual binary data.
KeyNotFound is the constant returned by memo table functions when a key isn't found in the table
The pages are generated with Goldsv0.8.2. (GOOS=linux GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @zigo_101 (reachable from the left QR code) to get the latest news of Golds.