package hashing

Import Path
	github.com/apache/arrow-go/v18/internal/hashing (on go.dev)

Dependency Relation
	imports 9 packages, and imported by 3 packages

Involved Source Files hash_funcs.go hash_string.go Package hashing provides utilities for and an implementation of a hash table which is more performant than the default go map implementation by leveraging xxh3 and some custom hash functions. xxh3_memo_table_types.go
Package-Level Type Names (total 20)
/* sort by: | */
( BinaryBuilderIFace) Append([]byte) ( BinaryBuilderIFace) AppendNull() ( BinaryBuilderIFace) AppendString(string) ( BinaryBuilderIFace) DataLen() int ( BinaryBuilderIFace) Len() int ( BinaryBuilderIFace) Release() ( BinaryBuilderIFace) Reserve(int) ( BinaryBuilderIFace) ReserveData(int) ( BinaryBuilderIFace) Resize(int) ( BinaryBuilderIFace) ResizeData(int) ( BinaryBuilderIFace) Retain() ( BinaryBuilderIFace) Value(int) []byte *github.com/apache/arrow-go/v18/arrow/array.BinaryBuilder BinaryBuilderIFace : github.com/apache/arrow-go/v18/arrow/scalar.Releasable func NewBinaryMemoTable(initial, valuesize int, bldr BinaryBuilderIFace) *BinaryMemoTable
BinaryMemoTable is our hashtable for binary data using the BinaryBuilder to construct the actual data in an easy to pass around way with minimal copies while using a hash table to keep track of the indexes into the dictionary that is created as we go. CopyFixedWidthValues exists to cope with the fact that the table doesn't keep track of the fixed width when inserting the null value the databuffer holds a zero length byte slice for the null value (if found) CopyLargeOffsets copies the list of offsets into the passed in slice, the offsets being the start and end values of the underlying allocated bytes in the builder for the individual values of the table. out should be at least sized to Size()+1 CopyLargeOffsetsSubset is like CopyOffsets but instead of copying all of the offsets, it gets a subset of the offsets in the table starting at the index provided by "start". CopyOffsets copies the list of offsets into the passed in slice, the offsets being the start and end values of the underlying allocated bytes in the builder for the individual values of the table. out should be at least sized to Size()+1 CopyOffsetsSubset is like CopyOffsets but instead of copying all of the offsets, it gets a subset of the offsets in the table starting at the index provided by "start". CopyValues copies the raw binary data bytes out, out should be a []byte with at least ValuesSize bytes allocated to copy into. CopyValuesSubset copies the raw binary data bytes out starting with the value at the index start, out should be a []byte with at least ValuesSize bytes allocated (*BinaryMemoTable) Exists(val []byte) bool Get returns the index of the specified value in the table or KeyNotFound, and a boolean indicating whether it was found in the table. GetNull returns the index of a null that has been inserted into the table or KeyNotFound. The bool returned will be true if there was a null inserted into the table, and false otherwise. (*BinaryMemoTable) GetOrInsert(val interface{}) (idx int, found bool, err error) GetOrInsertBytes returns the index of the given value in the table, if not found it is inserted into the table. The return value 'found' indicates whether the value was found in the table (true) or inserted (false) along with any possible error. GetOrInsertNull retrieves the index of a null in the table or inserts null into the table, returning the index and a boolean indicating if it was found in the table (true) or was inserted (false). GetOrInsert returns the index of the given value in the table, if not found it is inserted into the table. The return value 'found' indicates whether the value was found in the table (true) or inserted (false) along with any possible error. Release is used to tell the underlying builder that it can release the memory allocated when the reference count reaches 0, this is safe to be called from multiple goroutines simultaneously Reset dumps all of the data in the table allowing it to be reutilized. Retain increases the ref count, it is safe to call it from multiple goroutines simultaneously. Size returns the current size of the memo table including the null value if one has been inserted. ( BinaryMemoTable) TypeTraits() TypeTraits (*BinaryMemoTable) Value(i int) []byte ValuesSize returns the current total size of all the raw bytes that have been inserted into the memotable so far. VisitValues exists to run the visitFn on each value currently in the hash table. (*BinaryMemoTable) WriteOut(out []byte) (*BinaryMemoTable) WriteOutSubset(start int, out []byte) *BinaryMemoTable : MemoTable[T] *BinaryMemoTable : github.com/apache/arrow-go/v18/arrow/scalar.Releasable *BinaryMemoTable : github.com/gogo/protobuf/proto.Sizer func NewBinaryMemoTable(initial, valuesize int, bldr BinaryBuilderIFace) *BinaryMemoTable
( ByteSlice) Bytes() []byte *github.com/apache/arrow-go/v18/arrow/memory.Buffer github.com/apache/arrow-go/v18/parquet.ByteArray github.com/apache/arrow-go/v18/parquet.FixedLenByteArray github.com/apache/arrow-go/v18/parquet/variant.Metadata github.com/apache/arrow-go/v18/parquet/variant.Value github.com/apache/thrift/lib/go/thrift.TMemoryBuffer github.com/dop251/goja.ArrayBuffer *github.com/gobwas/httphead.Scanner *github.com/gogo/protobuf/proto.Buffer *github.com/golang/protobuf/proto.Buffer github.com/ipfs/go-cid.Cid github.com/ipfs/go-cid.Prefix *github.com/klauspost/compress/s2.Dict *github.com/libp2p/go-buffer-pool.Buffer *github.com/mailru/easyjson/jlexer.Lexer *github.com/multiformats/go-multiaddr.Component github.com/multiformats/go-multiaddr.Multiaddr github.com/multiformats/go-multiaddr/x/meg.Matchable (interface) github.com/oklog/ulid/v2.ULID github.com/parquet-go/parquet-go.Value *github.com/parquet-go/parquet-go/bloom.Block github.com/parquet-go/parquet-go/bloom.SplitBlockFilter github.com/quic-go/quic-go/internal/protocol.ArbitraryLenConnectionID github.com/quic-go/quic-go/internal/protocol.ConnectionID *github.com/rivo/uniseg.Graphemes github.com/sethvargo/go-envconfig.Base64Bytes github.com/sethvargo/go-envconfig.HexBytes *github.com/vmihailenco/tagparser/v2/internal/parser.Parser *github.com/yuin/goldmark/util.CopyOnWriteBuffer *bufio.Scanner *bytes.Buffer *crypto/ecdh.PrivateKey *crypto/ecdh.PublicKey *crypto/internal/boring.PublicKeyECDH crypto/internal/fips140/ecdh.Point[...] (interface) *crypto/internal/fips140/ecdh.PrivateKey *crypto/internal/fips140/ecdh.PublicKey crypto/internal/fips140/ecdsa.Point[...] (interface) *crypto/internal/fips140/ecdsa.PrivateKey *crypto/internal/fips140/ecdsa.PublicKey *crypto/internal/fips140/ed25519.PrivateKey *crypto/internal/fips140/ed25519.PublicKey *crypto/internal/fips140/edwards25519.Point *crypto/internal/fips140/edwards25519.Scalar *crypto/internal/fips140/edwards25519/field.Element *crypto/internal/fips140/mlkem.DecapsulationKey1024 *crypto/internal/fips140/mlkem.DecapsulationKey768 *crypto/internal/fips140/mlkem.EncapsulationKey1024 *crypto/internal/fips140/mlkem.EncapsulationKey768 *crypto/internal/fips140/nistec.P224Point *crypto/internal/fips140/nistec.P256Point *crypto/internal/fips140/nistec.P384Point *crypto/internal/fips140/nistec.P521Point *crypto/internal/fips140/nistec/fiat.P224Element *crypto/internal/fips140/nistec/fiat.P256Element *crypto/internal/fips140/nistec/fiat.P384Element *crypto/internal/fips140/nistec/fiat.P521Element *crypto/mlkem.DecapsulationKey1024 *crypto/mlkem.DecapsulationKey768 *crypto/mlkem.EncapsulationKey1024 *crypto/mlkem.EncapsulationKey768 *filippo.io/edwards25519.Point *filippo.io/edwards25519.Scalar *filippo.io/edwards25519/field.Element *go.uber.org/zap/buffer.Buffer *golang.org/x/sys/unix.FileHandle *golang.org/x/text/unicode/bidi.Run *google.golang.org/protobuf/internal/encoding/json.Encoder *google.golang.org/protobuf/internal/encoding/text.Encoder google.golang.org/protobuf/reflect/protoreflect.Value gorm.io/datatypes.BinUUID *math/big.Int reflect.Value *vendor/golang.org/x/text/unicode/bidi.Run
type Float32MemoTable = (struct)
type Float64MemoTable = (struct)
Type Parameters: T: fixedLenMemoTypes (*HashTable[T]) CopyValues(out []T) (*HashTable[T]) CopyValuesSubset(start int, out []T) (*HashTable[T]) Insert(e *entry[T], v uint64, val T, memoIdx int32) error (*HashTable[T]) Lookup(v uint64, cmp func(T) bool) (*entry[T], bool) (*HashTable[T]) Reset(cap uint64) (*HashTable[T]) VisitEntries(visit func(*entry[T])) (*HashTable[T]) WriteOut(out []byte) (*HashTable[T]) WriteOutSubset(start int, out []byte) func NewHashTable[T](cap uint64) *HashTable[T]
type Int16MemoTable = (struct)
type Int32MemoTable = (struct)
type Int64MemoTable = (struct)
type Int8MemoTable = (struct)
MemoTable interface for hash tables and dictionary encoding. Values will remember the order they are inserted to generate a valid dictionary. CopyValues populates out with the values currently in the table, out must be a slice of the appropriate type for the table type. CopyValuesSubset is like CopyValues but only copies a subset of values starting at the indicated index. Get returns the index of the table the specified value is, and a boolean indicating whether or not the value was found in the table. Will panic if val is not the appropriate type for the underlying table. GetNull returns the index of the null value in the table, but does not insert one if it doesn't already exist. Will return -1 if it doesn't exist indicated by a false value for the boolean. GetOrInsert returns the index of the table the specified value is, and a boolean indicating whether or not the value was found in the table (if false, the value was inserted). An error is returned if val is not the appropriate type for the table. GetOrInsertBytes returns the index of the table the specified value is, and a boolean indicating whether or not the value was found in the table (if false, the value was inserted). An error is returned if val is not the appropriate type for the table. This function is intended to be used by the BinaryMemoTable to prevent unnecessary allocations of the data when converting from a []byte to interface{}. GetOrInsertNull returns the index of the null value in the table, inserting one if it hasn't already been inserted. It returns a boolean indicating if the null value already existed or not in the table. Reset drops everything in the table allowing it to be reused Size returns the current number of unique values stored in the table, including whether or not a null value has been inserted via GetOrInsertNull. ( MemoTable) TypeTraits() TypeTraits WriteOut copies the unique values of the memotable out to the byte slice provided. Must have allocated enough bytes for all the values. WriteOutSubset is like WriteOut, but only writes a subset of values starting with the index offset. *BinaryMemoTable NumericMemoTable (interface) *Table[...] TypedMemoTable[...] (interface) MemoTable : github.com/gogo/protobuf/proto.Sizer func github.com/apache/arrow-go/v18/arrow/array.GetDictArrayData(mem memory.Allocator, valueType arrow.DataType, memoTable MemoTable, startOffset int) (*array.Data, error)
type MemoTypes (interface)
CopyValues populates out with the values currently in the table, out must be a slice of the appropriate type for the table type. CopyValuesSubset is like CopyValues but only copies a subset of values starting at the indicated index. Get returns the index of the table the specified value is, and a boolean indicating whether or not the value was found in the table. Will panic if val is not the appropriate type for the underlying table. GetNull returns the index of the null value in the table, but does not insert one if it doesn't already exist. Will return -1 if it doesn't exist indicated by a false value for the boolean. GetOrInsert returns the index of the table the specified value is, and a boolean indicating whether or not the value was found in the table (if false, the value was inserted). An error is returned if val is not the appropriate type for the table. GetOrInsertBytes returns the index of the table the specified value is, and a boolean indicating whether or not the value was found in the table (if false, the value was inserted). An error is returned if val is not the appropriate type for the table. This function is intended to be used by the BinaryMemoTable to prevent unnecessary allocations of the data when converting from a []byte to interface{}. GetOrInsertNull returns the index of the null value in the table, inserting one if it hasn't already been inserted. It returns a boolean indicating if the null value already existed or not in the table. Reset drops everything in the table allowing it to be reused Size returns the current number of unique values stored in the table, including whether or not a null value has been inserted via GetOrInsertNull. ( NumericMemoTable) TypeTraits() TypeTraits WriteOut copies the unique values of the memotable out to the byte slice provided. Must have allocated enough bytes for all the values. ( NumericMemoTable) WriteOutLE(out []byte) WriteOutSubset is like WriteOut, but only writes a subset of values starting with the index offset. ( NumericMemoTable) WriteOutSubsetLE(offset int, out []byte) *Table[...] NumericMemoTable : MemoTable[T] NumericMemoTable : github.com/gogo/protobuf/proto.Sizer
Type Parameters: T: fixedLenMemoTypes (*Table[T]) CopyValues(out any) (*Table[T]) CopyValuesSubset(start int, out any) (*Table[T]) Exists(val T) bool (*Table[T]) Get(val any) (int, bool) (*Table[T]) GetNull() (int, bool) (*Table[T]) GetOrInsert(val any) (idx int, found bool, err error) (*Table[T]) GetOrInsertBytes(val []byte) (idx int, found bool, err error) (*Table[T]) GetOrInsertNull() (idx int, found bool) (*Table[T]) InsertOrGet(val T) (idx int, found bool, err error) (*Table[T]) Reset() (*Table[T]) Size() int (*Table[T]) TypeTraits() TypeTraits (*Table[T]) WriteOut(out []byte) (*Table[T]) WriteOutLE(out []byte) (*Table[T]) WriteOutSubset(start int, out []byte) (*Table[T]) WriteOutSubsetLE(start int, out []byte) *Table : MemoTable[T] *Table : NumericMemoTable *Table : github.com/gogo/protobuf/proto.Sizer func NewMemoTable[T](num int64) *Table[T]
Type Parameters: T: MemoTypes CopyValues populates out with the values currently in the table, out must be a slice of the appropriate type for the table type. CopyValuesSubset is like CopyValues but only copies a subset of values starting at the indicated index. ( TypedMemoTable[T]) Exists(T) bool Get returns the index of the table the specified value is, and a boolean indicating whether or not the value was found in the table. Will panic if val is not the appropriate type for the underlying table. GetNull returns the index of the null value in the table, but does not insert one if it doesn't already exist. Will return -1 if it doesn't exist indicated by a false value for the boolean. GetOrInsert returns the index of the table the specified value is, and a boolean indicating whether or not the value was found in the table (if false, the value was inserted). An error is returned if val is not the appropriate type for the table. GetOrInsertBytes returns the index of the table the specified value is, and a boolean indicating whether or not the value was found in the table (if false, the value was inserted). An error is returned if val is not the appropriate type for the table. This function is intended to be used by the BinaryMemoTable to prevent unnecessary allocations of the data when converting from a []byte to interface{}. GetOrInsertNull returns the index of the null value in the table, inserting one if it hasn't already been inserted. It returns a boolean indicating if the null value already existed or not in the table. ( TypedMemoTable[T]) InsertOrGet(val T) (idx int, found bool, err error) Reset drops everything in the table allowing it to be reused Size returns the current number of unique values stored in the table, including whether or not a null value has been inserted via GetOrInsertNull. ( TypedMemoTable[T]) TypeTraits() TypeTraits WriteOut copies the unique values of the memotable out to the byte slice provided. Must have allocated enough bytes for all the values. WriteOutSubset is like WriteOut, but only writes a subset of values starting with the index offset. TypedMemoTable : MemoTable[T] TypedMemoTable : github.com/gogo/protobuf/proto.Sizer
BytesRequired returns the number of bytes required to be allocated in order to hold the passed in number of elements of this type. github.com/apache/arrow-go/v18/arrow.OffsetTraits (interface) github.com/apache/arrow-go/v18/arrow/decimal.Traits[...] (interface) TypeTraits : github.com/apache/arrow-go/v18/arrow.OffsetTraits func BinaryMemoTable.TypeTraits() TypeTraits func MemoTable.TypeTraits() TypeTraits func NumericMemoTable.TypeTraits() TypeTraits func (*Table)[T].TypeTraits() TypeTraits func TypedMemoTable.TypeTraits() TypeTraits
type Uint16MemoTable = (struct)
type Uint32MemoTable = (struct)
type Uint64MemoTable = (struct)
type Uint8MemoTable = (struct)
Package-Level Functions (total 4)
for smaller amounts of bytes this is faster than even calling into xxh3 to do the Hash, so we specialize in order to get the benefits of that performance.
NewBinaryMemoTable returns a hash table for Binary data, the passed in allocator will be utilized for the BinaryBuilder, if nil then memory.DefaultAllocator will be used. initial and valuesize can be used to pre-allocate the table to reduce allocations. With initial being the initial number of entries to allocate for and valuesize being the starting amount of space allocated for writing the actual binary data.
Type Parameters: T: fixedLenMemoTypes
Type Parameters: T: fixedLenMemoTypes
Package-Level Constants (only one)
KeyNotFound is the constant returned by memo table functions when a key isn't found in the table