package parquet
Import Path
github.com/parquet-go/parquet-go (on go.dev)
Dependency Relation
imports 56 packages, and imported by 13 packages
Involved Source Files
allocator.go
array.go
bitmap.go
bloom.go
bloom_le.go
buffer.go
buffer_pool.go
column.go
column_buffer.go
column_buffer_amd64.go
column_chunk.go
column_index_le.go
column_mapping.go
column_path.go
compare.go
compress.go
config.go
convert.go
dedupe.go
dictionary.go
dictionary_amd64.go
encoding.go
errors.go
file.go
filter.go
level.go
limits.go
merge.go
multi_row_group.go
node.go
null.go
null_amd64.go
offset_index.go
order.go
order_amd64.go
page.go
page_bounds.go
page_bounds_amd64.go
page_header.go
page_max.go
page_max_amd64.go
page_min.go
page_min_amd64.go
page_values.go
Package parquet is a library for working with parquet files. For an overview
of Parquet's qualities as a storage format, see this blog post:
https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with-parquet
Or see the Parquet documentation: https://parquet.apache.org/docs/
parquet_amd64.go
print.go
reader.go
row.go
row_buffer.go
row_builder.go
row_group.go
scan.go
schema.go
search.go
sorting.go
transform.go
type.go
value.go
value_amd64.go
value_le.go
writer.go
column_buffer_amd64.s
dictionary_amd64.s
null_amd64.s
order_amd64.s
page_bounds_amd64.s
page_max_amd64.s
page_min_amd64.s
value_amd64.s
Code Examples
{
// parquet-go uses the same struct-tag definition style as JSON and XML
type Contact struct {
Name string `parquet:"name"`
// "zstd" specifies the compression for this column
PhoneNumber string `parquet:"phoneNumber,optional,zstd"`
}
type AddressBook struct {
Owner string `parquet:"owner,zstd"`
OwnerPhoneNumbers []string `parquet:"ownerPhoneNumbers,gzip"`
Contacts []Contact `parquet:"contacts"`
}
f, _ := ioutil.TempFile("", "parquet-example-")
writer := parquet.NewWriter(f)
rows := []AddressBook{
{Owner: "UserA", Contacts: []Contact{
{Name: "Alice", PhoneNumber: "+15505551234"},
{Name: "Bob"},
}},
}
for _, row := range rows {
if err := writer.Write(row); err != nil {
log.Fatal(err)
}
}
_ = writer.Close()
_ = f.Close()
rf, _ := os.Open(f.Name())
pf := parquet.NewReader(rf)
addrs := make([]AddressBook, 0)
for {
var addr AddressBook
err := pf.Read(&addr)
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
addrs = append(addrs, addr)
}
fmt.Println(addrs[0].Owner)
}
Package-Level Type Names (total 90)
BloomFilter is an interface allowing applications to test whether a key
exists in a bloom filter.
Tests whether the given value is present in the filter.
A non-nil error may be returned if reading the filter failed. This may
happen if the filter was lazily loaded from a storage medium during the
call to Check for example. Applications that can guarantee that the
filter was in memory at the time Check was called can safely ignore the
error, which would always be nil in this case.
( BloomFilter) ReadAt(p []byte, off int64) (n int, err error)
Returns the size of the bloom filter (in bytes).
BloomFilter : io.ReaderAt
func ColumnBuffer.BloomFilter() BloomFilter
func ColumnChunk.BloomFilter() BloomFilter
func github.com/polarsignals/frostdb/dynparquet.(*NilColumnChunk).BloomFilter() BloomFilter
The BloomFilterColumn interface is a declarative representation of bloom filters
used when configuring filters on a parquet writer.
Returns an encoding which can be used to write columns of values to the
filter.
Returns the hashing algorithm used when inserting values into a bloom
filter.
Returns the path of the column that the filter applies to.
Returns the size of the filter needed to encode values in the filter,
assuming each value will be encoded with the given number of bits.
func SplitBlockFilter(bitsPerValue uint, path ...string) BloomFilterColumn
func BloomFilters(filters ...BloomFilterColumn) WriterOption
BooleanReader is an interface implemented by ValueReader instances which
expose the content of a column of boolean values.
Read boolean values into the buffer passed as argument.
The method returns io.EOF when all values have been read.
BooleanWriter is an interface implemented by ValueWriter instances which
support writing columns of boolean values.
Write boolean values.
The method returns the number of values written, and any error that
occurred while writing the values.
Buffer represents an in-memory group of parquet rows.
The main purpose of the Buffer type is to provide a way to sort rows before
writing them to a parquet file. Buffer implements sort.Interface as a way
to support reordering the rows that have been written to it.
ColumnBuffer returns the buffer columns.
This method is similar to ColumnChunks, but returns a list of ColumnBuffer
instead of a ColumnChunk values (the latter being read-only); calling
ColumnBuffers or ColumnChunks with the same index returns the same underlying
objects, but with different types, which removes the need for making a type
assertion if the program needed to write directly to the column buffers.
The presence of the ColumnChunks method is still required to satisfy the
RowGroup interface.
ColumnChunks returns the buffer columns.
Len returns the number of rows written to the buffer.
Less returns true if row[i] < row[j] in the buffer.
NumRows returns the number of rows written to the buffer.
Reset clears the content of the buffer, allowing it to be reused.
Rows returns a reader exposing the current content of the buffer.
The buffer and the returned reader share memory. Mutating the buffer
concurrently to reading rows may result in non-deterministic behavior.
Schema returns the schema of the buffer.
The schema is either configured by passing a Schema in the option list when
constructing the buffer, or lazily discovered when the first row is written.
Size returns the estimated size of the buffer in memory (in bytes).
SortingColumns returns the list of columns by which the buffer will be
sorted.
The sorting order is configured by passing a SortingColumns option when
constructing the buffer.
Swap exchanges the rows at indexes i and j.
Write writes a row held in a Go value to the buffer.
WriteRowGroup satisfies the RowGroupWriter interface.
WriteRows writes parquet rows to the buffer.
*Buffer : RowGroup
*Buffer : RowGroupWriter
*Buffer : RowWriter
*Buffer : RowWriterWithSchema
*Buffer : github.com/polarsignals/frostdb/query/expr.Particulate
*Buffer : sort.Interface
func NewBuffer(options ...RowGroupOption) *Buffer
BufferPool is an interface abstracting the underlying implementation of
page buffer pools.
The parquet-go package provides two implementations of this interface, one
backed by in-memory buffers (on the Go heap), and the other using temporary
files on disk.
Applications which need finer grain control over the allocation and retention
of page buffers may choose to provide their own implementation and install it
via the parquet.ColumnPageBuffers writer option.
BufferPool implementations must be safe to use concurrently from multiple
goroutines.
GetBuffer is called when a parquet writer needs to acquire a new
page buffer from the pool.
PutBuffer is called when a parquet writer releases a page buffer to
the pool.
The parquet.Writer type guarantees that the buffers it calls this method
with were previously acquired by a call to GetBuffer on the same
pool, and that it will not use them anymore after the call.
func NewBufferPool() BufferPool
func NewChunkBufferPool(chunkSize int) BufferPool
func NewFileBufferPool(tempdir, pattern string) BufferPool
func ColumnPageBuffers(buffers BufferPool) WriterOption
func SortingBuffers(buffers BufferPool) SortingOption
ByteArrayReader is an interface implemented by ValueReader instances which
expose the content of a column of variable length byte array values.
Read values into the byte buffer passed as argument, returning the number
of values written to the buffer (not the number of bytes). Values are
written using the PLAIN encoding, each byte array prefixed with its
length encoded as a 4 bytes little endian unsigned integer.
The method returns io.EOF when all values have been read.
If the buffer was not empty, but too small to hold at least one value,
io.ErrShortBuffer is returned.
ByteArrayWriter is an interface implemented by ValueWriter instances which
support writing columns of variable length byte array values.
Write variable length byte array values.
The values passed as input must be laid out using the PLAIN encoding,
with each byte array prefixed with the four bytes little endian unsigned
integer length.
The method returns the number of values written to the underlying column
(not the number of bytes), or any error that occurred while attempting to
write the values.
Column represents a column in a parquet file.
Methods of Column values are safe to call concurrently from multiple
goroutines.
Column instances satisfy the Node interface.
Column returns the child column matching the given name.
Columns returns the list of child columns.
The method returns the same slice across multiple calls, the program must
treat it as a read-only value.
Compression returns the compression codecs used by this column.
DecodeDataPageV1 decodes a data page from the header, compressed data, and
optional dictionary passed as arguments.
DecodeDataPageV2 decodes a data page from the header, compressed data, and
optional dictionary passed as arguments.
DecodeDictionary decodes a data page from the header and compressed data
passed as arguments.
Depth returns the position of the column relative to the root.
Encoding returns the encodings used by this column.
Fields returns the list of fields on the column.
GoType returns the Go type that best represents the parquet column.
ID returns column field id
Index returns the position of the column in a row. Only leaf columns have a
column index, the method returns -1 when called on non-leaf columns.
Leaf returns true if c is a leaf column.
MaxDefinitionLevel returns the maximum value of definition levels on this
column.
MaxRepetitionLevel returns the maximum value of repetition levels on this
column.
Name returns the column name.
Optional returns true if the column is optional.
Pages returns a reader exposing all pages in this column, across row groups.
Path of the column in the parquet schema.
Repeated returns true if the column may repeat.
Required returns true if the column is required.
String returns a human-readable string representation of the column.
Type returns the type of the column.
The returned value is unspecified if c is not a leaf column.
Value returns the sub-value in base for the child column at the given
index.
*Column : Field
*Column : Node
*Column : github.com/polarsignals/frostdb/query/logicalplan.Named
*Column : expvar.Var
*Column : fmt.Stringer
func (*Column).Column(name string) *Column
func (*Column).Columns() []*Column
func (*File).Root() *Column
ColumnBuffer is an interface representing columns of a row group.
ColumnBuffer implements sort.Interface as a way to support reordering the
rows that have been written to it.
The current implementation has a limitation which prevents applications from
providing custom versions of this interface because it contains unexported
methods. The only way to create ColumnBuffer values is to call the
NewColumnBuffer of Type instances. This limitation may be lifted in future
releases.
( ColumnBuffer) BloomFilter() BloomFilter
Returns the current capacity of the column (rows).
Returns a copy of the column. The returned copy shares no memory with
the original, mutations of either column will not modify the other.
Returns the index of this column in its parent row group.
Returns the components of the page index for this column chunk,
containing details about the content and location of pages within the
chunk.
Note that the returned value may be the same across calls to these
methods, programs must treat those as read-only.
If the column chunk does not have a column or offset index, the methods return
ErrMissingColumnIndex or ErrMissingOffsetIndex respectively.
Prior to v0.20, these methods did not return an error because the page index
for a file was either fully read when the file was opened, or skipped
completely using the parquet.SkipPageIndex option. Version v0.20 introduced a
change that the page index can be read on-demand at any time, even if a file
was opened with the parquet.SkipPageIndex option. Since reading the page index
can fail, these methods now return an error.
For indexed columns, returns the underlying dictionary holding the column
values. If the column is not indexed, nil is returned.
Returns the number of rows currently written to the column.
Compares rows at index i and j and reports whether i < j.
Returns the number of values in the column chunk.
This quantity may differ from the number of rows in the parent row group
because repeated columns may hold zero or more values per row.
( ColumnBuffer) OffsetIndex() (OffsetIndex, error)
Returns the column as a Page.
Returns a reader exposing the pages of the column.
( ColumnBuffer) ReadValuesAt([]Value, int64) (int, error)
Clears all rows written to the column.
Returns the size of the column buffer in bytes.
Swaps rows at index i and j.
Returns the column type.
Write values from the buffer passed as argument and returns the number
of values written.
ColumnBuffer : ColumnChunk
ColumnBuffer : ValueReaderAt
ColumnBuffer : ValueWriter
ColumnBuffer : sort.Interface
func (*Buffer).ColumnBuffers() []ColumnBuffer
func ColumnBuffer.Clone() ColumnBuffer
func (*GenericBuffer)[T].ColumnBuffers() []ColumnBuffer
func Type.NewColumnBuffer(columnIndex, numValues int) ColumnBuffer
The ColumnChunk interface represents individual columns of a row group.
( ColumnChunk) BloomFilter() BloomFilter
Returns the index of this column in its parent row group.
Returns the components of the page index for this column chunk,
containing details about the content and location of pages within the
chunk.
Note that the returned value may be the same across calls to these
methods, programs must treat those as read-only.
If the column chunk does not have a column or offset index, the methods return
ErrMissingColumnIndex or ErrMissingOffsetIndex respectively.
Prior to v0.20, these methods did not return an error because the page index
for a file was either fully read when the file was opened, or skipped
completely using the parquet.SkipPageIndex option. Version v0.20 introduced a
change that the page index can be read on-demand at any time, even if a file
was opened with the parquet.SkipPageIndex option. Since reading the page index
can fail, these methods now return an error.
Returns the number of values in the column chunk.
This quantity may differ from the number of rows in the parent row group
because repeated columns may hold zero or more values per row.
( ColumnChunk) OffsetIndex() (OffsetIndex, error)
Returns a reader exposing the pages of the column.
Returns the column type.
ColumnBuffer (interface)
*github.com/polarsignals/frostdb/dynparquet.NilColumnChunk
func (*Buffer).ColumnChunks() []ColumnChunk
func (*GenericBuffer)[T].ColumnChunks() []ColumnChunk
func (*RowBuffer)[T].ColumnChunks() []ColumnChunk
func RowGroup.ColumnChunks() []ColumnChunk
func github.com/polarsignals/frostdb/dynparquet.(*Buffer).ColumnChunks() []ColumnChunk
func github.com/polarsignals/frostdb/dynparquet.DynamicRowGroup.ColumnChunks() []ColumnChunk
func github.com/polarsignals/frostdb/dynparquet.(*DynamicRowGroupMergeAdapter).ColumnChunks() []ColumnChunk
func github.com/polarsignals/frostdb/index.ReleaseableRowGroup.ColumnChunks() []ColumnChunk
func github.com/polarsignals/frostdb/query/expr.(*ColumnRef).Column(p expr.Particulate) (ColumnChunk, bool, error)
func github.com/polarsignals/frostdb/query/expr.Particulate.ColumnChunks() []ColumnChunk
func PrintColumnChunk(w io.Writer, columnChunk ColumnChunk) error
func github.com/polarsignals/frostdb/query/expr.BinaryScalarOperation(left ColumnChunk, right Value, operator logicalplan.Op) (bool, error)
IsAscending returns true if the column index min/max values are sorted
in ascending order (based on the ordering rules of the column's logical
type).
IsDescending returns true if the column index min/max values are sorted
in descending order (based on the ordering rules of the column's logical
type).
( ColumnIndex) MaxValue(int) Value
PageIndex return min/max bounds for the page at the given index in the
column.
Returns the number of null values in the page at the given index.
Tells whether the page at the given index contains null values only.
NumPages returns the number of paged in the column index.
func NewColumnIndex(kind Kind, index *format.ColumnIndex) ColumnIndex
func ColumnBuffer.ColumnIndex() (ColumnIndex, error)
func ColumnChunk.ColumnIndex() (ColumnIndex, error)
func github.com/polarsignals/frostdb/dynparquet.(*NilColumnChunk).ColumnIndex() (ColumnIndex, error)
func Find(index ColumnIndex, value Value, cmp func(Value, Value) int) int
func Search(index ColumnIndex, value Value, typ Type) int
func github.com/polarsignals/frostdb/query/expr.Max(columnIndex ColumnIndex) Value
func github.com/polarsignals/frostdb/query/expr.Min(columnIndex ColumnIndex) Value
func github.com/polarsignals/frostdb/query/expr.NullCount(columnIndex ColumnIndex) int64
The ColumnIndexer interface is implemented by types that support generating
parquet column indexes.
The package does not export any types that implement this interface, programs
must call NewColumnIndexer on a Type instance to construct column indexers.
Generates a format.ColumnIndex value from the current state of the
column indexer.
The returned value may reference internal buffers, in which case the
values remain valid until the next call to IndexPage or Reset on the
column indexer.
Add a page to the column indexer.
Resets the column indexer state.
func Type.NewColumnIndexer(sizeLimit int) ColumnIndexer
Conversion is an interface implemented by types that provide conversion of
parquet rows from one schema to another.
Conversion instances must be safe to use concurrently from multiple goroutines.
Converts the given column index in the target schema to the original
column index in the source schema of the conversion.
Applies the conversion logic on the src row, returning the result
appended to dst.
Returns the target schema of the conversion.
func Convert(to, from Node) (conv Conversion, err error)
func ConvertRowGroup(rowGroup RowGroup, conv Conversion) RowGroup
func ConvertRowReader(rows RowReader, conv Conversion) RowReaderWithSchema
ConvertError is an error type returned by calls to Convert when the conversion
of parquet schemas is impossible or the input row for the conversion is
malformed.
From Node
Path []string
To Node
Error satisfies the error interface.
*ConvertError : error
DataPageHeader is a specialization of the PageHeader interface implemented by
data pages.
Returns the encoding of the definition level section.
Returns the page encoding.
Returns the maximum value in the page based on the ordering rules of the
column's logical type.
As an optimization, the method may return the same slice across multiple
calls. Programs must treat the returned value as immutable to prevent
unpredictable behaviors.
If the page only contains only null values, an empty slice is returned.
Returns the minimum value in the page based on the ordering rules of the
column's logical type.
As an optimization, the method may return the same slice across multiple
calls. Programs must treat the returned value as immutable to prevent
unpredictable behaviors.
If the page only contains only null values, an empty slice is returned.
Returns the number of null values in the page.
Returns the number of values in the page (including nulls).
Returns the parquet format page type.
Returns the encoding of the repetition level section.
DataPageHeaderV1
DataPageHeaderV2
DataPageHeader : PageHeader
DataPageHeaderV1 is an implementation of the DataPageHeader interface
representing data pages version 1.
( DataPageHeaderV1) DefinitionLevelEncoding() format.Encoding
( DataPageHeaderV1) Encoding() format.Encoding
( DataPageHeaderV1) MaxValue() []byte
( DataPageHeaderV1) MinValue() []byte
( DataPageHeaderV1) NullCount() int64
( DataPageHeaderV1) NumValues() int64
( DataPageHeaderV1) PageType() format.PageType
( DataPageHeaderV1) RepetitionLevelEncoding() format.Encoding
( DataPageHeaderV1) String() string
DataPageHeaderV1 : DataPageHeader
DataPageHeaderV1 : PageHeader
DataPageHeaderV1 : expvar.Var
DataPageHeaderV1 : fmt.Stringer
func (*Column).DecodeDataPageV1(header DataPageHeaderV1, page []byte, dict Dictionary) (Page, error)
DataPageHeaderV2 is an implementation of the DataPageHeader interface
representing data pages version 2.
( DataPageHeaderV2) DefinitionLevelEncoding() format.Encoding
( DataPageHeaderV2) DefinitionLevelsByteLength() int64
( DataPageHeaderV2) Encoding() format.Encoding
( DataPageHeaderV2) IsCompressed() bool
( DataPageHeaderV2) MaxValue() []byte
( DataPageHeaderV2) MinValue() []byte
( DataPageHeaderV2) NullCount() int64
( DataPageHeaderV2) NumNulls() int64
( DataPageHeaderV2) NumRows() int64
( DataPageHeaderV2) NumValues() int64
( DataPageHeaderV2) PageType() format.PageType
( DataPageHeaderV2) RepetitionLevelEncoding() format.Encoding
( DataPageHeaderV2) RepetitionLevelsByteLength() int64
( DataPageHeaderV2) String() string
DataPageHeaderV2 : DataPageHeader
DataPageHeaderV2 : PageHeader
DataPageHeaderV2 : expvar.Var
DataPageHeaderV2 : fmt.Stringer
func (*Column).DecodeDataPageV2(header DataPageHeaderV2, page []byte, dict Dictionary) (Page, error)
The Dictionary interface represents type-specific implementations of parquet
dictionaries.
Programs can instantiate dictionaries by call the NewDictionary method of a
Type object.
The current implementation has a limitation which prevents applications from
providing custom versions of this interface because it contains unexported
methods. The only way to create Dictionary values is to call the
NewDictionary of Type instances. This limitation may be lifted in future
releases.
Returns the min and max values found in the given indexes.
Returns the dictionary value at the given index.
Inserts values from the second slice to the dictionary and writes the
indexes at which each value was inserted to the first slice.
The method panics if the length of the indexes slice is smaller than the
length of the values slice.
Returns the number of value indexed in the dictionary.
Given an array of dictionary indexes, lookup the values into the array
of values passed as second argument.
The method panics if len(indexes) > len(values), or one of the indexes
is negative or greater than the highest index in the dictionary.
Returns a Page representing the content of the dictionary.
The returned page shares the underlying memory of the buffer, it remains
valid to use until the dictionary's Reset method is called.
Resets the dictionary to its initial state, removing all values.
Returns the type that the dictionary was created from.
func (*Column).DecodeDictionary(header DictionaryPageHeader, page []byte) (Dictionary, error)
func ColumnBuffer.Dictionary() Dictionary
func Page.Dictionary() Dictionary
func Type.NewDictionary(columnIndex, numValues int, data encoding.Values) Dictionary
func (*Column).DecodeDataPageV1(header DataPageHeaderV1, page []byte, dict Dictionary) (Page, error)
func (*Column).DecodeDataPageV2(header DataPageHeaderV2, page []byte, dict Dictionary) (Page, error)
DictionaryPageHeader is an implementation of the PageHeader interface
representing dictionary pages.
( DictionaryPageHeader) Encoding() format.Encoding
( DictionaryPageHeader) IsSorted() bool
( DictionaryPageHeader) NumValues() int64
( DictionaryPageHeader) PageType() format.PageType
( DictionaryPageHeader) String() string
DictionaryPageHeader : PageHeader
DictionaryPageHeader : expvar.Var
DictionaryPageHeader : fmt.Stringer
func (*Column).DecodeDictionary(header DictionaryPageHeader, page []byte) (Dictionary, error)
DoubleReader is an interface implemented by ValueReader instances which
expose the content of a column of double-precision float point values.
Read double-precision floating point values into the buffer passed as
argument.
The method returns io.EOF when all values have been read.
DoubleWriter is an interface implemented by ValueWriter instances which
support writing columns of double-precision floating point values.
Write double-precision floating point values.
The method returns the number of values written, and any error that
occurred while writing the values.
Field instances represent fields of a parquet node, which associate a node to
their name in their parent node.
Returns compression codec used by the node.
The method may return nil to indicate that no specific compression codec
was configured on the node, in which case a default compression might be
used.
Returns the encoding used by the node.
The method may return nil to indicate that no specific encoding was
configured on the node, in which case a default encoding might be used.
Returns a mapping of the node's fields.
As an optimization, the same slices may be returned by multiple calls to
this method, programs must treat the returned values as immutable.
This method returns an empty mapping when called on leaf nodes.
Returns the Go type that best represents the parquet node.
For leaf nodes, this will be one of bool, int32, int64, deprecated.Int96,
float32, float64, string, []byte, or [N]byte.
For groups, the method returns a struct type.
If the method is called on a repeated node, the method returns a slice of
the underlying type.
For optional nodes, the method returns a pointer of the underlying type.
For nodes that were constructed from Go values (e.g. using SchemaOf), the
method returns the original Go type.
The id of this node in its parent node. Zero value is treated as id is not
set. ID only needs to be unique within its parent context.
This is the same as parquet field_id
Returns true if this a leaf node.
Returns the name of this field in its parent node.
Returns whether the parquet column is optional.
Returns whether the parquet column is repeated.
Returns whether the parquet column is required.
Returns a human-readable representation of the parquet node.
For leaf nodes, returns the type of values of the parquet column.
Calling this method on non-leaf nodes will panic.
Given a reference to the Go value matching the structure of the parent
node, returns the Go value of the field.
*Column
Field : Node
Field : github.com/polarsignals/frostdb/query/logicalplan.Named
Field : expvar.Var
Field : fmt.Stringer
func (*Column).Fields() []Field
func Field.Fields() []Field
func Group.Fields() []Field
func Node.Fields() []Field
func (*Schema).Fields() []Field
func github.com/polarsignals/frostdb/dynparquet.FieldByName(fields []Field, name string) Field
func github.com/polarsignals/frostdb/dynparquet.Concat(fields []Field, drg ...dynparquet.DynamicRowGroup) dynparquet.DynamicRowGroup
func github.com/polarsignals/frostdb/dynparquet.FieldByName(fields []Field, name string) Field
func github.com/polarsignals/frostdb/dynparquet.FindChildIndex(fields []Field, name string) int
func github.com/polarsignals/frostdb/dynparquet.NewDynamicRow(row Row, schema *Schema, dyncols map[string][]string, fields []Field) *dynparquet.DynamicRow
func github.com/polarsignals/frostdb/dynparquet.NewDynamicRows(rows []Row, schema *Schema, dynamicColumns map[string][]string, fields []Field) *dynparquet.DynamicRows
func github.com/polarsignals/frostdb/pqarrow.SingleMatchingColumn(distinctColumns []logicalplan.Expr, fields []Field) bool
func github.com/polarsignals/frostdb/pqarrow/convert.ParquetFieldToArrowField(pf Field) (arrow.Field, error)
File represents a parquet file. The layout of a Parquet file can be found
here: https://github.com/apache/parquet-format#file-format
ColumnIndexes returns the page index of the parquet file f.
If the file did not contain a column index, the method returns an empty slice
and nil error.
Lookup returns the value associated with the given key in the file key/value
metadata.
The ok boolean will be true if the key was found, false otherwise.
Metadata returns the metadata of f.
NumRows returns the number of rows in the file.
OffsetIndexes returns the page index of the parquet file f.
If the file did not contain an offset index, the method returns an empty
slice and nil error.
ReadAt reads bytes into b from f at the given offset.
The method satisfies the io.ReaderAt interface.
ReadPageIndex reads the page index section of the parquet file f.
If the file did not contain a page index, the method returns two empty slices
and a nil error.
Only leaf columns have indexes, the returned indexes are arranged using the
following layout:
------------------
| col 0: chunk 0 |
------------------
| col 1: chunk 0 |
------------------
| ... |
------------------
| col 0: chunk 1 |
------------------
| col 1: chunk 1 |
------------------
| ... |
------------------
This method is useful in combination with the SkipPageIndex option to delay
reading the page index section until after the file was opened. Note that in
this case the page index is not cached within the file, programs are expected
to make use of independently from the parquet package.
Root returns the root column of f.
RowGroups returns the list of row groups in the file.
Schema returns the schema of f.
Size returns the size of f (in bytes).
*File : github.com/sethvargo/go-envconfig.Lookuper
*File : golang.org/x/text/internal/catmsg.Dictionary
*File : golang.org/x/text/message/catalog.Dictionary
*File : io.ReaderAt
func OpenFile(r io.ReaderAt, size int64, options ...FileOption) (*File, error)
func github.com/polarsignals/frostdb/dynparquet.(*SerializedBuffer).ParquetFile() *File
func github.com/polarsignals/frostdb/dynparquet.DefinitionFromParquetFile(file *File) (*schemapb.Schema, error)
func github.com/polarsignals/frostdb/dynparquet.NewSerializedBuffer(f *File) (*dynparquet.SerializedBuffer, error)
func github.com/polarsignals/frostdb/dynparquet.SchemaFromParquetFile(file *File) (*dynparquet.Schema, error)
The FileConfig type carries configuration options for parquet files.
FileConfig implements the FileOption interface so it can be used directly
as argument to the OpenFile function when needed, for example:
f, err := parquet.OpenFile(reader, size, &parquet.FileConfig{
SkipPageIndex: true,
SkipBloomFilters: true,
ReadMode: ReadModeAsync,
})
ReadBufferSize int
ReadMode ReadMode
Schema *Schema
SkipBloomFilters bool
SkipPageIndex bool
Apply applies the given list of options to c.
ConfigureFile applies configuration options from c to config.
Validate returns a non-nil error if the configuration of c is invalid.
*FileConfig : FileOption
func DefaultFileConfig() *FileConfig
func NewFileConfig(options ...FileOption) (*FileConfig, error)
func (*FileConfig).ConfigureFile(config *FileConfig)
func FileOption.ConfigureFile(*FileConfig)
FileOption is an interface implemented by types that carry configuration
options for parquet files.
( FileOption) ConfigureFile(*FileConfig)
*FileConfig
func FileReadMode(mode ReadMode) FileOption
func FileSchema(schema *Schema) FileOption
func ReadBufferSize(size int) FileOption
func SkipBloomFilters(skip bool) FileOption
func SkipPageIndex(skip bool) FileOption
func NewFileConfig(options ...FileOption) (*FileConfig, error)
func OpenFile(r io.ReaderAt, size int64, options ...FileOption) (*File, error)
func (*FileConfig).Apply(options ...FileOption)
FixedLenByteArrayReader is an interface implemented by ValueReader instances
which expose the content of a column of fixed length byte array values.
Read values into the byte buffer passed as argument, returning the number
of values written to the buffer (not the number of bytes).
The method returns io.EOF when all values have been read.
If the buffer was not empty, but too small to hold at least one value,
io.ErrShortBuffer is returned.
FixedLenByteArrayWriter is an interface implemented by ValueWriter instances
which support writing columns of fixed length byte array values.
Writes the fixed length byte array values.
The size of the values is assumed to be the same as the expected size of
items in the column. The method errors if the length of the input values
is not a multiple of the expected item size.
FloatReader is an interface implemented by ValueReader instances which expose
the content of a column of single-precision floating point values.
Read single-precision floating point values into the buffer passed as
argument.
The method returns io.EOF when all values have been read.
FloatWriter is an interface implemented by ValueWriter instances which
support writing columns of single-precision floating point values.
Write single-precision floating point values.
The method returns the number of values written, and any error that
occurred while writing the values.
Type Parameters:
T: any
GenericBuffer is similar to a Buffer but uses a type parameter to define the
Go type representing the schema of rows in the buffer.
See GenericWriter for details about the benefits over the classic Buffer API.
(*GenericBuffer[T]) ColumnBuffers() []ColumnBuffer
(*GenericBuffer[T]) ColumnChunks() []ColumnChunk
(*GenericBuffer[T]) Len() int
(*GenericBuffer[T]) Less(i, j int) bool
(*GenericBuffer[T]) NumRows() int64
(*GenericBuffer[T]) Reset()
(*GenericBuffer[T]) Rows() Rows
(*GenericBuffer[T]) Schema() *Schema
(*GenericBuffer[T]) Size() int64
(*GenericBuffer[T]) SortingColumns() []SortingColumn
(*GenericBuffer[T]) Swap(i, j int)
(*GenericBuffer[T]) Write(rows []T) (int, error)
(*GenericBuffer[T]) WriteRowGroup(rowGroup RowGroup) (int64, error)
(*GenericBuffer[T]) WriteRows(rows []Row) (int, error)
*GenericBuffer : RowGroup
*GenericBuffer : RowGroupWriter
*GenericBuffer : RowWriter
*GenericBuffer : RowWriterWithSchema
*GenericBuffer : github.com/polarsignals/frostdb/query/expr.Particulate
*GenericBuffer : sort.Interface
func NewGenericBuffer[T](options ...RowGroupOption) *GenericBuffer[T]
Type Parameters:
T: any
GenericReader is similar to a Reader but uses a type parameter to define the
Go type representing the schema of rows being read.
See GenericWriter for details about the benefits over the classic Reader API.
(*GenericReader[T]) Close() error
(*GenericReader[T]) NumRows() int64
Read reads the next rows from the reader into the given rows slice up to len(rows).
The returned values are safe to reuse across Read calls and do not share
memory with the reader's underlying page buffers.
The method returns the number of rows read and io.EOF when no more rows
can be read from the reader.
(*GenericReader[T]) ReadRows(rows []Row) (int, error)
(*GenericReader[T]) Reset()
(*GenericReader[T]) Schema() *Schema
(*GenericReader[T]) SeekToRow(rowIndex int64) error
*GenericReader : RowReader
*GenericReader : RowReaderWithSchema
*GenericReader : RowReadSeeker
*GenericReader : RowSeeker
*GenericReader : Rows
*GenericReader : github.com/prometheus/common/expfmt.Closer
*GenericReader : io.Closer
func NewGenericReader[T](input io.ReaderAt, options ...ReaderOption) *GenericReader[T]
func NewGenericRowGroupReader[T](rowGroup RowGroup, options ...ReaderOption) *GenericReader[T]
func github.com/polarsignals/frostdb/dynparquet.(*SerializedBuffer).Reader() *GenericReader[any]
Type Parameters:
T: any
GenericWriter is similar to a Writer but uses a type parameter to define the
Go type representing the schema of rows being written.
Using this type over Writer has multiple advantages:
- By leveraging type information, the Go compiler can provide greater
guarantees that the code is correct. For example, the parquet.Writer.Write
method accepts an argument of type interface{}, which delays type checking
until runtime. The parquet.GenericWriter[T].Write method ensures at
compile time that the values it receives will be of type T, reducing the
risk of introducing errors.
- Since type information is known at compile time, the implementation of
parquet.GenericWriter[T] can make safe assumptions, removing the need for
runtime validation of how the parameters are passed to its methods.
Optimizations relying on type information are more effective, some of the
writer's state can be precomputed at initialization, which was not possible
with parquet.Writer.
- The parquet.GenericWriter[T].Write method uses a data-oriented design,
accepting an slice of T instead of a single value, creating more
opportunities to amortize the runtime cost of abstractions.
This optimization is not available for parquet.Writer because its Write
method's argument would be of type []interface{}, which would require
conversions back and forth from concrete types to empty interfaces (since
a []T cannot be interpreted as []interface{} in Go), would make the API
more difficult to use and waste compute resources in the type conversions,
defeating the purpose of the optimization in the first place.
Note that this type is only available when compiling with Go 1.18 or later.
(*GenericWriter[T]) Close() error
(*GenericWriter[T]) Flush() error
(*GenericWriter[T]) ReadRowsFrom(rows RowReader) (int64, error)
(*GenericWriter[T]) Reset(output io.Writer)
(*GenericWriter[T]) Schema() *Schema
SetKeyValueMetadata sets a key/value pair in the Parquet file metadata.
Keys are assumed to be unique, if the same key is repeated multiple times the
last value is retained. While the parquet format does not require unique keys,
this design decision was made to optimize for the most common use case where
applications leverage this extension mechanism to associate single values to
keys. This may create incompatibilities with other parquet libraries, or may
cause some key/value pairs to be lost when open parquet files written with
repeated keys. We can revisit this decision if it ever becomes a blocker.
(*GenericWriter[T]) Write(rows []T) (int, error)
(*GenericWriter[T]) WriteRowGroup(rowGroup RowGroup) (int64, error)
(*GenericWriter[T]) WriteRows(rows []Row) (int, error)
*GenericWriter : RowGroupWriter
*GenericWriter : RowReaderFrom
*GenericWriter : RowWriter
*GenericWriter : RowWriterWithSchema
*GenericWriter : github.com/apache/thrift/lib/go/thrift.Flusher
*GenericWriter : github.com/polarsignals/frostdb.ParquetWriter
*GenericWriter : github.com/prometheus/common/expfmt.Closer
*GenericWriter : io.Closer
func NewGenericWriter[T](output io.Writer, options ...WriterOption) *GenericWriter[T]
( Group) Compression() compress.Codec
( Group) Encoding() encoding.Encoding
( Group) Fields() []Field
( Group) GoType() reflect.Type
( Group) ID() int
( Group) Leaf() bool
( Group) Optional() bool
( Group) Repeated() bool
( Group) Required() bool
( Group) String() string
( Group) Type() Type
Group : Node
Group : expvar.Var
Group : fmt.Stringer
Int32Reader is an interface implemented by ValueReader instances which expose
the content of a column of int32 values.
Read 32 bits integer values into the buffer passed as argument.
The method returns io.EOF when all values have been read.
Int32Writer is an interface implemented by ValueWriter instances which
support writing columns of 32 bits signed integer values.
Write 32 bits signed integer values.
The method returns the number of values written, and any error that
occurred while writing the values.
Int64Reader is an interface implemented by ValueReader instances which expose
the content of a column of int64 values.
Read 64 bits integer values into the buffer passed as argument.
The method returns io.EOF when all values have been read.
Int64Writer is an interface implemented by ValueWriter instances which
support writing columns of 64 bits signed integer values.
Write 64 bits signed integer values.
The method returns the number of values written, and any error that
occurred while writing the values.
Int96Reader is an interface implemented by ValueReader instances which expose
the content of a column of int96 values.
Read 96 bits integer values into the buffer passed as argument.
The method returns io.EOF when all values have been read.
Int96Writer is an interface implemented by ValueWriter instances which
support writing columns of 96 bits signed integer values.
Write 96 bits signed integer values.
The method returns the number of values written, and any error that
occurred while writing the values.
Kind is an enumeration type representing the physical types supported by the
parquet type system.
String returns a human-readable representation of the physical type.
Value constructs a value from k and v.
The method panics if the data is not a valid representation of the value
kind; for example, if the kind is Int32 but the data is not 4 bytes long.
Kind : expvar.Var
Kind : fmt.Stringer
func Type.Kind() Kind
func Value.Kind() Kind
func NewColumnIndex(kind Kind, index *format.ColumnIndex) ColumnIndex
func ZeroValue(kind Kind) Value
const Boolean
const ByteArray
const Double
const FixedLenByteArray
const Float
const Int32
const Int64
const Int96
LeafColumn is a struct type representing leaf columns of a parquet schema.
ColumnIndex int
MaxDefinitionLevel int
MaxRepetitionLevel int
Node Node
Path []string
func (*Schema).Lookup(path ...string) (LeafColumn, bool)
Node values represent nodes of a parquet schema.
Nodes carry the type of values, as well as properties like whether the values
are optional or repeat. Nodes with one or more children represent parquet
groups and therefore do not have a logical type.
Nodes are immutable values and therefore safe to use concurrently from
multiple goroutines.
Returns compression codec used by the node.
The method may return nil to indicate that no specific compression codec
was configured on the node, in which case a default compression might be
used.
Returns the encoding used by the node.
The method may return nil to indicate that no specific encoding was
configured on the node, in which case a default encoding might be used.
Returns a mapping of the node's fields.
As an optimization, the same slices may be returned by multiple calls to
this method, programs must treat the returned values as immutable.
This method returns an empty mapping when called on leaf nodes.
Returns the Go type that best represents the parquet node.
For leaf nodes, this will be one of bool, int32, int64, deprecated.Int96,
float32, float64, string, []byte, or [N]byte.
For groups, the method returns a struct type.
If the method is called on a repeated node, the method returns a slice of
the underlying type.
For optional nodes, the method returns a pointer of the underlying type.
For nodes that were constructed from Go values (e.g. using SchemaOf), the
method returns the original Go type.
The id of this node in its parent node. Zero value is treated as id is not
set. ID only needs to be unique within its parent context.
This is the same as parquet field_id
Returns true if this a leaf node.
Returns whether the parquet column is optional.
Returns whether the parquet column is repeated.
Returns whether the parquet column is required.
Returns a human-readable representation of the parquet node.
For leaf nodes, returns the type of values of the parquet column.
Calling this method on non-leaf nodes will panic.
*Column
Field (interface)
Group
*Schema
Node : expvar.Var
Node : fmt.Stringer
func BSON() Node
func Compressed(node Node, codec compress.Codec) Node
func Date() Node
func Decimal(scale, precision int, typ Type) Node
func Encoded(node Node, encoding encoding.Encoding) Node
func Enum() Node
func FieldID(node Node, id int) Node
func Int(bitWidth int) Node
func JSON() Node
func Leaf(typ Type) Node
func List(of Node) Node
func Map(key, value Node) Node
func Optional(node Node) Node
func Repeated(node Node) Node
func Required(node Node) Node
func String() Node
func Time(unit TimeUnit) Node
func Timestamp(unit TimeUnit) Node
func Uint(bitWidth int) Node
func UUID() Node
func Compressed(node Node, codec compress.Codec) Node
func Convert(to, from Node) (conv Conversion, err error)
func Encoded(node Node, encoding encoding.Encoding) Node
func FieldID(node Node, id int) Node
func List(of Node) Node
func Map(key, value Node) Node
func NewRowBuilder(schema Node) *RowBuilder
func NewSchema(name string, root Node) *Schema
func Optional(node Node) Node
func PrintSchema(w io.Writer, name string, node Node) error
func PrintSchemaIndent(w io.Writer, name string, node Node, pattern, newline string) error
func Repeated(node Node) Node
func Required(node Node) Node
func github.com/polarsignals/frostdb/pqarrow/convert.GetWriter(offset int, n Node) (writer.NewWriterFunc, error)
func github.com/polarsignals/frostdb/pqarrow/convert.ParquetNodeToType(n Node) (arrow.DataType, error)
CompressedPageSize returns the size of the page at the given index
(in bytes).
FirstRowIndex returns the the first row in the page at the given index.
The returned row index is based on the row group that the page belongs
to, the first row has index zero.
NumPages returns the number of pages in the offset index.
Offset returns the offset starting from the beginning of the file for the
page at the given index.
func ColumnBuffer.OffsetIndex() (OffsetIndex, error)
func ColumnChunk.OffsetIndex() (OffsetIndex, error)
func github.com/polarsignals/frostdb/dynparquet.(*NilColumnChunk).OffsetIndex() (OffsetIndex, error)
Page values represent sequences of parquet values. From the Parquet
documentation: "Column chunks are a chunk of the data for a particular
column. They live in a particular row group and are guaranteed to be
contiguous in the file. Column chunks are divided up into pages. A page is
conceptually an indivisible unit (in terms of compression and encoding).
There can be multiple page types which are interleaved in a column chunk."
https://github.com/apache/parquet-format#glossary
Returns the page's min and max values.
The third value is a boolean indicating whether the page bounds were
available. Page bounds may not be known if the page contained no values
or only nulls, or if they were read from a parquet file which had neither
page statistics nor a page index.
Returns the column index that this page belongs to.
Returns the in-memory buffer holding the page values.
The intent is for the returned value to be used as input parameter when
calling the Encode method of the associated Type.
The slices referenced by the encoding.Values may be the same across
multiple calls to this method, applications must treat the content as
immutable.
( Page) DefinitionLevels() []byte
If the page contains indexed values, calling this method returns the
dictionary in which the values are looked up. Otherwise, the method
returns nil.
( Page) NumNulls() int64
Returns the number of rows, values, and nulls in the page. The number of
rows may be less than the number of values in the page if the page is
part of a repeated column.
( Page) NumValues() int64
Expose the lists of repetition and definition levels of the page.
The returned slices may be empty when the page has no repetition or
definition levels.
Returns the size of the page in bytes (uncompressed).
Returns a new page which is as slice of the receiver between row indexes
i and j.
Returns the type of values read from this page.
The returned type can be used to encode the page data, in the case of
an indexed page (which has a dictionary), the type is configured to
encode the indexes stored in the page rather than the plain values.
Returns a reader exposing the values contained in the page.
Depending on the underlying implementation, the returned reader may
support reading an array of typed Go values by implementing interfaces
like parquet.Int32Reader. Applications should use type assertions on
the returned reader to determine whether those optimizations are
available.
func (*Column).DecodeDataPageV1(header DataPageHeaderV1, page []byte, dict Dictionary) (Page, error)
func (*Column).DecodeDataPageV2(header DataPageHeaderV2, page []byte, dict Dictionary) (Page, error)
func ColumnBuffer.Page() Page
func Dictionary.Page() Page
func Page.Slice(i, j int64) Page
func PageReader.ReadPage() (Page, error)
func Pages.ReadPage() (Page, error)
func Type.NewPage(columnIndex, numValues int, data encoding.Values) Page
func PrintPage(w io.Writer, page Page) error
func Release(page Page)
func Retain(page Page)
func PageWriter.WritePage(Page) (int64, error)
func github.com/polarsignals/frostdb/pqarrow/writer.PageWriter.WritePage(Page) error
PageHeader is an interface implemented by parquet page headers.
Returns the page encoding.
Returns the number of values in the page (including nulls).
Returns the parquet format page type.
DataPageHeader (interface)
DataPageHeaderV1
DataPageHeaderV2
DictionaryPageHeader
PageReader is an interface implemented by types that support producing a
sequence of pages.
Reads and returns the next page from the sequence. When all pages have
been read, or if the sequence was closed, the method returns io.EOF.
Pages (interface)
func CopyPages(dst PageWriter, src PageReader) (numValues int64, err error)
Pages is an interface implemented by page readers returned by calling the
Pages method of ColumnChunk instances.
( Pages) Close() error
Reads and returns the next page from the sequence. When all pages have
been read, or if the sequence was closed, the method returns io.EOF.
Positions the stream on the given row index.
Some implementations of the interface may only allow seeking forward.
The method returns io.ErrClosedPipe if the stream had already been closed.
Pages : PageReader
Pages : RowSeeker
Pages : github.com/prometheus/common/expfmt.Closer
Pages : io.Closer
func AsyncPages(pages Pages) Pages
func (*Column).Pages() Pages
func ColumnBuffer.Pages() Pages
func ColumnChunk.Pages() Pages
func github.com/polarsignals/frostdb/dynparquet.(*NilColumnChunk).Pages() Pages
func AsyncPages(pages Pages) Pages
PageWriter is an interface implemented by types that support writing pages
to an underlying storage medium.
( PageWriter) WritePage(Page) (int64, error)
func CopyPages(dst PageWriter, src PageReader) (numValues int64, err error)
Deprecated: A Reader reads Go values from parquet files.
This example showcases a typical use of parquet readers:
reader := parquet.NewReader(file)
rows := []RowType{}
for {
row := RowType{}
err := reader.Read(&row)
if err != nil {
if err == io.EOF {
break
}
...
}
rows = append(rows, row)
}
if err := reader.Close(); err != nil {
...
}
For programs building with Go 1.18 or later, the GenericReader[T] type
supersedes this one.
Close closes the reader, preventing more rows from being read.
NumRows returns the number of rows that can be read from r.
Read reads the next row from r. The type of the row must match the schema
of the underlying parquet file or an error will be returned.
The method returns io.EOF when no more rows can be read from r.
ReadRows reads the next rows from r into the given Row buffer.
The returned values are laid out in the order expected by the
parquet.(*Schema).Reconstruct method.
The method returns io.EOF when no more rows can be read from r.
Reset repositions the reader at the beginning of the underlying parquet file.
Schema returns the schema of rows read by r.
SeekToRow positions r at the given row index.
*Reader : RowReader
*Reader : RowReaderWithSchema
*Reader : RowReadSeeker
*Reader : RowSeeker
*Reader : Rows
*Reader : github.com/prometheus/common/expfmt.Closer
*Reader : io.Closer
func NewReader(input io.ReaderAt, options ...ReaderOption) *Reader
func NewRowGroupReader(rowGroup RowGroup, options ...ReaderOption) *Reader
The ReaderConfig type carries configuration options for parquet readers.
ReaderConfig implements the ReaderOption interface so it can be used directly
as argument to the NewReader function when needed, for example:
reader := parquet.NewReader(output, schema, &parquet.ReaderConfig{
// ...
})
Schema *Schema
Apply applies the given list of options to c.
ConfigureReader applies configuration options from c to config.
Validate returns a non-nil error if the configuration of c is invalid.
*ReaderConfig : ReaderOption
func DefaultReaderConfig() *ReaderConfig
func NewReaderConfig(options ...ReaderOption) (*ReaderConfig, error)
func (*ReaderConfig).ConfigureReader(config *ReaderConfig)
func ReaderOption.ConfigureReader(*ReaderConfig)
func (*Schema).ConfigureReader(config *ReaderConfig)
ReaderOption is an interface implemented by types that carry configuration
options for parquet readers.
( ReaderOption) ConfigureReader(*ReaderConfig)
*ReaderConfig
*Schema
func NewGenericReader[T](input io.ReaderAt, options ...ReaderOption) *GenericReader[T]
func NewGenericRowGroupReader[T](rowGroup RowGroup, options ...ReaderOption) *GenericReader[T]
func NewReader(input io.ReaderAt, options ...ReaderOption) *Reader
func NewReaderConfig(options ...ReaderOption) (*ReaderConfig, error)
func NewRowGroupReader(rowGroup RowGroup, options ...ReaderOption) *Reader
func Read[T](r io.ReaderAt, size int64, options ...ReaderOption) (rows []T, err error)
func ReadFile[T](path string, options ...ReaderOption) (rows []T, err error)
func (*ReaderConfig).Apply(options ...ReaderOption)
ReadMode is an enum that is used to configure the way that a File reads pages.
func FileReadMode(mode ReadMode) FileOption
const DefaultReadMode
const ReadModeAsync
const ReadModeSync
Row represents a parquet row as a slice of values.
Each value should embed a column index, repetition level, and definition
level allowing the program to determine how to reconstruct the original
object from the row.
Clone creates a copy of the row which shares no pointers.
This method is useful to capture rows after a call to RowReader.ReadRows when
values need to be retained before the next call to ReadRows or after the lifespan
of the reader.
Equal returns true if row and other contain the same sequence of values.
Range calls f for each column of row.
func AppendRow(row Row, columns ...[]Value) Row
func MakeRow(columns ...[]Value) Row
func Row.Clone() Row
func (*RowBuilder).AppendRow(row Row) Row
func (*RowBuilder).Row() Row
func (*Schema).Deconstruct(row Row, value interface{}) Row
func github.com/polarsignals/frostdb/pqarrow.RecordToRow(final *Schema, record arrow.Record, index int) (Row, error)
func github.com/polarsignals/frostdb/samples.Sample.ToParquetRow(labelNames []string) Row
func AppendRow(row Row, columns ...[]Value) Row
func (*Buffer).WriteRows(rows []Row) (int, error)
func Conversion.Convert(rows []Row) (int, error)
func (*GenericBuffer)[T].WriteRows(rows []Row) (int, error)
func (*GenericReader)[T].ReadRows(rows []Row) (int, error)
func (*GenericWriter)[T].WriteRows(rows []Row) (int, error)
func (*Reader).ReadRows(rows []Row) (int, error)
func Row.Equal(other Row) bool
func (*RowBuffer)[T].WriteRows(rows []Row) (int, error)
func (*RowBuilder).AppendRow(row Row) Row
func RowReader.ReadRows([]Row) (int, error)
func RowReaderFunc.ReadRows(rows []Row) (int, error)
func RowReaderWithSchema.ReadRows([]Row) (int, error)
func RowReadSeeker.ReadRows([]Row) (int, error)
func Rows.ReadRows([]Row) (int, error)
func RowWriter.WriteRows([]Row) (int, error)
func RowWriterFunc.WriteRows(rows []Row) (int, error)
func RowWriterWithSchema.WriteRows([]Row) (int, error)
func (*Schema).Deconstruct(row Row, value interface{}) Row
func (*Schema).Reconstruct(value interface{}, row Row) error
func (*SortingWriter)[T].WriteRows(rows []Row) (int, error)
func (*Writer).WriteRows(rows []Row) (int, error)
func github.com/polarsignals/frostdb.ParquetWriter.WriteRows([]Row) (int, error)
func github.com/polarsignals/frostdb/dynparquet.NewDynamicRow(row Row, schema *Schema, dyncols map[string][]string, fields []Field) *dynparquet.DynamicRow
func github.com/polarsignals/frostdb/dynparquet.NewDynamicRows(rows []Row, schema *Schema, dynamicColumns map[string][]string, fields []Field) *dynparquet.DynamicRows
func github.com/polarsignals/frostdb/dynparquet.ValuesForIndex(row Row, index int) []Value
func github.com/polarsignals/frostdb/dynparquet.(*Buffer).WriteRows(rows []Row) (int, error)
func github.com/polarsignals/frostdb/dynparquet.ParquetWriter.WriteRows(rows []Row) (int, error)
Type Parameters:
T: any
RowBuffer is an implementation of the RowGroup interface which stores parquet
rows in memory.
Unlike GenericBuffer which uses a column layout to store values in memory
buffers, RowBuffer uses a row layout. The use of row layout provides greater
efficiency when sorting the buffer, which is the primary use case for the
RowBuffer type. Applications which intend to sort rows prior to writing them
to a parquet file will often see lower CPU utilization from using a RowBuffer
than a GenericBuffer.
RowBuffer values are not safe to use concurrently from multiple goroutines.
ColumnChunks returns a view of the buffer's columns.
Note that reading columns of a RowBuffer will be less efficient than reading
columns of a GenericBuffer since the latter uses a column layout. This method
is mainly exposed to satisfy the RowGroup interface, applications which need
compute-efficient column scans on in-memory buffers should likely use a
GenericBuffer instead.
The returned column chunks are snapshots at the time the method is called,
they remain valid until the next call to Reset on the buffer.
Len returns the number of rows in the buffer.
The method contributes to satisfying sort.Interface.
Less compares the rows at index i and j according to the sorting columns
configured on the buffer.
The method contributes to satisfying sort.Interface.
NumRows returns the number of rows currently written to the buffer.
Reset clears the content of the buffer without releasing its memory.
Rows returns a Rows instance exposing rows stored in the buffer.
The rows returned are a snapshot at the time the method is called.
The returned rows and values read from it remain valid until the next call
to Reset on the buffer.
Schema returns the schema of rows in the buffer.
SortingColumns returns the list of columns that rows are expected to be
sorted by.
The list of sorting columns is configured when the buffer is created and used
when it is sorted.
Note that unless the buffer is explicitly sorted, there are no guarantees
that the rows it contains will be in the order specified by the sorting
columns.
Swap exchanges the rows at index i and j in the buffer.
The method contributes to satisfying sort.Interface.
Write writes rows to the buffer, returning the number of rows written.
WriteRows writes parquet rows to the buffer, returing the number of rows
written.
*RowBuffer : RowGroup
*RowBuffer : RowWriter
*RowBuffer : RowWriterWithSchema
*RowBuffer : github.com/polarsignals/frostdb/query/expr.Particulate
*RowBuffer : sort.Interface
func NewRowBuffer[T](options ...RowGroupOption) *RowBuffer[T]
RowBuilder is a type which helps build parquet rows incrementally by adding
values to columns.
Add adds columnValue to the column at columnIndex.
AppendRow appends the current state of b to row and returns it.
Next must be called to indicate the start of a new repeated record for the
column at the given index.
If the column index is part of a repeated group, the builder automatically
starts a new record for all adjacent columns, the application does not need
to call this method for each column of the repeated group.
Next must be called after adding a sequence of records.
Reset clears the internal state of b, making it possible to reuse while
retaining the internal buffers.
Row materializes the current state of b into a parquet row.
func NewRowBuilder(schema Node) *RowBuilder
RowGroup is an interface representing a parquet row group. From the Parquet
docs, a RowGroup is "a logical horizontal partitioning of the data into rows.
There is no physical structure that is guaranteed for a row group. A row
group consists of a column chunk for each column in the dataset."
https://github.com/apache/parquet-format#glossary
Returns the list of column chunks in this row group. The chunks are
ordered in the order of leaf columns from the row group's schema.
If the underlying implementation is not read-only, the returned
parquet.ColumnChunk may implement other interfaces: for example,
parquet.ColumnBuffer if the chunk is backed by an in-memory buffer,
or typed writer interfaces like parquet.Int32Writer depending on the
underlying type of values that can be written to the chunk.
As an optimization, the row group may return the same slice across
multiple calls to this method. Applications should treat the returned
slice as read-only.
Returns the number of rows in the group.
Returns a reader exposing the rows of the row group.
As an optimization, the returned parquet.Rows object may implement
parquet.RowWriterTo, and test the RowWriter it receives for an
implementation of the parquet.RowGroupWriter interface.
This optimization mechanism is leveraged by the parquet.CopyRows function
to skip the generic row-by-row copy algorithm and delegate the copy logic
to the parquet.Rows object.
Returns the schema of rows in the group.
Returns the list of sorting columns describing how rows are sorted in the
group.
The method will return an empty slice if the rows are not sorted.
*Buffer
*GenericBuffer[...]
*RowBuffer[...]
*github.com/polarsignals/frostdb/dynparquet.Buffer
github.com/polarsignals/frostdb/dynparquet.DynamicRowGroup (interface)
*github.com/polarsignals/frostdb/dynparquet.DynamicRowGroupMergeAdapter
github.com/polarsignals/frostdb/dynparquet.MergedRowGroup
github.com/polarsignals/frostdb/dynparquet.PooledBuffer
github.com/polarsignals/frostdb/index.ReleaseableRowGroup (interface)
RowGroup : github.com/polarsignals/frostdb/query/expr.Particulate
func ConvertRowGroup(rowGroup RowGroup, conv Conversion) RowGroup
func MergeRowGroups(rowGroups []RowGroup, options ...RowGroupOption) (RowGroup, error)
func MultiRowGroup(rowGroups ...RowGroup) RowGroup
func (*File).RowGroups() []RowGroup
func RowGroupReader.ReadRowGroup() (RowGroup, error)
func ConvertRowGroup(rowGroup RowGroup, conv Conversion) RowGroup
func MergeRowGroups(rowGroups []RowGroup, options ...RowGroupOption) (RowGroup, error)
func MultiRowGroup(rowGroups ...RowGroup) RowGroup
func NewGenericRowGroupReader[T](rowGroup RowGroup, options ...ReaderOption) *GenericReader[T]
func NewRowGroupReader(rowGroup RowGroup, options ...ReaderOption) *Reader
func NewRowGroupRowReader(rowGroup RowGroup) Rows
func PrintRowGroup(w io.Writer, rowGroup RowGroup) error
func (*Buffer).WriteRowGroup(rowGroup RowGroup) (int64, error)
func (*GenericBuffer)[T].WriteRowGroup(rowGroup RowGroup) (int64, error)
func (*GenericWriter)[T].WriteRowGroup(rowGroup RowGroup) (int64, error)
func RowGroupWriter.WriteRowGroup(RowGroup) (int64, error)
func (*Writer).WriteRowGroup(rowGroup RowGroup) (int64, error)
func github.com/polarsignals/frostdb/dynparquet.NewDynamicRowGroupMergeAdapter(schema *Schema, sortingColumns []SortingColumn, mergedDynamicColumns map[string][]string, originalRowGroup RowGroup) *dynparquet.DynamicRowGroupMergeAdapter
func github.com/polarsignals/frostdb/dynparquet.(*Buffer).WriteRowGroup(rg RowGroup) (int64, error)
func github.com/polarsignals/frostdb/pqarrow.ParquetRowGroupToArrowSchema(ctx context.Context, rg RowGroup, s *dynparquet.Schema, options logicalplan.IterOptions) (*arrow.Schema, error)
func github.com/polarsignals/frostdb/pqarrow.(*ParquetConverter).Convert(ctx context.Context, rg RowGroup, s *dynparquet.Schema) error
The RowGroupConfig type carries configuration options for parquet row groups.
RowGroupConfig implements the RowGroupOption interface so it can be used
directly as argument to the NewBuffer function when needed, for example:
buffer := parquet.NewBuffer(&parquet.RowGroupConfig{
ColumnBufferCapacity: 10_000,
})
ColumnBufferCapacity int
Schema *Schema
Sorting SortingConfig
(*RowGroupConfig) Apply(options ...RowGroupOption)
(*RowGroupConfig) ConfigureRowGroup(config *RowGroupConfig)
Validate returns a non-nil error if the configuration of c is invalid.
*RowGroupConfig : RowGroupOption
func DefaultRowGroupConfig() *RowGroupConfig
func NewRowGroupConfig(options ...RowGroupOption) (*RowGroupConfig, error)
func (*RowGroupConfig).ConfigureRowGroup(config *RowGroupConfig)
func RowGroupOption.ConfigureRowGroup(*RowGroupConfig)
func (*Schema).ConfigureRowGroup(config *RowGroupConfig)
RowGroupOption is an interface implemented by types that carry configuration
options for parquet row groups.
( RowGroupOption) ConfigureRowGroup(*RowGroupConfig)
*RowGroupConfig
*Schema
func ColumnBufferCapacity(size int) RowGroupOption
func SortingRowGroupConfig(options ...SortingOption) RowGroupOption
func MergeRowGroups(rowGroups []RowGroup, options ...RowGroupOption) (RowGroup, error)
func NewBuffer(options ...RowGroupOption) *Buffer
func NewGenericBuffer[T](options ...RowGroupOption) *GenericBuffer[T]
func NewRowBuffer[T](options ...RowGroupOption) *RowBuffer[T]
func NewRowGroupConfig(options ...RowGroupOption) (*RowGroupConfig, error)
func (*RowGroupConfig).Apply(options ...RowGroupOption)
RowGroupReader is an interface implemented by types that expose sequences of
row groups to the application.
( RowGroupReader) ReadRowGroup() (RowGroup, error)
RowGroupWriter is an interface implemented by types that allow the program
to write row groups.
( RowGroupWriter) WriteRowGroup(RowGroup) (int64, error)
*Buffer
*GenericBuffer[...]
*GenericWriter[...]
*Writer
*github.com/polarsignals/frostdb/dynparquet.Buffer
github.com/polarsignals/frostdb/dynparquet.PooledBuffer
RowReader reads a sequence of parquet rows.
ReadRows reads rows from the reader, returning the number of rows read
into the buffer, and any error that occurred. Note that the rows read
into the buffer are not safe for reuse after a subsequent call to
ReadRows. Callers that want to reuse rows must copy the rows using Clone.
When all rows have been read, the reader returns io.EOF to indicate the
end of the sequence. It is valid for the reader to return both a non-zero
number of rows and a non-nil error (including io.EOF).
The buffer of rows passed as argument will be used to store values of
each row read from the reader. If the rows are not nil, the backing array
of the slices will be used as an optimization to avoid re-allocating new
arrays.
The application is expected to handle the case where ReadRows returns
less rows than requested and no error, by looking at the first returned
value from ReadRows, which is the number of rows that were read.
*GenericReader[...]
*Reader
RowReaderFunc
RowReaderWithSchema (interface)
RowReadSeeker (interface)
Rows (interface)
func DedupeRowReader(reader RowReader, compare func(Row, Row) int) RowReader
func FilterRowReader(reader RowReader, predicate func(Row) bool) RowReader
func MergeRowReaders(readers []RowReader, compare func(Row, Row) int) RowReader
func ScanRowReader(reader RowReader, predicate func(Row, int64) bool) RowReader
func TransformRowReader(reader RowReader, transform func(dst, src Row) (Row, error)) RowReader
func ConvertRowReader(rows RowReader, conv Conversion) RowReaderWithSchema
func CopyRows(dst RowWriter, src RowReader) (int64, error)
func DedupeRowReader(reader RowReader, compare func(Row, Row) int) RowReader
func FilterRowReader(reader RowReader, predicate func(Row) bool) RowReader
func MergeRowReaders(readers []RowReader, compare func(Row, Row) int) RowReader
func ScanRowReader(reader RowReader, predicate func(Row, int64) bool) RowReader
func TransformRowReader(reader RowReader, transform func(dst, src Row) (Row, error)) RowReader
func (*GenericWriter)[T].ReadRowsFrom(rows RowReader) (int64, error)
func RowReaderFrom.ReadRowsFrom(RowReader) (int64, error)
func (*Writer).ReadRowsFrom(rows RowReader) (written int64, err error)
RowReaderFrom reads parquet rows from reader.
( RowReaderFrom) ReadRowsFrom(RowReader) (int64, error)
*GenericWriter[...]
*Writer
RowReaderFunc is a function type implementing the RowReader interface.
( RowReaderFunc) ReadRows(rows []Row) (int, error)
RowReaderFunc : RowReader
RowReaderWithSchema is an extension of the RowReader interface which
advertises the schema of rows returned by ReadRow calls.
ReadRows reads rows from the reader, returning the number of rows read
into the buffer, and any error that occurred. Note that the rows read
into the buffer are not safe for reuse after a subsequent call to
ReadRows. Callers that want to reuse rows must copy the rows using Clone.
When all rows have been read, the reader returns io.EOF to indicate the
end of the sequence. It is valid for the reader to return both a non-zero
number of rows and a non-nil error (including io.EOF).
The buffer of rows passed as argument will be used to store values of
each row read from the reader. If the rows are not nil, the backing array
of the slices will be used as an optimization to avoid re-allocating new
arrays.
The application is expected to handle the case where ReadRows returns
less rows than requested and no error, by looking at the first returned
value from ReadRows, which is the number of rows that were read.
( RowReaderWithSchema) Schema() *Schema
*GenericReader[...]
*Reader
Rows (interface)
RowReaderWithSchema : RowReader
func ConvertRowReader(rows RowReader, conv Conversion) RowReaderWithSchema
RowReadSeeker is an interface implemented by row readers which support
seeking to arbitrary row positions.
ReadRows reads rows from the reader, returning the number of rows read
into the buffer, and any error that occurred. Note that the rows read
into the buffer are not safe for reuse after a subsequent call to
ReadRows. Callers that want to reuse rows must copy the rows using Clone.
When all rows have been read, the reader returns io.EOF to indicate the
end of the sequence. It is valid for the reader to return both a non-zero
number of rows and a non-nil error (including io.EOF).
The buffer of rows passed as argument will be used to store values of
each row read from the reader. If the rows are not nil, the backing array
of the slices will be used as an optimization to avoid re-allocating new
arrays.
The application is expected to handle the case where ReadRows returns
less rows than requested and no error, by looking at the first returned
value from ReadRows, which is the number of rows that were read.
Positions the stream on the given row index.
Some implementations of the interface may only allow seeking forward.
The method returns io.ErrClosedPipe if the stream had already been closed.
*GenericReader[...]
*Reader
Rows (interface)
RowReadSeeker : RowReader
RowReadSeeker : RowSeeker
Rows is an interface implemented by row readers returned by calling the Rows
method of RowGroup instances.
Applications should call Close when they are done using a Rows instance in
order to release the underlying resources held by the row sequence.
After calling Close, all attempts to read more rows will return io.EOF.
( Rows) Close() error
ReadRows reads rows from the reader, returning the number of rows read
into the buffer, and any error that occurred. Note that the rows read
into the buffer are not safe for reuse after a subsequent call to
ReadRows. Callers that want to reuse rows must copy the rows using Clone.
When all rows have been read, the reader returns io.EOF to indicate the
end of the sequence. It is valid for the reader to return both a non-zero
number of rows and a non-nil error (including io.EOF).
The buffer of rows passed as argument will be used to store values of
each row read from the reader. If the rows are not nil, the backing array
of the slices will be used as an optimization to avoid re-allocating new
arrays.
The application is expected to handle the case where ReadRows returns
less rows than requested and no error, by looking at the first returned
value from ReadRows, which is the number of rows that were read.
( Rows) Schema() *Schema
Positions the stream on the given row index.
Some implementations of the interface may only allow seeking forward.
The method returns io.ErrClosedPipe if the stream had already been closed.
*GenericReader[...]
*Reader
Rows : RowReader
Rows : RowReaderWithSchema
Rows : RowReadSeeker
Rows : RowSeeker
Rows : github.com/prometheus/common/expfmt.Closer
Rows : io.Closer
func NewRowGroupRowReader(rowGroup RowGroup) Rows
func (*Buffer).Rows() Rows
func (*GenericBuffer)[T].Rows() Rows
func (*RowBuffer)[T].Rows() Rows
func RowGroup.Rows() Rows
func github.com/polarsignals/frostdb/dynparquet.(*Buffer).Rows() Rows
func github.com/polarsignals/frostdb/dynparquet.DynamicRowGroup.Rows() Rows
func github.com/polarsignals/frostdb/dynparquet.(*DynamicRowGroupMergeAdapter).Rows() Rows
func github.com/polarsignals/frostdb/index.ReleaseableRowGroup.Rows() Rows
RowSeeker is an interface implemented by readers of parquet rows which can be
positioned at a specific row index.
Positions the stream on the given row index.
Some implementations of the interface may only allow seeking forward.
The method returns io.ErrClosedPipe if the stream had already been closed.
*GenericReader[...]
Pages (interface)
*Reader
RowReadSeeker (interface)
Rows (interface)
github.com/polarsignals/frostdb/dynparquet.DynamicRowReader (interface)
RowWriter writes parquet rows to an underlying medium.
Writes rows to the writer, returning the number of rows written and any
error that occurred.
Because columnar operations operate on independent columns of values,
writes of rows may not be atomic operations, and could result in some
rows being partially written. The method returns the number of rows that
were successfully written, but if an error occurs, values of the row(s)
that failed to be written may have been partially committed to their
columns. For that reason, applications should consider a write error as
fatal and assume that they need to discard the state, they cannot retry
the write nor recover the underlying file.
*Buffer
*GenericBuffer[...]
*GenericWriter[...]
*RowBuffer[...]
RowWriterFunc
RowWriterWithSchema (interface)
*SortingWriter[...]
*Writer
github.com/polarsignals/frostdb.ParquetWriter (interface)
*github.com/polarsignals/frostdb/dynparquet.Buffer
github.com/polarsignals/frostdb/dynparquet.ParquetWriter (interface)
github.com/polarsignals/frostdb/dynparquet.PooledBuffer
github.com/polarsignals/frostdb/dynparquet.PooledWriter
func DedupeRowWriter(writer RowWriter, compare func(Row, Row) int) RowWriter
func FilterRowWriter(writer RowWriter, predicate func(Row) bool) RowWriter
func MultiRowWriter(writers ...RowWriter) RowWriter
func TransformRowWriter(writer RowWriter, transform func(dst, src Row) (Row, error)) RowWriter
func CopyRows(dst RowWriter, src RowReader) (int64, error)
func DedupeRowWriter(writer RowWriter, compare func(Row, Row) int) RowWriter
func FilterRowWriter(writer RowWriter, predicate func(Row) bool) RowWriter
func MultiRowWriter(writers ...RowWriter) RowWriter
func TransformRowWriter(writer RowWriter, transform func(dst, src Row) (Row, error)) RowWriter
func RowWriterTo.WriteRowsTo(RowWriter) (int64, error)
RowWriterFunc is a function type implementing the RowWriter interface.
( RowWriterFunc) WriteRows(rows []Row) (int, error)
RowWriterFunc : RowWriter
RowWriterTo writes parquet rows to a writer.
( RowWriterTo) WriteRowsTo(RowWriter) (int64, error)
RowWriterWithSchema is an extension of the RowWriter interface which
advertises the schema of rows expected to be passed to WriteRow calls.
( RowWriterWithSchema) Schema() *Schema
Writes rows to the writer, returning the number of rows written and any
error that occurred.
Because columnar operations operate on independent columns of values,
writes of rows may not be atomic operations, and could result in some
rows being partially written. The method returns the number of rows that
were successfully written, but if an error occurs, values of the row(s)
that failed to be written may have been partially committed to their
columns. For that reason, applications should consider a write error as
fatal and assume that they need to discard the state, they cannot retry
the write nor recover the underlying file.
*Buffer
*GenericBuffer[...]
*GenericWriter[...]
*RowBuffer[...]
*SortingWriter[...]
*Writer
*github.com/polarsignals/frostdb/dynparquet.Buffer
github.com/polarsignals/frostdb/dynparquet.ParquetWriter (interface)
github.com/polarsignals/frostdb/dynparquet.PooledBuffer
github.com/polarsignals/frostdb/dynparquet.PooledWriter
RowWriterWithSchema : RowWriter
Schema represents a parquet schema created from a Go value.
Schema implements the Node interface to represent the root node of a parquet
schema.
Columns returns the list of column paths available in the schema.
The method always returns the same slice value across calls to ColumnPaths,
applications should treat it as immutable.
Comparator constructs a comparator function which orders rows according to
the list of sorting columns passed as arguments.
Compression returns the compression codec set on the root node of the parquet
schema.
ConfigureReader satisfies the ReaderOption interface, allowing Schema
instances to be passed to NewReader to pre-declare the schema of rows
read from the reader.
ConfigureRowGroup satisfies the RowGroupOption interface, allowing Schema
instances to be passed to row group constructors to pre-declare the schema of
the output parquet file.
ConfigureWriter satisfies the WriterOption interface, allowing Schema
instances to be passed to NewWriter to pre-declare the schema of the
output parquet file.
Deconstruct deconstructs a Go value and appends it to a row.
The method panics is the structure of the go value does not match the
parquet schema.
Encoding returns the encoding set on the root node of the parquet schema.
Fields returns the list of fields on the root node of the parquet schema.
GoType returns the Go type that best represents the schema.
ID returns field id of the root node.
Leaf returns true if the root node of the parquet schema is a leaf column.
Lookup returns the leaf column at the given path.
The path is the sequence of column names identifying a leaf column (not
including the root).
If the path was not found in the mapping, or if it did not represent a
leaf column of the parquet schema, the boolean will be false.
Name returns the name of s.
Optional returns false since the root node of a parquet schema is always required.
Reconstruct reconstructs a Go value from a row.
The go value passed as first argument must be a non-nil pointer for the
row to be decoded into.
The method panics if the structure of the go value and parquet row do not
match.
Repeated returns false since the root node of a parquet schema is always required.
Required returns true since the root node of a parquet schema is always required.
String returns a parquet schema representation of s.
Type returns the parquet type of s.
*Schema : Node
*Schema : ReaderOption
*Schema : RowGroupOption
*Schema : WriterOption
*Schema : github.com/polarsignals/frostdb/query/logicalplan.Named
*Schema : expvar.Var
*Schema : fmt.Stringer
func NewSchema(name string, root Node) *Schema
func SchemaOf(model interface{}) *Schema
func (*Buffer).Schema() *Schema
func Conversion.Schema() *Schema
func (*File).Schema() *Schema
func (*GenericBuffer)[T].Schema() *Schema
func (*GenericReader)[T].Schema() *Schema
func (*GenericWriter)[T].Schema() *Schema
func (*Reader).Schema() *Schema
func (*RowBuffer)[T].Schema() *Schema
func RowGroup.Schema() *Schema
func RowReaderWithSchema.Schema() *Schema
func Rows.Schema() *Schema
func RowWriterWithSchema.Schema() *Schema
func (*SortingWriter)[T].Schema() *Schema
func (*Writer).Schema() *Schema
func github.com/polarsignals/frostdb/dynparquet.ParquetSchemaFromV2Definition(def *schemav2pb.Schema) *Schema
func github.com/polarsignals/frostdb/dynparquet.(*Buffer).Schema() *Schema
func github.com/polarsignals/frostdb/dynparquet.DynamicRowGroup.Schema() *Schema
func github.com/polarsignals/frostdb/dynparquet.(*DynamicRowGroupMergeAdapter).Schema() *Schema
func github.com/polarsignals/frostdb/dynparquet.ParquetWriter.Schema() *Schema
func github.com/polarsignals/frostdb/dynparquet.(*Schema).ParquetSchema() *Schema
func github.com/polarsignals/frostdb/index.ReleaseableRowGroup.Schema() *Schema
func github.com/polarsignals/frostdb/query/expr.Particulate.Schema() *Schema
func FileSchema(schema *Schema) FileOption
func github.com/polarsignals/frostdb/dynparquet.NewDynamicRow(row Row, schema *Schema, dyncols map[string][]string, fields []Field) *dynparquet.DynamicRow
func github.com/polarsignals/frostdb/dynparquet.NewDynamicRowGroupMergeAdapter(schema *Schema, sortingColumns []SortingColumn, mergedDynamicColumns map[string][]string, originalRowGroup RowGroup) *dynparquet.DynamicRowGroupMergeAdapter
func github.com/polarsignals/frostdb/dynparquet.NewDynamicRows(rows []Row, schema *Schema, dynamicColumns map[string][]string, fields []Field) *dynparquet.DynamicRows
func github.com/polarsignals/frostdb/pqarrow.ParquetSchemaToArrowSchema(ctx context.Context, schema *Schema, s *dynparquet.Schema, options logicalplan.IterOptions) (*arrow.Schema, error)
func github.com/polarsignals/frostdb/pqarrow.RecordToDynamicRow(pqSchema *Schema, record arrow.Record, dyncols map[string][]string, index int) (*dynparquet.DynamicRow, error)
func github.com/polarsignals/frostdb/pqarrow.RecordToRow(final *Schema, record arrow.Record, index int) (Row, error)
SortingColumn represents a column by which a row group is sorted.
Returns true if the column will sort values in descending order.
Returns true if the column will put null values at the beginning.
Returns the path of the column in the row group schema, omitting the name
of the root node.
github.com/polarsignals/frostdb/dynparquet.SortingColumn (interface)
func Ascending(path ...string) SortingColumn
func Descending(path ...string) SortingColumn
func NullsFirst(sortingColumn SortingColumn) SortingColumn
func (*Buffer).SortingColumns() []SortingColumn
func (*GenericBuffer)[T].SortingColumns() []SortingColumn
func (*RowBuffer)[T].SortingColumns() []SortingColumn
func RowGroup.SortingColumns() []SortingColumn
func github.com/polarsignals/frostdb/dynparquet.SortingColumnsFromDef(def *schemav2pb.Schema) ([]SortingColumn, error)
func github.com/polarsignals/frostdb/dynparquet.(*Buffer).SortingColumns() []SortingColumn
func github.com/polarsignals/frostdb/dynparquet.DynamicRowGroup.SortingColumns() []SortingColumn
func github.com/polarsignals/frostdb/dynparquet.(*DynamicRowGroupMergeAdapter).SortingColumns() []SortingColumn
func github.com/polarsignals/frostdb/dynparquet.Schema.ParquetSortingColumns(dynamicColumns map[string][]string) []SortingColumn
func github.com/polarsignals/frostdb/index.ReleaseableRowGroup.SortingColumns() []SortingColumn
func NullsFirst(sortingColumn SortingColumn) SortingColumn
func SortingColumns(columns ...SortingColumn) SortingOption
func (*Schema).Comparator(sortingColumns ...SortingColumn) func(Row, Row) int
func github.com/polarsignals/frostdb/dynparquet.NewDynamicRowGroupMergeAdapter(schema *Schema, sortingColumns []SortingColumn, mergedDynamicColumns map[string][]string, originalRowGroup RowGroup) *dynparquet.DynamicRowGroupMergeAdapter
The SortingConfig type carries configuration options for parquet row groups.
SortingConfig implements the SortingOption interface so it can be used
directly as argument to the NewSortingWriter function when needed,
for example:
buffer := parquet.NewSortingWriter[Row](
parquet.SortingWriterConfig(
parquet.DropDuplicatedRows(true),
),
})
DropDuplicatedRows bool
SortingBuffers BufferPool
SortingColumns []SortingColumn
(*SortingConfig) Apply(options ...SortingOption)
(*SortingConfig) ConfigureSorting(config *SortingConfig)
(*SortingConfig) Validate() error
*SortingConfig : SortingOption
func DefaultSortingConfig() *SortingConfig
func NewSortingConfig(options ...SortingOption) (*SortingConfig, error)
func (*SortingConfig).ConfigureSorting(config *SortingConfig)
func SortingOption.ConfigureSorting(*SortingConfig)
SortingOption is an interface implemented by types that carry configuration
options for parquet sorting writers.
( SortingOption) ConfigureSorting(*SortingConfig)
*SortingConfig
func DropDuplicatedRows(drop bool) SortingOption
func SortingBuffers(buffers BufferPool) SortingOption
func SortingColumns(columns ...SortingColumn) SortingOption
func NewSortingConfig(options ...SortingOption) (*SortingConfig, error)
func SortingRowGroupConfig(options ...SortingOption) RowGroupOption
func SortingWriterConfig(options ...SortingOption) WriterOption
func (*SortingConfig).Apply(options ...SortingOption)
Type Parameters:
T: any
SortingWriter is a type similar to GenericWriter but it ensures that rows
are sorted according to the sorting columns configured on the writer.
The writer accumulates rows in an in-memory buffer which is sorted when it
reaches the target number of rows, then written to a temporary row group.
When the writer is flushed or closed, the temporary row groups are merged
into a row group in the output file, ensuring that rows remain sorted in the
final row group.
Because row groups get encoded and compressed, they hold a lot less memory
than if all rows were retained in memory. Sorting then merging rows chunks
also tends to be a lot more efficient than sorting all rows in memory as it
results in better CPU cache utilization since sorting multi-megabyte arrays
causes a lot of cache misses since the data set cannot be held in CPU caches.
(*SortingWriter[T]) Close() error
(*SortingWriter[T]) Flush() error
(*SortingWriter[T]) Reset(output io.Writer)
(*SortingWriter[T]) Schema() *Schema
(*SortingWriter[T]) SetKeyValueMetadata(key, value string)
(*SortingWriter[T]) Write(rows []T) (int, error)
(*SortingWriter[T]) WriteRows(rows []Row) (int, error)
*SortingWriter : RowWriter
*SortingWriter : RowWriterWithSchema
*SortingWriter : github.com/apache/thrift/lib/go/thrift.Flusher
*SortingWriter : github.com/polarsignals/frostdb.ParquetWriter
*SortingWriter : github.com/prometheus/common/expfmt.Closer
*SortingWriter : io.Closer
func NewSortingWriter[T](output io.Writer, sortRowCount int64, options ...WriterOption) *SortingWriter[T]
TimeUnit represents units of time in the parquet type system.
Returns the precision of the time unit as a time.Duration value.
Converts the TimeUnit value to its representation in the parquet thrift
format.
func Time(unit TimeUnit) Node
func Timestamp(unit TimeUnit) Node
var Microsecond
var Millisecond
var Nanosecond
The Type interface represents logical types of the parquet type system.
Types are immutable and therefore safe to access from multiple goroutines.
Assigns a Parquet value to a Go value. Returns an error if assignment is
not possible. The source Value must be an expected logical type for the
receiver. This can be accomplished using ConvertValue.
ColumnOrder returns the type's column order. For group types, this method
returns nil.
The order describes the comparison logic implemented by the Less method.
As an optimization, the method may return the same pointer across
multiple calls. Applications must treat the returned value as immutable,
mutating the value will result in undefined behavior.
Compares two values and returns a negative integer if a < b, positive if
a > b, or zero if a == b.
The values' Kind must match the type, otherwise the result is undefined.
The method panics if it is called on a group type.
Convert a Parquet Value of the given Type into a Parquet Value that is
compatible with the receiver. The returned Value is suitable to be passed
to AssignValue.
Returns the logical type's equivalent converted type. When there are
no equivalent converted type, the method returns nil.
As an optimization, the method may return the same pointer across
multiple calls. Applications must treat the returned value as immutable,
mutating the value will result in undefined behavior.
Assuming the src buffer contains values encoding in the given encoding,
decodes the input and produces the encoded values into the dst output
buffer passed as first argument by dispatching the call to one of the
encoding methods.
Assuming the src buffer contains PLAIN encoded values of the type it is
called on, applies the given encoding and produces the output to the dst
buffer passed as first argument by dispatching the call to one of the
encoding methods.
Returns an estimation of the output size after decoding the values passed
as first argument with the given encoding.
For most types, this is similar to calling EstimateSize with the known
number of encoded values. For variable size types, using this method may
provide a more precise result since it can inspect the input buffer.
Returns an estimation of the number of values of this type that can be
held in the given byte size.
The method returns zero for group types.
Returns an estimation of the number of bytes required to hold the given
number of values of this type in memory.
The method returns zero for group types.
Returns the Kind value representing the underlying physical type.
The method panics if it is called on a group type.
For integer and floating point physical types, the method returns the
size of values in bits.
For fixed-length byte arrays, the method returns the size of elements
in bytes.
For other types, the value is zero.
Returns the logical type as a *format.LogicalType value. When the logical
type is unknown, the method returns nil.
As an optimization, the method may return the same pointer across
multiple calls. Applications must treat the returned value as immutable,
mutating the value will result in undefined behavior.
Creates a row group buffer column for values of this type.
Column buffers are created using the index of the column they are
accumulating values in memory for (relative to the parent schema),
and the size of their memory buffer.
The application may give an estimate of the number of values it expects
to write to the buffer as second argument. This estimate helps set the
initialize buffer capacity but is not a hard limit, the underlying memory
buffer will grown as needed to allow more values to be written. Programs
may use the Size method of the column buffer (or the parent row group,
when relevant) to determine how many bytes are being used, and perform a
flush of the buffers to a storage layer.
The method panics if it is called on a group type.
Creates a column indexer for values of this type.
The size limit is a hint to the column indexer that it is allowed to
truncate the page boundaries to the given size. Only BYTE_ARRAY and
FIXED_LEN_BYTE_ARRAY types currently take this value into account.
A value of zero or less means no limits.
The method panics if it is called on a group type.
Creates a dictionary holding values of this type.
The dictionary retains the data buffer, it does not make a copy of it.
If the application needs to share ownership of the memory buffer, it must
ensure that it will not be modified while the page is in use, or it must
make a copy of it prior to creating the dictionary.
The method panics if the data type does not correspond to the parquet
type it is called on.
Creates a page belonging to a column at the given index, backed by the
data buffer.
The page retains the data buffer, it does not make a copy of it. If the
application needs to share ownership of the memory buffer, it must ensure
that it will not be modified while the page is in use, or it must make a
copy of it prior to creating the page.
The method panics if the data type does not correspond to the parquet
type it is called on.
Creates an encoding.Values instance backed by the given buffers.
The offsets is only used by BYTE_ARRAY types, where it represents the
positions of each variable length value in the values buffer.
The following expression creates an empty instance for any type:
values := typ.NewValues(nil, nil)
The method panics if it is called on group types.
Returns the physical type as a *format.Type value. For group types, this
method returns nil.
As an optimization, the method may return the same pointer across
multiple calls. Applications must treat the returned value as immutable,
mutating the value will result in undefined behavior.
Returns a human-readable representation of the parquet type.
Type : expvar.Var
Type : fmt.Stringer
func FixedLenByteArrayType(length int) Type
func (*Column).Type() Type
func ColumnBuffer.Type() Type
func ColumnChunk.Type() Type
func Dictionary.Type() Type
func Field.Type() Type
func Group.Type() Type
func Node.Type() Type
func Page.Type() Type
func (*Schema).Type() Type
func github.com/polarsignals/frostdb/dynparquet.(*NilColumnChunk).Type() Type
func Decimal(scale, precision int, typ Type) Node
func Leaf(typ Type) Node
func Search(index ColumnIndex, value Value, typ Type) int
func Type.ConvertValue(val Value, typ Type) (Value, error)
func github.com/polarsignals/frostdb/dynparquet.NewNilColumnChunk(typ Type, columnIndex, numValues int) *dynparquet.NilColumnChunk
var BooleanType
var ByteArrayType
var DoubleType
var FloatType
var Int32Type
var Int64Type
var Int96Type
The Value type is similar to the reflect.Value abstraction of Go values, but
for parquet values. Value instances wrap underlying Go values mapped to one
of the parquet physical types.
Value instances are small, immutable objects, and usually passed by value
between function calls.
The zero-value of Value represents the null parquet value.
AppendBytes appends the binary representation of v to b.
If v is the null value, b is returned unchanged.
Boolean returns v as a bool, assuming the underlying type is BOOLEAN.
Byte returns v as a byte, which may truncate the underlying byte.
ByteArray returns v as a []byte, assuming the underlying type is either
BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY.
The application must treat the returned byte slice as a read-only value,
mutating the content will result in undefined behaviors.
Bytes returns the binary representation of v.
If v is the null value, an nil byte slice is returned.
Clone returns a copy of v which does not share any pointers with it.
Column returns the column index within the row that v was created from.
Returns -1 if the value does not carry a column index.
DefinitionLevel returns the definition level of v.
Double returns v as a float64, assuming the underlying type is DOUBLE.
Float returns v as a float32, assuming the underlying type is FLOAT.
Format outputs a human-readable representation of v to w, using r as the
formatting verb to describe how the value should be printed.
The following formatting options are supported:
%c prints the column index
%+c prints the column index, prefixed with "C:"
%d prints the definition level
%+d prints the definition level, prefixed with "D:"
%r prints the repetition level
%+r prints the repetition level, prefixed with "R:"
%q prints the quoted representation of v
%+q prints the quoted representation of v, prefixed with "V:"
%s prints the string representation of v
%+s prints the string representation of v, prefixed with "V:"
%v same as %s
%+v prints a verbose representation of v
%#v prints a Go value representation of v
Format satisfies the fmt.Formatter interface.
GoString returns a Go value string representation of v.
Int32 returns v as a int32, assuming the underlying type is INT32.
Int64 returns v as a int64, assuming the underlying type is INT64.
Int96 returns v as a int96, assuming the underlying type is INT96.
IsNull returns true if v is the null value.
Kind returns the kind of v, which represents its parquet physical type.
Level returns v with the repetition level, definition level, and column index
set to the values passed as arguments.
The method panics if either argument is negative.
RepetitionLevel returns the repetition level of v.
String returns a string representation of v.
Uint32 returns v as a uint32, assuming the underlying type is INT32.
Uint64 returns v as a uint64, assuming the underlying type is INT64.
Value : github.com/apache/arrow-go/v18/internal/hashing.ByteSlice
Value : expvar.Var
Value : fmt.Formatter
Value : fmt.GoStringer
Value : fmt.Stringer
Value : math/rand/v2.Source
func BooleanValue(value bool) Value
func ByteArrayValue(value []byte) Value
func DoubleValue(value float64) Value
func FixedLenByteArrayValue(value []byte) Value
func FloatValue(value float32) Value
func Int32Value(value int32) Value
func Int64Value(value int64) Value
func Int96Value(value deprecated.Int96) Value
func NullValue() Value
func ValueOf(v interface{}) Value
func ZeroValue(kind Kind) Value
func ColumnIndex.MaxValue(int) Value
func ColumnIndex.MinValue(int) Value
func Dictionary.Bounds(indexes []int32) (min, max Value)
func Dictionary.Index(index int32) Value
func Kind.Value(v []byte) Value
func Page.Bounds() (min, max Value, ok bool)
func Type.ConvertValue(val Value, typ Type) (Value, error)
func Value.Clone() Value
func Value.Level(repetitionLevel, definitionLevel, columnIndex int) Value
func github.com/polarsignals/frostdb/dynparquet.ValuesForIndex(row Row, index int) []Value
func github.com/polarsignals/frostdb/pqarrow.ArrowScalarToParquetValue(sc scalar.Scalar) (Value, error)
func github.com/polarsignals/frostdb/query/expr.Max(columnIndex ColumnIndex) Value
func github.com/polarsignals/frostdb/query/expr.Min(columnIndex ColumnIndex) Value
func AppendRow(row Row, columns ...[]Value) Row
func DeepEqual(v1, v2 Value) bool
func Equal(v1, v2 Value) bool
func Find(index ColumnIndex, value Value, cmp func(Value, Value) int) int
func MakeRow(columns ...[]Value) Row
func Search(index ColumnIndex, value Value, typ Type) int
func BloomFilter.Check(value Value) (bool, error)
func ColumnBuffer.ReadValuesAt([]Value, int64) (int, error)
func ColumnBuffer.WriteValues([]Value) (int, error)
func ColumnIndexer.IndexPage(numValues, numNulls int64, min, max Value)
func Dictionary.Insert(indexes []int32, values []Value)
func Dictionary.Lookup(indexes []int32, values []Value)
func (*RowBuilder).Add(columnIndex int, columnValue Value)
func Type.AssignValue(dst reflect.Value, src Value) error
func Type.Compare(a, b Value) int
func Type.ConvertValue(val Value, typ Type) (Value, error)
func ValueReader.ReadValues([]Value) (int, error)
func ValueReaderAt.ReadValuesAt([]Value, int64) (int, error)
func ValueReaderFunc.ReadValues(values []Value) (int, error)
func ValueWriter.WriteValues([]Value) (int, error)
func ValueWriterFunc.WriteValues(values []Value) (int, error)
func github.com/polarsignals/frostdb/pqarrow/builder.(*OptBinaryBuilder).AppendParquetValues(values []Value) error
func github.com/polarsignals/frostdb/pqarrow/builder.(*OptBooleanBuilder).AppendParquetValues(values []Value)
func github.com/polarsignals/frostdb/pqarrow/builder.(*OptFloat64Builder).AppendParquetValues(values []Value)
func github.com/polarsignals/frostdb/pqarrow/builder.(*OptInt32Builder).AppendParquetValues(values []Value)
func github.com/polarsignals/frostdb/pqarrow/builder.(*OptInt64Builder).AppendParquetValues(values []Value)
func github.com/polarsignals/frostdb/pqarrow/writer.PageWriter.Write([]Value)
func github.com/polarsignals/frostdb/pqarrow/writer.ValueWriter.Write([]Value)
func github.com/polarsignals/frostdb/query/expr.BinaryScalarOperation(left ColumnChunk, right Value, operator logicalplan.Op) (bool, error)
ValueReader is an interface implemented by types that support reading
batches of values.
Read values into the buffer passed as argument and return the number of
values read. When all values have been read, the error will be io.EOF.
ValueReaderFunc
func Page.Values() ValueReader
func CopyValues(dst ValueWriter, src ValueReader) (int64, error)
func ValueReaderFrom.ReadValuesFrom(ValueReader) (int64, error)
ValueReaderAt is an interface implemented by types that support reading
values at offsets specified by the application.
( ValueReaderAt) ReadValuesAt([]Value, int64) (int, error)
ColumnBuffer (interface)
ValueReaderFrom is an interface implemented by value writers to read values
from a reader.
( ValueReaderFrom) ReadValuesFrom(ValueReader) (int64, error)
ValueReaderFunc is a function type implementing the ValueReader interface.
( ValueReaderFunc) ReadValues(values []Value) (int, error)
ValueReaderFunc : ValueReader
ValueWriter is an interface implemented by types that support reading
batches of values.
Write values from the buffer passed as argument and returns the number
of values written.
ColumnBuffer (interface)
ValueWriterFunc
func CopyValues(dst ValueWriter, src ValueReader) (int64, error)
func ValueWriterTo.WriteValuesTo(ValueWriter) (int64, error)
ValueWriterFunc is a function type implementing the ValueWriter interface.
( ValueWriterFunc) WriteValues(values []Value) (int, error)
ValueWriterFunc : ValueWriter
ValueWriterTo is an interface implemented by value readers to write values to
a writer.
( ValueWriterTo) WriteValuesTo(ValueWriter) (int64, error)
Deprecated: A Writer uses a parquet schema and sequence of Go values to
produce a parquet file to an io.Writer.
This example showcases a typical use of parquet writers:
writer := parquet.NewWriter(output)
for _, row := range rows {
if err := writer.Write(row); err != nil {
...
}
}
if err := writer.Close(); err != nil {
...
}
The Writer type optimizes for minimal memory usage, each page is written as
soon as it has been filled so only a single page per column needs to be held
in memory and as a result, there are no opportunities to sort rows within an
entire row group. Programs that need to produce parquet files with sorted
row groups should use the Buffer type to buffer and sort the rows prior to
writing them to a Writer.
For programs building with Go 1.18 or later, the GenericWriter[T] type
supersedes this one.
Close must be called after all values were produced to the writer in order to
flush all buffers and write the parquet footer.
Flush flushes all buffers into a row group to the underlying io.Writer.
Flush is called automatically on Close, it is only useful to call explicitly
if the application needs to limit the size of row groups or wants to produce
multiple row groups per file.
If the writer attempts to create more than MaxRowGroups row groups the method
returns ErrTooManyRowGroups.
ReadRowsFrom reads rows from the reader passed as arguments and writes them
to w.
This is similar to calling WriteRow repeatedly, but will be more efficient
if optimizations are supported by the reader.
Reset clears the state of the writer without flushing any of the buffers,
and setting the output to the io.Writer passed as argument, allowing the
writer to be reused to produce another parquet file.
Reset may be called at any time, including after a writer was closed.
Schema returns the schema of rows written by w.
The returned value will be nil if no schema has yet been configured on w.
SetKeyValueMetadata sets a key/value pair in the Parquet file metadata.
Keys are assumed to be unique, if the same key is repeated multiple times the
last value is retained. While the parquet format does not require unique keys,
this design decision was made to optimize for the most common use case where
applications leverage this extension mechanism to associate single values to
keys. This may create incompatibilities with other parquet libraries, or may
cause some key/value pairs to be lost when open parquet files written with
repeated keys. We can revisit this decision if it ever becomes a blocker.
Write is called to write another row to the parquet file.
The method uses the parquet schema configured on w to traverse the Go value
and decompose it into a set of columns and values. If no schema were passed
to NewWriter, it is deducted from the Go type of the row, which then have to
be a struct or pointer to struct.
WriteRowGroup writes a row group to the parquet file.
Buffered rows will be flushed prior to writing rows from the group, unless
the row group was empty in which case nothing is written to the file.
The content of the row group is flushed to the writer; after the method
returns successfully, the row group will be empty and in ready to be reused.
WriteRows is called to write rows to the parquet file.
The Writer must have been given a schema when NewWriter was called, otherwise
the structure of the parquet file cannot be determined from the row only.
The row is expected to contain values for each column of the writer's schema,
in the order produced by the parquet.(*Schema).Deconstruct method.
*Writer : RowGroupWriter
*Writer : RowReaderFrom
*Writer : RowWriter
*Writer : RowWriterWithSchema
*Writer : github.com/apache/thrift/lib/go/thrift.Flusher
*Writer : github.com/polarsignals/frostdb.ParquetWriter
*Writer : github.com/prometheus/common/expfmt.Closer
*Writer : io.Closer
func NewWriter(output io.Writer, options ...WriterOption) *Writer
The WriterConfig type carries configuration options for parquet writers.
WriterConfig implements the WriterOption interface so it can be used directly
as argument to the NewWriter function when needed, for example:
writer := parquet.NewWriter(output, schema, &parquet.WriterConfig{
CreatedBy: "my test program",
})
BloomFilters []BloomFilterColumn
ColumnIndexSizeLimit int
ColumnPageBuffers BufferPool
Compression compress.Codec
CreatedBy string
DataPageStatistics bool
DataPageVersion int
KeyValueMetadata map[string]string
MaxRowsPerRowGroup int64
PageBufferSize int
Schema *Schema
SkipPageBounds [][]string
Sorting SortingConfig
WriteBufferSize int
Apply applies the given list of options to c.
ConfigureWriter applies configuration options from c to config.
Validate returns a non-nil error if the configuration of c is invalid.
*WriterConfig : WriterOption
func DefaultWriterConfig() *WriterConfig
func NewWriterConfig(options ...WriterOption) (*WriterConfig, error)
func (*Schema).ConfigureWriter(config *WriterConfig)
func (*WriterConfig).ConfigureWriter(config *WriterConfig)
func WriterOption.ConfigureWriter(*WriterConfig)
WriterOption is an interface implemented by types that carry configuration
options for parquet writers.
( WriterOption) ConfigureWriter(*WriterConfig)
*Schema
*WriterConfig
func BloomFilters(filters ...BloomFilterColumn) WriterOption
func ColumnIndexSizeLimit(sizeLimit int) WriterOption
func ColumnPageBuffers(buffers BufferPool) WriterOption
func Compression(codec compress.Codec) WriterOption
func CreatedBy(application, version, build string) WriterOption
func DataPageStatistics(enabled bool) WriterOption
func DataPageVersion(version int) WriterOption
func KeyValueMetadata(key, value string) WriterOption
func MaxRowsPerRowGroup(numRows int64) WriterOption
func PageBufferSize(size int) WriterOption
func SkipPageBounds(path ...string) WriterOption
func SortingWriterConfig(options ...SortingOption) WriterOption
func WriteBufferSize(size int) WriterOption
func NewGenericWriter[T](output io.Writer, options ...WriterOption) *GenericWriter[T]
func NewSortingWriter[T](output io.Writer, sortRowCount int64, options ...WriterOption) *SortingWriter[T]
func NewWriter(output io.Writer, options ...WriterOption) *Writer
func NewWriterConfig(options ...WriterOption) (*WriterConfig, error)
func Write[T](w io.Writer, rows []T, options ...WriterOption) error
func WriteFile[T](path string, rows []T, options ...WriterOption) error
func (*WriterConfig).Apply(options ...WriterOption)
func github.com/polarsignals/frostdb/dynparquet.(*Schema).NewWriter(w io.Writer, dynamicColumns map[string][]string, sorting bool, options ...WriterOption) (dynparquet.ParquetWriter, error)
Package-Level Functions (total 128)
AppendRow appends to row the given list of column values.
AppendRow can be used to construct a Row value from columns, while retaining
the underlying memory buffer to avoid reallocation; for example:
The function panics if the column indexes of values in each column do not
match their position in the argument list.
Ascending constructs a SortingColumn value which dictates to sort the column
at the path given as argument in ascending order.
AsyncPages wraps the given Pages instance to perform page reads
asynchronously in a separate goroutine.
Performing page reads asynchronously is important when the application may
be reading pages from a high latency backend, and the last
page read may be processed while initiating reading of the next page.
BloomFilters creates a configuration option which defines the bloom filters
that parquet writers should generate.
The compute and memory footprint of generating bloom filters for all columns
of a parquet schema can be significant, so by default no filters are created
and applications need to explicitly declare the columns that they want to
create filters for.
BooleanValue constructs a BOOLEAN parquet value from the bool passed as
argument.
BSON constructs a leaf node of BSON logical type.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#bson
ByteArrayValue constructs a BYTE_ARRAY parquet value from the byte slice
passed as argument.
ColumnBufferCapacity creates a configuration option which defines the size of
row group column buffers.
Defaults to 16384.
ColumnIndexSizeLimit creates a configuration option to customize the size
limit of page boundaries recorded in column indexes.
Defaults to 16.
ColumnPageBuffers creates a configuration option to customize the buffer pool
used when constructing row groups. This can be used to provide on-disk buffers
as swap space to ensure that the parquet file creation will no be bottlenecked
on the amount of memory available.
Defaults to using in-memory buffers.
CompareDescending constructs a comparison function which inverses the order
of values.
CompareNullsFirst constructs a comparison function which assumes that null
values are smaller than all other values.
CompareNullsLast constructs a comparison function which assumes that null
values are greater than all other values.
Compressed wraps the node passed as argument to use the given compression
codec.
If the codec is nil, the node's compression is left unchanged.
The function panics if it is called on a non-leaf node.
Compression creates a configuration option which sets the default compression
codec used by a writer for columns where none were defined.
Convert constructs a conversion function from one parquet schema to another.
The function supports converting between schemas where the source or target
have extra columns; if there are more columns in the source, they will be
stripped out of the rows. Extra columns in the target schema will be set to
null or zero values.
The returned function is intended to be used to append the converted source
row to the destination buffer.
ConvertRowGroup constructs a wrapper of the given row group which applies
the given schema conversion to its rows.
ConvertRowReader constructs a wrapper of the given row reader which applies
the given schema conversion to the rows.
CopyPages copies pages from src to dst, returning the number of values that
were copied.
The function returns any error it encounters reading or writing pages, except
for io.EOF from the reader which indicates that there were no more pages to
read.
CopyRows copies rows from src to dst.
The underlying types of src and dst are tested to determine if they expose
information about the schema of rows that are read and expected to be
written. If the schema information are available but do not match, the
function will attempt to automatically convert the rows from the source
schema to the destination.
As an optimization, the src argument may implement RowWriterTo to bypass
the default row copy logic and provide its own. The dst argument may also
implement RowReaderFrom for the same purpose.
The function returns the number of rows written, or any error encountered
other than io.EOF.
CopyValues copies values from src to dst, returning the number of values
that were written.
As an optimization, the reader and writer may choose to implement
ValueReaderFrom and ValueWriterTo to provide their own copy logic.
The function returns any error it encounters reading or writing pages, except
for io.EOF from the reader which indicates that there were no more values to
read.
CreatedBy creates a configuration option which sets the name of the
application that created a parquet file.
The option formats the "CreatedBy" file metadata according to the convention
described by the parquet spec:
"<application> version <version> (build <build>)"
By default, the option is set to the parquet-go module name, version, and
build hash.
DataPageStatistics creates a configuration option which defines whether data
page statistics are emitted. This option is useful when generating parquet
files that intend to be backward compatible with older readers which may not
have the ability to load page statistics from the column index.
Defaults to false.
DataPageVersion creates a configuration option which configures the version of
data pages used when creating a parquet file.
Defaults to version 2.
Date constructs a leaf node of DATE logical type.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#date
Decimal constructs a leaf node of decimal logical type with the given
scale, precision, and underlying type.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
DedupeRowReader constructs a row reader which drops duplicated consecutive
rows, according to the comparator function passed as argument.
If the underlying reader produces a sequence of rows sorted by the same
comparison predicate, the output is guaranteed to produce unique rows only.
DedupeRowWriter constructs a row writer which drops duplicated consecutive
rows, according to the comparator function passed as argument.
If the writer is given a sequence of rows sorted by the same comparison
predicate, the output is guaranteed to contain unique rows only.
DeepEqual returns true if v1 and v2 are equal, including their repetition
levels, definition levels, and column indexes.
See Equal for details about how value equality is determined.
DefaultFileConfig returns a new FileConfig value initialized with the
default file configuration.
DefaultReaderConfig returns a new ReaderConfig value initialized with the
default reader configuration.
DefaultRowGroupConfig returns a new RowGroupConfig value initialized with the
default row group configuration.
DefaultSortingConfig returns a new SortingConfig value initialized with the
default row group configuration.
DefaultWriterConfig returns a new WriterConfig value initialized with the
default writer configuration.
Descending constructs a SortingColumn value which dictates to sort the column
at the path given as argument in descending order.
DoubleValue constructs a DOUBLE parquet value from the float64 passed as
argument.
DropDuplicatedRows configures whether a sorting writer will keep or remove
duplicated rows.
Two rows are considered duplicates if the values of their all their sorting
columns are equal.
Defaults to false
Encoded wraps the node passed as argument to use the given encoding.
The function panics if it is called on a non-leaf node, or if the
encoding does not support the node type.
Enum constructs a leaf node with a logical type representing enumerations.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#enum
Equal returns true if v1 and v2 are equal.
Values are considered equal if they are of the same physical type and hold
the same Go values. For BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY, the content of
the underlying byte arrays are tested for equality.
Note that the repetition levels, definition levels, and column indexes are
not compared by this function, use DeepEqual instead.
FieldID wraps a node to provide node field id
FileReadMode is a file configuration option which controls the way pages
are read. Currently the only two options are ReadModeAsync and ReadModeSync
which control whether or not pages are loaded asynchronously. It can be
advantageous to use ReadModeAsync if your reader is backed by network
storage.
Defaults to ReadModeSync.
FileSchema is used to pass a known schema in while opening a Parquet file.
This optimization is only useful if your application is currently opening
an extremely large number of parquet files with the same, known schema.
Defaults to nil.
FilterRowReader constructs a RowReader which exposes rows from reader for
which the predicate has returned true.
FilterRowWriter constructs a RowWriter which writes rows to writer for which
the predicate has returned true.
Find uses the ColumnIndex passed as argument to find the page in a column
chunk (determined by the given ColumnIndex) that the given value is expected
to be found in.
The function returns the index of the first page that might contain the
value. If the function determines that the value does not exist in the
index, NumPages is returned.
If you want to search the entire parquet file, you must iterate over the
RowGroups and search each one individually, if there are multiple in the
file. If you call writer.Flush before closing the file, then you will have
multiple RowGroups to iterate over, otherwise Flush is called once on Close.
The comparison function passed as last argument is used to determine the
relative order of values. This should generally be the Compare method of
the column type, but can sometimes be customized to modify how null values
are interpreted, for example:
pageIndex := parquet.Find(columnIndex, value,
parquet.CompareNullsFirst(typ.Compare),
)
FixedLenByteArrayType constructs a type for fixed-length values of the given
size (in bytes).
FixedLenByteArrayValue constructs a BYTE_ARRAY parquet value from the byte
slice passed as argument.
FloatValue constructs a FLOAT parquet value from the float32 passed as
argument.
Int constructs a leaf node of signed integer logical type of the given bit
width.
The bit width must be one of 8, 16, 32, 64, or the function will panic.
Int32Value constructs a INT32 parquet value from the int32 passed as
argument.
Int64Value constructs a INT64 parquet value from the int64 passed as
argument.
Int96Value constructs a INT96 parquet value from the deprecated.Int96 passed
as argument.
JSON constructs a leaf node of JSON logical type.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#json
KeyValueMetadata creates a configuration option which adds key/value metadata
to add to the metadata of parquet files.
This option is additive, it may be used multiple times to add more than one
key/value pair.
Keys are assumed to be unique, if the same key is repeated multiple times the
last value is retained. While the parquet format does not require unique keys,
this design decision was made to optimize for the most common use case where
applications leverage this extension mechanism to associate single values to
keys. This may create incompatibilities with other parquet libraries, or may
cause some key/value pairs to be lost when open parquet files written with
repeated keys. We can revisit this decision if it ever becomes a blocker.
Leaf returns a leaf node of the given type.
List constructs a node of LIST logical type.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists
LookupCompressionCodec returns the compression codec associated with the
given code.
The function never returns nil. If the encoding is not supported,
an "unsupported" codec is returned.
LookupEncoding returns the parquet encoding associated with the given code.
The function never returns nil. If the encoding is not supported,
encoding.NotSupported is returned.
MakeRow constructs a Row from a list of column values.
The function panics if the column indexes of values in each column do not
match their position in the argument list.
Map constructs a node of MAP logical type.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps
MaxRowsPerRowGroup configures the maximum number of rows that a writer will
produce in each row group.
This limit is useful to control size of row groups in both number of rows and
byte size. While controlling the byte size of a row group is difficult to
achieve with parquet due to column encoding and compression, the number of
rows remains a useful proxy.
Defaults to unlimited.
MergeRowGroups constructs a row group which is a merged view of rowGroups. If
rowGroups are sorted and the passed options include sorting, the merged row
group will also be sorted.
The function validates the input to ensure that the merge operation is
possible, ensuring that the schemas match or can be converted to an
optionally configured target schema passed as argument in the option list.
The sorting columns of each row group are also consulted to determine whether
the output can be represented. If sorting columns are configured on the merge
they must be a prefix of sorting columns of all row groups being merged.
MergeRowReader constructs a RowReader which creates an ordered sequence of
all the readers using the given compare function as the ordering predicate.
MultiRowGroup wraps multiple row groups to appear as if it was a single
RowGroup. RowGroups must have the same schema or it will error.
MultiRowWriter constructs a RowWriter which dispatches writes to all the
writers passed as arguments.
When writing rows, if any of the writers returns an error, the operation is
aborted and the error returned. If one of the writers did not error, but did
not write all the rows, the operation is aborted and io.ErrShortWrite is
returned.
Rows are written sequentially to each writer in the order they are given to
this function.
NewBuffer constructs a new buffer, using the given list of buffer options
to configure the buffer returned by the function.
The function panics if the buffer configuration is invalid. Programs that
cannot guarantee the validity of the options passed to NewBuffer should
construct the buffer configuration independently prior to calling this
function:
config, err := parquet.NewRowGroupConfig(options...)
if err != nil {
// handle the configuration error
...
} else {
// this call to create a buffer is guaranteed not to panic
buffer := parquet.NewBuffer(config)
...
}
NewBufferPool creates a new in-memory page buffer pool.
The implementation is backed by sync.Pool and allocates memory buffers on the
Go heap.
NewChunkBufferPool creates a new in-memory page buffer pool.
The implementation is backed by sync.Pool and allocates memory buffers on the
Go heap in fixed-size chunks.
NewColumnIndex constructs a ColumnIndex instance from the given parquet
format column index. The kind argument configures the type of values
NewFileBufferPool creates a new on-disk page buffer pool.
NewFileConfig constructs a new file configuration applying the options passed
as arguments.
The function returns an non-nil error if some of the options carried invalid
configuration values.
Type Parameters:
T: any
NewGenericBuffer is like NewBuffer but returns a GenericBuffer[T] suited to write
rows of Go type T.
The type parameter T should be a map, struct, or any. Any other types will
cause a panic at runtime. Type checking is a lot more effective when the
generic parameter is a struct type, using map and interface types is somewhat
similar to using a Writer. If using an interface type for the type parameter,
then providing a schema at instantiation is required.
If the option list may explicitly declare a schema, it must be compatible
with the schema generated from T.
Type Parameters:
T: any
NewGenericReader is like NewReader but returns GenericReader[T] suited to write
rows of Go type T.
The type parameter T should be a map, struct, or any. Any other types will
cause a panic at runtime. Type checking is a lot more effective when the
generic parameter is a struct type, using map and interface types is somewhat
similar to using a Writer.
If the option list may explicitly declare a schema, it must be compatible
with the schema generated from T.
Type Parameters:
T: any
Type Parameters:
T: any
NewGenericWriter is like NewWriter but returns a GenericWriter[T] suited to
write rows of Go type T.
The type parameter T should be a map, struct, or any. Any other types will
cause a panic at runtime. Type checking is a lot more effective when the
generic parameter is a struct type, using map and interface types is somewhat
similar to using a Writer.
If the option list may explicitly declare a schema, it must be compatible
with the schema generated from T.
Sorting columns may be set on the writer to configure the generated row
groups metadata. However, rows are always written in the order they were
seen, no reordering is performed, the writer expects the application to
ensure proper correlation between the order of rows and the list of sorting
columns. See SortingWriter[T] for a writer which handles reordering rows
based on the configured sorting columns.
NewReader constructs a parquet reader reading rows from the given
io.ReaderAt.
In order to read parquet rows, the io.ReaderAt must be converted to a
parquet.File. If r is already a parquet.File it is used directly; otherwise,
the io.ReaderAt value is expected to either have a `Size() int64` method or
implement io.Seeker in order to determine its size.
The function panics if the reader configuration is invalid. Programs that
cannot guarantee the validity of the options passed to NewReader should
construct the reader configuration independently prior to calling this
function:
config, err := parquet.NewReaderConfig(options...)
if err != nil {
// handle the configuration error
...
} else {
// this call to create a reader is guaranteed not to panic
reader := parquet.NewReader(input, config)
...
}
NewReaderConfig constructs a new reader configuration applying the options
passed as arguments.
The function returns an non-nil error if some of the options carried invalid
configuration values.
Type Parameters:
T: any
NewRowBuffer constructs a new row buffer.
NewRowBuilder constructs a RowBuilder which builds rows for the parquet
schema passed as argument.
NewRowGroupConfig constructs a new row group configuration applying the
options passed as arguments.
The function returns an non-nil error if some of the options carried invalid
configuration values.
NewRowGroupReader constructs a new Reader which reads rows from the RowGroup
passed as argument.
func NewRowGroupRowReader(rowGroup RowGroup) Rows
NewSchema constructs a new Schema object with the given name and root node.
The function panics if Node contains more leaf columns than supported by the
package (see parquet.MaxColumnIndex).
NewSortingConfig constructs a new sorting configuration applying the
options passed as arguments.
The function returns an non-nil error if some of the options carried invalid
configuration values.
Type Parameters:
T: any
NewSortingWriter constructs a new sorting writer which writes a parquet file
where rows of each row group are ordered according to the sorting columns
configured on the writer.
The sortRowCount argument defines the target number of rows that will be
sorted in memory before being written to temporary row groups. The greater
this value the more memory is needed to buffer rows in memory. Choosing a
value that is too small limits the maximum number of rows that can exist in
the output file since the writer cannot create more than 32K temporary row
groups to hold the sorted row chunks.
NewWriter constructs a parquet writer writing a file to the given io.Writer.
The function panics if the writer configuration is invalid. Programs that
cannot guarantee the validity of the options passed to NewWriter should
construct the writer configuration independently prior to calling this
function:
config, err := parquet.NewWriterConfig(options...)
if err != nil {
// handle the configuration error
...
} else {
// this call to create a writer is guaranteed not to panic
writer := parquet.NewWriter(output, config)
...
}
NewWriterConfig constructs a new writer configuration applying the options
passed as arguments.
The function returns an non-nil error if some of the options carried invalid
configuration values.
NullsFirst wraps the SortingColumn passed as argument so that it instructs
the row group to place null values first in the column.
NulLValue constructs a null value, which is the zero-value of the Value type.
OpenFile opens a parquet file and reads the content between offset 0 and the given
size in r.
Only the parquet magic bytes and footer are read, column chunks and other
parts of the file are left untouched; this means that successfully opening
a file does not validate that the pages have valid checksums.
Optional wraps the given node to make it optional.
PageBufferSize configures the size of column page buffers on parquet writers.
Note that the page buffer size refers to the in-memory buffers where pages
are generated, not the size of pages after encoding and compression.
This design choice was made to help control the amount of memory needed to
read and write pages rather than controlling the space used by the encoded
representation on disk.
Defaults to 256KiB.
func PrintColumnChunk(w io.Writer, columnChunk ColumnChunk) error func PrintRowGroup(w io.Writer, rowGroup RowGroup) error
Type Parameters:
T: any
Read reads and returns rows from the parquet file in the given reader.
The type T defines the type of rows read from r. T must be compatible with
the file's schema or an error will be returned. The row type might represent
a subset of the full schema, in which case only a subset of the columns will
be loaded from r.
This function is provided for convenience to facilitate reading of parquet
files from arbitrary locations in cases where the data set fit in memory.
ReadBufferSize is a file configuration option which controls the default
buffer sizes for reads made to the provided io.Reader. The default of 4096
is appropriate for disk based access but if your reader is backed by network
storage it can be advantageous to increase this value to something more like
4 MiB.
Defaults to 4096.
Type Parameters:
T: any
ReadFile reads rows of the parquet file at the given path.
The type T defines the type of rows read from r. T must be compatible with
the file's schema or an error will be returned. The row type might represent
a subset of the full schema, in which case only a subset of the columns will
be loaded from the file.
This function is provided for convenience to facilitate reading of parquet
files from the file system in cases where the data set fit in memory.
Release is a helper function to decrement the reference counter of pages
backed by memory which can be granularly managed by the application.
Usage of this is optional and with Retain, is intended to allow finer grained
memory management in the application, at the expense of potentially causing
panics if the page is used after its reference count has reached zero. Most
programs should be able to rely on automated memory management provided by
the Go garbage collector instead.
The function should be called to return a page to the internal buffer pool,
when a goroutine "releases ownership" it acquired either by being the single
owner (e.g. capturing the return value from a ReadPage call) or having gotten
shared ownership by calling Retain.
Calling this function on pages that do not embed a reference counter does
nothing.
Repeated wraps the given node to make it repeated.
Required wraps the given node to make it required.
Retain is a helper function to increment the reference counter of pages
backed by memory which can be granularly managed by the application.
Usage of this function is optional and with Release, is intended to allow
finer grain memory management in the application. Most programs should be
able to rely on automated memory management provided by the Go garbage
collector instead.
The function should be called when a page lifetime is about to be shared
between multiple goroutines or layers of an application, and the program
wants to express "sharing ownership" of the page.
Calling this function on pages that do not embed a reference counter does
nothing.
ScanRowReader constructs a RowReader which exposes rows from reader until
the predicate returns false for one of the rows, or EOF is reached.
SchemaOf constructs a parquet schema from a Go value.
The function can construct parquet schemas from struct or pointer-to-struct
values only. A panic is raised if a Go value of a different type is passed
to this function.
When creating a parquet Schema from a Go value, the struct fields may contain
a "parquet" tag to describe properties of the parquet node. The "parquet" tag
follows the conventional format of Go struct tags: a comma-separated list of
values describe the options, with the first one defining the name of the
parquet column.
The following options are also supported in the "parquet" struct tag:
optional | make the parquet column optional
snappy | sets the parquet column compression codec to snappy
gzip | sets the parquet column compression codec to gzip
brotli | sets the parquet column compression codec to brotli
lz4 | sets the parquet column compression codec to lz4
zstd | sets the parquet column compression codec to zstd
plain | enables the plain encoding (no-op default)
dict | enables dictionary encoding on the parquet column
delta | enables delta encoding on the parquet column
list | for slice types, use the parquet LIST logical type
enum | for string types, use the parquet ENUM logical type
uuid | for string and [16]byte types, use the parquet UUID logical type
decimal | for int32, int64 and [n]byte types, use the parquet DECIMAL logical type
date | for int32 types use the DATE logical type
timestamp | for int64 types use the TIMESTAMP logical type with, by default, millisecond precision
split | for float32/float64, use the BYTE_STREAM_SPLIT encoding
id(n) | where n is int denoting a column field id. Example id(2) for a column with field id of 2
# The date logical type is an int32 value of the number of days since the unix epoch
The timestamp precision can be changed by defining which precision to use as an argument.
Supported precisions are: nanosecond, millisecond and microsecond. Example:
type Message struct {
TimestrampMicros int64 `parquet:"timestamp_micros,timestamp(microsecond)"
}
The decimal tag must be followed by two integer parameters, the first integer
representing the scale and the second the precision; for example:
type Item struct {
Cost int64 `parquet:"cost,decimal(0:3)"`
}
Invalid combination of struct tags and Go types, or repeating options will
cause the function to panic.
As a special case, if the field tag is "-", the field is omitted from the schema
and the data will not be written into the parquet file(s).
Note that a field with name "-" can still be generated using the tag "-,".
The configuration of Parquet maps are done via two tags:
- The `parquet-key` tag allows to configure the key of a map.
- The parquet-value tag allows users to configure a map's values, for example to declare their native Parquet types.
When configuring a Parquet map, the `parquet` tag will configure the map itself.
For example, the following will set the int64 key of the map to be a timestamp:
type Actions struct {
Action map[int64]string `parquet:"," parquet-key:",timestamp"`
}
The schema name is the Go type name of the value.
Search is like Find, but uses the default ordering of the given type. Search
and Find are scoped to a given ColumnChunk and find the pages within a
ColumnChunk which might contain the result. See Find for more details.
SkipBloomFilters is a file configuration option which prevents automatically
reading the bloom filters when opening a parquet file, when set to true.
This is useful as an optimization when programs know that they will not need
to consume the bloom filters.
Defaults to false.
SkipPageBounds lists the path to a column that shouldn't have bounds written to the
footer of the parquet file. This is useful for data blobs, like a raw html file,
where the bounds are not meaningful.
This option is additive, it may be used multiple times to skip multiple columns.
SkipPageIndex is a file configuration option which prevents automatically
reading the page index when opening a parquet file, when set to true. This is
useful as an optimization when programs know that they will not need to
consume the page index.
Defaults to false.
SortingBuffers creates a configuration option which sets the pool of buffers
used to hold intermediary state when sorting parquet rows.
Defaults to using in-memory buffers.
SortingColumns creates a configuration option which defines the sorting order
of columns in a row group.
The order of sorting columns passed as argument defines the ordering
hierarchy; when elements are equal in the first column, the second column is
used to order rows, etc...
SortingRowGroupConfig is a row group option which applies configuration
specific sorting row groups.
SortingWriterConfig is a writer option which applies configuration specific
to sorting writers.
SplitBlockFilter constructs a split block bloom filter object for the column
at the given path, with the given bitsPerValue.
If you are unsure what number of bitsPerValue to use, 10 is a reasonable
tradeoff between size and error rate for common datasets.
For more information on the tradeoff between size and error rate, consult
this website: https://hur.st/bloomfilter/?n=4000&p=0.1&m=&k=1
String constructs a leaf node of UTF8 logical type.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#string
Time constructs a leaf node of TIME logical type.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#time
Timestamp constructs of leaf node of TIMESTAMP logical type.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp
TransformRowReader constructs a RowReader which applies the given transform
to each row rad from reader.
The transformation function appends the transformed src row to dst, returning
dst and any error that occurred during the transformation. If dst is returned
unchanged, the row is skipped.
TransformRowWriter constructs a RowWriter which applies the given transform
to each row writter to writer.
The transformation function appends the transformed src row to dst, returning
dst and any error that occurred during the transformation. If dst is returned
unchanged, the row is skipped.
Uint constructs a leaf node of unsigned integer logical type of the given
bit width.
The bit width must be one of 8, 16, 32, 64, or the function will panic.
UUID constructs a leaf node of UUID logical type.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#uuid
ValueOf constructs a parquet value from a Go value v.
The physical type of the value is assumed from the Go type of v using the
following conversion table:
Go type | Parquet physical type
------- | ---------------------
nil | NULL
bool | BOOLEAN
int8 | INT32
int16 | INT32
int32 | INT32
int64 | INT64
int | INT64
uint8 | INT32
uint16 | INT32
uint32 | INT32
uint64 | INT64
uintptr | INT64
float32 | FLOAT
float64 | DOUBLE
string | BYTE_ARRAY
[]byte | BYTE_ARRAY
[*]byte | FIXED_LEN_BYTE_ARRAY
When converting a []byte or [*]byte value, the underlying byte array is not
copied; instead, the returned parquet value holds a reference to it.
The repetition and definition levels of the returned value are both zero.
The function panics if the Go value cannot be represented in parquet.
Type Parameters:
T: any
Write writes the given list of rows to a parquet file written to w.
This function is provided for convenience to facilitate the creation of
parquet files.
WriteBufferSize configures the size of the write buffer.
Setting the writer buffer size to zero deactivates buffering, all writes are
immediately sent to the output io.Writer.
Defaults to 32KiB.
Type Parameters:
T: any
Write writes the given list of rows to a parquet file written to w.
This function is provided for convenience to facilitate writing parquet
files to the file system.
ZeroValue constructs a zero value of the given kind.
Package-Level Variables (total 39)
BitPacked is the deprecated bit-packed encoding for repetition and
definition levels.
var BooleanType Type
Brotli is the BROTLI parquet compression codec.
var ByteArrayType Type
ByteStreamSplit is an encoding for floating-point data.
DeltaBinaryPacked is the delta binary packed parquet encoding.
DeltaByteArray is the delta byte array parquet encoding.
DeltaLengthByteArray is the delta length byte array parquet encoding.
var DoubleType Type
ErrCorrupted is an error returned by the Err method of ColumnPages
instances when they encountered a mismatch between the CRC checksum
recorded in a page header and the one computed while reading the page
data.
ErrConversion is used to indicate that a conversion betwen two values
cannot be done because there are no rules to translate between their
physical types.
ErrMissingPageHeader is an error returned when a page reader encounters
a malformed page header which is missing page-type-specific information.
ErrMissingRootColumn is an error returned when opening an invalid parquet
file which does not have a root column.
ErrRowGroupSchemaMismatch is an error returned when attempting to write a
row group but the source and destination schemas differ.
ErrRowGroupSchemaMissing is an error returned when attempting to write a
row group but the source has no schema.
ErrRowGroupSortingColumnsMismatch is an error returned when attempting to
write a row group but the sorting columns differ in the source and
destination.
ErrSeekOutOfRange is an error returned when seeking to a row index which
is less than the first row of a page.
ErrTooManyRowGroups is returned when attempting to generate a parquet
file with more than MaxRowGroups row groups.
ErrUnexpectedDefinitionLevels is an error returned when attempting to
decode definition levels into a page which is part of a required column.
ErrUnexpectedDictionaryPage is an error returned when a page reader
encounters a dictionary page after the first page, or in a column
which does not use a dictionary encoding.
ErrUnexpectedRepetitionLevels is an error returned when attempting to
decode repetition levels into a page which is not part of a repeated
column.
Gzip is the GZIP parquet compression codec.
Lz4Raw is the LZ4_RAW parquet compression codec.
var Microsecond TimeUnit var Millisecond TimeUnit var Nanosecond TimeUnit
Plain is the default parquet encoding.
PlainDictionary is the plain dictionary parquet encoding.
This encoding should not be used anymore in parquet 2.0 and later,
it is implemented for backwards compatibility to support reading
files that were encoded with older parquet libraries.
RLE is the hybrid bit-pack/run-length parquet encoding.
RLEDictionary is the RLE dictionary parquet encoding.
Snappy is the SNAPPY parquet compression codec.
Uncompressed is a parquet compression codec representing uncompressed
pages.
Zstd is the ZSTD parquet compression codec.
Package-Level Constants (total 25)
const DefaultColumnBufferCapacity = 16384 const DefaultColumnIndexSizeLimit = 16 const DefaultDataPageStatistics = false const DefaultDataPageVersion = 2 const DefaultMaxRowsPerRowGroup = 9223372036854775807 const DefaultPageBufferSize = 262144 const DefaultReadMode ReadMode = 0 const DefaultSkipBloomFilters = false const DefaultSkipPageIndex = false const DefaultWriteBufferSize = 32768 const FixedLenByteArray Kind = 7
MaxColumnDepth is the maximum column depth supported by this package.
MaxColumnIndex is the maximum column index supported by this package.
MaxDefinitionLevel is the maximum definition level supported by this
package.
MaxRepetitionLevel is the maximum repetition level supported by this
package.
MaxRowGroups is the maximum number of row groups which can be contained
in a single parquet file.
This limit is enforced by the use of 16 bits signed integers in the file
metadata footer of parquet files. It is part of the parquet specification
and therefore cannot be changed.
const ReadModeAsync ReadMode = 1 // ReadModeAsync reads pages asynchronously in the background. const ReadModeSync ReadMode = 0 // ReadModeSync reads pages synchronously on demand (Default).![]() |
The pages are generated with Golds v0.8.2. (GOOS=linux GOARCH=amd64) Golds is a Go 101 project developed by Tapir Liu. PR and bug reports are welcome and can be submitted to the issue list. Please follow @zigo_101 (reachable from the left QR code) to get the latest news of Golds. |