package parquet
Import Path
github.com/apache/arrow-go/v18/parquet (on go.dev)
Dependency Relation
imports 15 packages, and imported by one package
Involved Source Files
Package parquet provides an implementation of Apache Parquet for Go.
Apache Parquet is an open-source columnar data storage format using the record
shredding and assembly algorithm to accommodate complex data structures which
can then be used to efficiently store the data.
While the go.mod states go1.18, everything here should be compatible
with go versions 1.17 and 1.16.
This implementation is a native go implementation for reading and writing the
parquet file format.
# Install
You can download the library and cli utilities via:
go get -u github.com/apache/arrow-go/v18/parquet
go install github.com/apache/arrow-go/v18/parquet/cmd/parquet_reader@latest
go install github.com/apache/arrow-go/v18/parquet/cmd/parquet_schema@latest
# Modules
This top level parquet package contains the basic common types and reader/writer
properties along with some utilities that are used throughout the other modules.
The file module contains the functions for directly reading/writing parquet files
including Column Readers and Column Writers.
The metadata module contains the types for managing the lower level file/rowgroup/column
metadata inside of a ParquetFile including inspecting the statistics.
The pqarrow module contains helper functions and types for converting directly
between Parquet and Apache Arrow formats.
The schema module contains the types for manipulating / inspecting / creating
parquet file schemas.
# Primitive Types
The Parquet Primitive Types and their corresponding Go types are Boolean (bool),
Int32 (int32), Int64 (int64), Int96 (parquet.Int96), Float (float32), Double (float64),
ByteArray (parquet.ByteArray) and FixedLenByteArray (parquet.FixedLenByteArray).
# Encodings
The encoding types supported in this package are:
- Plain
- Plain/RLE Dictionary
- Delta Binary Packed (only integer types)
- Delta Byte Array (only ByteArray)
- Delta Length Byte Array (only ByteArray)
- Byte Stream Split (Float, Double, Int32, Int64, FixedLenByteArray)
Tip: Some platforms don't necessarily support all kinds of encodings. If you're not
sure what to use, just use Plain and Dictionary encoding.
encryption_properties.go
reader_properties.go
types.go
version_string.go
writer_properties.go
Code Examples
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
package main
import (
"bytes"
"context"
"fmt"
"log"
"github.com/apache/arrow-go/v18/arrow"
"github.com/apache/arrow-go/v18/arrow/array"
"github.com/apache/arrow-go/v18/arrow/memory"
"github.com/apache/arrow-go/v18/parquet"
"github.com/apache/arrow-go/v18/parquet/compress"
"github.com/apache/arrow-go/v18/parquet/file"
"github.com/apache/arrow-go/v18/parquet/pqarrow"
)
// In a real project, this should be tuned based on memory usage and performance needs
const batchSize = 2
// List of fields to read
var colNames = []string{"intField", "stringField", "listField"}
func main() {
// --- Phase 1: Writing parquet file ---
// Create an in-memory buffer to simulate a file
// For writing real file to disk, use os.Create instead
buffer := &bytes.Buffer{}
// Create a schema with three fields
fields := []arrow.Field{
{Name: "intField", Type: arrow.PrimitiveTypes.Int32, Nullable: false},
{Name: "stringField", Type: arrow.BinaryTypes.String, Nullable: false},
{Name: "listField", Type: arrow.ListOf(arrow.PrimitiveTypes.Float32), Nullable: false},
}
schema := arrow.NewSchema(fields, nil)
// Create parquet writer props with snappy compression
writerProps := parquet.NewWriterProperties(
parquet.WithCompression(compress.Codecs.Snappy),
)
// WithStoreSchema embeds the original Arrow schema into the Parquet file metadata,
// allowing it to be accurately restored when reading. This ensures correct handling
// of advanced types like dictionaries and improves cross-language type consistency
// in libraries that support the "ARROW:schema" metadata.
arrowWriterProps := pqarrow.NewArrowWriterProperties(
pqarrow.WithStoreSchema(),
)
// Create a parquet writer
writer, err := pqarrow.NewFileWriter(schema, buffer, writerProps, arrowWriterProps)
if err != nil {
log.Fatalf("Failed create parquet writer: %v", err)
}
// Create a record builder
recordBuilder := array.NewRecordBuilder(memory.DefaultAllocator, schema)
// Create a builder for each field
intFieldBuilder := recordBuilder.Field(0).(*array.Int32Builder)
stringFieldBuilder := recordBuilder.Field(1).(*array.StringBuilder)
listFieldBuilder := recordBuilder.Field(2).(*array.ListBuilder)
// Get the builder for the list's values (Float32)
fl32Builder := listFieldBuilder.ValueBuilder().(*array.Float32Builder)
// Append values for each field
intFieldBuilder.AppendValues([]int32{38, 13, 53, 93, 66}, nil)
stringFieldBuilder.AppendValues([]string{"val1", "val2", "val3", "val4", "val5"}, nil)
// Append five lists, each containing the same float32 values
for i := 0; i < 5; i++ {
listFieldBuilder.Append(true)
fl32Builder.AppendValues([]float32{1.0, 2.0, 4.0, 8.0}, nil)
}
// Create a record
record := recordBuilder.NewRecord()
if err := writer.Write(record); err != nil {
log.Fatalf("Failed to write record: %v", err)
}
record.Release()
recordBuilder.Release()
// IMPORTANT: Close the writer to finalize the file
if err := writer.Close(); err != nil {
log.Fatalf("Failed to close parquet writer: %v", err)
}
// --- Phase 2: Reading parquet file ---
// Create a Parquet reader from the in-memory buffer
// For reading real file from disk, use file.OpenParquetFile() instead
fileReader, err := file.NewParquetReader(bytes.NewReader(buffer.Bytes()))
if err != nil {
log.Fatalf("Failed to create parquet reader: %v", err)
}
defer func() {
if err := fileReader.Close(); err != nil {
log.Printf("Failed to close file reader: %v", err)
}
}()
// Create arrow read props, specifying the batch size
arrowReadProps := pqarrow.ArrowReadProperties{BatchSize: batchSize}
// Create an arrow reader for the parquet file
arrowReader, err := pqarrow.NewFileReader(fileReader, arrowReadProps, memory.DefaultAllocator)
if err != nil {
log.Fatalf("Failed to create arrow reader: %v", err)
}
// Get the arrow schema from the file reader
schema, err = arrowReader.Schema()
if err != nil {
log.Fatalf("Failed to get schema: %v", err)
}
// colIndices can be nil to read all columns. Here, we specify which columns to read
colIndices := make([]int, len(colNames))
for idx := range colNames {
colIndices[idx] = schema.FieldIndices(colNames[idx])[0]
}
// Get a record reader from the file to iterate over
recordReader, err := arrowReader.GetRecordReader(context.TODO(), colIndices, nil)
if err != nil {
log.Fatalf("Failed to get record reader: %v", err)
}
defer recordReader.Release()
for recordReader.Next() {
// Create a record
record := recordReader.Record()
// Get columns
intCol := record.Column(0).(*array.Int32)
stringCol := record.Column(1).(*array.String)
listCol := record.Column(2).(*array.List)
listValueCol := listCol.ListValues().(*array.Float32)
// Iterate over the rows within the current record
for idx := range int(record.NumRows()) {
// For the list column, get the start and end offsets for the current row
start, end := listCol.ValueOffsets(idx)
fmt.Printf("%d %s %v\n", intCol.Value(idx), stringCol.Value(idx), listValueCol.Float32Values()[start:end])
}
}
}
Package-Level Type Names (total 32)
AADPrefixVerifier is an interface for any object that can be used to verify the identity of the file being decrypted.
It should panic if the provided AAD identity is bad.
In a data set, AAD Prefixes should be collected, and then checked for missing files.
Verify identity of file. panic if bad
func WithPrefixVerifier(verifier AADPrefixVerifier) FileDecryptionOption
Algorithm describes how something was encrypted, representing the EncryptionAlgorithm object from the
parquet.thrift file.
Aad struct{AadPrefix []byte; AadFileUnique []byte; SupplyAadPrefix bool}
Algo Cipher
ToThrift returns an instance to be used for serializing when writing a file.
func AlgorithmFromThrift(enc *format.EncryptionAlgorithm) (ret Algorithm)
func (*FileEncryptionProperties).Algorithm() Algorithm
( BufferedReader) BufferSize() int
( BufferedReader) Discard(int) (int, error)
( BufferedReader) Outer() utils.Reader
( BufferedReader) Peek(int) ([]byte, error)
( BufferedReader) Read(p []byte) (n int, err error)
( BufferedReader) Reset(utils.Reader)
BufferedReader : io.Reader
func (*ReaderProperties).GetStream(source io.ReaderAt, start, nbytes int64) (BufferedReader, error)
ByteArray is a type to be utilized for representing the Parquet ByteArray physical type, represented as a byte slice
( ByteArray) Bytes() []byte
Len returns the current length of the ByteArray, equivalent to len(bytearray)
String returns a string representation of the ByteArray
ByteArray : github.com/apache/arrow-go/v18/internal/hashing.ByteSlice
ByteArray : expvar.Var
ByteArray : fmt.Stringer
Cipher is the parquet Cipher Algorithms
func WithAlg(cipher Cipher) EncryptOption
const AesCtr
const AesGcm
const DefaultEncryptionAlgorithm
ColumnDecryptionProperties are the specifications for how to decrypt a given column.
Clone returns a new instance of ColumnDecryptionProperties with the same key and column
ColumnPath returns which column these properties describe how to decrypt
IsUtilized returns whether or not these properties have been used for decryption already
Key returns the key specified to decrypt this column, or is empty if the Footer Key should be used.
SetUtilized is used by the reader to specify when we've decrypted the column and have used the key so we know
to wipe out the keys.
WipeOutDecryptionKey is called after decryption to ensure the key doesn't stick around and get re-used.
func NewColumnDecryptionProperties(column string, opts ...ColumnDecryptOption) *ColumnDecryptionProperties
func (*ColumnDecryptionProperties).Clone() *ColumnDecryptionProperties
ColumnDecryptOption is the type of the options passed for constructing Decryption Properties
func WithDecryptKey(key string) ColumnDecryptOption
func NewColumnDecryptionProperties(column string, opts ...ColumnDecryptOption) *ColumnDecryptionProperties
ColumnEncryptionProperties specifies how to encrypt a given column
Clone returns a instance of ColumnEncryptionProperties with the same key and metadata
ColumnPath returns which column these properties are for
IsEncrypted returns true if this column is encrypted.
IsUtilized returns whether or not these properties have already been used, if the key is empty
then this is always false
Key returns the key used for encrypting this column if it isn't encrypted by the footer key
KeyMetadata returns the key identifier which is used with a KeyRetriever to get the key for this column if it is not
encrypted using the footer key
SetUtilized is used for marking it as utilized once it is used in FileEncryptionProperties
as the encryption key will be wiped out on completion of writing
WipeOutEncryptionKey Clears the encryption key, used after completion of file writing
func NewColumnEncryptionProperties(name string, opts ...ColumnEncryptOption) *ColumnEncryptionProperties
func (*ColumnEncryptionProperties).Clone() *ColumnEncryptionProperties
func (*FileEncryptionProperties).ColumnEncryptionProperties(path string) *ColumnEncryptionProperties
func (*WriterProperties).ColumnEncryptionProperties(path string) *ColumnEncryptionProperties
ColumnEncryptOption how to specify options to the NewColumnEncryptionProperties function.
func WithKey(key string) ColumnEncryptOption
func WithKeyID(keyID string) ColumnEncryptOption
func WithKeyMetadata(keyMeta string) ColumnEncryptOption
func NewColumnEncryptionProperties(name string, opts ...ColumnEncryptOption) *ColumnEncryptionProperties
ColumnOrder is the Column Order from the parquet.thrift
func github.com/apache/arrow-go/v18/parquet/schema.(*Column).ColumnOrder() ColumnOrder
func github.com/apache/arrow-go/v18/parquet/schema.(*Schema).UpdateColumnOrders(orders []ColumnOrder) error
var DefaultColumnOrder
ColumnPath is the path from the root of the schema to a given column
Extend creates a new ColumnPath from an existing one, with the new ColumnPath having s appended to the end.
( ColumnPath) String() string
ColumnPath : expvar.Var
ColumnPath : fmt.Stringer
func ColumnPathFromString(s string) ColumnPath
func ColumnPath.Extend(s string) ColumnPath
func github.com/apache/arrow-go/v18/parquet/schema.ColumnPathFromNode(n schema.Node) ColumnPath
func github.com/apache/arrow-go/v18/parquet/schema.(*Column).ColumnPath() ColumnPath
func WithAdaptiveBloomFilterEnabledPath(path ColumnPath, enabled bool) WriterProperty
func WithBloomFilterCandidatesPath(path ColumnPath, candidates int) WriterProperty
func WithBloomFilterEnabledPath(path ColumnPath, enabled bool) WriterProperty
func WithBloomFilterFPPPath(path ColumnPath, fpp float64) WriterProperty
func WithBloomFilterNDVPath(path ColumnPath, ndv int64) WriterProperty
func WithCompressionLevelPath(path ColumnPath, level int) WriterProperty
func WithCompressionPath(path ColumnPath, codec compress.Compression) WriterProperty
func WithDictionaryPath(path ColumnPath, dict bool) WriterProperty
func WithEncodingPath(path ColumnPath, encoding Encoding) WriterProperty
func WithPageIndexEnabledPath(path ColumnPath, enabled bool) WriterProperty
func WithStatsPath(path ColumnPath, enabled bool) WriterProperty
func (*WriterProperties).AdaptiveBloomFilterEnabledPath(path ColumnPath) bool
func (*WriterProperties).BloomFilterCandidatesPath(path ColumnPath) int
func (*WriterProperties).BloomFilterEnabledPath(path ColumnPath) bool
func (*WriterProperties).BloomFilterFPPPath(path ColumnPath) float64
func (*WriterProperties).BloomFilterNDVPath(path ColumnPath) int64
func (*WriterProperties).CompressionLevelPath(path ColumnPath) int
func (*WriterProperties).CompressionPath(path ColumnPath) compress.Compression
func (*WriterProperties).DictionaryEnabledPath(path ColumnPath) bool
func (*WriterProperties).EncodingPath(path ColumnPath) Encoding
func (*WriterProperties).MaxStatsSizePath(path ColumnPath) int64
func (*WriterProperties).PageIndexEnabledPath(path ColumnPath) bool
func (*WriterProperties).StatisticsEnabledPath(path ColumnPath) bool
ColumnPathToDecryptionPropsMap maps column paths to decryption properties
func WithColumnKeys(decrypt ColumnPathToDecryptionPropsMap) FileDecryptionOption
ColumnPathToEncryptionPropsMap maps column paths to encryption properties
func (*FileEncryptionProperties).EncryptedColumns() ColumnPathToEncryptionPropsMap
func WithEncryptedColumns(encrypted ColumnPathToEncryptionPropsMap) EncryptOption
ColumnProperties defines the encoding, codec, and so on for a given column.
AdaptiveBloomFilterEnabled bool
BloomFilterCandidates int
BloomFilterEnabled bool
BloomFilterFPP float64
BloomFilterNDV int64
Codec compress.Compression
CompressionLevel int
DictionaryEnabled bool
Encoding Encoding
MaxStatsSize int64
PageIndexEnabled bool
StatsEnabled bool
func DefaultColumnProperties() ColumnProperties
type ColumnTypes (interface)
DataPageVersion is the version of the Parquet Data Pages
func (*WriterProperties).DataPageVersion() DataPageVersion
func WithDataPageVersion(version DataPageVersion) WriterProperty
const DataPageV1
const DataPageV2
DecryptionKeyRetriever is an interface for getting the desired key for decryption from metadata. It should take in
some metadata identifier and return the actual Key to use for decryption.
( DecryptionKeyRetriever) GetKey(keyMetadata []byte) string
func WithKeyRetriever(retriever DecryptionKeyRetriever) FileDecryptionOption
Encoding is the parquet Encoding type
( Encoding) String() string
Encoding : expvar.Var
Encoding : fmt.Stringer
func (*WriterProperties).DictionaryIndexEncoding() Encoding
func (*WriterProperties).DictionaryPageEncoding() Encoding
func (*WriterProperties).Encoding() Encoding
func (*WriterProperties).EncodingFor(path string) Encoding
func (*WriterProperties).EncodingPath(path ColumnPath) Encoding
func WithEncoding(encoding Encoding) WriterProperty
func WithEncodingFor(path string, encoding Encoding) WriterProperty
func WithEncodingPath(path ColumnPath, encoding Encoding) WriterProperty
EncryptOption is used for specifying values when building FileEncryptionProperties
func DisableAadPrefixStorage() EncryptOption
func WithAadPrefix(aadPrefix string) EncryptOption
func WithAlg(cipher Cipher) EncryptOption
func WithEncryptedColumns(encrypted ColumnPathToEncryptionPropsMap) EncryptOption
func WithFooterKeyID(key string) EncryptOption
func WithFooterKeyMetadata(keyMeta string) EncryptOption
func WithPlaintextFooter() EncryptOption
func NewFileEncryptionProperties(footerKey string, opts ...EncryptOption) *FileEncryptionProperties
FileDecryptionOption is how to supply options to constructing a new FileDecryptionProperties instance.
func DisableFooterSignatureVerification() FileDecryptionOption
func WithColumnKeys(decrypt ColumnPathToDecryptionPropsMap) FileDecryptionOption
func WithDecryptAadPrefix(prefix string) FileDecryptionOption
func WithFooterKey(key string) FileDecryptionOption
func WithKeyRetriever(retriever DecryptionKeyRetriever) FileDecryptionOption
func WithPlaintextAllowed() FileDecryptionOption
func WithPrefixVerifier(verifier AADPrefixVerifier) FileDecryptionOption
func NewFileDecryptionProperties(opts ...FileDecryptionOption) *FileDecryptionProperties
FileDecryptionProperties define the File Level configuration for decrypting a parquet file. Once constructed they are
read only.
KeyRetriever DecryptionKeyRetriever
Verifier AADPrefixVerifier
AadPrefix returns the prefix to be supplied for constructing the identification strings when decrypting
Clone returns a new instance of these properties, changing the prefix if set (keeping the same prefix if left empty)
ColumnKey returns the key to be used for decrypting the provided column.
IsUtilized returns whether or not this instance has been used to decrypt a file. If the footer key and prefix are
empty and there are no column decryption properties, then this is always false.
PlaintextFilesAllowed returns whether or not this instance of decryption properties are allowed on a plaintext file.
SetUtilized is called to mark this instance as utilized once it is used to read a file. A single instance
can be used for reading one file only. Setting this ensures the keys will be wiped out upon completion of file reading.
WipeOutDecryptionKeys will clear all the keys for this instance including the column level ones, this will be called
after this instance has been utilized.
func NewFileDecryptionProperties(opts ...FileDecryptionOption) *FileDecryptionProperties
func (*FileDecryptionProperties).Clone(newAadPrefix string) *FileDecryptionProperties
FileEncryptionProperties describe how to encrypt a parquet file when writing data.
Algorithm returns the description of how we will perform the encryption, the algorithm, prefixes, and so on.
Clone allows returning an identical property setup for another file with the option to update the aadPrefix,
(if given the empty string, the current aad prefix will be used) since a single instance can only be used
to encrypt one file before wiping out the keys.
ColumnEncryptionProperties returns the properties for encrypting a given column.
This may be nil for columns that aren't encrypted or may be default properties.
EncryptedColumns returns the mapping of column paths to column encryption properties
FileAad returns the aad identification to be used at the file level which gets concatenated with the row and column
information for encrypting data.
IsUtilized returns whether or not this instance has been used to encrypt a file
SetUtilized is called after writing a file. A FileEncryptionProperties object can be used for writing one file only,
the encryption keys will be wiped out upon completion of writing the file.
WipeOutEncryptionKeys clears all of the encryption keys for this and the columns
func NewFileEncryptionProperties(footerKey string, opts ...EncryptOption) *FileEncryptionProperties
func (*FileEncryptionProperties).Clone(newAadPrefix string) *FileEncryptionProperties
func (*WriterProperties).FileEncryptionProperties() *FileEncryptionProperties
func WithEncryptionProperties(props *FileEncryptionProperties) WriterProperty
FixedLenByteArray is a go type to represent a FixedLengthByteArray as a byte slice
( FixedLenByteArray) Bytes() []byte
Len returns the current length of this FixedLengthByteArray, equivalent to len(fixedlenbytearray)
String returns a string representation of the FixedLenByteArray
FixedLenByteArray : github.com/apache/arrow-go/v18/internal/hashing.ByteSlice
FixedLenByteArray : expvar.Var
FixedLenByteArray : fmt.Stringer
Int96 is a 12 byte integer value utilized for representing timestamps as a 64 bit integer and a 32 bit
integer.
SetNanoSeconds sets the Nanosecond field of the Int96 timestamp to the provided value
String provides the string representation as a timestamp via converting to a time.Time
and then calling String
ToTime returns a go time.Time object that represents the same time instant as the given Int96 value
Int96 : expvar.Var
Int96 : fmt.Stringer
func NewInt96(v [3]uint32) (out Int96)
ReaderAtSeeker is a combination of the ReaderAt and ReadSeeker interfaces
from the io package defining the only functionality that is required
in order for a parquet file to be read by the file functions. We just need
to be able to call ReadAt, Read, and Seek
( ReaderAtSeeker) ReadAt(p []byte, off int64) (n int, err error)
( ReaderAtSeeker) Seek(offset int64, whence int) (int64, error)
github.com/apache/arrow-go/v18/arrow/ipc.ReadAtSeeker (interface)
github.com/apache/arrow-go/v18/internal/utils.Reader (interface)
github.com/coreos/etcd/pkg/fileutil.LockedFile
*github.com/klauspost/compress/s2.ReadSeeker
*github.com/polarsignals/wal/fs.File
*bytes.Reader
*io.SectionReader
mime/multipart.File (interface)
*os.File
*strings.Reader
ReaderAtSeeker : io.ReaderAt
ReaderAtSeeker : io.Seeker
ReaderProperties are used to define how the file reader will handle buffering and allocating buffers
Default buffer size to utilize when reading chunks, when reading page
headers or other metadata, this buffer may be increased if necessary
to read in the necessary metadata. The value here is simply the default
initial BufferSize when reading a new chunk.
If this is set to true, then the reader will use SectionReader to
just use the read stream when reading data. Otherwise we will buffer
the data we're going to read into memory first and then read that buffer.
When accessing data from IO sources with higher latency, like S3, setting this
to false may improve performance by reading the entire row group at once rather
than sending multiple smaller IO requests. For IO streams with low latency, setting
this to true can optimize memory usage for the reader. Additionally, this can decrease
the amount of data retrieved when only needs to access small portions of the parquet file.
create with NewFileDecryptionProperties if dealing with an encrypted file
Allocator returns the allocator that the properties were initialized with
GetStream returns a section of the underlying reader based on whether or not BufferedStream is enabled.
If BufferedStreamEnabled is true, it creates an io.SectionReader, otherwise it will read the entire section
into a buffer in memory and return a bytes.NewReader for that buffer.
func NewReaderProperties(alloc memory.Allocator) *ReaderProperties
Repetition is the underlying parquet field repetition type as in parquet.thrift
( Repetition) String() string
Repetition : expvar.Var
Repetition : fmt.Stringer
func (*WriterProperties).RootRepetition() Repetition
func github.com/apache/arrow-go/v18/parquet/schema.Node.RepetitionType() Repetition
func WithRootRepetition(repetition Repetition) WriterProperty
func github.com/apache/arrow-go/v18/parquet/schema.ListOf(n schema.Node, rep Repetition, fieldID int32) (*schema.GroupNode, error)
func github.com/apache/arrow-go/v18/parquet/schema.ListOfWithName(listName string, element schema.Node, rep Repetition, fieldID int32) (*schema.GroupNode, error)
func github.com/apache/arrow-go/v18/parquet/schema.MapOf(name string, key schema.Node, value schema.Node, mapRep Repetition, fieldID int32) (*schema.GroupNode, error)
func github.com/apache/arrow-go/v18/parquet/schema.NewBooleanNode(name string, rep Repetition, fieldID int32) *schema.PrimitiveNode
func github.com/apache/arrow-go/v18/parquet/schema.NewByteArrayNode(name string, rep Repetition, fieldID int32) *schema.PrimitiveNode
func github.com/apache/arrow-go/v18/parquet/schema.NewFixedLenByteArrayNode(name string, rep Repetition, length int32, fieldID int32) *schema.PrimitiveNode
func github.com/apache/arrow-go/v18/parquet/schema.NewFloat32Node(name string, rep Repetition, fieldID int32) *schema.PrimitiveNode
func github.com/apache/arrow-go/v18/parquet/schema.NewFloat64Node(name string, rep Repetition, fieldID int32) *schema.PrimitiveNode
func github.com/apache/arrow-go/v18/parquet/schema.NewGroupNode(name string, repetition Repetition, fields schema.FieldList, fieldID int32) (*schema.GroupNode, error)
func github.com/apache/arrow-go/v18/parquet/schema.NewGroupNodeConverted(name string, repetition Repetition, fields schema.FieldList, converted schema.ConvertedType, id int32) (n *schema.GroupNode, err error)
func github.com/apache/arrow-go/v18/parquet/schema.NewGroupNodeLogical(name string, repetition Repetition, fields schema.FieldList, logical schema.LogicalType, id int32) (n *schema.GroupNode, err error)
func github.com/apache/arrow-go/v18/parquet/schema.NewInt32Node(name string, rep Repetition, fieldID int32) *schema.PrimitiveNode
func github.com/apache/arrow-go/v18/parquet/schema.NewInt64Node(name string, rep Repetition, fieldID int32) *schema.PrimitiveNode
func github.com/apache/arrow-go/v18/parquet/schema.NewInt96Node(name string, rep Repetition, fieldID int32) *schema.PrimitiveNode
func github.com/apache/arrow-go/v18/parquet/schema.NewPrimitiveNode(name string, repetition Repetition, typ Type, fieldID, typeLength int32) (*schema.PrimitiveNode, error)
func github.com/apache/arrow-go/v18/parquet/schema.NewPrimitiveNodeConverted(name string, repetition Repetition, typ Type, converted schema.ConvertedType, typeLen, precision, scale int, id int32) (*schema.PrimitiveNode, error)
func github.com/apache/arrow-go/v18/parquet/schema.NewPrimitiveNodeLogical(name string, repetition Repetition, logicalType schema.LogicalType, physicalType Type, typeLen int, id int32) (*schema.PrimitiveNode, error)
SortingColumn specifies a sort order within a rowgroup of a specific leaf column.
Type is the physical type as in parquet.thrift
ByteSize returns the number of bytes required to store a single value of
the given parquet.Type in memory.
( Type) String() string
Type : expvar.Var
Type : fmt.Stringer
func GetColumnType[T]() Type
func github.com/apache/arrow-go/v18/parquet/schema.(*Column).PhysicalType() Type
func github.com/apache/arrow-go/v18/parquet/schema.(*PrimitiveNode).PhysicalType() Type
func github.com/apache/arrow-go/v18/parquet/schema.NewPrimitiveNode(name string, repetition Repetition, typ Type, fieldID, typeLength int32) (*schema.PrimitiveNode, error)
func github.com/apache/arrow-go/v18/parquet/schema.NewPrimitiveNodeConverted(name string, repetition Repetition, typ Type, converted schema.ConvertedType, typeLen, precision, scale int, id int32) (*schema.PrimitiveNode, error)
func github.com/apache/arrow-go/v18/parquet/schema.NewPrimitiveNodeLogical(name string, repetition Repetition, logicalType schema.LogicalType, physicalType Type, typeLen int, id int32) (*schema.PrimitiveNode, error)
func github.com/apache/arrow-go/v18/parquet/schema.BSONLogicalType.IsApplicable(t Type, _ int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.DateLogicalType.IsApplicable(t Type, _ int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.DecimalLogicalType.IsApplicable(typ Type, tlen int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.EnumLogicalType.IsApplicable(t Type, _ int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.Float16LogicalType.IsApplicable(t Type, tlen int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.IntervalLogicalType.IsApplicable(t Type, tlen int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.IntLogicalType.IsApplicable(typ Type, _ int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.JSONLogicalType.IsApplicable(t Type, _ int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.ListLogicalType.IsApplicable(Type, int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.LogicalType.IsApplicable(t Type, tlen int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.MapLogicalType.IsApplicable(Type, int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.NoLogicalType.IsApplicable(Type, int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.NullLogicalType.IsApplicable(Type, int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.StringLogicalType.IsApplicable(t Type, _ int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.TemporalLogicalType.IsApplicable(t Type, tlen int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.TimeLogicalType.IsApplicable(typ Type, _ int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.TimestampLogicalType.IsApplicable(t Type, _ int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.UnknownLogicalType.IsApplicable(Type, int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.UUIDLogicalType.IsApplicable(t Type, tlen int32) bool
func github.com/apache/arrow-go/v18/parquet/schema.VariantLogicalType.IsApplicable(Type, int32) bool
Version is the parquet version type
( Version) String() string
Version : expvar.Var
Version : fmt.Stringer
func (*WriterProperties).Version() Version
func WithVersion(version Version) WriterProperty
const V1_0
const V2_4
const V2_6
const V2_LATEST
WriterProperties is the collection of properties to use for writing a parquet file. The values are
read only once it has been constructed.
AdaptiveBloomFilterEnabled returns the default value for whether or not adaptive bloom filters are enabled
AdaptiveBloomFilterEnabledFor returns whether or not adaptive bloom filters are enabled for the given column path
AdaptiveBloomFilterEnabledPath is the same as AdaptiveBloomFilterEnabledFor but takes a ColumnPath
(*WriterProperties) Allocator() memory.Allocator
BloomFilterCandidates returns the default number of candidates to use for bloom filters
BloomFilterCandidatesFor returns the number of candidates to use for the given column path
BloomFilterCandidatesPath is the same as BloomFilterCandidatesFor but takes a ColumnPath
BloomFilterEnabled returns the default value for whether or not bloom filters are enabled
BloomFilterEnabledFor returns whether or not bloom filters are enabled for the given column path
BloomFilterEnabledPath is the same as BloomFilterEnabledFor but takes a ColumnPath
BloomFilterFPP returns the default false positive probability for bloom filters
BloomFilterFPPFor returns the false positive probability for the given column path
BloomFilterFPPPath is the same as BloomFilterFPPFor but takes a ColumnPath
BloomFilterNDV returns the default number of distinct values to use for bloom filters
BloomFilterNDVFor returns the number of distinct values to use for the given column path
BloomFilterNDVPath is the same as BloomFilterNDVFor but takes a ColumnPath
ColumnEncryptionProperties returns the specific properties for encryption that will be used for the given column path
Compression returns the default compression type that will be used for any columns that don't
have a specific compression defined.
CompressionFor will return the compression type that is specified for the given column path, or
the default compression codec if there isn't one specific to this column.
CompressionLevel returns the default compression level that will be used for any column
that doesn't have a compression level specified for it.
CompressionLevelFor returns the compression level that will be utilized for the given column,
or the default compression level if the column doesn't have a specific level specified.
CompressionLevelPath is the same as CompressionLevelFor but takes a ColumnPath object
CompressionPath is the same as CompressionFor but takes a ColumnPath
(*WriterProperties) CreatedBy() string
(*WriterProperties) DataPageSize() int64
(*WriterProperties) DataPageVersion() DataPageVersion
DictionaryEnabled returns the default value as for whether or not dictionary encoding will be utilized for columns
that aren't separately specified.
DictionaryEnabledFor returns whether or not dictionary encoding will be used for the specified column when writing
or the default value if the column was not separately specified.
DictionaryEnabledPath is the same as DictionaryEnabledFor but takes a ColumnPath object.
DictionaryIndexEncoding returns which encoding will be used for the Dictionary Index values based on the
parquet version. V1 uses PlainDict and V2 uses RLEDict
DictionaryPageEncoding returns the encoding that will be utilized for the DictionaryPage itself based on the parquet
version. V1 uses PlainDict, v2 uses Plain
(*WriterProperties) DictionaryPageSizeLimit() int64
Encoding returns the default encoding that will be utilized for any columns which don't have a different value
specified.
EncodingFor returns the encoding that will be used for the given column path, or the default encoding if there
isn't one specified for this column.
EncodingPath is the same as EncodingFor but takes a ColumnPath object
FileEncryptionProperties returns the current encryption properties that were
used to create the writer properties.
MaxBloomFilterBytes returns the maximum number of bytes that a bloom filter can use
(*WriterProperties) MaxRowGroupLength() int64
MaxStatsSize returns the default maximum size for stats
MaxStatsSizeFor returns the maximum stat size for the given column path
MaxStatsSizePath is the same as MaxStatsSizeFor but takes a ColumnPath
PageIndexEnabled returns the default value for whether or not page indexes will be written
PageIndexEnabledFor returns whether page index writing is enabled for the given column path, or
the default value if it wasn't specified separately.
PageIndexEnabledPath is the same as PageIndexEnabledFor but takes a ColumnPath object
(*WriterProperties) RootName() string
(*WriterProperties) RootRepetition() Repetition
(*WriterProperties) SortingColumns() []SortingColumn
StatisticsEnabled returns the default value for whether or not stats are enabled to be written for columns
that aren't separately specified.
StatisticsEnabledFor returns whether stats will be written for the given column path, or the default value if
it wasn't separately specified.
StatisticsEnabledPath is the same as StatisticsEnabledFor but takes a ColumnPath object.
StoreDecimalAsInteger returns the config option controlling whether or not
to try storing decimal data as an integer type if the precision is low enough
(1 <= prec <= 18 can be stored as an int), otherwise it will be stored as
a fixed len byte array.
(*WriterProperties) Version() Version
(*WriterProperties) WriteBatchSize() int64
func NewWriterProperties(opts ...WriterProperty) *WriterProperties
WriterProperty is used as the options for building a writer properties instance
func WithAdaptiveBloomFilterEnabled(enabled bool) WriterProperty
func WithAdaptiveBloomFilterEnabledFor(path string, enabled bool) WriterProperty
func WithAdaptiveBloomFilterEnabledPath(path ColumnPath, enabled bool) WriterProperty
func WithAllocator(mem memory.Allocator) WriterProperty
func WithBatchSize(batch int64) WriterProperty
func WithBloomFilterCandidates(candidates int) WriterProperty
func WithBloomFilterCandidatesFor(path string, candidates int) WriterProperty
func WithBloomFilterCandidatesPath(path ColumnPath, candidates int) WriterProperty
func WithBloomFilterEnabled(enabled bool) WriterProperty
func WithBloomFilterEnabledFor(path string, enabled bool) WriterProperty
func WithBloomFilterEnabledPath(path ColumnPath, enabled bool) WriterProperty
func WithBloomFilterFPP(fpp float64) WriterProperty
func WithBloomFilterFPPFor(path string, fpp float64) WriterProperty
func WithBloomFilterFPPPath(path ColumnPath, fpp float64) WriterProperty
func WithBloomFilterNDV(ndv int64) WriterProperty
func WithBloomFilterNDVFor(path string, ndv int64) WriterProperty
func WithBloomFilterNDVPath(path ColumnPath, ndv int64) WriterProperty
func WithCompression(codec compress.Compression) WriterProperty
func WithCompressionFor(path string, codec compress.Compression) WriterProperty
func WithCompressionLevel(level int) WriterProperty
func WithCompressionLevelFor(path string, level int) WriterProperty
func WithCompressionLevelPath(path ColumnPath, level int) WriterProperty
func WithCompressionPath(path ColumnPath, codec compress.Compression) WriterProperty
func WithCreatedBy(createdby string) WriterProperty
func WithDataPageSize(pgsize int64) WriterProperty
func WithDataPageVersion(version DataPageVersion) WriterProperty
func WithDictionaryDefault(dict bool) WriterProperty
func WithDictionaryFor(path string, dict bool) WriterProperty
func WithDictionaryPageSizeLimit(limit int64) WriterProperty
func WithDictionaryPath(path ColumnPath, dict bool) WriterProperty
func WithEncoding(encoding Encoding) WriterProperty
func WithEncodingFor(path string, encoding Encoding) WriterProperty
func WithEncodingPath(path ColumnPath, encoding Encoding) WriterProperty
func WithEncryptionProperties(props *FileEncryptionProperties) WriterProperty
func WithMaxBloomFilterBytes(nbytes int64) WriterProperty
func WithMaxRowGroupLength(nrows int64) WriterProperty
func WithMaxStatsSize(maxStatsSize int64) WriterProperty
func WithPageIndexEnabled(enabled bool) WriterProperty
func WithPageIndexEnabledFor(path string, enabled bool) WriterProperty
func WithPageIndexEnabledPath(path ColumnPath, enabled bool) WriterProperty
func WithRootName(name string) WriterProperty
func WithRootRepetition(repetition Repetition) WriterProperty
func WithSortingColumns(cols []SortingColumn) WriterProperty
func WithStats(enabled bool) WriterProperty
func WithStatsFor(path string, enabled bool) WriterProperty
func WithStatsPath(path ColumnPath, enabled bool) WriterProperty
func WithStoreDecimalAsInteger(enabled bool) WriterProperty
func WithVersion(version Version) WriterProperty
func NewWriterProperties(opts ...WriterProperty) *WriterProperties
Package-Level Functions (total 77)
AlgorithmFromThrift converts the thrift object to the Algorithm struct for easier usage.
ColumnPathFromString constructs a ColumnPath from a dot separated string
DefaultColumnProperties returns the default properties which get utilized for writing.
The default column properties are the following constants:
Encoding: Encodings.Plain
Codec: compress.Codecs.Uncompressed
DictionaryEnabled: DefaultDictionaryEnabled
StatsEnabled: DefaultStatsEnabled
PageIndexEnabled: DefaultPageIndexEnabled
MaxStatsSize: DefaultMaxStatsSize
CompressionLevel: compress.DefaultCompressionLevel
BloomFilterEnabled: DefaultBloomFilterEnabled
BloomFilterFPP: DefaultBloomFilterFPP
AdaptiveBloomFilterEnabled: DefaultAdaptiveBloomFilterEnabled
BloomFilterCandidates: DefaultBloomFilterCandidates
DisableAadPrefixStorage will set the properties to not store the AadPrefix in the file. If this isn't called
and the AadPrefix is set, then it will be stored. This needs to in the options *after* WithAadPrefix to have an effect.
Type Parameters:
T: ColumnTypes
NewColumnDecryptionProperties constructs a new ColumnDecryptionProperties for the given column path, modified by
the provided options
NewColumnEncryptionProperties constructs properties for the provided column path, modified by the options provided
NewFileDecryptionProperties takes in the options for constructing a new FileDecryptionProperties object, otherwise
it will use the default configuration which will check footer integrity of a plaintext footer for an encrypted file
for unencrypted parquet files, the decryption properties should not be set.
NewFileEncryptionProperties returns a new File Encryption description object using the options provided.
NewInt96 creates a new Int96 from the given 3 uint32 values.
NewReaderProperties returns the default Reader Properties using the provided allocator.
If nil is passed for the allocator, then memory.DefaultAllocator will be used.
NewWriterProperties takes a list of options for building the properties. If multiple options are used which conflict
then the last option is the one which will take effect. If no WriterProperty options are provided, then the default
properties will be utilized for writing.
The Default properties use the following constants:
Allocator: memory.DefaultAllocator
DictionaryPageSize: DefaultDictionaryPageSizeLimit
BatchSize: DefaultWriteBatchSize
MaxRowGroupLength: DefaultMaxRowGroupLen
PageSize: DefaultDataPageSize
ParquetVersion: V2_LATEST
DataPageVersion: DataPageV1
CreatedBy: DefaultCreatedBy
WithAadPrefix sets the AAD prefix to use for encryption and by default will store it in the file
WithAdaptiveBloomFilterEnabled sets the default value for whether to enable writing
adaptive bloom filters for columns. This is the default value for all columns,
but can be overridden by using WithAdaptiveBloomFilterEnabledFor or
WithAdaptiveBloomFilterEnabledPath.
Using an Adaptive Bloom filter will attempt to use multiple candidate bloom filters
when building the column, with different expected distinct values. It will attempt
to use the smallest candidate bloom filter that achieves the desired false positive
probability. Dropping candidates bloom filters that are no longer viable.
WithAdaptiveBloomFilterEnabledFor specifies a per column value as to enable or disable writing
adaptive bloom filters for the column.
WithAdaptiveBloomFilterEnabledPath is like WithAdaptiveBloomFilterEnabledFor, but takes a ColumnPath
WithAlg sets the encryption algorithm to utilize. (default is AesGcm)
WithAllocator specifies the writer to use the given allocator
WithBatchSize specifies the number of rows to use for batch writes to columns
WithBloomFilterCandidates sets the number of candidate bloom filters to use when building
an adaptive bloom filter.
WithBloomFilterCandidatesFor specifies a per column value for the number of candidate
bloom filters to use when building an adaptive bloom filter.
WithBloomFilterCandidatesPath is like WithBloomFilterCandidatesFor, but takes a ColumnPath
WithBloomFilterEnabled sets the default value for whether to enable writing bloom
filters for columns. This is the default value for all columns, but can be overridden
by using WithBloomFilterEnabledFor or WithBloomFilterEnabledPath.
WithBloomFilterEnabledFor specifies a per column value as to enable or disable writing
bloom filters for the column.
WithBloomFilterEnabledPath is like WithBloomFilterEnabledFor, but takes a ColumnPath
WithBloomFilterFPP sets the default value for the false positive probability for writing
bloom filters.
WithBloomFilterFPPFor specifies a per column value for the false positive probability
for writing bloom filters.
WithBloomFilterFPPPath is like WithBloomFilterFPPFor, but takes a ColumnPath
WithBloomFilterNDV sets the default value for the expected number of distinct values
to be written for the column. This is ignored when using adaptive bloom filters.
WithBloomFilterNDVFor specifies a per column value for the expected number of distinct values
to be written for the column. This is ignored when using adaptive bloom filters.
WithBloomFilterNDVPath is like WithBloomFilterNDVFor, but takes a ColumnPath
WithColumnKeys sets explicit column keys.
It's also possible to set a key retriever on this property object.
Upon file decryption, availability of explicit keys is checked before invocation
of the retriever callback.
If an explicit key is available for a footer or a column, its key metadata will be ignored.
WithCompression specifies the default compression type to use for column writing.
WithCompressionFor specifies the compression type for the given column.
WithCompressionLevel specifies the default compression level for the compressor in every column.
The provided compression level is compressor specific. The user would have to know what the available
levels are for the selected compressor. If the compressor does not allow for selecting different
compression levels, then this function will have no effect. Parquet and Arrow will not validate the
passed compression level. If no level is selected by the user or if the special compress.DefaultCompressionLevel
value is used, then parquet will select the compression level.
WithCompressionLevelFor is like WithCompressionLevel but only for the given column path.
WithCompressionLevelPath is the same as WithCompressionLevelFor but takes a ColumnPath
WithCompressionPath is the same as WithCompressionFor but takes a ColumnPath directly.
WithCreatedBy specifies the "created by" string to use for the writer
WithDataPageSize specifies the size to use for splitting data pages for column writing.
WithDataPageVersion specifies whether to use Version 1 or Version 2 of the DataPage spec
WithDecryptAadPrefix explicitly supplies the file aad prefix.
A must when a prefix is used for file encryption, but not stored in the file.
WithDecryptKey specifies the key to utilize for decryption
WithDictionaryDefault sets the default value for whether to enable dictionary encoding
WithDictionaryFor allows enabling or disabling dictionary encoding for a given column path string
WithDictionaryPageSizeLimit is the limit of the dictionary at which the writer
will fallback to plain encoding instead
WithDictionaryPath is like WithDictionaryFor, but takes a ColumnPath type
WithEncoding defines the encoding that is used when we aren't using dictionary encoding.
This is either applied if dictionary encoding is disabled, or if we fallback if the dictionary
grew too large.
WithEncodingFor is for defining the encoding only for a specific column path. This encoding will be used
if dictionary encoding is disabled for the column or if we fallback because the dictionary grew too large
WithEncodingPath is the same as WithEncodingFor but takes a ColumnPath directly.
WithEncryptedColumns sets the map of columns and their properties (keys etc.) If not called, then all columns will
be encrypted with the footer key. If called, then columns not in the map will be left unencrypted.
WithEncryptionProperties specifies the file level encryption handling for writing the file.
WithKey sets a column specific key.
If key is not set on an encrypted column, the column will be encrypted with the footer key.
key length must be either 16, 24, or 32 bytes
the key is cloned and will be wiped out (array values set to 0) upon completion of file writing.
Caller is responsible for wiping out input key array
WithKeyID is a convenience function to set the key metadata using a string id.
Set a key retrieval metadata (converted from String). and use either KeyMetadata or KeyID, not both.
KeyID will be converted to metadata (UTF-8 Array)
WithKeyMetadata sets the key retrieval metadata, use either KeyMetadata or KeyID but not both
WithKeyRetriever sets a key retriever callback. It's also possible to set explicit footer or column keys.
WithMaxBloomFilterBytes sets the maximum size for a bloom filter, after which
it is abandoned and not written to the file.
WithMaxRowGroupLength specifies the number of rows as the maximum number of rows for a given row group in the writer.
WithMaxStatsSize sets a maximum size for the statistics before we decide not to include them.
WithPageIndexEnabled specifies the default value for whether or not to write page indexes for columns
WithPageIndexEnabled specifies a per column value as to enable or disable writing page indexes for the column
WithPageIndexEnabledPath is like WithPageIndexEnabledFor, but takes a ColumnPath
WithPlaintextAllowed sets allowing plaintext files.
By default, reading plaintext (unencrypted) files is not allowed when using
a decryptor.
In order to detect files that were not encrypted by mistake.
However the default behavior can be overridden by using this method.
WithPrefixVerifier supplies a verifier object to use for verifying the AAD Prefixes stored in the file.
WithRootName enables customization of the name used for the root schema node. This is required
to maintain compatibility with other tools.
WithRootRepetition enables customization of the repetition used for the root schema node.
This is required to maintain compatibility with other tools.
WithSortingColumns allow specifying the sorting columns in the written metadata.
If this is set, the user should ensure that records are sorted by these columns,
otherwise the sorting data will be inconsistent with the sorting_columns metadata.
WithStats specifies a default for whether or not to enable column statistics.
WithStatsFor specifies a per column value as to enable or disable statistics in the resulting file.
WithStatsPath is the same as WithStatsFor but takes a ColumnPath
WithStoreDecimalAsInteger specifies whether to try using an int32/int64 for storing
decimal data rather than fixed len byte arrays if the precision is low enough.
WithVersion specifies which Parquet Spec version to utilize for writing.
Package-Level Variables (total 10)
ByteArraySizeBytes is the number of bytes returned by reflect.TypeOf(ByteArray{}).Size()
ByteArrayTraits provides information about the ByteArray type, which is just an []byte
ColumnOrders contains constants for the Column Ordering fields
DefaultColumnOrder is to use TypeDefinedOrder
Encodings contains constants for the encoding types of the column data
The values used all correspond to the values in parquet.thrift for the
corresponding encoding type.
FixedLenByteArraySizeBytes is the number of bytes returned by reflect.TypeOf(FixedLenByteArray{}).Size()
FixedLenByteArrayTraits provides information about the FixedLenByteArray type which is just an []byte
Int96Traits provides information about the Int96 type
Repetitions contains the constants for Field Repetition Types
Types contains constants for the Physical Types that are used in the Parquet Spec
They can be specified when needed as such: `parquet.Types.Int32` etc. The values
all correspond to the values in parquet.thrift
Package-Level Constants (total 31)
Constants that will be used as the default values with encryption/decryption
constants for choosing the Aes Algorithm to use for encryption/decryption
constants for choosing the Aes Algorithm to use for encryption/decryption
constants for the parquet DataPage Version to use
constants for the parquet DataPage Version to use
Constants for default property values used for the default reader, writer and column props.
by default if you set the file decryption properties, we will error
on any plaintext files unless otherwise specified.
Constants for default property values used for the default reader, writer and column props.
Constants for default property values used for the default reader, writer and column props.
Constants for default property values used for the default reader, writer and column props.
Default Buffer size used for the Reader
Constants that will be used as the default values with encryption/decryption
Constants for default property values used for the default reader, writer and column props.
Default data page size limit is 1K it's not guaranteed, but we will try to
cut data pages off at this size where possible.
Default is for dictionary encoding to be turned on, use WithDictionaryDefault
writer property to change that.
If the dictionary reaches the size of this limitation, the writer will use
the fallback encoding (usually plain) instead of continuing to build the
dictionary index.
By default we'll use AesGCM as our encryption algorithm
Constants for default property values used for the default reader, writer and column props.
Default maximum number of rows for a single row group
If the stats are larger than 4K the writer will skip writing them out anyways.
Default is to not write page indexes for columns
Constants for default property values used for the default reader, writer and column props.
Default is to have stats enabled for all columns, use writer properties to
change the default, or to enable/disable for specific columns.
In order to attempt to facilitate data page size limits for writing,
data is written in batches. Increasing the batch size may improve performance
but the larger the batch size, the easier it is to overshoot the datapage limit.
Int96SizeBytes is the number of bytes that make up an Int96
Constants that will be used as the default values with encryption/decryption
Enable only pre-2.2 parquet format features when writing.
This is useful for maximum compatibility with legacy readers.
Note that logical types may still be emitted, as long as they have
a corresponding converted type.
Enable parquet format 2.4 and earlier features when writing.
This enables uint32 as well as logical types which don't have a
corresponding converted type.
Note: Parquet format 2.4.0 was released in October 2017
Enable Parquet format 2.6 and earlier features when writing.
This enables the nanos time unit in addition to the V2_4 features.
Note: Parquet format 2.6.0 was released in September 2018
Enable the latest parquet format 2.x features.
This is equal to the greatest 2.x version supported by this library.
![]() |
The pages are generated with Golds v0.8.2. (GOOS=linux GOARCH=amd64) Golds is a Go 101 project developed by Tapir Liu. PR and bug reports are welcome and can be submitted to the issue list. Please follow @zigo_101 (reachable from the left QR code) to get the latest news of Golds. |