package colltab
Import Path
golang.org/x/text/internal/colltab (on go.dev)
Dependency Relation
imports 6 packages, and imported by one package
Involved Source Files
collelem.go
Package colltab contains functionality related to collation tables.
It is only to be used by the collate and search packages.
contract.go
iter.go
numeric.go
table.go
trie.go
weighter.go
Package-Level Type Names (total 7)
type ContractTrieSet ([])
Elem is a representation of a collation element. This API provides ways to encode
and decode Elems. Implementations of collation tables may use values greater
or equal to PrivateUse for their own purposes. However, these should never be
returned by AppendNext.
CCC returns the canonical combining class associated with the underlying character,
if applicable, or 0 otherwise.
Mask sets weights for any level smaller than l to 0.
The resulting Elem can be used to test for equality with
other Elems to which the same mask has been applied.
Primary returns the primary collation weight for ce.
Quaternary returns the quaternary value if explicitly specified,
0 if ce == Ignore, or MaxQuaternary otherwise.
Quaternary values are used only for shifted variants.
Secondary returns the secondary collation weight for ce.
Tertiary returns the tertiary collation weight for ce.
Weight returns the collation weight for the given level.
func MakeElem(primary, secondary, tertiary int, ccc uint8) (Elem, error)
func MakeQuaternary(v int) Elem
func (*Table).AppendNext(w []Elem, b []byte) (res []Elem, n int)
func (*Table).AppendNextString(w []Elem, s string) (res []Elem, n int)
func Weighter.AppendNext(buf []Elem, s []byte) (ce []Elem, n int)
func Weighter.AppendNextString(buf []Elem, s string) (ce []Elem, n int)
func (*Table).AppendNext(w []Elem, b []byte) (res []Elem, n int)
func (*Table).AppendNextString(w []Elem, s string) (res []Elem, n int)
func Weighter.AppendNext(buf []Elem, s []byte) (ce []Elem, n int)
func Weighter.AppendNextString(buf []Elem, s string) (ce []Elem, n int)
An Iter incrementally converts chunks of the input text to collation
elements, while ensuring that the collation elements are in normalized order
(that is, they are in the order as if the input text were normalized first).
Elems []Elem
N is the number of elements in Elems that will not be reordered on
subsequent iterations, N <= len(Elems).
Weighter Weighter
Discard removes the collation elements up to N.
End returns the end position of the input text for which Next has returned
results.
Len returns the length of the input text.
Next appends Elems to the internal array. On each iteration, it will either
add starters or modifiers. In the majority of cases, an Elem with a primary
value > 0 will have a CCC of 0. The CCC values of collation elements are also
used to detect if the input string was not normalized and to adjust the
result accordingly.
Reset sets the position in the current input text to p and discards any
results obtained so far.
SetInput resets i to input s.
SetInputString resets i to input s.
*Iter : github.com/apache/arrow-go/v18/arrow/compute/exec.ArrayIter[bool]
Level identifies the collation comparison level.
The primary level corresponds to the basic sorting of text.
The secondary level corresponds to accents and related linguistic elements.
The tertiary level corresponds to casing and related concepts.
The quaternary level is derived from the other levels by the
various algorithms for handling variable elements.
func Elem.Mask(l Level) uint32
func Elem.Weight(l Level) int
const Identity
const NumLevels
const Primary
const Quaternary
const Secondary
const Tertiary
Table holds all collation data for a given collation ordering.
ContractElem []uint32
contraction info
expansion info
// main trie
MaxContractLen int
VariableTop uint32
(*Table) AppendNext(w []Elem, b []byte) (res []Elem, n int)
(*Table) AppendNextString(w []Elem, s string) (res []Elem, n int)
(*Table) Domain() []string
(*Table) Start(p int, b []byte) int
(*Table) StartString(p int, s string) int
(*Table) Top() uint32
*Table : Weighter
Index []uint16
// index for first byte (0xC0-0xFF)
Values []uint32
// index for first byte (0x00-0x7F)
A Weighter can be used as a source for Collator and Searcher.
AppendNext appends Elems to buf corresponding to the longest match
of a single character or contraction from the start of s.
It returns the new buf and the number of bytes consumed.
AppendNextString appends Elems to buf corresponding to the longest match
of a single character or contraction from the start of s.
It returns the new buf and the number of bytes consumed.
Domain returns a slice of all single characters and contractions for which
collation elements are defined in this table.
Start finds the start of the segment that includes position p.
StartString finds the start of the segment that includes position p.
Top returns the highest variable primary value.
*Table
func NewNumericWeighter(w Weighter) Weighter
func NewNumericWeighter(w Weighter) Weighter
func golang.org/x/text/collate.NewFromTable(w Weighter, o ...collate.Option) *collate.Collator
Package-Level Functions (total 4)
MakeElem returns an Elem for the given values. It will return an error
if the given combination of values is invalid.
MakeQuaternary returns an Elem with the given quaternary value.
MatchLang finds the index of t in tags, using a matching algorithm used for
collation and search. tags[0] must be language.Und, the remaining tags should
be sorted alphabetically.
Language matching for collation and search is different from the matching
defined by language.Matcher: the (inferred) base language must be an exact
match for the relevant fields. For example, "gsw" should not match "de".
Also the parent relation is different, as a parent may have a different
script. So usually the parent of zh-Hant is und, whereas for MatchLang it is
zh.
NewNumericWeighter wraps w to replace individual digits to sort based on their
numeric value.
Weighter w must have a free primary weight after the primary weight for 9.
If this is not the case, numeric value will sort at the same primary level
as the first primary sorting after 9.
Package-Level Constants (total 9)
For normal collation elements, we assume that a collation element either has
a primary or non-default secondary value, not both.
Collation elements with a primary value are of the form
01pppppp pppppppp ppppppp0 ssssssss
- p* is primary collation value
- s* is the secondary collation value
00pppppp pppppppp ppppppps sssttttt, where
- p* is primary collation value
- s* offset of secondary from default value.
- t* is the tertiary collation value
100ttttt cccccccc pppppppp pppppppp
- t* is the tertiar collation value
- c* is the canonical combining class
- p* is the primary collation value
Collation elements with a secondary value are of the form
1010cccc ccccssss ssssssss tttttttt, where
- c* is the canonical combining class
- s* is the secondary collation value
- t* is the tertiary collation value
11qqqqqq qqqqqqqq qqqqqqq0 00000000
- q* quaternary value
const MaxQuaternary = 2097151 // 21 bits. const PrivateUse = 3221225472 const Quaternary Level = 3![]() |
The pages are generated with Golds v0.8.2. (GOOS=linux GOARCH=amd64) Golds is a Go 101 project developed by Tapir Liu. PR and bug reports are welcome and can be submitted to the issue list. Please follow @zigo_101 (reachable from the left QR code) to get the latest news of Golds. |