package graphemes
Import Path
github.com/clipperhouse/uax29/v2/graphemes (on go.dev)
Dependency Relation
imports 3 packages, and imported by one package
Involved Source Files
iterator.go
Package graphemes implements Unicode grapheme cluster boundaries: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
splitfunc.go
trie.go
Package-Level Type Names (total 2)
Type Parameters:
T: iterators.Stringish
Iterator *iterators.Iterator[T]
End returns the byte position after the current token in the original data.
Next advances the iterator to the next token. It returns false when there
are no remaining tokens or an error occurred.
Reset resets the iterator to the beginning of the data.
SetText sets the text for the iterator to operate on, and resets all state.
Split sets the SplitFunc for the Iterator.
Start returns the byte position of the current token in the original data.
Value returns the current token.
Iterator : github.com/apache/arrow-go/v18/arrow/compute/exec.ArrayIter[bool]
func FromBytes(b []byte) Iterator[[]byte]
func FromString(s string) Iterator[string]
Scanner *bufio.Scanner
Buffer controls memory allocation by the Scanner.
It sets the initial buffer to use when scanning
and the maximum size of buffer that may be allocated during scanning.
The contents of the buffer are ignored.
The maximum token size must be less than the larger of max and cap(buf).
If max <= cap(buf), [Scanner.Scan] will use this buffer only and do no allocation.
By default, [Scanner.Scan] uses an internal buffer and sets the
maximum token size to [MaxScanTokenSize].
Buffer panics if it is called after scanning has started.
Bytes returns the most recent token generated by a call to [Scanner.Scan].
The underlying array may point to data that will be overwritten
by a subsequent call to Scan. It does no allocation.
Err returns the first non-EOF error that was encountered by the [Scanner].
Scan advances the [Scanner] to the next token, which will then be
available through the [Scanner.Bytes] or [Scanner.Text] method. It returns false when
there are no more tokens, either by reaching the end of the input or an error.
After Scan returns false, the [Scanner.Err] method will return any error that
occurred during scanning, except that if it was [io.EOF], [Scanner.Err]
will return nil.
Scan panics if the split function returns too many empty
tokens without advancing the input. This is a common error mode for
scanners.
Split sets the split function for the [Scanner].
The default split function is [ScanLines].
Split panics if it is called after scanning has started.
Text returns the most recent token generated by a call to [Scanner.Scan]
as a newly allocated string holding its bytes.
Scanner : github.com/apache/arrow-go/v18/internal/hashing.ByteSlice
func FromReader(r io.Reader) *Scanner
Package-Level Functions (total 3)
FromBytes returns an iterator for the grapheme clusters in the input bytes.
Iterate while Next() is true, and access the grapheme via Value().
FromReader returns a Scanner, to split graphemes per
https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries.
It embeds a [bufio.Scanner], so you can use its methods.
Iterate through graphemes by calling Scan() until false, then check Err().
FromString returns an iterator for the grapheme clusters in the input string.
Iterate while Next() is true, and access the grapheme via Value().
Package-Level Variables (only one)
SplitFunc is a bufio.SplitFunc implementation of Unicode grapheme cluster segmentation, for use with bufio.Scanner.
See https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries.
![]() |
The pages are generated with Golds v0.8.2. (GOOS=linux GOARCH=amd64) Golds is a Go 101 project developed by Tapir Liu. PR and bug reports are welcome and can be submitted to the issue list. Please follow @zigo_101 (reachable from the left QR code) to get the latest news of Golds. |