Charmap is a structure for setting up encodings for 8-bit character sets,
for transforming between UTF8 and that other character set. It has some
ideas borrowed from golang.org/x/text/encoding/charmap, but it uses a
different implementation. This implementation uses maps, and supports
user-defined maps.
We do assume that a character map has a reasonable substitution character,
and that valid encodings are stable (exactly a 1:1 map) and stateless
(that is there is no shift character or anything like that.) Hence this
approach will not work for many East Asian character sets.
Measurement shows little or no measurable difference in the performance of
the two approaches. The difference was down to a couple of nsec/op, and
no consistent pattern as to which ran faster. With the conversion to
UTF-8 the code takes about 25 nsec/op. The conversion in the reverse
direction takes about 100 nsec/op. (The larger cost for conversion
from UTF-8 is most likely due to the need to convert the UTF-8 byte stream
to a rune before conversion. The map between bytes and runes. To indicate that a specific
byte value is invalid for a charcter set, use the rune
utf8.RuneError. Values that are absent from this map will
be assumed to have the identity mapping -- that is the default
is to assume ISO8859-1, where all 8-bit characters have the same
numeric value as their Unicode runes. (Not to be confused with
the UTF-8 values, which *will* be different for non-ASCII runes.)
If no values less than RuneSelf are changed (or have non-identity
mappings), then the character set is assumed to be an ASCII
superset, and certain assumptions and optimizations become
available for ASCII bytes.NopResettertransform.NopResetter The ReplacementChar is the byte value to use for substitution.
It should normally be ASCIISub for ASCII encodings. This may be
unset (left to zero) for mappings that are strictly ASCII supersets.
In that case ASCIISub will be assumed instead. Init initializes internal values of a character map. This should
be done early, to minimize the cost of allocation of transforms
later. It is not strictly necessary however, as the allocation
functions will arrange to call it if it has not already been done. NewDecoder returns a Decoder the converts from the 8-bit
character set to UTF-8. Unknown mappings, if any, are mapped
to '\uFFFD'. NewEncoder returns a Transformer that converts from UTF8 to the
8-bit character set. Unknown mappings are mapped to 0x1A. Reset implements the Reset method of the Transformer interface.
*Charmap : golang.org/x/text/encoding.Encoding
Package-Level Variables (total 5)
ASCII represents the 7-bit US-ASCII scheme. It decodes directly to
UTF-8 without change, as all ASCII values are legal UTF-8.
Unicode values less than 128 (i.e. 7 bits) map 1:1 with ASCII.
It encodes runes outside of that to 0x1A, the ASCII substitution character.
EBCDIC represents the 8-bit EBCDIC scheme, found in some mainframe
environments. If you don't know what this is, consider yourself lucky.
ISO8859_1 represents the 8-bit ISO8859-1 scheme. It decodes directly to
UTF-8 without change, as all ISO8859-1 values are legal UTF-8.
Unicode values less than 256 (i.e. 8 bits) map 1:1 with 8859-1.
It encodes runes outside of that to 0x1A, the ASCII substitution character.
ISO8859_9 represents the 8-bit ISO8859-9 scheme.
UTF8 is an encoding for UTF-8. All it does is verify that the UTF-8
in is valid. The main reason for its existence is that it will detect
and report ErrSrcShort or ErrDstShort, whereas the Nop encoding just
passes every byte, blithely.
Package-Level Constants (total 3)
ASCIISub is the ASCII substitution character.
RuneError is an alias for the UTF-8 replacement rune, '\uFFFD'.
RuneSelf is the rune below which UTF-8 and the Unicode values are
identical. Its also the limit for ASCII.
The pages are generated with Goldsv0.8.2. (GOOS=linux GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @zigo_101 (reachable from the left QR code) to get the latest news of Golds.