On Mon, 21 Nov 2011, άàäáæ wrote:
> [root@brain brain4]$cat test_charset.pike
[...]
> [root@brain brain4]$pike test_charset.pike
> Segmentation fault
Thanks, fixed in Pike 7.9. The bug was that the EUC decoder assumed that
the second and third tables were present.
> As I know pike Locale.Charset has other bugs. for example MIME.Message
> use Locale.Charset.decoder("gb2312")
> to decode message that has a charset of gb2312,
> Locale.Charset.decoder("gb2312") not accept any latin1 chars,
> but a gb2312 message DO has latin1 chars mixed with gb2312 chars.
Actually, according to the ISO registrar for ISO-IR standards, GB 2312-80
(aka ISO-IR 58) is just the 94x94 character set. It seems however that GB
2312-80 is often an alias for EUC-CN (ie GB 2312-80 encoded accordning to
EUC). This is also what
http://en.wikipedia.org/wiki/GB_2312 claims.
I've now changed GB 2312-80 to an alias for EUC-CN. Thanks.
> I wonder why not use iconv to implement Locale.Charset ?
AFAIK: Because iconv wasn't available at the time the module was written.
It is also useful to know that support for certain encodings is present.
> iconv is reliable
>
> Guo Xuesong
Thanks,
--
Henrik Grubbström
[hidden email]
Roxen Internet Software AB