euc_cn decoder Segmentation fault

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

euc_cn decoder Segmentation fault

郭雪松-2
[root@brain brain4]$cat test_charset.pike
void main()
{
   for(int i=0;i<256;i++){
        for(int j=0;j<256;j++){
                catch{
                        Locale.Charset.decoder("euc_cn")->feed(sprintf("%c%c",i,j))->drain();
                };
        }
   }
}
[root@brain brain4]$pike test_charset.pike
Segmentation fault
[root@brain brain4]$pike --version
Pike v7.8 release 352 Copyright ▒ 1994-2009 Link▒ping University
Pike comes with ABSOLUTELY NO WARRANTY; This is free software and you are
welcome to redistribute it under certain conditions; read the filesCOPYING and COPYRIGHT in the Pike distribution for more details.
[root@brain brain4]$uname -a
Linux brain 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

As I know pike Locale.Charset has other bugs. for example MIME.Message use Locale.Charset.decoder("gb2312")
 to decode message that has a charset of gb2312, Locale.Charset.decoder("gb2312") not accept any latin1 chars,
but a gb2312 message DO has latin1 chars mixed with gb2312 chars.
 
I wonder why not use iconv to implement Locale.Charset ?
 
iconv is reliable
 
Guo Xuesong
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: euc_cn decoder Segmentation fault

Henrik Grubbström-2
On Mon, 21 Nov 2011, άàäáæ wrote:

> [root@brain brain4]$cat test_charset.pike
[...]
> [root@brain brain4]$pike test_charset.pike
> Segmentation fault

Thanks, fixed in Pike 7.9. The bug was that the EUC decoder assumed that
the second and third tables were present.

> As I know pike Locale.Charset has other bugs. for example MIME.Message
> use Locale.Charset.decoder("gb2312")
> to decode message that has a charset of gb2312,
> Locale.Charset.decoder("gb2312") not accept any latin1 chars,
> but a gb2312 message DO has latin1 chars mixed with gb2312 chars.

Actually, according to the ISO registrar for ISO-IR standards, GB 2312-80
(aka ISO-IR 58) is just the 94x94 character set. It seems however that GB
2312-80 is often an alias for EUC-CN (ie GB 2312-80 encoded accordning to
EUC). This is also what http://en.wikipedia.org/wiki/GB_2312 claims.

I've now changed GB 2312-80 to an alias for EUC-CN. Thanks.

> I wonder why not use iconv to implement Locale.Charset ?

AFAIK: Because iconv wasn't available at the time the module was written.
It is also useful to know that support for certain encodings is present.

> iconv is reliable
>
> Guo Xuesong

Thanks,

--
Henrik Grubbström [hidden email]
Roxen Internet Software AB
Loading...