Maybe a bug or compatiable issue about MIME.decode_words_text_remapped

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Maybe a bug or compatiable issue about MIME.decode_words_text_remapped

郭雪松-2
I got a mail with the following headers:
Subject: =?utf-8?B?6Z+p5b2p6Iux5LiK5ryU5aSn5bC65bqm5bqK5oiPIOS9n+Wkp+S4uuiHquab?=
=?utf-8?B?neeOqeaequiAjeW4heinhumikQ==?=
X-Mailer: PHPMailer [version 1.73]
...
Whe decode the subject, MIME.decode_words_text_remapped complain: Error decoding [0x9d]"\347\216\251\346\236\252\350\200\215\345\270\205\350\247\206\351\242\221" using utf8: Invalid byte.
The reason is PHPMailer splited a utf-8 char to two parts, and MIME.encode_words_text inserted a space between them.
I don't know which side (Pike MIME or PHPMailer) is up to the standard. but hotmail.com can decode this mail correctly.


Guo Xuesong
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Maybe a bug or compatiable issue about MIME.decode_words_text_remapped

Henrik Grubbström-2
On Tue, 13 Dec 2011, ¹ùÑ©ËÉ wrote:

> I got a mail with the following headers:
> Subject: =?utf-8?B?6Z+p5b2p6Iux5LiK5ryU5aSn5bC65bqm5bqK5oiPIOS9n+Wkp+S4uuiHquab?=
> =?utf-8?B?neeOqeaequiAjeW4heinhumikQ==?=
> X-Mailer: PHPMailer [version 1.73]
> ...
> Whe decode the subject, MIME.decode_words_text_remapped complain: Error decoding [0x9d]"\347\216\251\346\236\252\350\200\215\345\270\205\350\247\206\351\242\221" using utf8: Invalid byte.
> The reason is PHPMailer splited a utf-8 char to two parts, and MIME.encode_words_text inserted a space between them.
> I don't know which side (Pike MIME or PHPMailer) is up to the standard. but hotmail.com can decode this mail correctly.

As far as I can see it's an invalid split, since each encoded-word should
be individually decodable (you can eg change character set beween each
encoded-word). This is also what RFC 2047 says:

    Each 'encoded-word' MUST represent an integral number of characters.
    A multi-octet character may not be split across adjacent 'encoded-
    word's.

The reason it works for most people is probably that they
stay in UTF-8 space, and thus don't see the broken character.

I've now adjusted MIME.decode_words_text() (and thus also
MIME.decode_words_text_remapped()) to join adjacent decoded
words that have the same character set. This seems to fix
at least your example case.

Fixed in Pike 7.8 and 7.9.

> Guo Xuesong

Thanks for the report,

--
Henrik Grubbström [hidden email]
Roxen Internet Software AB
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: Maybe a bug or compatiable issue about MIME.decode_words_text_remapped

郭雪松-2
Fetching from git, I got Pike-v7.9.5 .
Where can I get fixed Pike-v7.8 ?

Guo Xuesong
 
> Date: Tue, 13 Dec 2011 23:24:42 +0100

> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]
> Subject: Re: Maybe a bug or compatiable issue about MIME.decode_words_text_remapped
>
> On Tue, 13 Dec 2011, ¹ùÑ©ËÉ wrote:
>
> > I got a mail with the following headers:
> > Subject: =?utf-8?B?6Z+p5b2p6Iux5LiK5ryU5aSn5bC65bqm5bqK5oiPIOS9n+Wkp+S4uuiHquab?=
> > =?utf-8?B?neeOqeaequiAjeW4heinhumikQ==?=
> > X-Mailer: PHPMailer [version 1.73]
> > ...
> > Whe decode the subject, MIME.decode_words_text_remapped complain: Error decoding [0x9d]"\347\216\251\346\236\252\350\200\215\345\270\205\350\247\206\351\242\221" using utf8: Invalid byte.
> > The reason is PHPMailer splited a utf-8 char to two parts, and MIME.encode_words_text inserted a space between them.
> > I don't know which side (Pike MIME or PHPMailer) is up to the standard. but hotmail.com can decode this mail correctly.
>
> As far as I can see it's an invalid split, since each encoded-word should
> be individually decodable (you can eg change character set beween each
> encoded-word). This is also what RFC 2047 says:
>
> Each 'encoded-word' MUST represent an integral number of characters.
> A multi-octet character may not be split across adjacent 'encoded-
> word's.
>
> The reason it works for most people is probably that they
> stay in UTF-8 space, and thus don't see the broken character.
>
> I've now adjusted MIME.decode_words_text() (and thus also
> MIME.decode_words_text_remapped()) to join adjacent decoded
> words that have the same character set. This seems to fix
> at least your example case.
>
> Fixed in Pike 7.8 and 7.9.
>
> > Guo Xuesong
>
> Thanks for the report,
>
> --
> Henrik Grubbström [hidden email]
> Roxen Internet Software AB
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: Maybe a bug or compatiable issue about MIME.decode_words_text_remapped

Henrik Grubbström-2
On Wed, 14 Dec 2011, Guo Xuesong wrote:

> Fetching from git, I got Pike-v7.9.5 .
> Where can I get fixed Pike-v7.8 ?

It's in the same git-repository. Try checking out the 7.8 branch:

   git checkout 7.8

--
Henrik Grubbström [hidden email]
Roxen Internet Software AB
Loading...