So, what do those =?utf-8? and ?= sequences mean? Are they part of
the string or are they wrapped around the string on output as a way to
show that it's utf-8 encoded?
I'm having a real hard time trying to do anything to a string (?)
returned by mailbox.MaildirMessage.get().
On Sat, 6 May 2023 14:50:40 +0100, Chris Green <cl@isbd.net> wrote:
[snip]
So, what do those =?utf-8? and ?= sequences mean? Are they part of
the string or are they wrapped around the string on output as a way to
show that it's utf-8 encoded?
Yes, "=?utf-8?" signals "MIME header encoding".
I've only blundered about briefly in this area, but I think you
need to make sure that all header values you work with have been
converted to UTF-8 before proceeding.
Here's the code that seemed to work for me:
def mime_decode_single(pair):
"""Decode a single (bytestring, charset) pair.
"""
b, charset = pair
result = b if isinstance(b, str) else b.decode(
charset if charset else "utf-8")
return result
def mime_decode(s):
"""Decode a MIME-header-encoded character string.
"""
decoded_pairs = email.header.decode_header(s)
return "".join(mime_decode_single(d) for d in decoded_pairs)
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 18:23:43 |
Calls: | 10,389 |
Files: | 14,061 |
Messages: | 6,416,956 |