• What do these '=?utf-8?' sequences mean in python?

    From Chris Green@21:1/5 to All on Sat May 6 14:50:40 2023
    I'm having a real hard time trying to do anything to a string (?)
    returned by mailbox.MaildirMessage.get().

    I'm extracting the Subject: header from a message and, if I write what
    it returns to a log file using the python logging module what I see
    in the log file (when the Subject: has non-ASCII characters in it) is:-

    =?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?=

    Whatever I try I am unable to change the underscore characters in the
    above string back to spaces.


    So, what do those =?utf-8? and ?= sequences mean? Are they part of
    the string or are they wrapped around the string on output as a way to
    show that it's utf-8 encoded?

    If I have the string in a variable how do I replace the underscores
    with spaces? Simply doing "subject.replace('_', ' ')" doesn't work,
    nothing happens at all.

    All I really want to do is throw the non-ASCII characters away as the
    string I'm trying to match in the subject is guaranteed to be ASCII.

    --
    Chris Green
    ·

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Pearson@21:1/5 to Chris Green on Sat May 6 15:10:05 2023
    On Sat, 6 May 2023 14:50:40 +0100, Chris Green <cl@isbd.net> wrote:
    [snip]
    So, what do those =?utf-8? and ?= sequences mean? Are they part of
    the string or are they wrapped around the string on output as a way to
    show that it's utf-8 encoded?

    Yes, "=?utf-8?" signals "MIME header encoding".

    I've only blundered about briefly in this area, but I think you
    need to make sure that all header values you work with have been
    converted to UTF-8 before proceeding.
    Here's the code that seemed to work for me:

    def mime_decode_single(pair):
    """Decode a single (bytestring, charset) pair.
    """
    b, charset = pair
    result = b if isinstance(b, str) else b.decode(
    charset if charset else "utf-8")
    return result

    def mime_decode(s):
    """Decode a MIME-header-encoded character string.
    """
    decoded_pairs = email.header.decode_header(s)
    return "".join(mime_decode_single(d) for d in decoded_pairs)



    --
    To email me, substitute nowhere->runbox, invalid->com.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Green@21:1/5 to Chris Green on Sat May 6 15:58:15 2023
    Chris Green <cl@isbd.net> wrote:
    I'm having a real hard time trying to do anything to a string (?)
    returned by mailbox.MaildirMessage.get().

    What a twit I am :-)

    Strings are immutable, I have to do:-

    newstring = oldstring.replace("_", " ")

    Job done!

    --
    Chris Green
    ·

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jak@21:1/5 to All on Sat May 6 18:27:23 2023
    Peter Pearson ha scritto:
    On Sat, 6 May 2023 14:50:40 +0100, Chris Green <cl@isbd.net> wrote:
    [snip]
    So, what do those =?utf-8? and ?= sequences mean? Are they part of
    the string or are they wrapped around the string on output as a way to
    show that it's utf-8 encoded?

    Yes, "=?utf-8?" signals "MIME header encoding".

    I've only blundered about briefly in this area, but I think you
    need to make sure that all header values you work with have been
    converted to UTF-8 before proceeding.
    Here's the code that seemed to work for me:

    def mime_decode_single(pair):
    """Decode a single (bytestring, charset) pair.
    """
    b, charset = pair
    result = b if isinstance(b, str) else b.decode(
    charset if charset else "utf-8")
    return result

    def mime_decode(s):
    """Decode a MIME-header-encoded character string.
    """
    decoded_pairs = email.header.decode_header(s)
    return "".join(mime_decode_single(d) for d in decoded_pairs)




    HI,
    You could also use make_header:

    from email.header import decode_header, make_header

    print(make_header(decode_header( subject )))

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)