• Python 3.7+ cannot print unicode characters when output is redirected t

    From Jessica Smith@21:1/5 to All on Sun Nov 13 08:49:40 2022
    Consider the following code ran in Powershell or cmd.exe:

    $ python -c "print('└')"


    $ python -c "print('└')" > test_file.txt
    Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in
    position 0: character maps to <undefined>

    Is this a known limitation of Windows + Unicode? I understand that
    using -x utf8 would fix this, or modifying various environment
    variables. But is this expected for a standard Python installation on
    Windows?

    Jessica

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Jessica Smith on Sun Nov 13 15:49:10 2022
    Jessica Smith <12jessicasmith34@gmail.com> writes:
    $ python -c "print('└')"

    $ python -c "print('└')" > test_file.txt
    Traceback (most recent call last):

    I can't answer your questions, but here's an observation
    from a Microsoft® Windows installation:

    (Sorry, I was not able to limit the length of some of the
    following lines to 72.)

    |C:\>chcp 65001
    |Active code page: 65001

    The user has set the code page of his cmd session to "65001".

    |C:\>py -c "import os,sys; print( sys.stdout.isatty(), os.device_encoding( sys.stdout.fileno() ), file=sys.stderr )"
    |True cp65001

    Python sees "cp65001" as the encoding of the standard output
    device when the console is used for standard output.

    |C:\>py -c "import os,sys; print( sys.stdout.isatty(), os.device_encoding( sys.stdout.fileno() ), file=sys.stderr )" >test_file.txt
    |False None

    The user has redirected stdout to a file, and now Python sees
    "None" as the encoding of the standard output device.

    Python might not know the encoding of the standard output
    devive now. It might fall back to the preferred encoding
    of the current locale now.

    |C:\Users\s>py -c "import locale, sys; print( locale.getpreferredencoding( False ), file=sys.stderr )" >test_file.txt
    |cp1252

    (The preferred encoding of the current locale might depend
    upon current settings of the Microsoft® Windows operating
    system installation, so the behavior of those Python programs
    might not always be the same.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Barry@21:1/5 to All on Sun Nov 13 15:37:39 2022
    On 13 Nov 2022, at 14:52, Jessica Smith <12jessicasmith34@gmail.com> wrote:

    Consider the following code ran in Powershell or cmd.exe:

    $ python -c "print('└')"


    $ python -c "print('└')" > test_file.txt
    Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in position 0: character maps to <undefined>

    Is this a known limitation of Windows + Unicode? I understand that
    using -x utf8 would fix this, or modifying various environment
    variables. But is this expected for a standard Python installation on Windows?

    Your other thread has a reply that explained this.
    It is a problem with windows and character sets.
    You have to set things up to allow Unicode to work.

    Barry


    Jessica
    --
    https://mail.python.org/mailman/listinfo/python-list

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Passin@21:1/5 to Jessica Smith on Sun Nov 13 10:45:57 2022
    On 11/13/2022 9:49 AM, Jessica Smith wrote:
    Consider the following code ran in Powershell or cmd.exe:

    $ python -c "print('└')"


    $ python -c "print('└')" > test_file.txt
    Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in position 0: character maps to <undefined>

    Is this a known limitation of Windows + Unicode? I understand that
    using -x utf8 would fix this, or modifying various environment
    variables. But is this expected for a standard Python installation on Windows?

    Jessica


    This also fails with the same error:

    $ python -c "print('└')" |clip

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Stefan Ram on Sun Nov 13 16:50:45 2022
    ram@zedat.fu-berlin.de (Stefan Ram) writes:
    |C:\Users\s>py -c "import locale, sys; print( locale.getpreferredencoding( False ), file=sys.stderr )" >test_file.txt
    |cp1252

    It seems that only in some newer Windows versions one can
    change this setting from the Microsoft® Windows operating
    system's "Control Panel" to "UTF-8".

    To "repair" this for a specific Python program, one can
    monkey patch some method like locale.getpreferredencoding
    to return "utf-8" at the start of the program.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eryk Sun@21:1/5 to Jessica Smith on Sun Nov 13 16:39:55 2022
    On 11/13/22, Jessica Smith <12jessicasmith34@gmail.com> wrote:
    Consider the following code ran in Powershell or cmd.exe:

    $ python -c "print('└')"


    $ python -c "print('└')" > test_file.txt
    Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in position 0: character maps to <undefined>

    If your applications and existing data files are compatible with using
    UTF-8, then in Windows 10+ you can modify the administrative regional
    settings in the control panel to force using UTF-8. In this case,
    GetACP() and GetOEMCP() will return CP_UTF8 (65001), and the reserved
    code page constants CP_ACP (0), CP_OEMCP (1), CP_MACCP (2), and
    CP_THREAD_ACP (3) will use CP_UTF8.

    You can override this on a per-application basis via the
    ActiveCodePage setting in the manifest:

    https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activecodepage

    In Windows 10, this setting only supports "UTF-8". In Windows 11, it
    also supports "legacy" to allow old applications to run on a system
    that's configured to use UTF-8. Setting an explicit locale is also
    supported in Windows 11, such as "en-US", with fallback to UTF-8 if
    the given locale has no legacy code page.

    Note that setting the system to use UTF-8 also affects the host
    process for console sessions (i.e. conhost.exe or openconsole.exe),
    since it defaults to using the OEM code page (UTF-8 in this case). Unfortunately, a legacy read from the console host does not support
    reading non-ASCII text as UTF-8. For example:

    >>> os.read(0, 6)
    SPĀM
    b'SP\x00M\r\n'

    This is a trivial bug in the console host, which stems from the fact
    that UTF-8 is a multibyte encoding (1-4 bytes per code), but for some
    reason the console team at Microsoft still hasn't fixed it. You can
    use chcp.com to set the console's input and output code pages to
    something other than UTF-8 if you have to read non-ASCII input in a
    legacy console app. By default, this problem doesn't affect Python's
    sys.stdin, which internally uses wide-character ReadConsoleW() with
    the system's native text encoding, UTF-16LE.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)