• Need help with custom string formatter

    From Robert Latest@21:1/5 to All on Fri Oct 21 12:34:16 2022
    Hi all,

    I would like to modify the standard str.format() in a way that when the
    input field is of type str, there is some character replacement, and the
    string gets padded or truncated to the given field width. Basically like
    this:

    fmt = MagicString('<{s:6}>')
    print(fmt.format(s='Äußerst'))

    Output:
    <Aeusse>

    I've written a function fix_format() which, given a string and a field width, does just that. However, I find myself unable to implement a Formatter that uses this function in the intened way. See the example below, I hope I sprinkled it with enough comments to make my intents clear. Thanks for any enlightenment. The interesting part starts somewhere in the middle.

    ### Self contained example
    import re
    from string import Formatter

    _replacements = [(re.compile(rx), repl) for rx, repl in (\
    ('Ä', 'Ae'),
    ('ä', 'ae'),
    ('Ö', 'Oe'),
    ('ö', 'oe'),
    ('Ü', 'Ue'),
    ('ü', 'ue'),
    ('ß', 'ss'))]

    def fix_format(text, width):

    # Seven regex passes seems awfully inefficient. I can't think of a
    # better way. Besides the point though.
    for rx, repl in _replacements:
    text = re.sub(rx, repl, text)

    # return truncated / padded version of string
    return text[:width] + ' ' * max(0, width - len(text))

    class Obj():
    """I'm just an object with some attributes"""
    def __init__(self, **kw):
    self.__dict__.update(kw)

    o = Obj(x="I am X, and I'm too long",
    y="ÄÖÜ Ich bin auch zu lang")
    z = 'Pad me!'

    format_spec = '<{o.x:6}>\n<{o.y:6}>\n<{z:10}>'

    # Standard string formatting
    print('Standard string formatting:')
    print(format_spec.format(o=o, z=z))

    # Demonstrate fix_format()
    print('\nWanted output:')
    print('<' + fix_format(o.x, 6) + '>')
    print('<' + fix_format(o.y, 6) + '>')
    print('<' + fix_format(z, 10) + '>')

    ##### This is where my struggle begins. #####

    class MagicString(Formatter):
    def __init__(self, format_spec):
    self.spec = format_spec
    super().__init__()

    def format(self, **kw):
    return(self.vformat(self.spec, [], kw))

    def get_field(self, name, a, kw):
    # Compound fields have a dot:
    obj_name, _, key = name.partition('.')
    obj = getattr(kw[obj_name], key) if key else kw[obj_name]
    if isinstance(obj, str):
    # Here I would like to call fix_format(), but I don't know where
    # to get the field width.
    print('get_field(): <' + obj + '>')
    else:
    # Here I'd like to use the "native" formatter of whatever type
    # the field is.
    pass
    return obj, key

    def get_value(self, key, a, kw):
    '''I don't understand what this method is for, it never gets called'''
    raise NotImplementedError

    fmt = MagicString(format_spec)
    print('\nReal output:')
    print(fmt.format(o=o, z=z))

    # Weirdly, somewhere on the way the standard formatting kicks in, too, as
    # the 'Pad me!' string does get padded (which must be some postprocessing,
    # as the string is still unpadded when passed into get_field())

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Robert Latest on Fri Oct 21 14:55:30 2022
    Robert Latest <boblatest@yahoo.com> writes:
    fmt = MagicString('<{s:6}>')
    print(fmt.format(s='Äußerst'))
    Output:
    <Aeusse>

    Here's a quick attempt:

    import re
    import string

    def formatted( value, format_string ):
    length = int( format_string )
    result = value.translate( { ord( 'Ä' ): 'Ae', ord( 'ß' ): 'ss', })
    result = result[ :length ]
    result += ' ' *( length - len( result ))
    return result

    class MagicString( string.Formatter ):
    def __init__( self ):
    super().__init__()
    def format_field( self, value, format_string ):
    if re.match( r'\d+', format_string )and type( value )== str:
    return formatted( value, format_string )
    else:
    return super().format_field( value, format_string )

    fmt = MagicString()
    print( fmt.format( '<{s:6}>', s='Äußerst' ))

    .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Latest@21:1/5 to All on Fri Oct 21 16:06:32 2022
    Stefan Ram wrote:

    [the solution]

    thanks, right on the spot. I had already figured out that format_field() is the one method I need, and thanks for the str.translate method. I knew that raking seven RE's across the same string HAD to be stupid.

    Have a nice weekend!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Robert Latest on Fri Oct 21 16:55:01 2022
    Robert Latest <boblatest@yahoo.com> writes:
    def __init__( self ):
    super().__init__()
    Isn't this a no-op? Probably a leftover from my stuff.

    I did not know whether this was required.

    def format_field( self, value, format_string ):
    if re.match( r'\d+', format_string )and type( value )== str:
    Why do you prefer re.match(r'\d+', x) over x.isdigit()?

    I was not aware of "isdigit".

    return super().format_field( value, format_string )
    Why do you prefer super().format_field() over plain format()? The doc says: >"format_field() simply calls format()." So I figured I might do the same.

    I am not aware of any reason not to call "format" directly.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Latest@21:1/5 to All on Fri Oct 21 16:37:23 2022
    Hi Stefan,

    I have now implemented a version of this, works nicely. I have a few minor questions / remarks:

    result += ' ' *( length - len( result ))

    Nice, I didn't know that one could multiply strings by negative numbers without error.

    def __init__( self ):
    super().__init__()

    Isn't this a no-op? Probably a leftover from my stuff.

    def format_field( self, value, format_string ):
    if re.match( r'\d+', format_string )and type( value )== str:

    Why do you prefer re.match(r'\d+', x) over x.isdigit()?

    return super().format_field( value, format_string )

    Why do you prefer super().format_field() over plain format()? The doc says: "format_field() simply calls format()." So I figured I might do the same.

    Thanks!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cameron Simpson@21:1/5 to Stefan Ram on Sat Oct 22 10:04:18 2022
    On 21Oct2022 16:55, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
    Robert Latest <boblatest@yahoo.com> writes:
    return super().format_field( value, format_string )
    Why do you prefer super().format_field() over plain format()? The doc says: >>"format_field() simply calls format()." So I figured I might do the same.

    I am not aware of any reason not to call "format" directly.

    Stefan's code implements it's own format_field and falls back to the
    original format_field(). That's standard subclassing practice, and worth
    doing reflexively more of the time - it avoids _knowing_ that
    format_field() just calls format().

    So I'd take Stefan's statement above to imply that calling format()
    directly should work.

    My own habit would be to stick with the original, if only for semantic reasons. Supposing you switched from subclassing Formatter to some other
    class which itself subclasses Formatter. (Yes, that is probably
    something you will never do.) Bypassing the call to
    super().format_field(...) prevents shoehorning some custom format_fields
    from the changed superclass. Which you'd then need to debug.

    That's all.

    Cheers,
    Cameron Simpson <cs@cskk.id.au>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cameron Simpson@21:1/5 to Stefan Ram on Sat Oct 22 10:05:11 2022
    On 21Oct2022 16:55, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
    I was not aware of "isdigit".

    There's also "isdecimal" and "isnumeric". They all have subtly different meanings :-)

    Cheers,
    Cameron Simpson <cs@cskk.id.au>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Latest@21:1/5 to Cameron Simpson on Sat Oct 22 14:10:49 2022
    Cameron Simpson wrote:
    Stefan's code implements it's own format_field and falls back to the
    original format_field(). That's standard subclassing practice, and worth doing reflexively more of the time - it avoids _knowing_ that
    format_field() just calls format().

    So I'd take Stefan's statement above to imply that calling format()
    directly should work.

    Yup, makes sense.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Robert Latest on Sat Nov 5 11:07:46 2022
    Robert Latest <boblatest@yahoo.com> writes:
    result += ' ' *( length - len( result ))
    Nice, I didn't know that one could multiply strings by negative numbers without
    error.

    Thanks, but today I thought that maybe there might
    be a solution for getting a field of a fixed length
    that is even shorter. It uses Python's string
    formatting, here in the form of an "f" string.

    main.py

    def f( result, length ):
    # get a field with the given length from the string "result"
    # - as used in postings from october
    result = result[ :length ]
    result += ' ' *( length - len( result ))
    return result

    def g( result, length ):
    # get a field with the given length from the string "result"
    # - new version using Python's formatting
    result = f"{result:{length}.{length}s}"
    return result

    # a small main program to try both the functions
    length = 10
    text_tuple =( 'abc', 'abcdefghij', 'abcdefghijklmnop' )
    for text in text_tuple:
    print( ":" + f( text, length )+ ":" )
    print( ":" + g( text, length )+ ":" )

    output

    :abc :
    :abc :
    :abcdefghij:
    :abcdefghij:
    :abcdefghij:
    :abcdefghij:

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MRAB@21:1/5 to Stefan Ram on Sat Nov 5 17:15:52 2022
    On 2022-11-05 11:07, Stefan Ram wrote:
    Robert Latest <boblatest@yahoo.com> writes:
    result += ' ' *( length - len( result ))
    Nice, I didn't know that one could multiply strings by negative numbers without
    error.

    Thanks, but today I thought that maybe there might
    be a solution for getting a field of a fixed length
    that is even shorter. It uses Python's string
    formatting, here in the form of an "f" string.

    main.py

    def f( result, length ):
    # get a field with the given length from the string "result"
    # - as used in postings from october
    result = result[ :length ]
    result += ' ' *( length - len( result ))

    That can be done more compactly as:

    result = result[ : length].ljust(length)

    return result

    [snip]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)