• substring manipulation

    From Rainer Weikusat@21:1/5 to All on Sun Feb 26 22:23:27 2023
    Problem: I have a string of an unknown length accessible as $_ and need
    to collect a string whose length is stored in a variable. This length
    may be <, > or == the length of the current work string. The string or substring I need from the current work string has to be removed from it
    and its actual length subtracted from the length in the variable.

    Initially, I tried to do this with ($c_sz being the length variable)

    s/^(.{1,$c_sz})//s;
    $$self[BODY] .= $1;
    $c_sz -= length($1);

    Unfortunately, this doesn't work because regex quantfiers are rather
    annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
    represented as positve, signed 16-bit integer - 1, ie, 32766.

    OTOH, the substr-operator returns a so-called lvalue which means that
    the following works:

    for (substr($_, 0, $c_sz)) {
    $$self[BODY] .= $_;
    $c_sz -= length();

    $_ = '';
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eric Pozharski@21:1/5 to Weikusat on Mon Feb 27 08:44:50 2023
    with <87o7pg3w2o.fsf@doppelsaurus.mobileactivedefense.com> Rainer
    Weikusat wrote:

    *SKIP*
    s/^(.{1,$c_sz})//s;
    $$self[BODY] .= $1;
    $c_sz -= length($1);

    Unfortunately, this doesn't work because regex quantfiers are rather annoyingly (It's 2023, folks. RAM is cheap) limited to what can be represented as positve, signed 16-bit integer - 1, ie, 32766.

    Your perl is 32bit. Get over it.

    % perl -wle '
    $aa = "x" x 90_000;
    $aa =~ m[(.{1,80000})];
    print length $1 '
    Quantifier in {,} bigger than 65534 in regex; marked by <-- HERE
    in m/(.{1,80000 <-- HERE })/ at -e line 3.

    OTOH, the substr-operator returns a so-called lvalue which means that
    the following works:

    for (substr($_, 0, $c_sz)) {
    $$self[BODY] .= $_;
    $c_sz -= length();
    $_ = '';
    }

    It's a pity that pseudo-looping is the only way to get aliasing. This
    is as ugly:

    % perl -wle '
    $_ = "kflv-kvla-oprt";
    ( $ab, substr( $_, 0, 5 ) ) = ( substr( $_, 0, 5 ), "" );
    print "($ab)";
    print "($_)" '
    (kflv-)
    (kvla-oprt)

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Rainer Weikusat on Mon Feb 27 11:58:47 2023
    Rainer Weikusat <rweikusat@talktalk.net> writes:

    Problem: I have a string of an unknown length accessible as $_ and need
    to collect a string whose length is stored in a variable. This length
    may be <, > or == the length of the current work string. The string or substring I need from the current work string has to be removed from it
    and its actual length subtracted from the length in the variable.

    Initially, I tried to do this with ($c_sz being the length variable)

    s/^(.{1,$c_sz})//s;
    $$self[BODY] .= $1;
    $c_sz -= length($1);

    Unfortunately, this doesn't work because regex quantfiers are rather annoyingly (It's 2023, folks. RAM is cheap) limited to what can be represented as positve, signed 16-bit integer - 1, ie, 32766.

    This should work:

    s/^(${\(".?" x $c_sz)})//s;

    but I suggest it only as a Perl joke!

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rainer Weikusat@21:1/5 to Eric Pozharski on Mon Feb 27 16:13:43 2023
    Eric Pozharski <whynot@pozharski.name> writes:
    with <87o7pg3w2o.fsf@doppelsaurus.mobileactivedefense.com> Rainer
    Weikusat wrote:

    *SKIP*
    s/^(.{1,$c_sz})//s;
    $$self[BODY] .= $1;
    $c_sz -= length($1);

    Unfortunately, this doesn't work because regex quantfiers are rather
    annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
    represented as positve, signed 16-bit integer - 1, ie, 32766.

    Your perl is 32bit. Get over it.

    It's not.

    rw@brushfire:~/work/mad-http$ perl -e 'print ((1 << 63) + 15, "\n")' 9223372036854775823

    This is just an abitrary limit compiled into it.


    [...]

    OTOH, the substr-operator returns a so-called lvalue which means that
    the following works:

    for (substr($_, 0, $c_sz)) {
    $$self[BODY] .= $_;
    $c_sz -= length();
    $_ = '';
    }

    It's a pity that pseudo-looping is the only way to get aliasing.

    Not really. This works as well:

    --------
    my $a = 'abcdefghijklmnopqrstuvwxyz';
    my $b;

    {
    local *_ = \substr($a, 0, 5);
    $b = $_;
    $_ = 'emilia';
    }

    print("$a\n$b\n");
    -------

    It's also not really pseudo-anything

    for (<list>) {
    <stmt>;
    <stmt>;
    <stmt>;
    }

    aliases $_ to each element of the list and then executes whatever is in
    the associated block. A list with one element is as good a list as any
    other.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rainer Weikusat@21:1/5 to Ben Bacarisse on Mon Feb 27 15:59:42 2023
    Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
    Rainer Weikusat <rweikusat@talktalk.net> writes:

    Problem: I have a string of an unknown length accessible as $_ and need
    to collect a string whose length is stored in a variable. This length
    may be <, > or == the length of the current work string. The string or
    substring I need from the current work string has to be removed from it
    and its actual length subtracted from the length in the variable.

    Initially, I tried to do this with ($c_sz being the length variable)

    s/^(.{1,$c_sz})//s;
    $$self[BODY] .= $1;
    $c_sz -= length($1);

    Unfortunately, this doesn't work because regex quantfiers are rather
    annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
    represented as positve, signed 16-bit integer - 1, ie, 32766.

    This should work:

    s/^(${\(".?" x $c_sz)})//s;

    but I suggest it only as a Perl joke!

    s/^((??{".?" x $c_sz}))//s

    also works as a less contorted way of creating a pattern at match time
    without storing it in an intermediate variable.

    In case someone also wants to know and doesn't want to pick it appart
    himself:

    ".?" x $c_sz

    is an expression returning a string of $c_sz non-greey any character
    matchers.

    \(".?" x $c_z)

    creates a reference to a read-only scalar whose value is the string
    returned by the expression in brackes.

    ${\(".?" x $c_sz)}

    dereferences this reference which yields the string. It's interpolated
    into the s/// before matching because ordinay string interpolation is
    done on the pattern part.

    While this was an entertaining puzzle, I don't think this kind of
    semantic terrorism has a place in actual code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eric Pozharski@21:1/5 to Weikusat on Wed Mar 1 09:13:02 2023
    with <87fsardr2g.fsf@doppelsaurus.mobileactivedefense.com> Rainer
    Weikusat wrote:
    Eric Pozharski <whynot@pozharski.name> writes:
    with <87o7pg3w2o.fsf@doppelsaurus.mobileactivedefense.com> Rainer
    Weikusat wrote:

    *SKIP*
    Unfortunately, this doesn't work because regex quantfiers are rather
    annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
    represented as positve, signed 16-bit integer - 1, ie, 32766.
    Your perl is 32bit. Get over it.
    It's not.
    rw@brushfire:~/work/mad-http$ perl -e 'print ((1 << 63) + 15, "\n")' 9223372036854775823
    This is just an abitrary limit compiled into it.

    This implies that whoever built your perl has explicitly set this
    "arbitrary limit". Somehow I doubt it. May I see output of this:

    % perl -MConfig -wE 'say $Config{use64bitall} // "foo"'
    foo

    I haven't looked how upper limit on capture groups is set upon building.
    Should I?

    *SKIP*
    It's also not really pseudo-anything
    for (<list>) {
    <stmt>;
    <stmt>;
    <stmt>;
    }
    *SKIP*

    Well, how would you identify this construct then:

    for ( $aa=42 ) { $_*=2 }


    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rainer Weikusat@21:1/5 to Eric Pozharski on Wed Mar 1 15:18:49 2023
    Eric Pozharski <whynot@pozharski.name> writes:
    with <87fsardr2g.fsf@doppelsaurus.mobileactivedefense.com> Rainer
    Weikusat wrote:
    Eric Pozharski <whynot@pozharski.name> writes:
    with <87o7pg3w2o.fsf@doppelsaurus.mobileactivedefense.com> Rainer
    Weikusat wrote:

    *SKIP*
    Unfortunately, this doesn't work because regex quantfiers are rather
    annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
    represented as positve, signed 16-bit integer - 1, ie, 32766.
    Your perl is 32bit. Get over it.
    It's not.
    rw@brushfire:~/work/mad-http$ perl -e 'print ((1 << 63) + 15, "\n")'
    9223372036854775823
    This is just an abitrary limit compiled into it.

    This implies that whoever built your perl has explicitly set this
    "arbitrary limit". Somehow I doubt it.

    It's documented as such:

    n and m are limited to non-negative integral values less than a
    preset limit defined when perl is built. This is usually 32766
    on the most common platforms.

    [...]

    It's also not really pseudo-anything
    for (<list>) {
    <stmt>;
    <stmt>;
    <stmt>;
    }
    *SKIP*

    Well, how would you identify this construct then:

    for ( $aa=42 ) { $_*=2 }

    I already wrote that. foreach is something like mapc in lisp: It takes a
    block and a list as argument. It then aliases $_ to each element on the
    list in turn and executes the block once everytime a new list element
    has been aliased. A list of one element is just a list. Even a
    list of 0 elements could be used for something:

    ----------
    This is a funky comment for ();

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eric Pozharski@21:1/5 to Weikusat on Thu Mar 2 18:33:18 2023
    with <87y1ogfqjq.fsf@doppelsaurus.mobileactivedefense.com> Rainer
    Weikusat wrote:
    Eric Pozharski <whynot@pozharski.name> writes:
    with <87fsardr2g.fsf@doppelsaurus.mobileactivedefense.com> Rainer
    Weikusat wrote:
    Eric Pozharski <whynot@pozharski.name> writes:
    with <87o7pg3w2o.fsf@doppelsaurus.mobileactivedefense.com> Rainer
    Weikusat wrote:

    *SKIP*
    Unfortunately, this doesn't work because regex quantfiers are rather >>>>> annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
    represented as positve, signed 16-bit integer - 1, ie, 32766.
    Your perl is 32bit. Get over it.
    It's not.
    *SKIP*
    It's documented as such:

    n and m are limited to non-negative integral values less than a
    preset limit defined when perl is built. This is usually 32766
    on the most common platforms.

    Well, I've done some research, this is what v5.34.0 has to offer:

    *n* and *m* are limited to non-negative integral values less than a
    preset limit defined when perl is built. This is usually 65534 on the
    most common platforms.

    And in spite of being definetely 32bit

    % perl -MConfig -wE 'say $Config{archname} // "foo"'
    i586-linux-thread-multi

    it realy does that many

    % perl -wE '/.{200000}/'
    Quantifier in {,} bigger than 65534 in regex; marked by <-- HERE
    in m/.{200000 <-- HERE }/ at -e line 1.

    I stand corrected -- s/32bit/really old/

    It's also not really pseudo-anything
    for (<list>) {
    <stmt>;
    <stmt>;
    <stmt>;
    }
    Well, how would you identify this construct then:

    for ( $aa=42 ) { $_*=2 }
    I already wrote that. foreach is something like mapc in lisp: It takes
    a block and a list as argument. It then aliases $_ to each element on
    the list in turn and executes the block once everytime a new list
    element has been aliased. A list of one element is just a list. Even a
    list of 0 elements could be used for something:
    ----------
    This is a funky comment for ();

    Well. Turns out, looping over *explicit* one-element list has no
    any meaning. Thank you for this insight.

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)