I have a function that checks the occurrence of characters in a string:
cw-sort ()
{
echo $1 | sed 's/./&\n/g' | sort | uniq -ic | sort -rn
}
This is all fine and dandy as long as there are no funny characters:
$ cw-sort uyrwqoequwyroqiuwey8378429537uriewr
[...]
When you throw single quotes in the mix, you have to escape them in
double quotes:
$ cw-sort "uyrwqoequwyroqiuwey83'''''78429537uriewr"
[...]
Likewise, if you throw double quotes in the mix, you have to escape them
in single quotes:
$ cw-sort 'uyrwqoequwyroqiuwey83"""""""""""78429537uriewr'
[...]
But what if you have both single and double quotes?
$ cw-sort 'uyrwqoequwyroqiuwey83""""'''''78429537uriewr'
$
$ cw-sort "uyrwqoequwyroqiuwey83""""'''''78429537uriewr"
[...]
Wrapping the string in single quotes, makes the function choke. Wrapping
the string in double quotes makes the double quotes disappear from the
count.
How do I get out of this?
Hi,
I have a function that checks the occurrence of characters in a string:
cw-sort ()
{
echo $1 | sed 's/./&\n/g' | sort | uniq -ic | sort -rn
}
$ cw-sort "uyrwqoequwyroqiuwey83""""'''''78429537uriewr"
5 '
4 w
4 u
4 r
3 y
3 q
3 e
2 o
2 i
2 8
2 7
2 3
1 9
1 5
1 4
1 2
1
Wrapping the string in single quotes, makes the function choke. Wrapping
the string in double quotes makes the double quotes disappear from the count.
How do I get out of this?
How do I escape a string that contains both single and double quotes?
On 3/17/2022 5:16 AM, Ottavio Caruso wrote:
Hi,
I have a function that checks the occurrence of characters in a string:
cw-sort ()
{
     echo $1 | sed 's/./&\n/g' | sort | uniq -ic | sort -rn
}
Always quote your variables (see https://mywiki.wooledge.org/Quotes)
unless you have a specific need to not do so, which you don't, (try
without quotes if your input string was `*` or otherwise matched files
in your directory) and use printf instead of echo for robustness and portability . Also the above echo+sed will add a presumably undesirable newline to the list of chars seen by sort (note the standalone `1` at
the end of your posted output) so for portability and robustness the
above should really be:
   cw-sort () {
       printf '%s\n' "$1" | grep -o. | sort | uniq -ic | sort -rn
   }
If your grep version doesn't have a `-o` option then you could instead
use either of:
   printf ... | fold -w1 | sort ...
   printf ... | awk '{for (i=1;i<=length();i++) print substr($0,i,1)}'
| sort
AFAIK fold isn't a mandatory POSIX tool but awk is and so WILL be
available on your system.
<snip>
$ cw-sort "uyrwqoequwyroqiuwey83""""'''''78429537uriewr"
       5 '
       4 w
       4 u
       4 r
       3 y
       3 q
       3 e
       2 o
       2 i
       2 8
       2 7
       2 3
       1 9
       1 5
       1 4
       1 2
       1
Wrapping the string in single quotes, makes the function choke. Wrapping the string in double quotes makes the double quotes disappear from the count.
How do I get out of this?
In your mind split:
   uyrwqoequwyroqiuwey83""""'''''78429537uriewr
into segments with each type of quote:
   uyrwqoequwyroqiuwey83""""  '''''  78429537uriewr
then quote those segments as you usually do:
   'uyrwqoequwyroqiuwey83""""'  "'''''"  '78429537uriewr'
then glue them back together to get:
   'uyrwqoequwyroqiuwey83""""'"'''''"'78429537uriewr'
e.g.:
   $ cw-sort 'uyrwqoequwyroqiuwey83""""'"'''''"'78429537uriewr'
         5 '
         4 w
         4 u
         4 r
         4 "
         3 y
         3 q
         3 e
         2 o
         2 i
         2 8
         2 7
         2 3
         1 9
         1 5
         1 4
         1 2
Regards,
    Ed.
On 17.03.2022 15:06, Ed Morton wrote:
printf ... | fold -w1 | sort ...
[...]
AFAIK fold isn't a mandatory POSIX tool but awk is and so WILL be
available on your system.
The fold command is at least as old as UNIX Release 7, so it would
be surprising if it weren't specified in POSIX (where you can indeed
find it).
printf ... | fold -w1 | sort ...
[...]
AFAIK fold isn't a mandatory POSIX tool but awk is and so WILL be
available on your system.
On 3/17/2022 9:19 AM, Janis Papanagnou wrote:
On 17.03.2022 15:06, Ed Morton wrote:
printf ... | fold -w1 | sort ...
[...]
AFAIK fold isn't a mandatory POSIX tool but awk is and so WILL be
available on your system.
The fold command is at least as old as UNIX Release 7, so it would
be surprising if it weren't specified in POSIX (where you can indeed
find it).
It's specified by POSIX, I just don't know if it's *mandatory* by POSIX.
On 17.03.2022 15:58, Ed Morton wrote:
On 3/17/2022 9:19 AM, Janis Papanagnou wrote:
On 17.03.2022 15:06, Ed Morton wrote:
printf ... | fold -w1 | sort ...
[...]
AFAIK fold isn't a mandatory POSIX tool but awk is and so WILL be
available on your system.
The fold command is at least as old as UNIX Release 7, so it would
be surprising if it weren't specified in POSIX (where you can indeed
find it).
It's specified by POSIX, I just don't know if it's *mandatory* by POSIX.
And where would that distinction be made? (I can't see anything.) https://pubs.opengroup.org/onlinepubs/9699919799/utilities/fold.html
And what would be the rationale for not declaring such a basic tool
as mandatory that is around in the Unix toolbox for more than four
decades?
On 17.03.2022 15:06, Ed Morton wrote:
printf ... | fold -w1 | sort ...
[...]
AFAIK fold isn't a mandatory POSIX tool but awk is and so WILL be
available on your system.
The fold command is at least as old as UNIX Release 7, so it would
be surprising if it weren't specified in POSIX (where you can indeed
find it).
The fold command is at least as old as UNIX Release 7, so it would
be surprising if it weren't specified in POSIX (where you can indeed
find it).
Yes, I meant Version 7. The source I looked at is my first (German)
book about Unix; it is based on Version 7 and was published 1984.
There the fold command syntax is defined as fold [-width] {files}
Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
On 17.03.2022 15:06, Ed Morton wrote:
printf ... | fold -w1 | sort ...
[...]
AFAIK fold isn't a mandatory POSIX tool but awk is and so WILL be
available on your system.
The fold command is at least as old as UNIX Release 7, so it would
be surprising if it weren't specified in POSIX (where you can indeed
find it).
Is Release 7 the same as version 7 (i.e. the first widely distributed
version from the late 70s)? If so, I can't find any record of fold being part of it.
On Thu, 17 Mar 2022 19:28:52 -0400, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
Yes, I meant Version 7. The source I looked at is my first (German)
book about Unix; it is based on Version 7 and was published 1984.
There the fold command syntax is defined as fold [-width] {files}
https://www.gnu.org/software/coreutils/manual/coreutils.html#fold-invocation
The -width option is considered obsolete.
On Thu, 17 Mar 2022 19:28:52 -0400, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
Yes, I meant Version 7. The source I looked at is my first (German)
book about Unix; it is based on Version 7 and was published 1984.
There the fold command syntax is defined as fold [-width] {files}
https://www.gnu.org/software/coreutils/manual/coreutils.html#fold-invocation
The -width option is considered obsolete.
On Thu, 2022-03-17, Ottavio Caruso wrote:
Hi,
I have a function that checks the occurrence of characters in a string:
cw-sort ()
{
echo $1 | sed 's/./&\n/g' | sort | uniq -ic | sort -rn
}
(Nitpick: the name of the function and the description don't fit. The
sorting seems incidental rather than essential.)
This is all fine and dandy as long as there are no funny characters:
$ cw-sort uyrwqoequwyroqiuwey8378429537uriewr
I think your problem stems from putting data in variables instead of
in pipes/streams.
The very first thing your function does is (via
echo) transforming the input back to a stream, so the function itself
doesn't require that.
If your cw-sort function read from stdin instead, could you still fit
it into your workflow? I'm assuming you're normally calling it from
some larger script.
/Jorgen
Hi,
I have a function that checks the occurrence of characters in a string:
cw-sort ()
{
echo $1 | sed 's/./&\n/g' | sort | uniq -ic | sort -rn
}
This is all fine and dandy as long as there are no funny characters:
$ cw-sort uyrwqoequwyroqiuwey8378429537uriewr
If your cw-sort function read from stdin instead, could you still fit
it into your workflow? I'm assuming you're normally calling it from
some larger script.
Is Release 7 the same as version 7 (i.e. the first widely distributed
version from the late 70s)? If so, I can't find any record of fold being
part of it.
Yes, I meant Version 7. The source I looked at is my first (German)
book about Unix; it is based on Version 7 and was published 1984.
There the fold command syntax is defined as fold [-width] {files}
On 2022-03-17, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
Is Release 7 the same as version 7 (i.e. the first widely distributed
version from the late 70s)? If so, I can't find any record of fold being >>> part of it.
Yes, I meant Version 7. The source I looked at is my first (German)
book about Unix; it is based on Version 7 and was published 1984.
There the fold command syntax is defined as fold [-width] {files}
I don't think that book is a reliable source in this respect.
You can browse the source tree of UNIX V7 here: https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7
There is no fold(1) command.
fold.c did however ship with 1BSD. (It's in s6/cont.a.)
On 17.03.2022 18:33, Ben Bacarisse wrote:
Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
On 17.03.2022 15:06, Ed Morton wrote:
printf ... | fold -w1 | sort ...
[...]
AFAIK fold isn't a mandatory POSIX tool but awk is and so WILL be
available on your system.
The fold command is at least as old as UNIX Release 7, so it would
be surprising if it weren't specified in POSIX (where you can indeed
find it).
Is Release 7 the same as version 7 (i.e. the first widely distributed
version from the late 70s)? If so, I can't find any record of fold being
part of it.
Yes, I meant Version 7. The source I looked at is my first (German)
book about Unix; it is based on Version 7 and was published 1984.
There the fold command syntax is defined as fold [-width] {files}
if (($#))
Notes: For simplicity I used non-standard ((...)).
Janis Papanagnou <janis_papanagnou@hotmail.com>:
if (($#))
[…]
Notes: For simplicity I used non-standard ((...)).
If one wants to do it the standard way one can do
if ${1+:} false
On 2022-03-17, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
...
fold.c did however ship with 1BSD. (It's in s6/cont.a.)
fold.c did however ship with 1BSD. (It's in s6/cont.a.)
Is that the same fold command though?
There is no fold(1) command.
fold.c did however ship with 1BSD. (It's in s6/cont.a.)
When was that? (Must have been before 1984, I'd have to suppose.)
On 18/03/2022 08:28, Jorgen Grahn wrote:
If your cw-sort function read from stdin instead, could you still fit
it into your workflow? I'm assuming you're normally calling it from
some larger script.
No, I call it from the shell.
This is supposed to check the frequency of mistakes I make while
learning Morse code on lcwo.net.
[...]
If your grep version doesn't have a `-o` option then you could instead
use either of:
printf ... | fold -w1 | sort ...
printf ... | awk '{for (i=1;i<=length();i++) print substr($0,i,1)}'
| sort
I just noticed that my fold command doesn't respect the locale settings.
$ echo Säge | LC_ALL=de_DE.UTF-8 fold -w 1
S
�
g
e
$ echo Säge | LC_ALL=C fold -w 1
S
�
g
e
In a de_DE.UTF-8 locale the umlauts aren't handled properly, which makes
it pretty useless. :-(
Or is my fold version too old?
$ fold --version
fold (GNU coreutils) 8.13
On 17.03.2022 15:06, Ed Morton wrote:
[...]
If your grep version doesn't have a `-o` option then you could instead
use either of:
printf ... | fold -w1 | sort ...
I just noticed that my fold command doesn't respect the locale settings.
$ echo Säge | LC_ALL=de_DE.UTF-8 fold -w 1
S
�
g
e
$ echo Säge | LC_ALL=C fold -w 1
S
�
g
e
In a de_DE.UTF-8 locale the umlauts aren't handled properly, which makes
it pretty useless. :-(
Or is my fold version too old?
$ fold --version
fold (GNU coreutils) 8.13
printf ... | awk '{for (i=1;i<=length();i++) print substr($0,i,1)}'
| sort
BTW, with GNU awk, setting FS="" you can use $i instead of substr(...).
Janis
On Sat, 19 Mar 2022 12:51:20 -0400, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
I just noticed that my fold command doesn't respect the locale settings.
$ echo Säge | LC_ALL=de_DE.UTF-8 fold -w 1
[...]
In a de_DE.UTF-8 locale the umlauts aren't handled properly, which makes
it pretty useless. :-(
Or is my fold version too old?
$ fold --version
fold (GNU coreutils) 8.13
$ echo Säge | LC_ALL=de_DE.UTF-8 fold -w 1
S
ä
g
e
$ fold --version
fold (GNU coreutils) 8.32
[...]
On 19.03.2022 18:56, David W. Hodgins wrote:
On Sat, 19 Mar 2022 12:51:20 -0400, Janis Papanagnou
<janis_papanagnou@hotmail.com> wrote:
I just noticed that my fold command doesn't respect the locale settings. >>>
$ echo Säge | LC_ALL=de_DE.UTF-8 fold -w 1
[...]
In a de_DE.UTF-8 locale the umlauts aren't handled properly, which makes >>> it pretty useless. :-(
Or is my fold version too old?
$ fold --version
fold (GNU coreutils) 8.13
$ echo Säge | LC_ALL=de_DE.UTF-8 fold -w 1
S
ä
g
e
$ fold --version
fold (GNU coreutils) 8.32
[...]
Ah, fine, so that seems to have gotten fixed. Thanks!
Janis
Hmm... I have 8.32 but
$ echo Säge | LC_ALL=en_GB.UTF-8 fold -w 1
S
�
�
g
e
$ echo Säge | LC_ALL=en_GB.UTF-8 wc -m
5
$
(That last one just to show that some utilities do get UTF-8 characters right).
I suspect it's due to a missing locale or (more likely) font package.
$ echo Säge | LC_ALL=en_GB.UTF-8 fold -w 1
S
ä
g
e
I don't see how it could be a font issue. The character displays fine,
and even if it did not, that should have no effect on whether fold
counts them correctly.
Looking at the source of fold in coreutils 8.32 I can't see any code
that could possibly count the UTF-8 encoding of ä as a single character.
Are you using a single-byte character encoding? (Setting
LC_ALL=en_GB.UTF-8 won't change that since fold appears to ignore the encoding.) What does echo Säge | hd show?
On Sun, 20 Mar 2022 13:37:41 -0400, David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:
I suspect it's due to a missing locale or (more likely) font package.
$ echo Säge | LC_ALL=en_GB.UTF-8 fold -w 1
S
ä
g
e
As it's bash in a terminal, is the terminus-font package installed?
On Sun, 20 Mar 2022 16:30:00 -0400, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
I don't see how it could be a font issue. The character displays fine,
and even if it did not, that should have no effect on whether fold
counts them correctly.
Looking at the source of fold in coreutils 8.32 I can't see any code
that could possibly count the UTF-8 encoding of ä as a single character.
Are you using a single-byte character encoding? (Setting
LC_ALL=en_GB.UTF-8 won't change that since fold appears to ignore the
encoding.) What does echo Säge | hd show?
$ man echo
[dave@x3 ~]$ echo -n Säge|hexdump -x
0000000 c353 67a4 0065
0000005
"David W. Hodgins" <dwhodgins@nomail.afraid.org> writes:
On Sun, 20 Mar 2022 16:30:00 -0400, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
I don't see how it could be a font issue. The character displays fine,
and even if it did not, that should have no effect on whether fold
counts them correctly.
Looking at the source of fold in coreutils 8.32 I can't see any code
that could possibly count the UTF-8 encoding of ä as a single character. >>>
Are you using a single-byte character encoding? (Setting
LC_ALL=en_GB.UTF-8 won't change that since fold appears to ignore the
encoding.) What does echo Säge | hd show?
$ man echo
[dave@x3 ~]$ echo -n Säge|hexdump -x
0000000 c353 67a4 0065
0000005
Thanks for that. hexdump -c (or -b) makes UTF-8 a bit easier to read, especially when there's an odd number of bytes but I can see what's
going on.
c3 a4 is indeed the UTF-8 encoding of ä, so I am at a loss to see how
the source code for fold that I saw could produce the correct output!
Maybe I missed something.
I don't think I did miss anything but there is a bug report about this
that suggests that there is a fedora patch that fixes it. Your distro probably has that patch applied. That would be the simplest
explanation.
On Fri, 2022-03-18, Ottavio Caruso wrote:
On 18/03/2022 08:28, Jorgen Grahn wrote:
If your cw-sort function read from stdin instead, could you still fit
it into your workflow? I'm assuming you're normally calling it from
some larger script.
No, I call it from the shell.
This is supposed to check the frequency of mistakes I make while
learning Morse code on lcwo.net.
So, do you then paste the input from some web page? In that case I'd personally prefer:
% cw-sort
*paste*
<EOF>
to
% cw-sort *paste*
followed by manual escaping and quoting.
But oh, maybe that's what you were asking for: a simple one-size-
fits-all way you can do that without thinking. I have always assumed
there is no such thing, so I didn't even read the responses which went
into quoting.
If there was such a thing, I would consider using it.
/Jorgen
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 498 |
Nodes: | 16 (2 / 14) |
Uptime: | 40:24:10 |
Calls: | 9,799 |
Calls today: | 1 |
Files: | 13,751 |
Messages: | 6,189,462 |