I know that's what awk does, but I don't think I would have expected
it if I didn't know about it.
$0 is the current input line.
If you don't change anything, or if you modify $0 itself, whitespace
betweeen fields is preserved.
If you modify any of the fields, $0 is recomputed and whitespace
between tokens is collapsed.
awk *could* have been defined to preserve inter-field whitespace even
when you modify individual fields,
and I think I would have found that more intuitive.
(And ideally there would be a way to refer to that inter-field
whitespace.)
The fact that modifying a field has the side effect of messing up $0
seems counterintuitive.
Perhaps the behavior matches your intuition better than it matches
mine.
(And perhaps this should be moved to comp.lang.awk if it doesn't die
out soon.
Though both sed and awk are both languages in their own right
and tools that can be used from the shell, so I'd argue there's a
topicality overlap.)
But awk doesn't work with fixed-width data. The length of each field,
and the length of $0, is variable.
If awk *purely* dealt with input lines only as lists of tokens, then
this:
echo 'one two three' | awk '{print $0}'
would print "one two three" rather than "one two three" (and awk would
lose the ability to deal with arbitrarily formatted input). The fact
that the inter-field whitespace is reset only when individual fields are touched feels arbitrary to me.
Not really. I'm just remarking on one particular awk feature that I
find a bit counterintuitive.
The original Awk doesn't support regular expressions, right?
Because regex was not yet talked about back then??
Awk without regexps makes little sense;
mind that the basic syntax of Awk programs is described as
pattern { action }
On 8/3/2024 10:46 pm, Janis Papanagnou wrote:
Stable Awk (1985) was released 1987. The (initial) old Awk (1977) was
released 1979. Before that tool we had Sed (1974), and before that we
had Ed and Grep (1973). My perception is that regexps were there as a
basic concept of UNIX in all these tools, so why should Awk be exempt.
According to the authors Awk was designed to see how Sed and Grep could
be generalized.
That part of history is beyond me. Sorry... my fault for not doing a check.
On 3/8/24 08:46, Janis Papanagnou wrote:
Awk without regexps makes little sense;
I think this comes down to what is a regular expression and what is not
a regular expression.
mind that the basic syntax of Awk programs is described as
pattern { action }
I'm guessing that 40-60% of the awk that I use doesn't use what I would consider to be regular expressions.
[...]
Maybe I have an imprecise understanding / definition.
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
On 2024-03-06, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head. You can't have an action without
enclosing braces. But it's still legal syntax because... it's an
expression serving as a pattern. The assignment itself is a side
effect.
On 2024-03-06, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head. You can't have an action without
enclosing braces. But it's still legal syntax because... it's an
expression serving as a pattern. The assignment itself is a side
effect.
Without braces, the default action takes place, which is ``{print}''.
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
we should encourage people to always write:
'{$2="1-1"} 1'
instead of:
$2="1-1"
On 2024-03-06, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
$ awk '{print $1, "1-1"}' newsrc-news.eternal-september.org-test >
newsrc-news.eternal-september.org
In this specific case of regular data you can simplify that to
awk '$2="1-1"' sourcefile > targetfile
That had me scratching my head.
You can't have an action without
enclosing braces. But it's still legal syntax because...
it's an expression serving as a pattern.
The assignment itself is a side effect.
Care needs to be taken when using this shortcut so the expression
doesn't evalute as false:
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=4'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=0'
$
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2="4"'
one 4
two 4
three 4
$ printf 'one 1\ntwo 2\nthree 3\n' | awk '$2=""'
$
On 09.03.2024 17:52, Ed Morton wrote:
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
we should encourage people to always write:
'{$2="1-1"} 1'
I don't recall such a "consensus". If you want to avoid cryptic code
you'd rather write
'{$2="1-1"; print}'
Don't you think?
Do Linux and Unix have a ONE AND ONLY ONE STANDARD regex library?
It seemed that tools and programming languages have their own
implementions, let alone different versions among them.
POSIX requires that awk uses extended regular expressions (i.e. the
same as regcomp() with REG_EXTENDED).
In article <tv26ck-3qt.ln1@ID-313840.user.individual.net>,
Geoff Clare <netnews@gclare.org.uk> wrote:
POSIX requires that awk uses extended regular expressions (i.e. the
same as regcomp() with REG_EXTENDED).
There is the additional requirement that \ inside [....] can
be used to escape characters,
On 3/9/2024 2:07 PM, Janis Papanagnou wrote:
On 09.03.2024 17:52, Ed Morton wrote:
About 20 or so years ago we had a discussion in this NG (which I'm not
going to search for now) and, shockingly, a consensus was reached that
we should encourage people to always write:
'{$2="1-1"} 1'
I don't recall such a "consensus".
I do, I have no reason to lie about it, but I can't be bothered
searching through 20-year-old usenet archives for it (I did take a very
quick shot at it but I don't even know how to write a good search for it
- you can't just google "awk '1'" and I'm not even sure if it was in comp.lang.awk or comp.unix.shell).
If you want to avoid cryptic code you'd rather write
'{$2="1-1"; print}'
Don't you think?
If I'm writing a multi-line script I use an explicit `print` but it just doesn't matter for a tiny one-line script like that.
Everyone using awk
needs to know the `1` idiom as it's so common and once you've seen it
once it's not hard to figure out what `{$2="1-1"} 1` does.
By changing `condition` to `{condition}1` we just add 3 chars to remove
the guesswork from anyone reading it in future and protect against unconsidered values so we don't just make it less cryptic but also less fragile.
For example, lets say someone wants to copy the $1 value into $3 and
print every line:
$ printf '1 2 3\n4 5 7\n' | awk '{$3=$1}1'
1 2 1
4 5 4
$ printf '1 2 3\n0 5 7\n' | awk '{$3=$1}1'
1 2 1
0 5 0
$ printf '1 2 3\n4 5 7\n' | awk '$3=$1'
1 2 1
4 5 4
$ printf '1 2 3\n0 5 7\n' | awk '$3=$1'
1 2 1
Note the 2nd line is undesirably (because I wrote the requirements)
missing from that last output.
It happens ALL the time that people don't consider all possible input
values so it's safer to just write the code that reflects your intent
and if you intend for every line to be printed then write code that will print every line.
Ed.
And of course add more measures in case the data is not as regular as
the sample data suggests. (See my other postings what may be defined
as data, line missing or spurious blanks in the data, comment lines
or empty lines that have to be preserved, etc.)
instead of:
$2="1-1"
Janis
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 52:32:35 |
Calls: | 10,397 |
Calls today: | 5 |
Files: | 14,067 |
Messages: | 6,417,384 |
Posted today: | 1 |