• Combine multiple awk calls into one.

    From hongyi.zhao@gmail.com@21:1/5 to All on Wed Aug 2 06:19:59 2023
    Hi here,

    I've written the following awk script:

    --- begin here ---
    #!/bin/bash

    if [ -z "$1" ]; then
    echo "Usage: $0 <suffix>"
    exit 1
    fi

    suffix=$1


    awk 'BEGIN{i=0} /<dielectricfunction>/,\
    /<\/dielectricfunction>/ \
    {if ($1=="<r>") {a[i]=$2 ; b[i]=($3+$4+$5)/3 ; c[i]=$4 ; d[i]=$5 ; i=i+1}} \
    END{for (j=0;j<i/2;j++) print a[j],b[j],b[j+i/2]}' vasprun.xml > optics-$1.dat

    awk 'BEGIN{i=0} /<dielectricfunction comment="density-density">/,\
    /<\/dielectricfunction>/ \
    {if ($1=="<r>") {a[i]=$2 ; b[i]=($3+$4+$5)/3 ; c[i]=$4 ; d[i]=$5 ; i=i+1}} \
    END{for (j=0;j<i/2;j++) print a[j],b[j],b[j+i/2]}' vasprun.xml > optics-density-$1.dat

    awk 'BEGIN{i=0} /<dielectricfunction comment="current-current">/,\
    /<\/dielectricfunction>/ \
    {if ($1=="<r>") {a[i]=$2 ; b[i]=($3+$4+$5)/3 ; c[i]=$4 ; d[i]=$5 ; i=i+1}} \
    END{for (j=0;j<i/2;j++) print a[j],b[j],b[j+i/2]}' vasprun.xml > optics-current-$1.dat
    --- end here ---

    But the above
  • From Kees Nuyt@21:1/5 to hongyi.zhao@gmail.com on Wed Aug 2 17:58:33 2023
    On Wed, 2 Aug 2023 06:19:59 -0700 (PDT), "hongy...@gmail.com" <hongyi.zhao@gmail.com> wrote:

    1. It will generate empty out files.
    2. It uses 3 awk calls.

    1) Read the awk manual about redirection
    2) Rewrite the bash script with one awk invocation
    3) Pass the suffix as a variable to awk with
    -v suffix=$suffix
    4)
    BEGIN{} and the body are the same as your current awk scripts
    5) In one END{}, use the existing loop three times, with
    redirections to each of the files. One example:
    END{
    ...
    ...
    for (j=0;j<i/2;j++){
    print a[j],b[j],b[j+i/2]} >"optics-current-" suffix ".dat"
    }
    }
    --
    HTH
    Kees Nuyt

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From hongyi.zhao@gmail.com@21:1/5 to Kees Nuyt on Wed Aug 2 16:59:05 2023
    On Wednesday, August 2, 2023 at 11:58:51 PM UTC+8, Kees Nuyt wrote:
    On Wed, 2 Aug 2023 06:19:59 -0700 (PDT), "hongy...@gmail.com" <hongy...@gmail.com> wrote:

    1. It will generate empty out files.
    2. It uses 3 awk calls.
    1) Read the awk manual about redirection
    2) Rewrite the bash script with one awk invocation
    3) Pass the suffix as a variable to awk with
    -v suffix=$suffix
    4)
    BEGIN{} and the body are the same as your current awk scripts
    5) In one END{}, use the existing loop three times, with
    redirections to each of the files. One example:
    END{
    ...
    ...
    for (j=0;j<i/2;j++){
    print a[j],b[j],b[j+i/2]} >"optics-current-" suffix ".dat"

    But how can I know this is corresponding to the case of matched xml tag `<dielectricfunction comment="current-current">'?

    More specifically, I want to use the corresponding filename part based on the xml tag programmatically here.

    }
    }
    --
    HTH
    Kees Nuyt

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to hongy...@gmail.com on Thu Aug 3 10:24:48 2023
    On 03.08.2023 01:59, hongy...@gmail.com wrote:
    On Wednesday, August 2, 2023 at 11:58:51 PM UTC+8, Kees Nuyt wrote:
    On Wed, 2 Aug 2023 06:19:59 -0700 (PDT), "hongy...@gmail.com"
    <hongy...@gmail.com> wrote:

    1. It will generate empty out files.
    2. It uses 3 awk calls.
    1) Read the awk manual about redirection
    2) Rewrite the bash script with one awk invocation
    3) Pass the suffix as a variable to awk with
    -v suffix=$suffix
    4)
    BEGIN{} and the body are the same as your current awk scripts
    5) In one END{}, use the existing loop three times, with
    redirections to each of the files. One example:
    END{
    ...
    ...
    for (j=0;j<i/2;j++){
    print a[j],b[j],b[j+i/2]} >"optics-current-" suffix ".dat"

    But how can I know this is corresponding to the case of matched xml tag `<dielectricfunction comment="current-current">'?

    You can, for example, simply store the three data types in different
    array sets; a1[], b1[], ..., a2[], b2[], ..., a3[], b3[], ...
    (and then don't forget to use own index variables i1, i2, i3).

    (Also note that the /<dielectricfunction>/ data contains also the other
    two data sets; in case that it's not intentional.)


    More specifically, I want to use the corresponding filename part based on the xml tag programmatically here.

    This has already been answered by Kees; pass the suffix to awk...

    awk -v suffix="$1" '
    ...
    print a[j],b[j],b[j+i/2]} >"optics-current-" suffix ".dat"
    ...
    '

    Janis


    }
    }
    --
    HTH
    Kees Nuyt

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From hongyi.zhao@gmail.com@21:1/5 to Janis Papanagnou on Thu Aug 3 06:43:20 2023
    On Thursday, August 3, 2023 at 4:24:55 PM UTC+8, Janis Papanagnou wrote:
    On 03.08.2023 01:59, hongy...@gmail.com wrote:
    On Wednesday, August 2, 2023 at 11:58:51 PM UTC+8, Kees Nuyt wrote:
    On Wed, 2 Aug 2023 06:19:59 -0700 (PDT), "hongy...@gmail.com"
    <hongy...@gmail.com> wrote:

    1. It will generate empty out files.
    2. It uses 3 awk calls.
    1) Read the awk manual about redirection
    2) Rewrite the bash script with one awk invocation
    3) Pass the suffix as a variable to awk with
    -v suffix=$suffix
    4)
    BEGIN{} and the body are the same as your current awk scripts
    5) In one END{}, use the existing loop three times, with
    redirections to each of the files. One example:
    END{
    ...
    ...
    for (j=0;j<i/2;j++){
    print a[j],b[j],b[j+i/2]} >"optics-current-" suffix ".dat"

    But how can I know this is corresponding to the case of matched xml tag `<dielectricfunction comment="current-current">'?
    You can, for example, simply store the three data types in different
    array sets; a1[], b1[], ..., a2[], b2[], ..., a3[], b3[], ...
    (and then don't forget to use own index variables i1, i2, i3).

    (Also note that the /<dielectricfunction>/ data contains also the other
    two data sets; in case that it's not intentional.)

    More specifically, I want to use the corresponding filename part based on the xml tag programmatically here.
    This has already been answered by Kees; pass the suffix to awk...

    awk -v suffix="$1" '
    ...
    print a[j],b[j],b[j+i/2]} >"optics-current-" suffix ".dat"
    ...
    '

    Do you mean something as follows?

    #!/bin/bash

    if [ -z "$1" ]; then
    echo "Usage: $0 <suffix>"
    exit 1
    fi

    suffix=$1

    awk -v suffix="$suffix" '
    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    if ($1 == "<r>") {
    a[i] = $2
    b[i] = ($3 + $4 + $5) / 3
    c[i] = $4
    d[i] = $5
    i++
    }
    }
    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ {
    if ($1 == "<r>") {
    a1[i1] = $2
    b1[i1] = ($3 + $4 + $5) / 3
    c1[i1] = $4
    d1[i1] = $5
    i1++
    }
    }
    /<dielectricfunction comment="current-current">/, /<\/dielectricfunction>/ {
    if ($1 == "<r>") {
    a2[i2] = $2
    b2[i2] = ($3 + $4 + $5) / 3
    c2[i2] = $4
    d2[i2] = $5
    i2++
    }
    }
    END {
    for (j = 0; j < i / 2; j++) {
    print a[j], b[j], b[j + i / 2] > "optics-" suffix ".dat"
    }
    for (j = 0; j < i1 / 2; j++) {
    print a1[j], b1[j], b1[j + i1 / 2] > "optics-density-" suffix ".dat"
    }
    for (j = 0; j < i2 / 2; j++) {
    print a2[j], b2[j], b2[j + i2 / 2] > "optics-current-" suffix ".dat"
    }
    }
    ' vasprun.xml


    Janis

    Regards,
    Zhao


    }
    }
    --
    HTH
    Kees Nuyt

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to hongy...@gmail.com on Thu Aug 3 17:44:04 2023
    On 03.08.2023 15:43, hongy...@gmail.com wrote:
    Do you mean something as follows?

    #!/bin/bash
    [...]


    At a quick first glance I'd say yes, something like that. Does it do
    the job as you expect it? - If it does then I'd also consider to use
    a function for the whole if-statement and pass the array as argument
    (and of course you'd also need to handle the indexes differently).
    Maybe something like...

    function assign (a, b, c, d, i)
    {
    i++
    if ($1 == "<r>") {
    a[i] = $2
    b[i] = ($3 + $4 + $5) / 3
    c[i] = $4
    d[i] = $5
    }
    return i
    }

    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    i = assign(a, b, c, d, i)
    }

    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ {
    i1 = assign(a1, b1, c1, d1, i1)
    }

    ...etc...

    Note: Arrays are passed by reference but scalars not, so you must pass
    the index value and return it.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Janis Papanagnou on Thu Aug 3 18:04:07 2023
    On 03.08.2023 17:44, Janis Papanagnou wrote:
    On 03.08.2023 15:43, hongy...@gmail.com wrote:
    Do you mean something as follows?

    #!/bin/bash
    [...]


    At a quick first glance I'd say yes, something like that. Does it do
    the job as you expect it? - If it does then I'd also consider to use
    a function for the whole if-statement and pass the array as argument
    (and of course you'd also need to handle the indexes differently).
    Maybe something like...

    function assign (a, b, c, d, i)
    {

    Argh! - I just noticed you start by 0, so remove that i++ here and...

    i++
    if ($1 == "<r>") {
    a[i] = $2
    b[i] = ($3 + $4 + $5) / 3
    c[i] = $4
    d[i] = $5
    }
    return i

    ...add the '++' here...

    return ++i

    }

    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    i = assign(a, b, c, d, i)
    }

    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ {
    i1 = assign(a1, b1, c1, d1, i1)
    }

    ...etc...

    Note: Arrays are passed by reference but scalars not, so you must pass
    the index value and return it.

    Janis


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From hongyi.zhao@gmail.com@21:1/5 to Janis Papanagnou on Thu Aug 3 19:53:29 2023
    On Friday, August 4, 2023 at 12:04:14 AM UTC+8, Janis Papanagnou wrote:
    On 03.08.2023 17:44, Janis Papanagnou wrote:
    On 03.08.2023 15:43, hongy...@gmail.com wrote:
    Do you mean something as follows?

    #!/bin/bash
    [...]


    At a quick first glance I'd say yes, something like that. Does it do
    the job as you expect it? - If it does then I'd also consider to use
    a function for the whole if-statement and pass the array as argument
    (and of course you'd also need to handle the indexes differently).
    Maybe something like...

    function assign (a, b, c, d, i)
    {
    Argh! - I just noticed you start by 0, so remove that i++ here and...
    i++
    if ($1 == "<r>") {
    a[i] = $2
    b[i] = ($3 + $4 + $5) / 3
    c[i] = $4
    d[i] = $5
    }
    return i
    ...add the '++' here...

    return ++i
    }

    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    i = assign(a, b, c, d, i)
    }

    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ {
    i1 = assign(a1, b1, c1, d1, i1)
    }

    ...etc...

    I ran the following tests, but it seems that neither my nor your approach works perfectly:

    $ cat test.xml
    <dielectricfunction comment="density-density">
    <r> 0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 </r>
    <r> 0.0200 0.0030 0.0030 0.0030 0.0000 0.0000 0.0000 </r>
    <r> 0.0400 0.0059 0.0059 0.0059 0.0000 0.0000 0.0000 </r>
    <r> 0.0601 0.0089 0.0089 0.0089 0.0000 0.0000 0.0000 </r>
    <r> 0.0801 0.0119 0.0119 0.0119 0.0000 0.0000 0.0000 </r>
    <r> 0.1001 0.0149 0.0149 0.0149 0.0000 0.0000 0.0000 </r>
    </dielectricfunction>

    <dielectricfunction comment="current-current">
    <r> 0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 </r>
    <r> 0.0200 0.0030 0.0030 0.0030 0.0000 0.0000 0.0000 </r>
    <r> 0.0400 0.0059 0.0059 0.0059 0.0000 0.0000 0.0000 </r>
    <r> 0.0601 0.0089 0.0089 0.0089 0.0000 0.0000 0.0000 </r>
    <r> 0.0801 0.0119 0.0119 0.0119 0.0000 0.0000 0.0000 </r>
    <r> 0.1001 0.0149 0.0149 0.0149 0.0000 0.0000 0.0000 </r>
    </dielectricfunction>

    <dielectricfunction>
    <r> 0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 </r>
    <r> 0.0200 0.0030 0.0030 0.0030 0.0000 0.0000 0.0000 </r>
    <r> 0.0400 0.0059 0.0059 0.0059 0.0000 0.0000 0.0000 </r>
    <r> 0.0601 0.0089 0.0089 0.0089 0.0000 0.0000 0.0000 </r>
    <r> 0.0801 0.0119 0.0119 0.0119 0.0000 0.0000 0.0000 </r>
    <r> 0.1001 0.0149 0.0149 0.0149 0.0000 0.0000 0.0000 </r>
    </dielectricfunction>

    $ cat awk.1
    #!/bin/bash

    if [ -z "$1" ]; then
    echo "Usage: $0 <suffix>"
    exit 1
    fi

    suffix=$1

    awk -v suffix="$suffix" '
    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    if ($1 == "<r>") {
    a[i] = $2
    b[i] = ($3 + $4 + $5) / 3
    c[i] = $4
    d[i] = $5
    i++
    }
    }
    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ {
    if ($1 == "<r>") {
    a1[i1] = $2
    b1[i1] = ($3 + $4 + $5) / 3
    c1[i1] = $4
    d1[i1] = $5
    i1++
    }
    }
    /<dielectricfunction comment="current-current">/, /<\/dielectricfunction>/ {
    if ($1 == "<r>") {
    a2[i2] = $2
    b2[i2] = ($3 + $4 + $5) / 3
    c2[i2] = $4
    d2[i2] = $5
    i2++
    }
    }
    END {
    for (j = 0; j < i / 2; j++) {
    print a[j], b[j], b[j + i / 2] > "optics-" suffix ".dat"
    }
    for (j = 0; j < i1 / 2; j++) {
    print a1[j], b1[j], b1[j + i1 / 2] > "optics-density-" suffix ".dat"
    }
    for (j = 0; j < i2 / 2; j++) {
    print a2[j], b2[j], b2[j + i2 / 2] > "optics-current-" suffix ".dat"
    }
    }
    ' test.xml


    $ cat awk.2
    #!/bin/bash

    if [ -z "$1" ]; then
    echo "Usage: $0 <suffix>"
    exit 1
    fi

    suffix=$1

    awk -v suffix="$suffix" '

    function assign (a, b, c, d, i)
    {
    if ($1 == "<r>") {
    a[i] = $2
    b[i] = ($3 + $4 + $5) / 3
    c[i] = $4
    d[i] = $5
    }
    return i
    i++
    }

    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    i = assign(a, b, c, d, i)
    }

    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ {
    i1 = assign(a1, b1, c1, d1, i1)
    }

    /<dielectricfunction comment="current-current">/, /<\/dielectricfunction>/ {
    i1 = assign(a1, b1, c1, d1, i1)
    }

    END {
    for (j = 0; j < i / 2; j++) {
    print a[j], b[j], b[j + i / 2] > "optics-" suffix ".dat"
    }
    for (j = 0; j < i1 / 2; j++) {
    print a1[j], b1[j], b1[j + i1 / 2] > "optics-density-" suffix ".dat"
    }
    for (j = 0; j < i2 / 2; j++) {
    print a2[j], b2[j], b2[j + i2 / 2] > "optics-current-" suffix ".dat"
    }
    }
    ' test.xml

    The tests are as follows:

    First test my original script:

    $ bash awk.1 dft
    $ cat optics-dft.dat
    0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149
    $ cat optics-current-dft.dat
    0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149
    $ cat optics-density-dft.dat
    0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149

    As you can see, the first line in the result only has one column.

    Then I will test the version based on your suggested improvements:

    $ rm *dat
    $ bash awk.2 dft
    $ ls *dat
    ls: cannot access '*dat': No such file or directory

    As you can see, nothing is generated at all.


    Note: Arrays are passed by reference but scalars not, so you must pass
    the index value and return it.

    Janis

    Zhao

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From hongyi.zhao@gmail.com@21:1/5 to hongy...@gmail.com on Thu Aug 3 23:39:48 2023
    On Friday, August 4, 2023 at 10:53:33 AM UTC+8, hongy...@gmail.com wrote:
    On Friday, August 4, 2023 at 12:04:14 AM UTC+8, Janis Papanagnou wrote:
    On 03.08.2023 17:44, Janis Papanagnou wrote:
    On 03.08.2023 15:43, hongy...@gmail.com wrote:
    Do you mean something as follows?

    #!/bin/bash
    [...]


    At a quick first glance I'd say yes, something like that. Does it do
    the job as you expect it? - If it does then I'd also consider to use
    a function for the whole if-statement and pass the array as argument (and of course you'd also need to handle the indexes differently).
    Maybe something like...

    function assign (a, b, c, d, i)
    {
    Argh! - I just noticed you start by 0, so remove that i++ here and...
    i++
    if ($1 == "<r>") {
    a[i] = $2
    b[i] = ($3 + $4 + $5) / 3
    c[i] = $4
    d[i] = $5
    }
    return i
    ...add the '++' here...

    return ++i
    }

    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    i = assign(a, b, c, d, i)
    }

    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ {
    i1 = assign(a1, b1, c1, d1, i1)
    }

    ...etc...
    I ran the following tests, but it seems that neither my nor your approach works perfectly:

    $ cat test.xml
    <dielectricfunction comment="density-density">
    <r> 0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 </r>
    <r> 0.0200 0.0030 0.0030 0.0030 0.0000 0.0000 0.0000 </r>
    <r> 0.0400 0.0059 0.0059 0.0059 0.0000 0.0000 0.0000 </r>
    <r> 0.0601 0.0089 0.0089 0.0089 0.0000 0.0000 0.0000 </r>
    <r> 0.0801 0.0119 0.0119 0.0119 0.0000 0.0000 0.0000 </r>
    <r> 0.1001 0.0149 0.0149 0.0149 0.0000 0.0000 0.0000 </r>
    </dielectricfunction>

    <dielectricfunction comment="current-current">
    <r> 0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 </r>
    <r> 0.0200 0.0030 0.0030 0.0030 0.0000 0.0000 0.0000 </r>
    <r> 0.0400 0.0059 0.0059 0.0059 0.0000 0.0000 0.0000 </r>
    <r> 0.0601 0.0089 0.0089 0.0089 0.0000 0.0000 0.0000 </r>
    <r> 0.0801 0.0119 0.0119 0.0119 0.0000 0.0000 0.0000 </r>
    <r> 0.1001 0.0149 0.0149 0.0149 0.0000 0.0000 0.0000 </r>
    </dielectricfunction>

    <dielectricfunction>
    <r> 0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 </r>
    <r> 0.0200 0.0030 0.0030 0.0030 0.0000 0.0000 0.0000 </r>
    <r> 0.0400 0.0059 0.0059 0.0059 0.0000 0.0000 0.0000 </r>
    <r> 0.0601 0.0089 0.0089 0.0089 0.0000 0.0000 0.0000 </r>
    <r> 0.0801 0.0119 0.0119 0.0119 0.0000 0.0000 0.0000 </r>
    <r> 0.1001 0.0149 0.0149 0.0149 0.0000 0.0000 0.0000 </r>
    </dielectricfunction>

    $ cat awk.1
    #!/bin/bash

    if [ -z "$1" ]; then
    echo "Usage: $0 <suffix>"
    exit 1
    fi

    suffix=$1

    awk -v suffix="$suffix" '
    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    if ($1 == "<r>") {
    a[i] = $2
    b[i] = ($3 + $4 + $5) / 3
    c[i] = $4
    d[i] = $5
    i++
    }
    }
    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ { if ($1 == "<r>") {
    a1[i1] = $2
    b1[i1] = ($3 + $4 + $5) / 3
    c1[i1] = $4
    d1[i1] = $5
    i1++
    }
    }
    /<dielectricfunction comment="current-current">/, /<\/dielectricfunction>/ { if ($1 == "<r>") {
    a2[i2] = $2
    b2[i2] = ($3 + $4 + $5) / 3
    c2[i2] = $4
    d2[i2] = $5
    i2++
    }
    }
    END {
    for (j = 0; j < i / 2; j++) {
    print a[j], b[j], b[j + i / 2] > "optics-" suffix ".dat"
    }
    for (j = 0; j < i1 / 2; j++) {
    print a1[j], b1[j], b1[j + i1 / 2] > "optics-density-" suffix ".dat"
    }
    for (j = 0; j < i2 / 2; j++) {
    print a2[j], b2[j], b2[j + i2 / 2] > "optics-current-" suffix ".dat"
    }
    }
    ' test.xml


    $ cat awk.2
    #!/bin/bash

    if [ -z "$1" ]; then
    echo "Usage: $0 <suffix>"
    exit 1
    fi

    suffix=$1

    awk -v suffix="$suffix" '
    function assign (a, b, c, d, i)
    {
    if ($1 == "<r>") {
    a[i] = $2
    b[i] = ($3 + $4 + $5) / 3
    c[i] = $4
    d[i] = $5
    }
    return i
    i++
    }

    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    i = assign(a, b, c, d, i)
    }

    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ { i1 = assign(a1, b1, c1, d1, i1)
    }
    /<dielectricfunction comment="current-current">/, /<\/dielectricfunction>/ { i1 = assign(a1, b1, c1, d1, i1)
    }
    END {
    for (j = 0; j < i / 2; j++) {
    print a[j], b[j], b[j + i / 2] > "optics-" suffix ".dat"
    }
    for (j = 0; j < i1 / 2; j++) {
    print a1[j], b1[j], b1[j + i1 / 2] > "optics-density-" suffix ".dat"
    }
    for (j = 0; j < i2 / 2; j++) {
    print a2[j], b2[j], b2[j + i2 / 2] > "optics-current-" suffix ".dat"
    }
    }
    ' test.xml

    The tests are as follows:

    First test my original script:

    $ bash awk.1 dft
    $ cat optics-dft.dat
    0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149
    $ cat optics-current-dft.dat
    0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149
    $ cat optics-density-dft.dat
    0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149

    As you can see, the first line in the result only has one column.

    Then I will test the version based on your suggested improvements:

    $ rm *dat
    $ bash awk.2 dft
    $ ls *dat
    ls: cannot access '*dat': No such file or directory

    As you can see, nothing is generated at all.

    Change to the following does the trick:

    $ cat awk.1
    #!/bin/bash

    if [ -z "$1" ]; then
    echo "Usage: $0 <suffix>"
    exit 1
    fi

    suffix=$1

    awk -v suffix="$suffix" '
    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    if ($1 == "<r>") {
    i++
    a[i] = $2
    b[i] = ($3 + $4 + $5) / 3
    c[i] = $4
    d[i] = $5
    }
    }
    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ {
    if ($1 == "<r>") {
    i1++
    a1[i1] = $2
    b1[i1] = ($3 + $4 + $5) / 3
    c1[i1] = $4
    d1[i1] = $5
    }
    }
    /<dielectricfunction comment="current-current">/, /<\/dielectricfunction>/ {
    if ($1 == "<r>") {
    i2++
    a2[i2] = $2
    b2[i2] = ($3 + $4 + $5) / 3
    c2[i2] = $4
    d2[i2] = $5
    }
    }
    END {
    for (j = 1; j <= i / 2; j++) {
    print a[j], b[j], b[j + i / 2] > "optics-" suffix ".dat"
    }
    for (j = 1; j <= i1 / 2; j++) {
    print a1[j], b1[j], b1[j + i1 / 2] > "optics-density-" suffix ".dat"
    }
    for (j = 1; j <= i2 / 2; j++) {
    print a2[j], b2[j], b2[j + i2 / 2] > "optics-current-" suffix ".dat"
    }
    }
    ' test.xml


    $ cat awk.2
    #!/bin/bash

    if [ -z "$1" ]; then
    echo "Usage: $0 <suffix>"
    exit 1
    fi

    suffix=$1

    awk -v suffix="$suffix" '
    function assign (a, b, c, d, i) {
    if ($1 == "<r>") {
    i++
    a[i] = $2
    b[i] = ($3 + $4 + $5) / 3
    c[i] = $4
    d[i] = $5

    }
    return i
    }

    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    i = assign(a, b, c, d, i)
    }

    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ {
    i1 = assign(a1, b1, c1, d1, i1)
    }

    /<dielectricfunction comment="current-current">/, /<\/dielectricfunction>/ {
    i2 = assign(a2, b2, c2, d2, i2)
    }

    END {
    for (j = 1; j <= i / 2; j++) {
    print a[j], b[j], b[j + i / 2] > "optics-" suffix ".dat"
    }
    for (j = 1; j <= i1 / 2; j++) {
    print a1[j], b1[j], b1[j + i1 / 2] > "optics-density-" suffix ".dat"
    }
    for (j = 1; j <= i2 / 2; j++) {
    print a2[j], b2[j], b2[j + i2 / 2] > "optics-current-" suffix ".dat"
    }
    }
    ' test.xml


    Then, test as follows:

    werner@X10DAi:~/awk-vasprun$ ./awk.1 dft
    werner@X10DAi:~/awk-vasprun$ ls *dat | while read -r filename; do echo "File: $filename"; cat "$filename"; echo; done
    File: optics-current-dft.dat
    0.0000 0 0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149

    File: optics-density-dft.dat
    0.0000 0 0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149

    File: optics-dft.dat
    0.0000 0 0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149

    werner@X10DAi:~/awk-vasprun$ ./awk.2 dft
    werner@X10DAi:~/awk-vasprun$ ls *dat | while read -r filename; do echo "File: $filename"; cat "$filename"; echo; done
    File: optics-current-dft.dat
    0.0000 0 0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149

    File: optics-density-dft.dat
    0.0000 0 0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149

    File: optics-dft.dat
    0.0000 0 0.0089
    0.0200 0.003 0.0119
    0.0400 0.0059 0.0149

    Best,
    Zhao


    Note: Arrays are passed by reference but scalars not, so you must pass the index value and return it.

    Janis
    Zhao

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to hongy...@gmail.com on Fri Aug 4 12:55:05 2023
    On 04.08.2023 08:39, hongy...@gmail.com wrote:
    On Friday, August 4, 2023 at 10:53:33 AM UTC+8, hongy...@gmail.com wrote:
    I ran the following tests, but it seems that neither my nor your approach works perfectly:

    I merely tried to show you the direction. I haven't analyzed
    the code, just transcribed it a bit.

    return i
    i++

    (The increment will not be reached. I meant to use 'return ++i'.)


    Change to the following does the trick:
    [...]

    Fine. So I suppose it now works for you.

    if ($1 == "<r>") {
    i++

    Well spotted; the i++ must indeed be inside the 'if' block.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to janis_papanagnou+ng@hotmail.com on Fri Aug 4 11:14:57 2023
    In article <uailea$18aba$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 04.08.2023 08:39, hongy...@gmail.com wrote:
    On Friday, August 4, 2023 at 10:53:33AM UTC+8, hongy...@gmail.com wrote:
    I ran the following tests, but it seems that neither my nor your approach >works perfectly:

    I merely tried to show you the direction. I haven't analyzed
    the code, just transcribed it a bit.

    return i
    i++

    (The increment will not be reached. I meant to use 'return ++i'.)

    I note that your general approach is that you have multiple arrays, running multiple counters (i, i1, etc) - one counter for each array.

    In situations like this, I often use the trick of storing the counter in
    the zero element of each array. So, you end up with something like:

    A[++A[0]] = "something"

    Also, OP might consider using a single, multi-dimensional array, instead of multiple arrays (a,b,c,d). I often find this works better. Of course, you have to be running a version of AWK that supports real multi-dimensional
    arrays (i.e., TAWK or GAWK).

    I'd be willing to bet that OP either is or should be running GAWK.

    --
    I love the poorly educated.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Kenny McCormack on Fri Aug 4 13:37:43 2023
    On 04.08.2023 13:14, Kenny McCormack wrote:

    I note that your general approach is that you have multiple arrays, running multiple counters (i, i1, etc) - one counter for each array.

    (The OP's approach. I just transcribed the arrays from the individual
    awk instances into one awk instance.)


    In situations like this, I often use the trick of storing the counter in
    the zero element of each array. So, you end up with something like:

    A[++A[0]] = "something"

    Also, OP might consider using a single, multi-dimensional array, instead of multiple arrays (a,b,c,d). I often find this works better. Of course, you have to be running a version of AWK that supports real multi-dimensional arrays (i.e., TAWK or GAWK).

    Both of your ideas I considered - the first one in a slightly different
    way; yours is interesting because of the array-(index,data)-coupling -,
    but I abstained to suggest them for reasons of (presumed) simplicity.

    Good points anyway.


    I'd be willing to bet that OP either is or should be running GAWK.

    I suppose he does.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From hongyi.zhao@gmail.com@21:1/5 to Kenny McCormack on Fri Aug 4 06:04:34 2023
    On Friday, August 4, 2023 at 7:15:03 PM UTC+8, Kenny McCormack wrote:
    In article <uailea$18aba$1...@dont-email.me>,
    Janis Papanagnou <janis_pap...@hotmail.com> wrote:
    On 04.08.2023 08:39, hongy...@gmail.com wrote:
    On Friday, August 4, 2023 at 10:53:33AM UTC+8, hongy...@gmail.com wrote: >>> I ran the following tests, but it seems that neither my nor your approach
    works perfectly:

    I merely tried to show you the direction. I haven't analyzed
    the code, just transcribed it a bit.

    return i
    i++

    (The increment will not be reached. I meant to use 'return ++i'.)
    I note that your general approach is that you have multiple arrays, running multiple counters (i, i1, etc) - one counter for each array.

    In situations like this, I often use the trick of storing the counter in
    the zero element of each array. So, you end up with something like:

    A[++A[0]] = "something"

    Also, OP might consider using a single, multi-dimensional array, instead of multiple arrays (a,b,c,d). I often find this works better. Of course, you have to be running a version of AWK that supports real multi-dimensional arrays (i.e., TAWK or GAWK).

    Do you mean something like the following?

    #!/bin/bash

    if [ -z "$1" ]; then
    echo "Usage: $0 <suffix>"
    exit 1
    fi

    suffix=$1

    awk -v suffix="$suffix" '
    function assign(arr, idx, val1, val2, val3) {
    if ($1 == "<r>") {
    idx = ++arr[0]
    val1[idx] = $2
    val2[idx] = ($3 + $4 + $5) / 3
    val3[idx] = $5
    }
    return idx
    }

    function print_to_file(arr1, arr2, filename) {
    for (j = 1; j <= arr1[0] / 2; j++) {
    print arr1[j], arr2[j], arr2[j + arr1[0] / 2] > filename
    }
    }

    /<dielectricfunction>/, /<\/dielectricfunction>/ {
    i = assign(a, i, a, b, c, d)
    }

    /<dielectricfunction comment="density-density">/, /<\/dielectricfunction>/ {
    i1 = assign(a1, i1, a1, b1, c1, d1)
    }

    /<dielectricfunction comment="current-current">/, /<\/dielectricfunction>/ {
    i2 = assign(a2, i2, a2, b2, c2, d2)
    }

    END {
    print_to_file(a, b, "optics-" suffix ".dat")
    print_to_file(a1, b1, "optics-density-" suffix ".dat")
    print_to_file(a2, b2, "optics-current-" suffix ".dat")
    }
    ' test.xml

    I'd be willing to bet that OP either is or should be running GAWK.

    Yes. GAWK.

    Zhao
    --
    I love the poorly educated.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Kenny McCormack on Fri Aug 4 14:53:17 2023
    gazelle@shell.xmission.com (Kenny McCormack) writes:

    I note that your general approach is that you have multiple arrays, running multiple counters (i, i1, etc) - one counter for each array.

    In situations like this, I often use the trick of storing the counter in
    the zero element of each array. So, you end up with something like:

    A[++A[0]] = "something"

    This should not be necessary. Some languages allow

    A[] = "something"

    as a way to append to an array, but not AWK. One might think that

    A[length(A)] = "something"

    would do the trick, but calling length before using A as an array turns
    A into a scalar so you get a run-time error when A is indexed.

    You can, however, do this:

    BEGIN { delete A }
    ...
    A[length(A)] = "something"

    which is rather oblique, but works on all the awks I have installed
    (gawk, nawk, mawk and original-awk). (You can write 'delete A[0]' if
    you don't want to use the newer POSIX syntax.)

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From hongyi.zhao@gmail.com@21:1/5 to Ben Bacarisse on Fri Aug 4 22:06:11 2023
    On Friday, August 4, 2023 at 9:53:25 PM UTC+8, Ben Bacarisse wrote:
    gaz...@shell.xmission.com (Kenny McCormack) writes:

    I note that your general approach is that you have multiple arrays, running
    multiple counters (i, i1, etc) - one counter for each array.

    In situations like this, I often use the trick of storing the counter in the zero element of each array. So, you end up with something like:

    A[++A[0]] = "something"
    This should not be necessary. Some languages allow

    A[] = "something"

    as a way to append to an array, but not AWK. One might think that

    Your first say "but not AWK" here.

    A[length(A)] = "something"

    would do the trick, but calling length before using A as an array turns
    A into a scalar so you get a run-time error when A is indexed.

    You can, however, do this:

    BEGIN { delete A }
    ...
    A[length(A)] = "something"

    which is rather oblique, but works on all the awks I have installed
    (gawk, nawk, mawk and original-awk). (You can write 'delete A[0]' if
    you don't want to use the newer POSIX syntax.)

    Then you say your suggestion is for AWK here.

    So, I'm confused on what do you really mean.

    --
    Ben.

    Zhao

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to hongy...@gmail.com on Sat Aug 5 11:08:44 2023
    On 05.08.2023 07:06, hongy...@gmail.com wrote:
    On Friday, August 4, 2023 at 9:53:25 PM UTC+8, Ben Bacarisse wrote:
    gaz...@shell.xmission.com (Kenny McCormack) writes:

    I note that your general approach is that you have multiple arrays, running >>> multiple counters (i, i1, etc) - one counter for each array.

    In situations like this, I often use the trick of storing the counter in >>> the zero element of each array. So, you end up with something like:

    A[++A[0]] = "something"
    This should not be necessary. Some languages allow

    A[] = "something"

    as a way to append to an array, but not AWK. One might think that

    Your first say "but not AWK" here.

    A[length(A)] = "something"

    would do the trick, but calling length before using A as an array turns
    A into a scalar so you get a run-time error when A is indexed.

    You can, however, do this:

    BEGIN { delete A }
    ...
    A[length(A)] = "something"

    which is rather oblique, but works on all the awks I have installed
    (gawk, nawk, mawk and original-awk). (You can write 'delete A[0]' if
    you don't want to use the newer POSIX syntax.)

    Then you say your suggestion is for AWK here.

    So, I'm confused on what do you really mean.

    It's a standard code pattern that you find (e.g.) in C/C++ context
    (or other languages where you regularly organize your data in arrays
    and start indexes counting at 0) but in Awk the ad hoc form of that
    pattern is not possible because of its defaults with its variable
    types, where you initially need some way to "declare" a variable as
    array (to not get an error). So the code pattern is usable, but it
    needs - once (in the BEGIN section) - the array type coercion. The
    code pattern is generally usable (also in Awk). But note that in
    some languages it might be costly (e.g. in C) when the complexity
    of length() is not O(1) but O(N). In any case the code pattern makes
    explicit and separate index variables unnecessary.

    Janis


    --
    Ben.

    Zhao


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to ben.usenet@bsb.me.uk on Sat Aug 5 22:12:04 2023
    In article <87zg35rwo7.fsf@bsb.me.uk>,
    Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    ...
    Unfortunately I am confused about what is confusing you. I have
    obviously not been clear, but I don't know what I need to clarify.

    You're going on and on about a bunch of irrelevant stuff, that has nothing to do with anything in this thread.

    Neither OP nor me have any idea as to why you're doing this, unless you
    just got confused about which newsgroup and/or which thread you are posting
    to. I understand that people who use "Thunderbird" to read and post to
    Usenet often get confused in this way. Apparently, "Thunderbird" gives the user little information about where their text is going, be it Usenet,
    email, or whatever. Maybe this is your problem?

    (Yours in Christ...)

    --
    You are a dreadful man, Kenny, for all your ways are the ways of death.
    - Rick C Hodgin -

    (P.S. -> https://www.youtube.com/watch?v=sMmTkKz60W8)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to hongy...@gmail.com on Sat Aug 5 22:44:56 2023
    "hongy...@gmail.com" <hongyi.zhao@gmail.com> writes:

    On Friday, August 4, 2023 at 9:53:25 PM UTC+8, Ben Bacarisse wrote:
    gaz...@shell.xmission.com (Kenny McCormack) writes:

    I note that your general approach is that you have multiple arrays, running
    multiple counters (i, i1, etc) - one counter for each array.

    In situations like this, I often use the trick of storing the counter in >> > the zero element of each array. So, you end up with something like:

    A[++A[0]] = "something"
    This should not be necessary. Some languages allow

    A[] = "something"

    as a way to append to an array, but not AWK. One might think that

    Your first say "but not AWK" here.

    Yes. AWK does not have the above syntax. Several other scripting
    languages do, but not AWK.

    A[length(A)] = "something"

    would do the trick, but calling length before using A as an array turns
    A into a scalar so you get a run-time error when A is indexed.

    You can, however, do this:

    BEGIN { delete A }
    ...
    A[length(A)] = "something"

    which is rather oblique, but works on all the awks I have installed
    (gawk, nawk, mawk and original-awk). (You can write 'delete A[0]' if
    you don't want to use the newer POSIX syntax.)

    Then you say your suggestion is for AWK here.

    Yes. I gave a sketch of how to append to an array in AWK, given that
    AWK has no simple way to do it like some other scripting languages.

    So, I'm confused on what do you really mean.

    Unfortunately I am confused about what is confusing you. I have
    obviously not been clear, but I don't know what I need to clarify.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)