• [ANN] (g)awk-csvio simple awk library

    From Manuel Collado@21:1/5 to All on Thu Apr 27 13:13:30 2023
    "csvio.awk" is a pure awk library that provides CSV support for awk. It
    is available in two variants. "gawk-csvio" uses some specific gawk
    features. "awk-csvio" uses only POSIX awk features.

    They are available at http://mcollado.z15.es/xgawk/.

    Version 0.x.x is intended as a preliminary issue, mostly to get feedback
    from interested users. Suggestions, comments, bug reports, etc. are welcome.

    The goal of csvio is to process CSV records as if they were regular awk records, delimited by some FS/OFS or your choice.

    HTH. Enjoy.
    --
    Manuel Collado - http://mcollado.z15.es

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Manuel Collado on Thu Apr 27 14:20:49 2023
    Hi Manuel,

    thanks for this interesting information and for providing the library!

    Since I currently don't have any XML projects and don't know whether
    I'll find time for a thorough examination, one question ahead...

    Does that library work similarly/exactly like xgawk's implementation
    or are there some or substantial differences?

    I am asking because I basically considered the xgawk concept as very
    good, but I recall to have found (at that time) some strange behavior
    (with blanks and RS, IIRC, but I have only faint memories after these
    many year).

    Janis


    On 27.04.2023 13:13, Manuel Collado wrote:
    "csvio.awk" is a pure awk library that provides CSV support for awk. It
    is available in two variants. "gawk-csvio" uses some specific gawk
    features. "awk-csvio" uses only POSIX awk features.

    They are available at http://mcollado.z15.es/xgawk/.

    Version 0.x.x is intended as a preliminary issue, mostly to get feedback
    from interested users. Suggestions, comments, bug reports, etc. are
    welcome.

    The goal of csvio is to process CSV records as if they were regular awk records, delimited by some FS/OFS or your choice.

    HTH. Enjoy.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Manuel Collado@21:1/5 to All on Thu Apr 27 18:09:29 2023
    I'm afraid there is some misunderstanding. The announced csvio library
    supports CVS files, not XML ones.

    The xgawk name is just the name of the directory in my website. It is
    not related at all with the xgawk project.

    Perhaps I should rename the web directory to avoid confusions.

    Thanks.

    El 27/4/23 a las 14:20, Janis Papanagnou escribió:
    Hi Manuel,

    thanks for this interesting information and for providing the library!

    Since I currently don't have any XML projects and don't know whether
    I'll find time for a thorough examination, one question ahead...

    Does that library work similarly/exactly like xgawk's implementation
    or are there some or substantial differences?

    I am asking because I basically considered the xgawk concept as very
    good, but I recall to have found (at that time) some strange behavior
    (with blanks and RS, IIRC, but I have only faint memories after these
    many year).

    Janis


    On 27.04.2023 13:13, Manuel Collado wrote:
    "csvio.awk" is a pure awk library that provides CSV support for awk. It
    is available in two variants. "gawk-csvio" uses some specific gawk
    features. "awk-csvio" uses only POSIX awk features.

    They are available at http://mcollado.z15.es/xgawk/.

    Version 0.x.x is intended as a preliminary issue, mostly to get feedback
    from interested users. Suggestions, comments, bug reports, etc. are
    welcome.

    The goal of csvio is to process CSV records as if they were regular awk
    records, delimited by some FS/OFS or your choice.

    HTH. Enjoy.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Manuel Collado@21:1/5 to All on Thu Apr 27 18:24:12 2023
    Re-posted - URL changed.

    "csvio.awk" is a pure awk library that provides CSV support for awk. It
    is available in two variants. "gawk-csvio" uses some specific gawk
    features. "awk-csvio" uses only POSIX awk features.

    They are available at http://mcollado.z15.es/gawk-extras/.

    Version 0.x.x is intended as a preliminary issue, mostly to get feedback
    from interested users. Suggestions, comments, bug reports, etc. are welcome.

    The goal of csvio is to process CSV records as if they were regular awk records, delimited by some FS/OFS or your choice.

    HTH. Enjoy.
    --
    Manuel Collado - http://mcollado.z15.es

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Manuel Collado on Thu Apr 27 18:17:46 2023
    On 27.04.2023 18:09, Manuel Collado wrote:
    I'm afraid there is some misunderstanding. The announced csvio library supports CVS files, not XML ones.

    Oh! Yes. Sorry for my inattentiveness (and the posting-noise). My fault!

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J Naman@21:1/5 to Manuel Collado on Thu Apr 27 21:36:44 2023
    On Thursday, 27 April 2023 at 12:24:14 UTC-4, Manuel Collado wrote:
    Re-posted - URL changed.
    "csvio.awk" is a pure awk library that provides CSV support for awk. It
    is available in two variants. "gawk-csvio" uses some specific gawk
    features. "awk-csvio" uses only POSIX awk features.
    They are available at http://mcollado.z15.es/gawk-extras/.
    Version 0.x.x is intended as a preliminary issue, mostly to get feedback
    from interested users. Suggestions, comments, bug reports, etc. are welcome.

    The goal of csvio is to process CSV records as if they were regular awk records, delimited by some FS/OFS or your choice.

    HTH. Enjoy.
    --
    Manuel Collado - http://mcollado.z15.es
    Manuel, Thank you for the Library Code. I tested it on a number of different CSV and TSV files and see no errors or bugs.
    I would like to share something I learned the hard way: I use
    RS = @/[\n\f\r]+/ # \f weirdness in the middle of some Fidelity Brokerage Download CSVs AND LF only (no \r) in some TSVs, etc.
    I am not 100% sure, but I think you might add the # Strip unwanted CRs to include \f (unwanted!) also.
    'Best, John Naman

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Manuel Collado@21:1/5 to All on Fri Apr 28 11:05:57 2023
    El 28/4/23 a las 6:36, J Naman escribió:
    Manuel, Thank you for the Library Code. I tested it on a number of different CSV and TSV files and see no errors or bugs.
    I would like to share something I learned the hard way: I use
    RS = @/[\n\f\r]+/ # \f weirdness in the middle of some Fidelity Brokerage Download CSVs AND LF only (no \r) in some TSVs, etc.
    I am not 100% sure, but I think you might add the # Strip unwanted CRs to include \f (unwanted!) also.
    'Best, John Naman

    Thanks, a lot. Will implement your suggestion.

    Regards.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Horrocks@21:1/5 to Manuel Collado on Fri Apr 28 10:44:31 2023
    On 27/04/2023 17:24, Manuel Collado wrote:
    Re-posted - URL changed.

    "csvio.awk" is a pure awk library that provides CSV support for awk. It
    is available in two variants. "gawk-csvio" uses some specific gawk
    features. "awk-csvio" uses only POSIX awk features.

    They are available at http://mcollado.z15.es/gawk-extras/.

    Version 0.x.x is intended as a preliminary issue, mostly to get feedback
    from interested users. Suggestions, comments, bug reports, etc. are
    welcome.

    The goal of csvio is to process CSV records as if they were regular awk records, delimited by some FS/OFS or your choice.

    HTH. Enjoy.

    [re-send as first attempt has vanished]

    Thanks Manuel, very handy.

    Some thoughts based purely on a reading of your code (rather than actual testing), so this may be wrong, but I'm sure the regulars here will
    correct me if so. :-)

    1) The while loop in csvimport()

    You append to $0 each time through the loop. This will cause $1, $2 etc
    to be regenerated each time so it would be more efficient to assign to a temporary variable and then assign that to $0 once only after the loop completes. That way $1, $2 etc are only rebuilt once.

    2) The "getline more" statement in csvimport()

    A malformed file might end in the middle of a multi-line field. It looks
    as though your code doesn't detect this whereas it might be better to
    abort with an error.

    3) The for loop in csvimport()

    The statements

    for (k=1; k in af; k++) {
    fk = af[k]

    can be replaced with the single statement

    for (fk in af) {

    4) The for loop in csvimport()

    The line

    $0 = $0 ofs fk # Concatenate fields, delimited by OFS

    only works because you set FS=OFS='|' in your examples. If they are not
    the same then after $0 has been built the values of $1, $2 etc won't
    match the fields extracted.

    A simple solution would be to build $0 by setting $1, $2 etc using the statement

    $k = fk

    so that $1, $2 are set as expected and you don't have to worry about
    whatever value the user has used for FS.

    Making this change negates my point 3, of course. It also allows for the slightly esoteric scenario where the user might want to change the value
    of OFS mid-output.

    Hope this helps and comments, corrections and improvements from others
    welcome.

    --
    Bruce Horrocks
    Surrey, England

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Bruce Horrocks on Fri Apr 28 13:41:54 2023
    On 28.04.2023 11:44, Bruce Horrocks wrote:
    [...]

    3) The for loop in csvimport()

    The statements

    for (k=1; k in af; k++) {
    fk = af[k]

    can be replaced with the single statement

    for (fk in af) {


    I don't know the context, so I cannot tell whether a sequential
    indexed traversal (as in the first for-loop) is neccessary or not.
    The second for-loop wouldn't guarantee a traversal ordered by the
    numerical index.

    Janis.

    [...]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Horrocks@21:1/5 to Janis Papanagnou on Fri Apr 28 17:36:05 2023
    On 28/04/2023 12:41, Janis Papanagnou wrote:
    On 28.04.2023 11:44, Bruce Horrocks wrote:
    [...]

    3) The for loop in csvimport()

    The statements

    for (k=1; k in af; k++) {
    fk = af[k]

    can be replaced with the single statement

    for (fk in af) {


    I don't know the context, so I cannot tell whether a sequential
    indexed traversal (as in the first for-loop) is neccessary or not.
    The second for-loop wouldn't guarantee a traversal ordered by the
    numerical index.

    Good point. The traversal order is important - the code is linked to in
    the original post so you can see the context for yourself.

    --
    Bruce Horrocks
    Surrey, England

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)