Forum: >>> Magnum BBS <<<

Useful tools for shell scripting/text mangling (slightly OT)?

From Axel Reichert@21:1/5 to All on Thu Jun 8 11:34:27 2023

Hello,

normally in my shell scripts I use the classics, such as grep, sed, awk,
cut, sort, uniq, a bit of Perl etc.

I recently learned about the existence of "mlr"

https://miller.readthedocs.io/en/6.8.0/

, "datamash",

https://www.gnu.org/software/datamash/manual/datamash.html/

"jq",

https://jqlang.github.io/jq/

"q",

https://harelba.github.io/q/

and "sqlet"

https://www.sqlet.com/

which intrigued me and made me think about other tools that I
might be missing.

Any obvious candidates for text mangling?

Pointers much appreciated!

Thanks and best regards,

Axel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Axel Reichert on Thu Jun 8 19:30:42 2023

On 2023-06-08, Axel Reichert <mail@axel-reichert.de> wrote:

Hello,

normally in my shell scripts I use the classics, such as grep, sed, awk,
cut, sort, uniq, a bit of Perl etc.

I recently learned about the existence of "mlr"

https://miller.readthedocs.io/en/6.8.0/

, "datamash",

https://www.gnu.org/software/datamash/manual/datamash.html/

"jq",

https://jqlang.github.io/jq/

"q",

https://harelba.github.io/q/

and "sqlet"

https://www.sqlet.com/

which intrigued me and made me think about other tools that I
might be missing.

Any obvious candidates for text mangling?

"txr"

http://nongnu.org/txr

:)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Axel Reichert@21:1/5 to Kaz Kylheku on Sat Jun 10 22:27:30 2023

Kaz Kylheku <864-117-4973@kylheku.com> writes:

On 2023-06-08, Axel Reichert <mail@axel-reichert.de> wrote:

Any obvious candidates for text mangling?

"txr"

http://nongnu.org/txr

:)

It is (and has been for some time) on my radar for sure. I find the
blend with Lisp intruiging (you occasionally answered my stupid
questions next door in comp.lang.lisp), but the 900 pages of "man" deter
me.

Are there any other resources for learning, more along the lines of a
tutorial or a classic textbook?

Best regards

Axel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Axel Reichert on Sat Jun 10 22:56:26 2023

On 2023-06-10, Axel Reichert <mail@axel-reichert.de> wrote:

Kaz Kylheku <864-117-4973@kylheku.com> writes:

On 2023-06-08, Axel Reichert <mail@axel-reichert.de> wrote:

Any obvious candidates for text mangling?

"txr"

http://nongnu.org/txr

:)

It is (and has been for some time) on my radar for sure. I find the
blend with Lisp intruiging (you occasionally answered my stupid
questions next door in comp.lang.lisp), but the 900 pages of "man" deter
me.

Yes; noboid is going to read a huge man page from top to bottom; but
it's very useful to have it installed for a quick search.

To help with the manual, there is a HTML-ized version with internal
hyperlinks and a navigable, collapsible table of contents.

There is also a library function doc which fires off a request
to open a browser to a document section, by symbol. e.g.

(doc 'cons)

By default that goes to the online one, but can be pointed
to a local installation of the manual.

Are there any other resources for learning, more along the lines of a tutorial or a classic textbook?

Unfortunately, no. There should be decent knowledge/skill transfer from
some basic Lisp tutorials like Touretzky and whatnot, but beyond that experimentation and the manual is all there is. Plus help: mailing list,
IRC and so on.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Javier@21:1/5 to Kaz Kylheku on Sat Jun 10 23:37:05 2023

Kaz Kylheku <864-117-4973@kylheku.com> wrote:

Yes; noboid is going to read a huge man page from top to bottom; but
it's very useful to have it installed for a quick search.

To help with the manual, there is a HTML-ized version with internal hyperlinks and a navigable, collapsible table of contents.

For that kind of manual the best format is texinfo.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Javier on Sun Jun 11 04:58:26 2023

On 2023-06-10, Javier <invalid@invalid.invalid> wrote:

Kaz Kylheku <864-117-4973@kylheku.com> wrote:

Yes; noboid is going to read a huge man page from top to bottom; but
it's very useful to have it installed for a quick search.

To help with the manual, there is a HTML-ized version with internal
hyperlinks and a navigable, collapsible table of contents.

For that kind of manual the best format is texinfo.

Texinfo doesn't have a good terminal-based interface, and what it does
have is not as widely available as a man program.

Almost every time I use some GNU program's info, I end up going on the
web to find the "all in one HTML page" version of the doc.

Documentation systems that don't have a good terminal interface (or any
at all) are a dime a dozen.

It doesn't seem to be capable of authoring man pages out of the box.
(If so, all those GNU maintainers who maintain separate man pages and
texinfo documentation didn't get the memo: GCC, Bash, Gawk, ...) I'm not
going to maintain parallel documents! Now I'm sure I could get good man
pages and texinfo documentation from a single source document if I
worked at it, but what would be the point.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Kaz Kylheku on Sun Jun 11 13:19:26 2023

On 11.06.23 06:58, Kaz Kylheku wrote:

Texinfo doesn't have a good terminal-based interface, and what it does
have is not as widely available as a man program.

Almost every time I use some GNU program's info, I end up going on the
web to find the "all in one HTML page" version of the doc.

Documentation systems that don't have a good terminal interface (or any
at all) are a dime a dozen.

It doesn't seem to be capable of authoring man pages out of the box.
(If so, all those GNU maintainers who maintain separate man pages and
texinfo documentation didn't get the memo: GCC, Bash, Gawk, ...) I'm not going to maintain parallel documents! Now I'm sure I could get good man
pages and texinfo documentation from a single source document if I
worked at it, but what would be the point.

Feeling myself tortured with 'info'-pages and with hints in some 'man'-page(-stub) that relevant information is only available in
'info'-pages I wonder whether there's some tool that extracts or
dumps (structured or not) the 'info' page information to create
some "all in one" page (in *roff, HTML, or plain text format)?
(With all the hypertext links it's probably a desperate wish but
maybe someone had a good idea implemented.)

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kenny McCormack@21:1/5 to janis_papanagnou@hotmail.com on Sun Jun 11 11:31:03 2023

In article <u64ak0$2meqj$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
...

Feeling myself tortured with 'info'-pages and with hints in some >'man'-page(-stub) that relevant information is only available in
'info'-pages I wonder whether there's some tool that extracts or
dumps (structured or not) the 'info' page information to create
some "all in one" page (in *roff, HTML, or plain text format)?
(With all the hypertext links it's probably a desperate wish but
maybe someone had a good idea implemented.)

man texi2any

looks interesting.
--
You are again heaping damnation upon your own head by your statements.

- Rick C Hodgin -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Spiros Bousbouras on Sun Jun 11 15:54:48 2023

On 11.06.23 15:43, Spiros Bousbouras wrote:

On Sun, 11 Jun 2023 13:19:26 +0200
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

Feeling myself tortured with 'info'-pages and with hints in some
'man'-page(-stub) that relevant information is only available in
'info'-pages I wonder whether there's some tool that extracts or
dumps (structured or not) the 'info' page information to create
some "all in one" page (in *roff, HTML, or plain text format)?
(With all the hypertext links it's probably a desperate wish but
maybe someone had a good idea implemented.)

Anything wrong with less info-page ? If there are several files
then cat <files-pattern> | less also works. You will get the
occasional control character but it's perfectly readable.

Yeah. Even (for example) 'info ls | less' seems to work pretty well.
Will also have a look into 'texi2any', that Kenny suggested. Thanks.

BTW, playing with 'info'...
$ info info
info: No menu item 'info-stnd' in node '(dir)Top'

Erm.. - okaaay.

Janis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Spiros Bousbouras@21:1/5 to Janis Papanagnou on Sun Jun 11 13:43:54 2023

On Sun, 11 Jun 2023 13:19:26 +0200
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

Feeling myself tortured with 'info'-pages and with hints in some 'man'-page(-stub) that relevant information is only available in
'info'-pages I wonder whether there's some tool that extracts or
dumps (structured or not) the 'info' page information to create
some "all in one" page (in *roff, HTML, or plain text format)?
(With all the hypertext links it's probably a desperate wish but
maybe someone had a good idea implemented.)

Anything wrong with less info-page ? If there are several files
then cat <files-pattern> | less also works. You will get the
occasional control character but it's perfectly readable.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Eric on Sun Jun 11 15:04:05 2023

On 2023-06-11, Eric <eric@deptj.eu> wrote:

On 2023-06-11, Kaz Kylheku <864-117-4973@kylheku.com> wrote:

< --------

It doesn't seem to be capable of authoring man pages out of the box.
(If so, all those GNU maintainers who maintain separate man pages and
texinfo documentation didn't get the memo: GCC, Bash, Gawk, ...) I'm not
going to maintain parallel documents! Now I'm sure I could get good man
pages and texinfo documentation from a single source document if I
worked at it, but what would be the point.

Halibut.

FAQ:

] Why on earth ‘Halibut’? What relevance does the name have to anything?
]
] Historical reasons. It's probably better not to ask.

For those versed in childish word games of the English language,
the name clearly communicates why the author started the project:
"just for the hell of it".

:)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Janis Papanagnou on Sun Jun 11 14:39:29 2023

On 2023-06-11, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

On 11.06.23 15:43, Spiros Bousbouras wrote:

On Sun, 11 Jun 2023 13:19:26 +0200
Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

Feeling myself tortured with 'info'-pages and with hints in some
'man'-page(-stub) that relevant information is only available in
'info'-pages I wonder whether there's some tool that extracts or
dumps (structured or not) the 'info' page information to create
some "all in one" page (in *roff, HTML, or plain text format)?
(With all the hypertext links it's probably a desperate wish but
maybe someone had a good idea implemented.)

Anything wrong with less info-page ? If there are several files
then cat <files-pattern> | less also works. You will get the
occasional control character but it's perfectly readable.

Yeah. Even (for example) 'info ls | less' seems to work pretty well.
Will also have a look into 'texi2any', that Kenny suggested. Thanks.

It works on all of coreutils. I get about 19,000 lines of output,
so you can read everything at once.

I noticed there is an index in the back, which has small line
numbers. These are node-relative.

For example:

* --padding: numfmt invocation. (line 87)

When I find the start of the numfmt node far above that, which looks
like:

File: coreutils.info, Node: numfmt invocation, Next: seq [...]

Then if pretend that the blank line immediately above it is 1,
then line 87 from that lands on the --padding option.

A filter could easily be written to turn these relative references into absolute.

It could be mildly useful to have program (called, say, 'boy') so you
could just type

$ boy coreutils

and have nicely reformatted output automatically piped into your pager,
with no extraneous cruft in it, and styled like a man page with
highlighting, and any index-like line number references being correct.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Eric@21:1/5 to Kaz Kylheku on Sun Jun 11 16:14:10 2023

On 2023-06-11, Kaz Kylheku <864-117-4973@kylheku.com> wrote:

< --------

It doesn't seem to be capable of authoring man pages out of the box.
(If so, all those GNU maintainers who maintain separate man pages and
texinfo documentation didn't get the memo: GCC, Bash, Gawk, ...) I'm not going to maintain parallel documents! Now I'm sure I could get good man
pages and texinfo documentation from a single source document if I
worked at it, but what would be the point.

Halibut.

E.
--
ms fnd in a lbry

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Spiros Bousbouras@21:1/5 to Kaz Kylheku on Sun Jun 11 14:56:39 2023

On Sun, 11 Jun 2023 14:39:29 -0000 (UTC)
Kaz Kylheku <864-117-4973@kylheku.com> wrote:

On 2023-06-11, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

Yeah. Even (for example) 'info ls | less' seems to work pretty well.
Will also have a look into 'texi2any', that Kenny suggested. Thanks.

It works on all of coreutils. I get about 19,000 lines of output,
so you can read everything at once.

I noticed there is an index in the back, which has small line
numbers. These are node-relative.

For example:

* --padding: numfmt invocation. (line 87)

When I find the start of the numfmt node far above that, which looks
like:

File: coreutils.info, Node: numfmt invocation, Next: seq [...]

Then if pretend that the blank line immediately above it is 1,
then line 87 from that lands on the --padding option.

A filter could easily be written to turn these relative references into absolute.

A TXR job ? ;-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Javier@21:1/5 to Kaz Kylheku on Sun Jun 11 16:03:20 2023

Kaz Kylheku <864-117-4973@kylheku.com> wrote:

Texinfo doesn't have a good terminal-based interface, and what it does
have is not as widely available as a man program.

The good terminal reader for info files is 'emacs -nw'. The
standalone info tool is better replaced by this:

emacs-info(){ emacs -nw --exec '(info "'"$(/usr/bin/info -w ${1})"'")'; }

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ivan Shmakov@21:1/5 to All on Sun Jun 11 19:17:40 2023

On 2023-06-10, Javier wrote:
Kaz Kylheku <864-117-4973@kylheku.com> wrote:

Yes; noboid is going to read a huge man page from top to bottom;
but it's very useful to have it installed for a quick search.

To help with the manual, there is a HTML-ized version with internal
hyperlinks and a navigable, collapsible table of contents.

For that kind of manual the best format is texinfo.

I'd be curious of the reasoning behind such a conclusion?

I've had a somewhat related discussion elsewhere recently.
My preference these days is HTML, though it can be argued
I don't write /that/ much documentation.

Consider, e. g., http://am-1.org/~ivan/qinp-2021/096.sys.en.xhtml
(neither a manual nor reference, but IMO close enough.)

a. It downloads quickly and renders adequately on desktop
and mobile computers, and can be printed as well.

b. It forces no page breaks. Those make little sense on
screen anyway.

c. It can be read on a 'device' just bought new, /and/ it can
be read with Lynx on a 30 year old 386-class computer.

d. You can edit it and send me patches.

With document preparation systems, such as Texinfo, LaTeX,
Man, I'd need an HTML copy for 'online' reading, likely
a separate PDF copy for printing, and the source code for
patches.

It might have changed in a decade or so I haven't had an
active interest in TeX implementations, but back then getting
HTML output out of one wasn't possible, thus forcing one to
use PDF, which is neither screen- nor 386-friendly. Though
there were workarounds.

Of course, Texinfo and Man /do/ have HTML formatters. Why,
for months now, my preferred way to deal with manpages is to
format them into HTML with mandoc(1) (and read the resulting
HTMLs with Lynx), along the lines of:

#!/bin/sh
## Usage: $0 TARGET-DIRECTORY /usr/share/manX/PAGE.X.gz
set -e
set -C -u
d=${1}
shift
for f ; do
b=${f##*/} ; q=${b%.gz}
s1=${q##*.} ; s2=${s1#?} ; s=${s1%${s2}}
g=${d}/html${s}/${q%.*}.html
test -s "$f" -a ! -e "$g" \
|| continue
case "$f" in (*.gz) zcat ;; (*) cat ;; esac < "$f" \
| mandoc -T html -O man=../html%S/%N.html > "$g"
done

Texinfo is a tad problematic in this regard as precompiled
software packages typically don't include source .texi files
but only the resulting .info ones. Depending on the packager's
preferences, .pdf, .html and others might also be provided, but
that's far from a universal practice IME.

While I understand that historically, using either a *roff
(for -man, -mdoc) or TeX (for Texinfo) implementation allowed
for high-quality printing, now that modern browsers are pretty
ubiquitous /and/ offer formatting capabilities on par with
*roff (and in specific areas arguably exceeding those of TeX),
I'm afraid the advantages of recommending them in the not so
uncommon "hey guys; I want to learn a good doc format, which
would you recommend?" scenario are not obvious to me.

(And given that a lot of free software distributions now come
with README.md files, which I believe are primarily useful as
a source for HTML, I'm going to guess I'm not the only one.)

--
FSF associate member #7257 http://am-1.org/~ivan/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Javier@21:1/5 to Jerry Peters on Sun Jun 11 21:10:42 2023

Jerry Peters <jerry@example.invalid> wrote:

For those of us not into emacs I'd recommend pinfo. It's a curses
based info reader with keyboard navigation. Very easy to use and
customize.

https://github.com/baszoetekouw/pinfo

Yes, pinfo is another choice for people who is not used to emacs, and
its keybindings are very easy to learn. The interface is very similar to
lynx.

What I miss in pinfo is a keybinding to search in the index of
important keywords (it's the 'I' binding in emacs and standalone
info). In pinfo you can do a full text search, but AFAIK it's not
possible to do a search of indexed keywords.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jerry Peters@21:1/5 to Javier on Sun Jun 11 20:51:22 2023

Javier <invalid@invalid.invalid> wrote:

Kaz Kylheku <864-117-4973@kylheku.com> wrote:

Texinfo doesn't have a good terminal-based interface, and what it does
have is not as widely available as a man program.

The good terminal reader for info files is 'emacs -nw'. The
standalone info tool is better replaced by this:

emacs-info(){ emacs -nw --exec '(info "'"$(/usr/bin/info -w ${1})"'")'; }

For those of us not into emacs I'd recommend pinfo. It's a curses
based info reader with keyboard navigation. Very easy to use and
customize.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jerry Peters@21:1/5 to Jerry Peters on Sun Jun 11 20:55:05 2023

Jerry Peters <jerry@example.invalid> wrote:

Javier <invalid@invalid.invalid> wrote:

Kaz Kylheku <864-117-4973@kylheku.com> wrote:

Texinfo doesn't have a good terminal-based interface, and what it does
have is not as widely available as a man program.

The good terminal reader for info files is 'emacs -nw'. The
standalone info tool is better replaced by this:

emacs-info(){ emacs -nw --exec '(info "'"$(/usr/bin/info -w ${1})"'")'; }

For those of us not into emacs I'd recommend pinfo. It's a curses
based info reader with keyboard navigation. Very easy to use and
customize.

https://github.com/baszoetekouw/pinfo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to Javier on Sun Jun 11 15:14:28 2023

Javier <invalid@invalid.invalid> writes:

Jerry Peters <jerry@example.invalid> wrote:

For those of us not into emacs I'd recommend pinfo. It's a curses
based info reader with keyboard navigation. Very easy to use and
customize.

https://github.com/baszoetekouw/pinfo

Yes, pinfo is another choice for people who is not used to emacs, and
its keybindings are very easy to learn. The interface is very similar to lynx.

What I miss in pinfo is a keybinding to search in the index of
important keywords (it's the 'I' binding in emacs and standalone
info). In pinfo you can do a full text search, but AFAIK it's not
possible to do a search of indexed keywords.

Another nice feature of `info` is that you can specify an index entry on
the command line (something I only learned recently).

For example, I've spent a lot of time in `info bash` searching for the
section on parameter expansion (and sometimes failing to remember that
it's "expansion", not "replacement"). But I can just type `info bash
param` and it jumps there directly. There are 5 or so index entries
starting with "param"; "parameter expansion" happens to be the first.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Javier@21:1/5 to Ivan Shmakov on Sun Jun 11 23:10:38 2023

Ivan Shmakov <ivan@siamics.netnospam.invalid> wrote:

For that kind of manual the best format is texinfo.

I'd be curious of the reasoning behind such a conclusion?

Because info is better for documents that have structure. I am
referring to technical documents the size of a book.

In the browser you only have back and forward, which in GNU info correspond
to 'l' (last) and 'r' (revisit).

In info you have 'p' (previous) and 'n' (next), for moving within a
level and 'u' for going up one level. You don;t have that in any
browser. IIRC the HTML standard mentions that kind of structure of a
document, but no web browser ever has bothered to implement
keybindings or buttons for those actions.

Perhaps the most interesting feature of info is do search of keywords
('I' binding). In a full text search in a browser with Ctrl-f you are
going to get a lot of false positives. Info docs highlight important
words and put them in an index. It's the same idea to the
alphabetical index that appears at the end of dead-tree books, but in
emacs you are not restricted to lookup those keywords alphabetically,
and that is useful because many times those 'keywords' are composed by
several words.

b. It forces no page breaks. Those make little sense on
screen anyway.

They do make sense. When you are reading a section on screen you don't
want to be distracted by the text in the next section.

(And given that a lot of free software distributions now come
with README.md files, which I believe are primarily useful as
a source for HTML, I'm going to guess I'm not the only one.)

Form what I have seen those .md documents are the size of a small
manpage. I don't think the .md format is well suited to write a
document the size of a book. Does md support making an alphabetical
index of keywords?

Texinfo is a tad problematic in this regard as precompiled
software packages typically don't include source .texi files
but only the resulting .info ones. Depending on the packager's
preferences, .pdf, .html and others might also be provided, but
that's far from a universal practice IME.

Yes, that's a problem with packagers, but you can always get the
source and recompile the docs.

While I understand that historically, using either a *roff
(for -man, -mdoc) or TeX (for Texinfo) implementation allowed
for high-quality printing, now that modern browsers are pretty
ubiquitous /and/ offer formatting capabilities on par with
*roff (and in specific areas arguably exceeding those of TeX),
I'm afraid the advantages of recommending them in the not so
uncommon "hey guys; I want to learn a good doc format, which
would you recommend?" scenario are not obvious to me.

There is no need to force people to learn arcane things.
Sphinx and the reST format is quite fashionable nowadays (the Python
foundation uses it), and it can output the doc in texinfo format.

https://docs.python.org/3.11/about.html https://stackoverflow.com/questions/1054903/how-do-you-get-python-documentation-in-texinfo-info-format

The same tool is used for other languages like Erlang and Julia.
Most of what is hosted https://readthedocs.org is convertable to info,
although very few projects nowadays bother to publish the docs in info
format and if you want the info docs you need to download the source
and compile the docs with sphinx yourself.

Debian helps a lot by providing packages with info docs.

https://packages.debian.org/search?searchon=contents&keywords=info.gz&mode=path&suite=stable&arch=any

The Perl documentation is also convertable to info, but the problem
with the perl docs I have seen is that they lack the keyword index.

The Texinfo format is not dead, and will continue to live, albeit with
an extremely reduced userbase. From the side of the GNU project some improvements could be done by providing alternative ways to do
things. For example, the directory format used for indexing info docs accessible in emacs is awful, and I have to edit manually ${HOME}/local/info/dir to be able to read docs in emacs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Javier on Mon Jun 12 01:46:14 2023

On 2023-06-11, Javier <invalid@invalid.invalid> wrote:

Ivan Shmakov <ivan@siamics.netnospam.invalid> wrote:

For that kind of manual the best format is texinfo.

I'd be curious of the reasoning behind such a conclusion?

Because info is better for documents that have structure. I am
referring to technical documents the size of a book.

In the browser you only have back and forward, which in GNU info correspond to 'l' (last) and 'r' (revisit).

In info you have 'p' (previous) and 'n' (next), for moving within a
level and 'u' for going up one level. You don;t have that in any
browser. IIRC the HTML standard mentions that kind of structure of a

dd

document, but no web browser ever has bothered to implement
keybindings or buttons for those actions.

A web document will have navigation links or buttons.
Javascripted hotkeys are a thing.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dr Eberhard W Lisse@21:1/5 to Axel Reichert on Mon Jun 12 06:55:35 2023

You are missing QSV, CSVQ, CSVIEW, CSVLENS and TYPS.

el

On 08/06/2023 11:34, Axel Reichert wrote:

Hello,

normally in my shell scripts I use the classics, such as grep, sed, awk,
cut, sort, uniq, a bit of Perl etc.

I recently learned about the existence of "mlr"

https://miller.readthedocs.io/en/6.8.0/

, "datamash",

https://www.gnu.org/software/datamash/manual/datamash.html/

"jq",

https://jqlang.github.io/jq/

"q",

https://harelba.github.io/q/

and "sqlet"

https://www.sqlet.com/

which intrigued me and made me think about other tools that I
might be missing.

Any obvious candidates for text mangling?

Pointers much appreciated!

Thanks and best regards,

Axel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Axel Reichert@21:1/5 to Kaz Kylheku on Mon Jun 12 08:59:27 2023

Kaz Kylheku <864-117-4973@kylheku.com> writes:

On 2023-06-10, Axel Reichert <mail@axel-reichert.de> wrote:

To help with the manual, there is a HTML-ized version with internal hyperlinks and a navigable, collapsible table of contents.

Yes, I have seen it.

There is also a library function doc which fires off a request
to open a browser to a document section, by symbol. e.g.

(doc 'cons)

... in the vein of classical lispy reference documentation. Good to
know.

Are there any other resources for learning, more along the lines of a
tutorial or a classic textbook?

Unfortunately, no.

I was suspecting this.

There should be decent knowledge/skill transfer from some basic Lisp tutorials

Sure.

Thanks!

Axel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Axel Reichert@21:1/5 to Ivan Shmakov on Mon Jun 12 08:52:56 2023

Ivan Shmakov <ivan@siamics.netNOSPAM.invalid> writes:

On 2023-06-10, Javier wrote:

For that kind of manual the best format is texinfo.

I'd be curious of the reasoning behind such a conclusion?

Back in my early Linux days in the mid-90s, HTML was not as common as it
is now. The Eternal September had only been in 1993. Also, texi was most prominent in the GNU world, which was small and kind of a nerd
elite. The similarities between texi and TeX/LaTeX were helpful in this community, many people had done their theses in LaTeX anyway, since at
that time Word was uncapable of handling more than a dozen
equations. Theses also highly structured.

With the influx of less nerdy content creators easier formats were
needed. No, Docbook, SGML, XML were not filling that need. Look at how
common light markup languages (Markdown, reStructured text,
asciidoc(tor), Emacs org-mode) are nowadays, even for websites. Most
(all?) of them allow for cross-target publishing from a single source
that is readable even in raw format. Asciidoc allows for index
creation. But many websites are much less structured than a scientific
thesis, so for me, after using asciidoc for a while, it has become
org-mode.

While this some historical context, I think there are technical reasons
as well: Texinfo AFAIK does not allow for pictures and nicely rendered equations, HTML and PDF do. The toolchain for HTML is vast, for PDF it
is still much larger (due to the professional publishing business) than
for Texinfo. The latter to me nowadays seems like an ancient HTML
predecessor, the comparison with Lynx for HTML is fitting.

Best regards

Axel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Axel Reichert@21:1/5 to Dr Eberhard W Lisse on Mon Jun 12 09:08:08 2023

Dr Eberhard W Lisse <nospam@lisse.NA> writes:

You are missing QSV, CSVQ, CSVIEW, CSVLENS and TYPS.

All look great from a quick glance, except for TYPS, which I could not
find easily. URL?

Thanks, very helpful!

Axel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Benjamin Esham@21:1/5 to Axel Reichert on Mon Jun 12 11:31:43 2023

Axel Reichert wrote:

Ivan Shmakov <ivan@siamics.netNOSPAM.invalid> writes:

On 2023-06-10, Javier wrote:

For that kind of manual the best format is texinfo.

I'd be curious of the reasoning behind such a conclusion?

[snip]

With the influx of less nerdy content creators [in the mid-90s] easier formats were needed. No, Docbook, SGML, XML were not filling that need.
Look at how common light markup languages (Markdown, reStructured text, asciidoc(tor), Emacs org-mode) are nowadays, even for websites. Most
(all?) of them allow for cross-target publishing from a single source that
is readable even in raw format. Asciidoc allows for index creation. But
many websites are much less structured than a scientific thesis, so for
me, after using asciidoc for a while, it has become org-mode.

While this some historical context, I think there are technical reasons
as well: Texinfo AFAIK does not allow for pictures and nicely rendered equations, HTML and PDF do. The toolchain for HTML is vast, for PDF it
is still much larger (due to the professional publishing business) than
for Texinfo. The latter to me nowadays seems like an ancient HTML predecessor, the comparison with Lynx for HTML is fitting.

Just in case anyone is unaware of Pandoc [1], I highly recommend it if you
need to convert between document formats. It can read and write HTML,
Markdown, org-mode, reStructuredText, LaTeX, DocBook, and even docx (among
many others). Unfortunately it only supports *writing* Texinfo, not reading
it, or else I would have mentioned it earlier in the thread.

Like you say, tools like Pandoc make it practical to keep the "canonical" documentation of a tool in a newer, more lightweight format like Markdown,
and then convert it to Texinfo or whatever as needed. I maintain a small command-line tool [2] that has a man page, and this is the approach I use: I author in Markdown and then use a one-line Pandoc invocation to convert it
to man format for consumption. I could also convert to HTML or PDF or
whatever, albeit that the man-style formatting conventions would probably
look weird in those contexts.

Benjamin

[1]: https://pandoc.org/

[2]: https://github.com/bdesham/pinboard-notes-backup

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Eric@21:1/5 to Kaz Kylheku on Mon Jun 12 21:26:22 2023

On 2023-06-11, Kaz Kylheku <864-117-4973@kylheku.com> wrote:

On 2023-06-11, Eric <eric@deptj.eu> wrote:

< --------

Halibut.

FAQ:

] Why on earth ‘Halibut’? What relevance does the name have to anything? ]
] Historical reasons. It's probably better not to ask.

For those versed in childish word games of the English language,
the name clearly communicates why the author started the project:
"just for the hell of it".

:)

I've wondered about the name from time to time, but never thought of
that. Plausible though.

E
--
ms fnd in a lbry

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Eric@21:1/5 to Axel Reichert on Mon Jun 12 21:28:43 2023

On 2023-06-12, Axel Reichert <mail@axel-reichert.de> wrote:

Dr Eberhard W Lisse <nospam@lisse.NA> writes:

You are missing QSV, CSVQ, CSVIEW, CSVLENS and TYPS.

All look great from a quick glance, except for TYPS, which I could not
find easily. URL?

Thanks, very helpful!

Axel

TYPST I suspect - https://github.com/typst/typst

E
--
ms fnd in a lbry

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Axel Reichert@21:1/5 to Eric on Wed Jun 14 07:50:28 2023

Eric <eric@deptj.eu> writes:

On 2023-06-12, Axel Reichert <mail@axel-reichert.de> wrote:

All look great from a quick glance, except for TYPS, which I could not
find easily. URL?

TYPST I suspect - https://github.com/typst/typst

Thanks, also interesting, even if not for text mangling (IIRC Eberhard mentioned it already sometime ago, maybe in a different forum).

Axel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bozo user@21:1/5 to Axel Reichert on Thu Jun 15 14:35:14 2023

On 2023-06-08, Axel Reichert <mail@axel-reichert.de> wrote:

Hello,

normally in my shell scripts I use the classics, such as grep, sed, awk,
cut, sort, uniq, a bit of Perl etc.

I recently learned about the existence of "mlr"

https://miller.readthedocs.io/en/6.8.0/

, "datamash",

https://www.gnu.org/software/datamash/manual/datamash.html/

"jq",

https://jqlang.github.io/jq/

"q",

https://harelba.github.io/q/

and "sqlet"

https://www.sqlet.com/

which intrigued me and made me think about other tools that I
might be missing.

Any obvious candidates for text mangling?

Pointers much appreciated!

Thanks and best regards,

Axel

Perl does sort/cut/grep/awk/sed and lots more.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to bozo user on Thu Jun 15 17:42:41 2023

On 15.06.23 16:35, bozo user wrote:

On 2023-06-08, Axel Reichert <mail@axel-reichert.de> wrote:

Hello,

normally in my shell scripts I use the classics, such as grep, sed, awk,
cut, sort, uniq, a bit of Perl etc.

[...]

Perl does sort/cut/grep/awk/sed and lots more.

Note that the OP already mentioned perl in his tool-chest.

For the sake of your (assumed) goal, pointing out that perl
comprises a lot of basic tools already, it makes sense that
you structure it yet more differentiated, though.

With awk you have already grep, sed, cut, sort [in gawk],
uniq, "and lots more".[*]

So if that's all one needs (or if you are working in a POSIX
environment) one may (or has to) prefer the small and simple
awk, and a (POSIX) shell.

Janis

[*] With that lemma your statement may be re-formulated as
"Perl does awk and lots more." - So the OP would need to
determine in his "shell scripting/text mangling" context
what's still necessary beyond what provides awk (and shell),
and what of that will be provided by perl.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Axel Reichert@21:1/5 to Janis Papanagnou on Fri Jun 16 07:38:30 2023

Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

With awk you have already grep, sed, cut, sort [in gawk],
uniq, "and lots more".[*]

Yes, and with the help of this group and others next door, my awk
knowledge has grown to a level that allows me the shrink the usage of
other tools. I like to have as few tools as possible and believe that it
is more efficient to master them rather than to clumsily use a large bag
full of special purpose tools. For this idea to work, the tools must be powerful and general purpose.

THAT was the reason of my question, not arbitrarily adding whatever is available.

So if that's all one needs (or if you are working in a POSIX
environment) one may (or has to) prefer the small and simple awk, and
a (POSIX) shell.

Add Emacs, and you have essentially my go-to tools. A proficient user of
all three (not that I count myself into that group) can do things that
amaze text mangling laymen. "Every sufficiently advanced technology is indistinguishable from magic." (Arthur C. Clarke)

Best regards

Axel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rene Kita@21:1/5 to Janis Papanagnou on Wed Jul 12 16:00:03 2023

Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

Feeling myself tortured with 'info'-pages and with hints in some 'man'-page(-stub) that relevant information is only available in
'info'-pages I wonder whether there's some tool that extracts or
dumps (structured or not) the 'info' page information to create
some "all in one" page (in *roff, HTML, or plain text format)?
(With all the hypertext links it's probably a desperate wish but
maybe someone had a good idea implemented.)

Janis

I have a shell function that dumps entire info pages to less:

info () {
command info --subnodes -o - "$1" | less
}

It's readable but misses all the navigation features.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	498
Nodes:	16 (2 / 14)
Uptime:	05:37:26
Calls:	9,821
Files:	13,757
Messages:	6,190,554

Useful tools for shell scripting/text mangling (slightly OT)?

Who's Online

System Info