• Bash expression to detect dying RAID devices

    From Roger Price@21:1/5 to All on Tue Dec 24 15:50:01 2024
    File /proc/mdstat indicates a dying RAID device with an output section
    such as

    md3 : active raid1 sdg6[0]
    871885632 blocks super 1.0 [2/1] [U_]
    bitmap: 4/7 pages [16KB], 65536KB chunk

    Note the [U-]. The "-" says /dev/sdh is dead. I would like to scan /proc/mdstat and set a flag if [U-], [-U] or [--] occur. My current attempt is

    #! /bin/bash -u
    set -x
    BAD=0;
    while read L;
    do if [[ $L == *"[U-]"* ]]; then B=1; fi;
    if [[ $L == *"[-U]"* ]]; then B=1; fi;
    if [[ $L == *"[--]"* ]]; then B=1; fi;
    done < /proc/mdstat;
    echo $BAD

    Far from elegant, but I still can't get it to work.
    The trace contains lines such as

    + 1164021 1164021 [4]read L
    + 1164021 1164021 [5][[ 20970368 blocks super 1.0 [2/1] [U_] == *\[\U\-\]* ]]

    The test always fails, but I can't see why. Any hint would be very welcome.

    Roger

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nicolas George@21:1/5 to All on Tue Dec 24 16:30:01 2024
    Roger Price (12024-12-24):
    File /proc/mdstat indicates a dying RAID device with an output section such as

    Maybe try to find a more script-friendly source for that information in /sys/class/block/md127/md/?

    Regards,

    --
    Nicolas George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Tue Dec 24 16:40:01 2024
    File /proc/mdstat indicates a dying RAID device with an output section such as

    md3 : active raid1 sdg6[0]
    871885632 blocks super 1.0 [2/1] [U_]
    bitmap: 4/7 pages [16KB], 65536KB chunk

    Note the [U-].

    I can't see a "[U-]", only a "[U_]"


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to All on Tue Dec 24 16:50:02 2024
    On Tue, Dec 24, 2024 at 10:37:29 -0500, Roberto C. Sánchez wrote:
    I think that '==' is the wrong tool. That is testing for string
    equality, whilst you are looking for a partial match. This is what I was
    able to get working after hacking on it for a minute or two:

    #! /bin/bash -u
    set -x
    BAD=0;
    while read L;
    do if [[ $L =~ \[(U_|_U|__)\] ]]; then BAD=1; break; fi;
    done < /proc/mdstat;
    echo $BAD

    Yeah, that works too. But == or = also works here, if you use an
    extended glob. (And extended glob matching inside the [[ command is
    on by default in all recent bash versions.)

    = or == is a glob (or extglob) match first and foremost. It
    degenerates to a plain string match if all the characters on the
    right hand side are quoted.

    hobbit:~$ stat=$'blah blah\ncheese [2/1] [U_]\nblah blah'
    hobbit:~$ if [[ $stat = *\[@(U_|_U|__)\]* ]]; then echo "bad"; fi
    bad
    hobbit:~$ stat=$'blah blah\ncheese [2/1] [UU]\nblah blah'
    hobbit:~$ if [[ $stat = *\[@(U_|_U|__)\]* ]]; then echo "bad"; fi
    hobbit:~$

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nicolas George@21:1/5 to All on Tue Dec 24 16:50:02 2024
    Roberto C. Sánchez (12024-12-24):
    I think that '==' is the wrong tool.

    string1 == string2
    string1 = string2
    True if the strings are equal. = should be used with the test
    command for POSIX conformance. When used with the [[ command,
    this performs pattern matching as described above (Compound Comâ€
    mands).

    But it is a bashism. Better use a more lightweight and standard shell,
    /bin/sh, or switch directly to a more powerful one, like zsh.

    Regards,

    --
    Nicolas George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Smith@21:1/5 to Roger Price on Tue Dec 24 17:10:01 2024
    Hi,

    On Tue, Dec 24, 2024 at 03:45:31PM +0100, Roger Price wrote:
    I would like to scan /proc/mdstat and set a flag if [U-], [-U] or [--]
    occur.

    Others have pointed out your '-' vs '_' confusion. But are you sure you wouldn't rather just rely on the "mdadm --monitor" command that emails
    you when devices are dropped? On Debian it's run by the "mdmonitor"
    systemd service.

    Failing that you can regularly check the
    /sys/class/block/mdX/md/degraded file and if it's anything but 0 only
    then look at other files under there or at /proc/mdstat.

    Thanks,
    Andy

    --
    https://bitfolk.com/ -- No-nonsense VPS hosting

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Roger Price on Tue Dec 24 16:40:01 2024
    On Tue, Dec 24, 2024 at 15:45:31 +0100, Roger Price wrote:
    File /proc/mdstat indicates a dying RAID device with an output section such as

    md3 : active raid1 sdg6[0]
    871885632 blocks super 1.0 [2/1] [U_]
    bitmap: 4/7 pages [16KB], 65536KB chunk

    Note the [U-].

    There isn't any [U-] in that output. There is [U_].

    The "-" says /dev/sdh is dead. I would like to scan
    /proc/mdstat and set a flag if [U-], [-U] or [--] occur.

    Start by figuring out whether you're looking for _ or - in the output.

    while read L;
    do if [[ $L == *"[U-]"* ]]; then B=1; fi;
    if [[ $L == *"[-U]"* ]]; then B=1; fi;
    if [[ $L == *"[--]"* ]]; then B=1; fi;
    done < /proc/mdstat;

    You don't really need to read a line at a time. You can slurp the
    entire file all at once and just do your matching on the whole thing.

    stat=$(< /proc/mdstat)
    if [[ $stat = *\[@(U_|_U|__)\]* ]]; then
    echo "bad thing found"
    fi

    But, again, if you're looking for the wrong character, it won't work
    very well. I showed code using _ to match your actual input.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto =?iso-8859-1?Q?C=2E_S=E1nch@21:1/5 to Roger Price on Tue Dec 24 16:40:01 2024
    Hi Roger,

    On Tue, Dec 24, 2024 at 03:45:31PM +0100, Roger Price wrote:
    File /proc/mdstat indicates a dying RAID device with an output section such as

    md3 : active raid1 sdg6[0]
    871885632 blocks super 1.0 [2/1] [U_]
    bitmap: 4/7 pages [16KB], 65536KB chunk

    Note the [U-]. The "-" says /dev/sdh is dead. I would like to scan /proc/mdstat and set a flag if [U-], [-U] or [--] occur.

    I am confused. The sample output you provide includes '_' rather than
    '-', but your script and all of the tests which it performs are trying
    to match '-' (which isn't part of the output).

    My current attempt is

    #! /bin/bash -u
    set -x
    BAD=0;
    while read L;
    do if [[ $L == *"[U-]"* ]]; then B=1; fi;
    if [[ $L == *"[-U]"* ]]; then B=1; fi;
    if [[ $L == *"[--]"* ]]; then B=1; fi;
    done < /proc/mdstat;
    echo $BAD

    Also, you assign BAD=0, then echo $BAD, but your loop assigns B=1
    (rather than BAD=1). Even if you managed to find a test that matched,
    your script would still always echo 0.

    Far from elegant, but I still can't get it to work.
    The trace contains lines such as

    + 1164021 1164021 [4]read L
    + 1164021 1164021 [5][[ 20970368 blocks super 1.0 [2/1] [U_] == *\[\U\-\]* ]]

    The test always fails, but I can't see why. Any hint would be very welcome.


    I think that '==' is the wrong tool. That is testing for string
    equality, whilst you are looking for a partial match. This is what I was
    able to get working after hacking on it for a minute or two:

    #! /bin/bash -u
    set -x
    BAD=0;
    while read L;
    do if [[ $L =~ \[(U_|_U|__)\] ]]; then BAD=1; break; fi;
    done < /proc/mdstat;
    echo $BAD

    Note that I changed to a regex match, and also added a 'break;' after
    assigning BAD=1, because there is no need to continue processing the
    input at that point.

    Regards and Merry Christmas,

    -Roberto

    --
    Roberto C. Sánchez

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roger Price@21:1/5 to Greg Wooledge on Tue Dec 24 17:30:01 2024
    On Tue, 24 Dec 2024, Greg Wooledge wrote:

    On Tue, Dec 24, 2024 at 15:45:31 +0100, Roger Price wrote:

    md3 : active raid1 sdg6[0]
    871885632 blocks super 1.0 [2/1] [U_]
    bitmap: 4/7 pages [16KB], 65536KB chunk

    Note the [U-].

    There isn't any [U-] in that output. There is [U_].

    My typo "-" instead of "_" is the culprit. Thanks to everyone who replied. I will look at mdmonitor.

    Roger

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Charles Curley@21:1/5 to Roger Price on Tue Dec 24 17:40:02 2024
    On Tue, 24 Dec 2024 15:45:31 +0100 (CET)
    Roger Price <debian@rogerprice.org> wrote:

    File /proc/mdstat indicates a dying RAID device with an output
    section such as

    md3 : active raid1 sdg6[0]
    871885632 blocks super 1.0 [2/1] [U_]
    bitmap: 4/7 pages [16KB], 65536KB chunk

    Note the [U-]. The "-" says /dev/sdh is dead. I would like to scan /proc/mdstat and set a flag if [U-], [-U] or [--] occur.

    You might look at systray-mdstat.

    --
    Does anybody read signatures any more?

    https://charlescurley.com
    https://charlescurley.com/blog/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)