• Question about regex with negated character class

    From Roger L Costello@21:1/5 to All on Mon Apr 25 12:48:43 2022
    Hi Folks,

    On page 12 of the Flex specification it says this:

    "A negated character class such as [^A-Z] will match a newline
    unless \n (or an equivalent escape sequence) is one of the characters explicitly present
    in the negated character class (e.g., [^A-Z\n]). This is unlike how many other regular
    expression tools treat negated character classes ..."

    Is that last sentence true? Does Flex behaves differently from other regex engines, with regard to negated character class?

    I just tested the [^A-Z] regex at (https://regex101.com/) and every regex engine on that web page matches a string containing a newline. In other words, Flex behaves just like all the other regex engines. I conclude that that last sentence in the Flex manual is not correct. Do you agree?

    /Roger
    [It may have been true 30 years ago but they all match \n in a pattern
    now. On the other hand, grep won't match a newline because it does the
    matching one line at a time. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Roger L Costello on Mon Apr 25 23:46:44 2022
    On 2022-04-25, Roger L Costello <costello@mitre.org> wrote:
    Hi Folks,

    On page 12 of the Flex specification it says this:

    "A negated character class such as [^A-Z] will match a newline
    unless \n (or an equivalent escape sequence) is one of the characters explicitly present
    in the negated character class (e.g., [^A-Z\n]). This is unlike how many other
    regular expression tools treat negated character classes ..."

    I suspect this is a documentation mistake (in terms of the the remark it
    makes about other regex implementations).

    There is something special in Flex with regard to newlines: namely the any-character regular expression . (dot) does not match any character:
    it excludes the newline. The documenter might have momentarily gotten
    their wires crossed, misremembering what is the special behavior.

    Or else, I also agree with John that it may in fact be a remark about
    regex implementations in line-oriented text processing utilities, which
    (in their standrad forms, e.g. POSIX) don't have multi-line matching
    features in which \n appears as a character.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)