• yEnc-encoded articles in newsgroups

    From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Thu Mar 28 09:05:39 2024
    Hi all,

    I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight
    (have a look at soc.culture.french for instance). Examples:
    <17ba4ef578674e9c$60891$141478$64d91c8e@news.vipernews.com>
    <1O6IN.329500$7uxe.279980@fx09.ams1>

    Wouldn't it be worthwhile having NoCeM notices of type "binary" or like
    to help cleaning non-binary newsgroups from these unwanted articles?
    Naturally, other kinds of "binary" stuff could also be in these notices,
    and not only yEnc.

    Just asking, in case a current NoCeM issuer would be interested in
    adding such filters. (I'm not going to send NoCeM notices.)

    --
    Julien ÉLIE

    « Les amis de la vérité sont ceux qui la cherchent, et non ceux qui se
    vantent de l'avoir trouvée. » (Condorcet)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Moock@21:1/5 to All on Thu Mar 28 12:05:16 2024
    Am 28.03.2024 schrieb Julien ÉLIE <iulius@nom-de-mon-site.com.invalid>:

    Wouldn't it be worthwhile having NoCeM notices of type "binary" or
    like to help cleaning non-binary newsgroups from these unwanted
    articles? Naturally, other kinds of "binary" stuff could also be in
    these notices, and not only yEnc.

    Sounds good.

    Just asking, in case a current NoCeM issuer would be interested in
    adding such filters. (I'm not going to send NoCeM notices.)

    Why don't you send NoCeM messages?
    Do you filter yenc out?
    If so, implementing NoCem shouldn't be that much work.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Retro Guy@21:1/5 to All on Thu Mar 28 13:03:38 2024
    Julien ÉLIE wrote:

    Hi all,

    I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight
    (have a look at soc.culture.french for instance). Examples:
    <17ba4ef578674e9c$60891$141478$64d91c8e@news.vipernews.com>
    <1O6IN.329500$7uxe.279980@fx09.ams1>

    Wouldn't it be worthwhile having NoCeM notices of type "binary" or like
    to help cleaning non-binary newsgroups from these unwanted articles? Naturally, other kinds of "binary" stuff could also be in these notices,
    and not only yEnc.

    Just asking, in case a current NoCeM issuer would be interested in
    adding such filters. (I'm not going to send NoCeM notices.)

    That looks pretty easy to filter out, but I'm not seeing these on my servers due to another "feature" of the articles. I'm happy to add filtering for yenc as I don't serve binary groups on my servers, so this would only check text newsgroups.

    I'll get on that in a few days, but I'll check here first in case someone has reasons that I should not do so.

    --
    Retro Guy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Thu Mar 28 14:00:57 2024
    Hi Marco
    Just asking, in case a current NoCeM issuer would be interested in
    adding such filters. (I'm not going to send NoCeM notices.)

    Why don't you send NoCeM messages?

    I'm just not keen on doing that; I already have enough other tasks to
    do, and don't want to add yet another one, especially when there already
    are lots of experts here in this newsgroup :)


    Do you filter yenc out?

    Yes, I don't want yEnc articles (neither in nor out) but unfortunately I
    see some that pass local filters. That's why I thought that dedicated
    NoCeM notices for binaries would be interesting: it is easier for news
    admins to just rely on NoCeM notices than to keep their filters
    up-to-date (for filters still maintained upstream) or locally adjust
    rules and keep an eye on how well they perform.

    --
    Julien ÉLIE

    « Hâte-toi de bien vivre et songe que chaque jour est à lui seul une
    vie. » (Sénèque)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam H. Kerman@21:1/5 to Julien on Thu Mar 28 13:37:29 2024
    Julien <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi all,

    I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight
    (have a look at soc.culture.french for instance). Examples:
    <17ba4ef578674e9c$60891$141478$64d91c8e@news.vipernews.com>
    <1O6IN.329500$7uxe.279980@fx09.ams1>

    Wouldn't it be worthwhile having NoCeM notices of type "binary" or like
    to help cleaning non-binary newsgroups from these unwanted articles? >Naturally, other kinds of "binary" stuff could also be in these notices,
    and not only yEnc.

    Just asking, in case a current NoCeM issuer would be interested in
    adding such filters. (I'm not going to send NoCeM notices.)

    I don't understand. Isn't misplaced binary content addressed at the
    Cleanfeed filter?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ray Banana@21:1/5 to All on Thu Mar 28 14:24:06 2024
    Thus spake Retro Guy <retroguy@novabbs.com>

    Just asking, in case a current NoCeM issuer would be interested in
    adding such filters. (I'm not going to send NoCeM notices.)
    That looks pretty easy to filter out, but I'm not seeing these on my
    servers due to another "feature" of the articles. I'm happy to add
    filtering for yenc as I don't serve binary groups on my servers, so
    this would only check text newsgroups.

    Same here. These articles never make it to the binary filter and if they
    do, they get rejected by cleanfeed.local (with a somewhat more
    sophisticated yEnc filter). Should be doable over the holidays, will
    probably use a seperate type like "binary" rather than "spam" or "bot".

    I'll get on that in a few days, but I'll check here first in case
    someone has reasons that I should not do so.

    Let's go belt and suspenders. Better safe than sorry ;-)

    --
    Пу́тін — хуйло́
    https://www.eternal-september.org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to Adam H. Kerman on Thu Mar 28 18:48:27 2024
    On Mar 28, 2024 at 8:37:29 AM CDT, ""Adam H. Kerman"" <ahk@chinet.com> wrote:

    Julien <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi all,

    I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight
    (have a look at soc.culture.french for instance). Examples:
    <17ba4ef578674e9c$60891$141478$64d91c8e@news.vipernews.com>
    <1O6IN.329500$7uxe.279980@fx09.ams1>

    Wouldn't it be worthwhile having NoCeM notices of type "binary" or like
    to help cleaning non-binary newsgroups from these unwanted articles?
    Naturally, other kinds of "binary" stuff could also be in these notices,
    and not only yEnc.

    Just asking, in case a current NoCeM issuer would be interested in
    adding such filters. (I'm not going to send NoCeM notices.)

    I don't understand. Isn't misplaced binary content addressed at the
    Cleanfeed filter?

    Cleanfeed and pyClean's binary filters are far from perfect. I've used both at the same time and some still get through where they should not.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From llp@21:1/5 to All on Fri Mar 29 22:41:32 2024
    Julien LIE a prsent l'nonc suivant :
    Hi all,

    I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight (have a look at soc.culture.french for instance). Examples:
    <17ba4ef578674e9c$60891$141478$64d91c8e@news.vipernews.com>
    <1O6IN.329500$7uxe.279980@fx09.ams1>

    Wouldn't it be worthwhile having NoCeM notices of type "binary" or like to help cleaning non-binary newsgroups from these unwanted articles?
    Naturally, other kinds of "binary" stuff could also be in these notices, and not only yEnc.

    Just asking, in case a current NoCeM issuer would be interested in adding such filters. (I'm not going to send NoCeM notices.)

    I don't have these articles on my server.

    --
    Admin of news.usenet.ovh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Retro Guy@21:1/5 to llp on Sat Mar 30 08:27:54 2024
    On Fri, 29 Mar 2024 22:41:32 +0100, llp wrote:

    Julien LIE a prsent l'nonc suivant :
    Hi all,

    I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight (have >> a look at soc.culture.french for instance). Examples:
    <17ba4ef578674e9c$60891$141478$64d91c8e@news.vipernews.com>
    <1O6IN.329500$7uxe.279980@fx09.ams1>

    Wouldn't it be worthwhile having NoCeM notices of type "binary" or like to >> help cleaning non-binary newsgroups from these unwanted articles?
    Naturally, other kinds of "binary" stuff could also be in these notices, and >> not only yEnc.

    Just asking, in case a current NoCeM issuer would be interested in adding
    such filters. (I'm not going to send NoCeM notices.)

    I don't have these articles on my server.

    Same here. After looking deeper, these seem mostly in groups that my
    servers do not carry, and if they are carried, the articles are filtered by cleanfeed (before spamassassin in my setup).

    Seeing that Ray seems to carry these groups, and looks like he's doing a
    great job identifying the articles, I'm going to delay diving into this
    issue. Maybe take some time to work with Perl without tearing my hair out
    first :)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ray Banana@21:1/5 to All on Sat Mar 30 17:58:41 2024
    Thus spake Retro Guy <retroguy@novabbs.org>

    Same here. After looking deeper, these seem mostly in groups that my
    servers do not carry, and if they are carried, the articles are filtered by cleanfeed (before spamassassin in my setup).

    That is also the case here. I just added a check for binary articles to filter_first (before all tests) to add the articles to the NoCem queue
    and then continue with the normal cleanfeed processing. I have, however,
    added a filter to eliminate the most obvious bogus group names like "a.b.something".

    Seeing that Ray seems to carry these groups, and looks like he's doing a great job identifying the articles, I'm going to delay diving into this issue. Maybe take some time to work with Perl without tearing my hair out first :)

    ;-)

    PS: You seem to have an apprentice spam boy on i2pn2: <uu8uit$3h91i$1@i2pn2.org>

    --
    Пу́тін — хуйло́
    https://www.eternal-september.org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Retro Guy@21:1/5 to Ray Banana on Sat Mar 30 19:49:27 2024
    Ray Banana wrote:

    Thus spake Retro Guy <retroguy@novabbs.org>

    Same here. After looking deeper, these seem mostly in groups that my
    servers do not carry, and if they are carried, the articles are filtered by >> cleanfeed (before spamassassin in my setup).

    That is also the case here. I just added a check for binary articles to filter_first (before all tests) to add the articles to the NoCem queue
    and then continue with the normal cleanfeed processing. I have, however, added a filter to eliminate the most obvious bogus group names like "a.b.something".

    That's a good idea, I may do so. I'm doing some testing on a test inn(stall) so I can feel free to break it if necessary :)

    Seeing that Ray seems to carry these groups, and looks like he's doing a
    great job identifying the articles, I'm going to delay diving into this
    issue. Maybe take some time to work with Perl without tearing my hair out
    first :)

    ;-)

    I'm not a fan of DO, IF. I much prefer IF, THEN. Maybe it's just the author of cleanfeed that prefers that order of doing things. I'm the one not very well schooled in Perl so I don't really have any room to talk.

    PS: You seem to have an apprentice spam boy on i2pn2: <uu8uit$3h91i$1@i2pn2.org>

    He needs to try harder, lol, I'll keep an eye on it. I find a few since 22 Feb and have sent some nocem for them, but not too bad so far.

    I have things set up so it's easy for me to review the first few posts of any new user without having to wade through all the regular user's posts. I do this at least once per day.

    --
    Retro Guy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Tue Apr 2 09:25:07 2024
    Hi Wolfgang,

    I just added a check for binary articles to
    filter_first (before all tests) to add the articles to the NoCem queue
    and then continue with the normal cleanfeed processing. I have, however, added a filter to eliminate the most obvious bogus group names like "a.b.something".

    Thanks a lot!
    I see that the binary spam coming from vipernews.com is caught, that's
    great!
    Incidentally, in your NoCeM notices, wouldn't it be useful to list all
    the newsgroups they are sent to? Only the first one is currently
    written whereas they could for instance be written on subsequent lines
    starting with whitespace, or on the same line. (I agree it would lead
    to more lengthy messages or lines.)


    I think some newsgroups should be marked as allowing binaries or HTML. <CAOLa=ZSo7ngBUxkfR+EEojhr4a-mM+3=f-P1H36hnhJukEqGVA@mail.gmail.com> in linux.kernel.git was caught in the Bot-misplaced_binary filter but looks
    like a valid article.

    As for <XMJON.158253$t8cc.153345@fx06.iad> in alt.binaries.clip-art,
    which was only posted to that newsgroup, maybe it should be considered
    valid as posted in a newsgroup with a "binaries" component.

    Thanks again for your work and involvement in fighting spam.

    --
    Julien ÉLIE

    « Hâte-toi de bien vivre et songe que chaque jour est à lui seul une
    vie. » (Sénèque)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ray Banana@21:1/5 to All on Tue Apr 2 11:52:18 2024
    Thus spake Julien ÉLIE <iulius@nom-de-mon-site.com.invalid>

    Incidentally, in your NoCeM notices, wouldn't it be useful to list all
    the newsgroups they are sent to? Only the first one is currently
    written whereas they could for instance be written on subsequent lines starting with whitespace, or on the same line. (I agree it would lead
    to more lengthy messages or lines.)

    I'm using News::Article::NoCeM from CPAN to generate NoCeM messages and
    it puts each additional newsgroup on a separate line starting with a TAB
    and ending with CRLF, which led to people (wrongly) complaining about the structure of my messages. Currently, I'm testing a patch for News::Article::NoCeM that will put all newsgroups on the same line as the
    M-ID with a TAB between the M-ID and the first article and a blank
    between the individual group names.

    I think some newsgroups should be marked as allowing binaries or HTML. <CAOLa=ZSo7ngBUxkfR+EEojhr4a-mM+3=f-P1H36hnhJukEqGVA@mail.gmail.com>
    in linux.kernel.git was caught in the Bot-misplaced_binary filter but
    looks like a valid article.

    My filter makes use of the is_binary () function in Cleanfeed, which in
    turn relies on some configuration variables. The problem in the case of
    the linux.kernel.git messages is that some of them have a Content-Type
    of multipart/mixed with the PGP signature included as a Base64 encoded attachment.

    As for <XMJON.158253$t8cc.153345@fx06.iad> in alt.binaries.clip-art,
    which was only posted to that newsgroup, maybe it should be considered
    valid as posted in a newsgroup with a "binaries" component.

    Groups with "binaries" in the group name should already be excluded from
    the binary filter, will double-check this.

    Thanks for your feedback. It is much appreciated.

    --
    Пу́тін — хуйло́
    https://www.eternal-september.org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Tue Apr 2 13:31:46 2024
    Hi Wolfgang,

    I'm using News::Article::NoCeM from CPAN to generate NoCeM messages and
    it puts each additional newsgroup on a separate line starting with a TAB
    and ending with CRLF, which led to people (wrongly) complaining about the structure of my messages. Currently, I'm testing a patch for News::Article::NoCeM that will put all newsgroups on the same line as the M-ID with a TAB between the M-ID and the first article and a blank
    between the individual group names.

    Sounds great with a one-line list of newsgroups, separated with a space, thanks.

    FYI, it will be useful with the perl-nocem program shipped with the next release of INN (2.7.2) as I have added the possibility to only process a
    subset of Message-IDs within a notice, according to specific rules by
    the news admin (sort of a local function called like in
    cleanfeed.local). Having the whole list of newsgroups will permit for
    instance to process Message-IDs of articles posted to a newsgroup
    actually carried by the server. Or more complex cases like processing
    NoCeM notices for only a subset of newsgroups (if someone does not want
    to cancel anything in some newsgroups) or not taking into account
    notices from "john" or of a given type, except for a subset of newsgroups.



    I think some newsgroups should be marked as allowing binaries or HTML.
    <CAOLa=ZSo7ngBUxkfR+EEojhr4a-mM+3=f-P1H36hnhJukEqGVA@mail.gmail.com>
    in linux.kernel.git was caught in the Bot-misplaced_binary filter but
    looks like a valid article.

    My filter makes use of the is_binary () function in Cleanfeed, which in
    turn relies on some configuration variables. The problem in the case of
    the linux.kernel.git messages is that some of them have a Content-Type
    of multipart/mixed with the PGP signature included as a Base64 encoded attachment.

    Is it an issue to open upstream to Cleanfeed, to fix the is_binary()
    function?
    Or do you have a lower max_base64_lines default value, which makes it
    match PGP signatures?

    --
    Julien ÉLIE

    « Tous les champignons sont comestibles. Certains, une fois seulement. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ray Banana@21:1/5 to All on Wed Apr 3 14:02:35 2024
    Thus spake Julien ÉLIE <iulius@nom-de-mon-site.com.invalid>
    [...]
    Sounds great with a one-line list of newsgroups, separated with a
    space, thanks.

    Done now.

    FYI, it will be useful with the perl-nocem program shipped with the
    next release of INN (2.7.2) as I have added the possibility to only
    process a subset of Message-IDs within a notice, according to specific
    rules by the news admin (sort of a local function called like in cleanfeed.local). Having the whole list of newsgroups will permit for instance to process Message-IDs of articles posted to a newsgroup
    actually carried by the server. Or more complex cases like processing
    NoCeM notices for only a subset of newsgroups (if someone does not
    want to cancel anything in some newsgroups) or not taking into account notices from "john" or of a given type, except for a subset of
    newsgroups.

    Is that the -i option in perl-nocem (I'm using INN 2.8 snapshots)?

    I think some newsgroups should be marked as allowing binaries or HTML.
    <CAOLa=ZSo7ngBUxkfR+EEojhr4a-mM+3=f-P1H36hnhJukEqGVA@mail.gmail.com>
    in linux.kernel.git was caught in the Bot-misplaced_binary filter but
    looks like a valid article.
    My filter makes use of the is_binary () function in Cleanfeed, which in
    turn relies on some configuration variables. The problem in the case of
    the linux.kernel.git messages is that some of them have a Content-Type
    of multipart/mixed with the PGP signature included as a Base64 encoded
    attachment.
    Is it an issue to open upstream to Cleanfeed, to fix the is_binary() function?

    Cleanfeed from Github does not handle Content-Type: multipart/mixed
    except for HTML, so it was my own fault, obviously. Quick fix applied
    now, is_binary() still misses lots of binary attachments encapsulated in separate entities.
    I think I will make Cleanfeed more Mime-aware (MIME::Parser) and add
    local config variables for allowed/disallowed mime types when I find the time.

    --
    Пу́тін — хуйло́
    https://www.eternal-september.org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Thu Apr 4 13:55:22 2024
    Hi Wolfgang,

    Sounds great with a one-line list of newsgroups, separated with a
    space, thanks.

    Done now.

    Thanks. I'll also add support for that in perl-nocem as its legacy
    behaviour is to only take into account the first newsgroup in such a
    list. (It already coped with the syntax with several continuation lines.)


    FYI, it will be useful with the perl-nocem program shipped with the
    next release of INN (2.7.2) as I have added the possibility to only
    process a subset of Message-IDs within a notice, according to specific
    rules by the news admin (sort of a local function called like in
    cleanfeed.local). Having the whole list of newsgroups will permit for
    instance to process Message-IDs of articles posted to a newsgroup
    actually carried by the server. Or more complex cases like processing
    NoCeM notices for only a subset of newsgroups (if someone does not
    want to cancel anything in some newsgroups) or not taking into account
    notices from "john" or of a given type, except for a subset of
    newsgroups.

    Is that the -i option in perl-nocem (I'm using INN 2.8 snapshots)?

    Exactly. There's an example of how to use it in the manual page.

    Before the final release, I plan on adding two other features: a flag to
    save nocemized articles (like what saveart() does in Cleanfeed), and a
    flag to activate in daily Usenet reports the mention of notices which
    were unprocessed. This way, a news admin will have a way to find out
    possible new issuers or types.
    Do you see other things which would be worthwhile having in perl-nocem
    while I'm working on it?


    Is it an issue to open upstream to Cleanfeed, to fix the is_binary()
    function?

    Cleanfeed from Github does not handle Content-Type: multipart/mixed
    except for HTML, so it was my own fault, obviously. Quick fix applied
    now, is_binary() still misses lots of binary attachments encapsulated in separate entities.
    I think I will make Cleanfeed more Mime-aware (MIME::Parser) and add
    local config variables for allowed/disallowed mime types when I find the time.

    Thanks for your work, that sounds a great move!

    --
    Julien ÉLIE

    « – Poussez pas derrière !
    – Pas si vite devant ! » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)