(3)
Some program (xgettext for program translations, po4a for manual pages
and some podebconf tool for debconf templates) is used to pull the translatable strings from the source code and to create a POT file.
xgettext doesn't even try to create a meaningful header and overwrites whatever one has written into the previous version of the POT file, so a wrapper is already needed to have a header that translators can fill
(which they usually don't do).
For Adduser's program translations, my call to xgettext is:
xgettext --keyword=mtx --keyword=gtx --omit-header -o "$TEMP_FILE" --from-code=UTF-8 -L perl adduser deluser $(find . -name "*.pm")².
TEMP_FILE then gets the generated header prepended to result in
adduser.pot.
I have seen this being done in debian/rules' clean target which, in
in-tree builds, causes the POT file to be changed as well and I don't understand at which step of the packaging process it would be a good
idea to commit that POT file. If I build my package out of tree (like I
do out of tradition of svn-buildpackage, I have gbp configured to use ../build-area), the POT file ends up newly generated in the source
package but never gets updated in git. Adduser had POT files from 2022
in git until just recently because I just never noticed. There is no
lintian check and no check inside tracker.d.o for this.
In other packages, there is a dedicated m4 macro to call xgettext which doesn't make things easier to understand.
(4)
Then,
msgmerge --update --backup=none --no-fuzzy-matching "${PO_FILE}" "${POT_FILE}"
is called for every existing PO file. This doesn't move the header from
the POT file to the existing PO file so stupidities like "# COPYRIGHT
THE PACKAGE CREATOR" just never get fixed because the translators don't
seem to care.
If a po file for a language already exists during this step, the already existing translation gets merged into the new po file. In some
circumstances that I have not understood yet, the translation gets "fuzzied",
which I have been told causes a lot of unnecessary and
repeated work for the translators which I am supposed to avoid by doing manual work myself which I don't understand. Not doing this work is condemned as "not being nice to translators".
Basically the same applies for this step than for the POT generation
step, with the additional hardship that the PO files are generated,
being written to by a program AND STILL contain a significant part of
human work. I never know how much work of other people I am destroying
by calling msgmerge out of line.
In which stage of package build do I do
msgmerge? Do I commit the merged po files, when do I commit them, what
do I do with them during git merge when a feature branch is merged?
(5)
podebconf-report-po is used to generate the calls for translation. One message is sent to this mailing list with the pot file attached, and for each existing po file, the translators listed in that file get an
individual mail with just the respective po file attached.
If the msgmerge step is forgotten, they get an already translated po
file that doesn't match the pot and therefore is useless.
In theory, for an already existing package, the POT file is not needed, right?
(6)
Depending on the age of the existing translations, about half of the messages I send to individual translators are going to bounce. Am I
supposed to report that to debian-i18n@l.d.o as a followup to the
general translation request so that new translators can take up the
outdated translatorless translations? Or am I supposed to send the
general translation request to debian-i18n last so that I can explicitly mention the translatorless languages there?
(8)
When a translator does a translation, they send me a new po file
containing the actual translation. If it's a new language, they start
with the pot file that hopefully has the correct header, and if it's an existing language, they start with the old po file, which almost always
has a historically grown header that is in more or less dire need of streamlining and cleaning. They either take the PO/POT file from the
e-mail attachment, use a package the pulled from the archive³ or they
pull the PO/POT file from git.
They usually don't bother about the header or copyright, so things like package name, licenseª, Project-ID-Version and PO-Revision-Date are
often questionable, unclear, just plain wrong or cause extra work to
package maintainer because, for example, a different license was chosen
than the actual package is licensed under either out of incompetence or
not caring.
Am I supposed to fix those headers in the po file myself? Am I supposed
to ask the translator to fix the headers? Or am I supposed to just
ignore all of that and just accept whatever I get sent? I often feel
like a smart-mouthed know-it-all when I ask a translator to improve the headers of their PO file.
(9)
I then commit the po file the translator sent me to version control
(10)
And then I eventually release the package.
In theory, it would probably be good to do all that regeneration when preparing a package for release. Why don't we have a debian/rules target like debian/rules prepare-release that might be useful for that? How
could we protect us against uploading a package with outdated POT/PO
files? People make mistakes.
How am I supposed to handle the unavoidable differences between git branches, that are probably easier to solve when I am just merging a
feature branch but can be a major pain when merging suite branches like experimental, stable, unstable where translation work has already been
done?
There must be some smarter method when merging to mas^wdebian/unstable[...]
than (1) move away all po files, (2) merge, (3) ignore all merge
conflicts in po files, (4) regenerate POT, (5) restore po files moved
away in step 1, (6) msgmerge, (7) do a dedicated translation commit
(one? or one per file?).
Thanks for reading up to this point. Writing this message alone has cost
me three hours of my time that I'd rather have put in productive
packaging work, and a sleepless night. You know, when I blow a fuse, I
rant, and then I start writing docs. I guess when I put the result of
this discussion in a wiki page, it should be under i18n, right? I am inclined to put on https://wiki.debian.org/I18n a dedicated chapter
titled "for package maintainers", probably between "Keyboard input infrastructure support" and "Meetings" as this is a matter beyond
interna of the translation teams and the i18n effort. Am I on the
correct track with that?
As I understand your question about a general workflow of translations
in all your email-cases correct, I think the short answer maybe to use `quilt` to patch the .PO files instead of committing them directly via
Git. Also consider `gbp-pq`, could be useful, but not sure.
As I understand your question about a general workflow of translations
in all your email-cases correct, I think the short answer maybe to use >>`quilt` to patch the .PO files instead of committing them directly via
Git. Also consider `gbp-pq`, could be useful, but not sure.
Thank you for the time you took to write this reply. However Mark did
not explicitly mention that fact, but he was talking about the
`adduser` package which is a (Debian) native package, which means
there is no further "upstream" and thus there is no reason to use
quilt and patch queues in this case (that would only further
complicate things).
I'm on mobile, so only a quick reply: merging should be done with
msgmerge as well — you need to call it twice, once with the po files
from both branches, and once again with the pot file to fix all the
location comments and fuzzy flags.
Alternatively, you can define a filter in git to remove(!) locations on >checkout and restore them from the pot file on commit, that solves the >majority of conflicts.
(2)
There is some point in the development process when it is time to ask
for translations. Translators need a POT file which contains all the >>translatable strings, and they make a PO file from that which contains
the actual translation.
That's for the initial translation. Once there is a .po they will
update it directly and don't need or use the .pot anymore.
(4)
Then,
msgmerge --update --backup=none --no-fuzzy-matching "${PO_FILE}" >>"${POT_FILE}"
is called for every existing PO file. This doesn't move the header from
the POT file to the existing PO file so stupidities like "# COPYRIGHT
THE PACKAGE CREATOR" just never get fixed because the translators don't >>seem to care.
I'm not sure the --no-fuzzy-matching helps here (see below).
Also I
note that msgmerge is not used at all in devscripts packaging, which
only uses po4a instead. Maybe you could look there if it could help
your case.
In a few cases there is spurious "fuzzing", e.g. when the source
message uses `...' for quoting where it shoud have been using \"...\".
In some of these cases it may be possible to fix the issue in the
source message, but I believe other cases are actually bugs in
msgmerge. Are these what your translators are asking you to deal with?
On a merge request I would just post comments to ask the translator to
fix the headers metadata and licensing, but may still deal with >normalizing/rewrapping myself (e.g. by adding a commit to the MR where >possible). If this were exchanged over e-mail I would probably fix
most trivial things myself.
² adduser has strings that get used in both translated and untranslated >>form, making sure that messages written to the console are translated
and messages written to syslog are written in English to make handling
bug repors easier
The "industrial" way to deal with in other (usually large) projects is
to pre/postfix all source log messages with a unique identifier e.g. >"(ADDUSER-1234)". This makes it possible to have tech-support-friendly
and end-user-friendly translated log messages, and as an added benefit
it has excellent SEO characteristics when it comes to searching for >workarounds on the web or in a knowledge base.
³ I have received translations that were obviously done against the POT >>file from stable.
That's a start, I guess. Maybe in these cases you can keep the .po
file as submitted for proposed updates, merge it in unstable and
nicely ask the translator to also please work on the upcoming version.
It might also be worthwhile to forward your message to
debian-i18n@l.d.o, since translators and i18n people are more likely to
be subscribed there and less likely to be subscribed here.
xgettext comes with a ton of options to help you. Have a look at the
diff I've attached for what I've been able to do.
Note that you shouldn't define the plural stuff in the POT file, that's >something that's set on a per-language basis. There should also be a
newline between the header and first message block.
I have seen this being done in debian/rules' clean target which, in
in-tree builds, causes the POT file to be changed as well and I don't
understand at which step of the packaging process it would be a good
idea to commit that POT file. If I build my package out of tree (like I
do out of tradition of svn-buildpackage, I have gbp configured to use
../build-area), the POT file ends up newly generated in the source
package but never gets updated in git. Adduser had POT files from 2022
in git until just recently because I just never noticed. There is no
lintian check and no check inside tracker.d.o for this.
In other packages, there is a dedicated m4 macro to call xgettext which
doesn't make things easier to understand.
Usually, all this stuff with generating and updating POT & PO files is >upstream's responsibility to deal with, hence why you'll find little >documentation for translating anything other than debconf templates.
Since this
is a native package, it's up to you to do what you want. My suggestion is to >run this script before release; the most important thing is that it is run >after the program's messages are updated and _finalised_, and before sending it
to translators.
(4)
Then,
msgmerge --update --backup=none --no-fuzzy-matching "${PO_FILE}" "${POT_FILE}"
is called for every existing PO file. This doesn't move the header from
the POT file to the existing PO file so stupidities like "# COPYRIGHT
THE PACKAGE CREATOR" just never get fixed because the translators don't
seem to care.
The header is only touched when the translation is initially created.
I've rarely seen anyone being condemned for fuzzy translations (but then >again, I work in a language team that has virtually no members).
In which stage of package build do I do
msgmerge? Do I commit the merged po files, when do I commit them, what
do I do with them during git merge when a feature branch is merged?
In your position, I would leave the translations and the POT file
untouched on the feature branch, and only ever update them on the main
branch after merging.
Yes, in theory, but it's still helpful to attach it for a variety of
reasons e.g. the existing translation is an old garbled mess and
starting new is the best option.
(6)
Depending on the age of the existing translations, about half of the
messages I send to individual translators are going to bounce. Am I
supposed to report that to debian-i18n@l.d.o as a followup to the
general translation request so that new translators can take up the
outdated translatorless translations? Or am I supposed to send the
general translation request to debian-i18n last so that I can explicitly
mention the translatorless languages there?
According to the manual page (and from what I've seen on >debian-l10n-ar@l.d.o), the language team listed in the PO file (which
should _always_ be debian-l10n-LANG@l.d.o) is Cc'ed by default, so they
will deal with inactive translators. Anyone working on translations
should be subscribed to the list for the relevant language, so it'll be >picked up. You don't need to do anything extra.
Am I supposed to fix those headers in the po file myself? Am I supposed
to ask the translator to fix the headers? Or am I supposed to just
ignore all of that and just accept whatever I get sent? I often feel
like a smart-mouthed know-it-all when I ask a translator to improve the
headers of their PO file.
What you do is up to you. Translation headers are annoyingly
inconsistent from a QA perspective, so don't feel bad for asking
translators to fix headers (or fix them yourself if it's easy enough).
I've had to do this for when I recieved translations for miniflux's
debconf templates.
(9)
I then commit the po file the translator sent me to version control
(10)
And then I eventually release the package.
In theory, it would probably be good to do all that regeneration when
preparing a package for release. Why don't we have a debian/rules target
like debian/rules prepare-release that might be useful for that? How
could we protect us against uploading a package with outdated POT/PO
files? People make mistakes.
I've attached a rough check script in the same diff that tells you if
you're POT is outdated, based on exit code. You could use this in a >pre-commit hook or somewhere in d/rules to fail the build (like >execute_before_dh_auto_configure).
How am I supposed to handle the unavoidable differences between git
branches, that are probably easier to solve when I am just merging a
feature branch but can be a major pain when merging suite branches like
experimental, stable, unstable where translation work has already been
done?
Wouldn't an "theirs" merge strategy for only the translations work?
During a merge conflict, you can use `git checkout --theirs -- po/`, or
use .gitattributes with `po/* merge=theirs` (I haven't tried this).
This way new translations from, for instance, debian/experimental, will >replace the old ones in debian/unstable.
But honestly, if some changes are in debian/experimental or in a feature >branch, the translations should really be left alone, since noone will
use it anyway and the package is prone to further changes.
Thanks for reading up to this point. Writing this message alone has cost
me three hours of my time that I'd rather have put in productive
packaging work, and a sleepless night. You know, when I blow a fuse, I
rant, and then I start writing docs. I guess when I put the result of
this discussion in a wiki page, it should be under i18n, right? I am
inclined to put on https://wiki.debian.org/I18n a dedicated chapter
titled "for package maintainers", probably between "Keyboard input
infrastructure support" and "Meetings" as this is a matter beyond
interna of the translation teams and the i18n effort. Am I on the
correct track with that?
IMO this information should probably go on i18n.d.o.
If you don't want to deal with translation stuff, I'm happy to help with
that aspect, and if you'd like you can offload that on me.
Whenever I am angry about something in Debian, I start writing docs. So
I try this here, but here I don't know enough to be really helpful. I
hope that this rant will start a positive discussion with actual results
that I could pour into a Wiki page that might actually help with the
pain I am feeling, assuming that many other maintainers feel as well.
Don't take it personally: as an established maintainer
of i18nized software did you know about this process?
In a few cases there is spurious "fuzzing", e.g. when the source
message uses `...' for quoting where it shoud have been using \"...\".
In some of these cases it may be possible to fix the issue in the
source message, but I believe other cases are actually bugs in
msgmerge. Are these what your translators are asking you to deal with?
Not directly the translators, but many years ago somebody who helped me
with making a package fit for translation mentioned that such changes
are unfriendly to translators and that one should handle those manually
and unfuzz things (in a manual process that seemed clumsy and
error-prone to me even back then).
On a merge request I would just post comments to ask the translator to
fix the headers metadata and licensing, but may still deal with
normalizing/rewrapping myself (e.g. by adding a commit to the MR where
possible). If this were exchanged over e-mail I would probably fix
most trivial things myself.
So you're saying that as package maintainer I can freely edit headers
and metadata for translations, putting the PO file into a mixed domain between package maintainer and the respective translator?
³ I have received translations that were obviously done against the
POT
file from stable.
That's a start, I guess. Maybe in these cases you can keep the .po
file as submitted for proposed updates, merge it in unstable and
nicely ask the translator to also please work on the upcoming version.
But translating a stable package does show a blatant non-understanding
of Debian's development mechanisms! How am I supposed to trust people
who care THIS little about the project they're contributing to?
xgettext comes with a ton of options to help you. Have a look at the
diff I've attached for what I've been able to do.
Will do and take your suggestions. But it is still a wrapper needed.
tl;dr: New Wiki Page https://wiki.debian.org/I18n/ForPackageMaintainers, please review
Am Wed, Mar 19, 2025 at 09:44:47AM +0100 schrieb Marc Haber:
tl;dr: New Wiki Page https://wiki.debian.org/I18n/ForPackageMaintainers,
please review
I just did it. I usually updated it right away, but at one or two
points it is more like a discussion. And due to my changes some
redundancy is included, but I think this is more helpful during
development, you can streamline it of course in the next step(s).
If you have any questions on my changes it is probably best to discuss
this on list. Please keep me (or -i18n) in CC, as I'm not subscribed
to -devel.
On Sat, Mar 22, 2025 at 06:32:54PM +0000, Helge Kreutzmann wrote:
Am Wed, Mar 19, 2025 at 09:44:47AM +0100 schrieb Marc Haber:
If you have any questions on my changes it is probably best to discuss
this on list. Please keep me (or -i18n) in CC, as I'm not subscribed
to -devel.
I think the only matter that needs discussion is whether and when to commit updated PO files:
Me:
It is currently not clear how to make sure that a package maintainer does
not forget these updates without creating lots of useless commits with new
PO files that only differ in line number and date stamp comments.
You:
Using no line numbers reduced this problem. Updated time stamps are no worry for translators (they seldomly look at them). And they do not care for commits - they look at master or similar or on the web pages mentioned above and take what they find. So the most important part is to keep this current. If you need this for your VCS, you can add code in your build system to discard po(t) file updates which only change in the date stamp.
My questions:
"Using no line numbers" => invoke msgmerge with --no-location?
"Web pages mentioned above" => I don't see web pages being mentioned. That needs a name or a link
"Add code in your build system to discard po(t) that only change in the date stamp" => that would mak ethe source package and the tag in the VCS diverge. I don't like that at all.
I have a bit played around with `msgmerge` and found a way to display
a `diff` which contains the translations which will be destroyed if
you merge a .po file from a translator
* test-de_DE-transl.po
into your maintained one
* test-de_DE.po
in conjunction with the current generated .pot file
* test.pot
The essential code snippet looks like this
**********************************************************************
*
* msgmerge -q -C test-de_DE-transl.po test-de_DE.po test.pot \
* > test-de_DE-maint+transl.po
* msgmerge -vv -q -C test-de_DE.po test-de_DE-transl.po test.pot \
* > test-de_DE-transl+maint.po
*
* # Print out the new merged .po file
* cat test-de_DE-transl+maint.po
*
* # Show the diff of destroyed translations (including comments)
* diff -u test-de_DE-maint+transl.po test-de_DE-transl+maint.po
*
**********************************************************************
Am Mon, Mar 24, 2025 at 05:49:40PM +0100 schrieb Marc Haber:
"Using no line numbers" => invoke msgmerge with --no-location?
Yes.
"Web pages mentioned above" => I don't see web pages being mentioned. That >> needs a name or a link
I meant the references to the Debian status pages, the link I inserted >further above: https://www.debian.org/international/l10n/
Please reword this to make it more clear.
"Add code in your build system to discard po(t) that only change in the date >> stamp" => that would mak ethe source package and the tag in the VCS diverge. >> I don't like that at all.
This I don't understand.
At some stage you update the POT files. Then you run a (git) commit,
to place the updated files in your repository.
In manpages-l10n Tobias added code to detect, if only the time stamp
changed. If so, the time stamp is reverted to the previous value, and
a "git commit" is a noop. Then also the po files are left alone.
This "only" saves you a commit in this corner case.
It is not meant to diverge version, because in the end the po(t) files
in your package should match the po(t) files in the repository.
tl;dr: We need more docs about best practices to handle translations as
a package maintainer
Le Tue, Mar 11, 2025 at 12:03:51PM +0100, Marc Haber a écrit :
tl;dr: We need more docs about best practices to handle translations as
a package maintainer
I would suggest you find someone on debian-i18n willing to handle translations for
adduser for you and give them commit access.
That is how popularity-translation translation have been handled for a long time
(thanks, Christian Perrier, I learned everything about translations
handling this way!)
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (0 / 16) |
Uptime: | 166:16:53 |
Calls: | 10,385 |
Calls today: | 2 |
Files: | 14,057 |
Messages: | 6,416,528 |