Date: March 2008
Drivers: Neil Williams <codehelp@debian.org>,
 Joerg Jaspert <joerg@debian.org>,
 Thomas Viehmann <tv@beamnet.de>,
 Mark Hymers <mhy@debian.org>,
 Frank Lichtenheld <djpig@debian.org>
URL: http://dep.debian.net/deps/dep4/
Source: http://anonscm.debian.org/viewvc/dep/web/deps/dep4.mdwn
Abstract: This document provides an overview of the TDeb format, TDeb
 design and usage. This specification should be considered as a work in
 progress.
Source: http://svn.debian.org/viewsvn/dep/web/deps/dep4.mdwn?view=markup
Version 0.0.3
  1. TDeb Specification
    1. Motivation
    2. Copyright © 2008
  2. Format of binary translation packages (tdeb)
    1. Summary
    2. Locale-root members
    3. Use of the .tdeb suffix
    4. Format specification
  3. Source format
    1. +t1.diff.gz
  4. TDeb contents
    1. What goes into a TDeb?
  5. TDeb uploads
    1. Initial uploads - +t0
    2. Translator updates
    3. dpkg source formats
  6. TDeb resources.
    1. Packages and patches
  7. TDeb Architectures
    1. TDebs are architecture-independent
  8. TDebs and LINGUAS
    1. Avoiding changes to the source package
    2. TDebs and binary packages
    3. Migrating packages to TDeb support
  9. Resolution of corner cases
    1. TDeb documentation duplication
  10. TDebs and package managers
  11. TDebs and debconf
  12. TDebs and multiple templates files
  13. Tdebs and usr/share/doc
  14. Lintian support
    1. PO translations
  15. TDeb maintainers
  16. TDeb implementation
    1. Incorporation of the tdiff in the next source package
    2. L10N Infrastructure
    3. Timeline
  17. Changes

TDeb Specification


This is where the Draft TDeb Specification, created at the ftp-master/i18n meeting in Extremadura, will be developed and improved.


Motivation


  1. Updates to translations should not require source NMU's.
  2. Translation data should not be distributed in architecture-dependent packages.
  3. Translators should have a common interface for getting updates into Debian (possibly with automated TDeb generation after i18n team review).

Copyright © 2008


Format of binary translation packages (tdeb)

Summary

The tdeb binary package format is a variation of the deb binary package format. It has the same structure as deb, but the (single) data member is replaced by bzip2-compressed members for each LOCALE_ROOT supported.

Locale-root members

The new locale root data members are designed to support easier management of the translations, including allowing users to only install the translations that are needed for one particular installation.

e.g. a standard .deb contains debian-binary, control.tar.gz and data.tar.gz whereas a typical TDeb could contain:

$ ar -t ../pilot-qof-tdeb_0.1.7-1_all.tdeb
debian-binary
control.tar.gz
t.de.tar.bz2
t.en.tar.bz2
t.fr.tar.bz2
t.pt.tar.bz2
t.ru.tar.bz2
t.sv.tar.bz2
t.vi.tar.bz2

t.pt.tar.bz2 would contain translations for pt and pt_BR:

./usr/share/locale/pt/LC_MESSAGES/pilot-qof.mo
./usr/share/locale/pt_BR/LC_MESSAGES/pilot-qof.mo

This allows later tools to extract only the requested translations from the TDeb upon installation.

TDebs are based on the .deb format, it is only a small change in the organisation of the data.tar.gz but it simplifies various stages of handling the resulting packages in the repository, in upload rules and in other support tools.

Use of the .tdeb suffix

Various file-based tools exist to handle .deb files and it will be easier for such tools to be able to reliably tell the difference between a .deb and a .tdeb from the filename rather than having to add new support in the codebase to detect the absence of data.tar.gz and work out how to handle the t.$root.bz2 members. The suffix also makes it easier to manage TDebs in various repository situations. Although closely related to the .deb format, the .tdeb format is sufficiently different to merit a subtle change to the suffix in a similar manner to .udeb.

Format specification

The file is an ar archive with a magic number of !<arch>.

The first member is named debian-binary and contains a series of lines, separated by newlines. Currently only one line is present, the format version number, 2.0 at the time the original dpkg manual page was written. Programs which read new-format archives should be prepared for the minor number to be increased and new lines to be present, and should ignore these if this is the case.

If the major number has changed, an incompatible change has been made and the program should stop. If it has not, then the program should be able to safely continue, unless it encounters an unexpected member in the archive (except at the end), as described below.

The second required member is named control.tar.bz2. It is a tar archive compressed with bzip2 which contains the package control information, as a series of plain files, of which the file control is mandatory and contains the core control information. The control tarball may optionally contain an entry for '.', the current directory.

The members following the control.tar.bz2 are named t.${LOCALE_ROOT}.tar.bz2. Each contains the filesystem archive for the locale root, as a tar archive compressed with bzip2.

LOCALE_ROOT must match the regular expression [a-z]{2,3}

These members must occur in this exact order. Current implementations should ignore any additional members after the t.${LOCALE_ROOT}.tar.bz2 members. Further members may be defined in the future, and (if possible) will be placed after these. Any additional members that may need to be inserted before t.${LOCALE_ROOT}.tar.bz2 and which should be safely ignored by older programs, will have names starting with an underscore, '_'.

Those new members which will not be able to be safely ignored will be inserted before the t.${LOCALE_ROOT}.tar.bz2 members with names starting with something other than underscores, or will (more likely) cause the major version number to be increased.


Source format

+t1.diff.gz

TDebs will use a source format for translation updates that will not cause any changes in the package binaries. The foo_1.2.3-4+t1.diff.gz will be created for changes made by translators and tools will need to apply the translation diff after applying the .diff.gz prepared (and signed) by the Debian maintainer.

The +t[0-9] update will need to be built from the source package but only details changes in the translated content. No changes will be allowed in the package binaries or untranslated content.

Translation updates are source-package based and translation updates are denoted by the +t[0-9] suffix where 0 is assumed to be the original upload by the Debian maintainer.

e.g. for a non-native package foo:

source version 1.2.3-4,
the first TDeb update would be foo_1.2.3-4+t1
the changes from -4 to -4+t1 will be in foo_1.2.3-4+t1.diff.gz

BinNMU versions are not affected as it is source based.

The +t1.diff.gz needs dpkg support which is being implemented:

New translations and translation fixes are currently tracked in the BTS. Tdeb uploads shall be able to close those bugs. Using a changelog might be the easiest way.

During the transition, those bugs will remain. After the transition, those bugs will go away so there should be no need for a closure method. We'll need to rely on i18n.debian.org for translation tracking after Squeeze.


TDeb contents

What goes into a TDeb?

(With the exception of debconf templates, untranslated content remains in the original package).

Translated content, including:

TDeb uploads

Initial uploads - +t0

The initial TDeb will be generated by the maintainer, effectively +t0, containing whatever translations are currently supported. The TDeb is uploaded alongside the binary and .dsc. It is up to the maintainer to incorporate any +t1.diff.gz containing updated or new translations that may exist already into each new Debian version.

If the new version has changed translated strings then those will only available in English until the +t1 TDeb can be prepared.

Maintainers are advised to always seek translation updates prior to the upload of the initial TDeb. If maintainers implement a string freeze and wait for translation updates before uploading, the chances of a +t1.diff.gz being required by time of the next release by the maintainer are lower.

See also Timeline.

Maintainers will be creating TDebs in Squeeze+1, using debian/rules, using debhelper calls and uploading TDebs each time they would currently upload any package that contains /usr/share/locale/LC_*/ etc. Those TDebs are, effectively, +t0 - only updates by translators start the +t1 sequence.

Maintainer uploads (non-native package example):

foo_1.2.3-4_amd64.deb
foo-tdeb_1.2.3-4_all.tdeb
foo-bar_1.2.3-4_amd64.deb
foo_1.2.3-4.diff.gz
foo_1.2.3.orig.tar.gz
foo_1.2.3-4.dsc
foo_1.2.3-4_amd64.changes

Maintainer uploads (native package example):

foo_1.2.3_amd64.deb
foo-tdeb_1.2.3_all.tdeb
foo-bar_1.2.3_amd64.deb
foo_1.2.3.tar.gz
foo_1.2.3.dsc
foo_1.2.3_amd64.changes

The foo-tdeb package will be listed in the .changes anyway so existing tools will simply add it to the list of files to be uploaded to ftp-master or wherever. foo-tdeb_1.2.3-4_all.tdeb is, effectively, foo-tdeb_1.2.3-4+t0_all.tdeb

When the maintainer makes a new release, foo_1.2.3-5, which incorporates the TDeb changes, it is done in a similar manner to how an NMU is included. All files matching foo*1.2.3-4* are removed by dak when the new version is uploaded. The updated translations now exist in foo-tdeb_1.2.3-5_all.tdeb - uploaded by the maintainer and there is no +t1.diff.gz or +t1_all.tdeb until the package translations need to be touched again.

Translator updates

Updates to translations will update the existing TDeb, creating +t2.diff.gz and +t3.diff.gz etc. All supported languages go into the existing TDeb, organised by locale root.

Unless a package needs more than one TDeb for the debconf plus large amounts of translated documentation corner case, each source package should only expect to have one TDeb for all binary packages and all locales.

Translation teams can work together to make uploads in a coordinated manner - similar to the current method of requesting deadlines for i18n bugs, a nominated person can collate the various translations prior to a deadline chosen by the teams themselves, according to the needs of that particular package.

Translator updates of TDebs do not necessarily need to use typical package building tools like 'dpkg-buildpackage'. All that is needed is to put the .mo files into the relevant directory hierarchy (or use dh_gentdeb) and then call dpkg-deb --tdeb -b:

dpkg-deb --tdeb -b debian/pilot-qof-tdeb  ../pilot-qof-tdeb_0.1.7-1_all.tdeb

This means that translators can build updated TDebs without needing the full dependency chain needed for a source rebuild - only dpkg (at a version that includes the TDeb support) is strictly necessary.

Translator update uploads would contain:

foo-tdeb_1.2.3-4+t1_all.tdeb
foo_1.2.3-4+t1.diff.gz
foo_1.2.3-4+t1.dsc
foo_1.2.3-4+t1_all.changes

The key point is that a +t1 revision can happen during a release freeze without touching the source, without changing any of the binaries. Once the release is out and unstable is accessible again, the maintainer adds +t1.diff.gz to their next upload.

dpkg source formats

Format 3.0 should not be any more difficult than 1.0 or anything that follows. 3.0 has to deal with incorporating patches and changes from the Debian Bug Tracking System, so +t1.diff.gz is no different.

What matters is that the maintainer gets the +t1.diff.gz and applies it onto the source package prior to the next upload. It's no different to how the same maintainer would handle a patch or new translations file sent to the BTS.


TDeb resources.

Packages and patches

The main changes to support TDebs will be concentrated in the archive tools and central packaging tools (dpkg, apt, debhelper).

Test packages are available via Emdebian:

Patches for current tools are handled in repositories for the relevant tools:


TDeb Architectures

TDebs are architecture-independent

TDebs must only be used for Architecture-independent data. There will be NO support for Architecture-dependent TDebs outside Emdebian.

Any translation system that does not use gettext can choose to use TDebs as long as the translation files are architecture-independent.


TDebs and LINGUAS

Avoiding changes to the source package

Many packages using autotools use the LINGUAS support of gettext but this requires changes within the source of the package - sometimes po/LINGUAS but more commonly configure.ac|in. Changing configure.ac and regenerating the autotools build system completely undermines the objective of TDebs being able to be used independently of maintainer uploads and NMUs. Existing TDeb support ignores the LINGUAS method, therefore:

If a $lang.po file exists in a recognisable po directory (${top_srcdir}/po/ or ${top_srcdir}/po-*/, TDeb handlers will process that .po file even if it is not listed in LINGUAS. If the PO file is valid, the generated .mo file will be included into the TDeb.

Packages will no longer be able to have unactivated or unused translations. (This is a debhelper / other packaging tool implementation problem, not a dpkg one)

As a result of this requirement, the debhelper tdeb tool (dh_gentdeb) handles finding the translations, preparing the binary translation files and moving the translations to suitable directories within the package build.

TDebs and binary packages

The filesystem contents of TDebs and their associated binary packages must be mutually exclusive, so that dpkg doesn't need any special replace handling. We will still need some Replaces for the transition, but that can be handled like any other Replaces.

Migrating packages to TDeb support

Maintainers will need to make a variety of changes to support TDebs:


Resolution of corner cases

TDeb documentation duplication

Basing the TDeb on the source package means that the TDeb could include large amounts of translated documentation. This results in a corner case where a package with debconf templates and a large amount of translated documentation would result in the docs being installed merely to obtain the translated templates. In order to resolve this, each source package may have one or more tdebs. If a source package has translations, it must have a tdeb named after the source package (suffixed with -tdeb) and all debconf templates must be placed in it. Such a package should place all architecture independent documentation (even in the native language) into a tdeb. If a package contains documentation which is not always required (for example API documentation or user documentation), the source package may provide additional ${source}-${foo}-tdeb_$version_all.tdeb files.

If tdebs are revised by the translation teams, the suffix +t[0-9]+ must be used and all tdebs for the source package must be revised at the same time.


TDebs and package managers

Package managers can find out whether a package has a base tdeb by examining the Packages file for Translation-Version: [0-9]+. In the case of Translation-Version: 0, the tdeb name and version is the same as the source file with -tdeb appended.

In the case of Translation-Version: 1 or higher, the tdeb name is ${source}-tdeb$version+t[0-9]+all.tdeb. Additional tdebs are referenced in the Packages file in the following way: Additional-Translations: ${source}-api-tdeb, ${source}-user-tdeb

In cases where a base tdeb is present, package managers must call dpkg with the tdeb and the deb in the same invocation in order to ensure that all debconf templates can be extracted before the config script is run.

There is no need to unpack in order to obtain the debconf templates - the tdeb merely has to be locatable by debconf which will call apt-extracttemplates and load the translated debconf strings into memory. See TDebs and debconf:


TDebs and debconf

apt-extracttemplates is used by debconf's dpkg-preconfigure to extract templates from the not-yet-extracted .debs right after download. This needs to take tdebs into account. Note that the templates are per-binary while tdebs are per-source. Also, the .deb should have non-translated templates.


TDebs and multiple templates files

If a source package builds multiple binaries that use debconf, the debian/ directory will contain foo.templates and bar.templates. The TDeb will retain all templates files under the original names. apt-extracttemplates and po-debconf will need to work together to ensure that all templates files are available to debconf so that debconf can selectively load only the templates files required.


Tdebs and usr/share/doc

A tdeb needs usr/share/doc/copyright and changelog.Debian and dpkg will create the necessary files, just as with a normal .deb.


Lintian support

PO translations


TDeb maintainers

Rather than allow repeat uploads of the same change in multiple languages, coordinate builds of tdebs to make a single upload with as many changes as possible at one time. Translation-Maintainers: in debian/control and Localisation Assistants.


TDeb implementation

Incorporation of the tdiff in the next source package

A process will be needed to help maintainers including the tdiff when they prepare a new source package (kind of NMU acknowledgement?) Automated so that the +t1.diff.gz is automatically applied if it exists. Problem still exists with maintainers who don't check apt-get source first. Possible method is to modify uscan and uupdate.

When the maintainer prepare a new package, he applies the tdiff and "acknownledge the new translations". (This tdiff has great chances not to be applicable if the upstream source changed)

The i18n infrastructure can check that this acknowledgement is really performed (e.g. merge the old translations in the new one and check if the translation statistics changed)

Automation in uscan should be possible

This issue can be postponed until tdebs appear for non-native packages (squeeze+1)


L10N Infrastructure

i18n.debian.net gathers the translation material from the packages. It needs to support tdebs too (tdiff).

i18n.debian.net can check that translation material from the tdiff were merged in new versions of the source package

i18n.debian.net needs to help "Localisation Assistants" in gathering the new translations before the preparation of a new tdeb


Timeline

Sequence

What needs to be done for Squeeze ?

There will be no TDebs in Squeeze.

What needs to be done for Squeeze + 1?

First generation of TDebs:

What will be done for Squeeze + 2 or later?


Changes


2009-03-08 - [Neil Williams] * Convert to DEP.

2009-03-19 - [Neil Williams] * Add a table of contents via ikiwiki

2009-04-14 - [Neil Williams] * Tweak some of the links to become active. * Update the URL * Fold in the results of the discussions so far on -devel.