~launchpad-pqm/launchpad/devel

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
= POFile statistics =

We frequently display statistics about POFiles: how many of its messages
remain untranslated, how many have been changed in Ubuntu, how many have
had suggested translations submitted that need to be reviewed?

It takes too long to gather these statistics on demand, so we cache them in
POFile fields.  When translations are changed, these statistics can be either
updated incrementally or recomputed altogether.

Incremental updates carry a risk of errors creeping in and being preserved
even during some updates.  That's why a cron job trawls the POFile table,
recomputing translation statistics and noting any divergence between the
cached values and the ones it recomputes from scratch.


== Direct invocation ==

The statistics verifier is invoked by the cron job, but we can also call
it directly.

    >>> import transaction
    >>> from lp.services.mail.stub import test_emails
    >>> from lp.translations.scripts.verify_pofile_stats import (
    ...     VerifyPOFileStatsProcess)
    >>> from lp.services.log.logger import FakeLogger
    >>> logger = FakeLogger()
    >>> VerifyPOFileStatsProcess(transaction, logger).run()
    INFO Starting verification of POFile stats at id 0
    ...
    INFO Done.
    >>> old_email = test_emails.pop()

All POFile statistics in our database are now correct, and we can test in
more detail.


== Limited runs ==

Old data will probably change less, either because it's superseded by later
distro/product series or because it has reached maturity and stabilized.  To
optimize for this principle, the verifier supports partial runs, skipping
POFiles whose id is lower than some given value.  This gives us room to
schedule more frequent runs on newer data, or we can choose to do a quick
manual run on part of the data if we believe some recent POFile(s) to have
incorrect statistics data.

As an example we verify just the POFiles with id 30 and up (something the
cron script does not allow us to do, but the underlying machinery supports).

    >>> verifier = VerifyPOFileStatsProcess(transaction, logger, 30)
    >>> verifier.run()
    INFO Starting verification of POFile stats at id 30
    INFO Done.

Again we find no errors.  The next section shows what happens when we do.


== Reports and correction ==

If for any reason any POFiles' statistics are found to be wrong, the script
reports this giving both the wrong and the corrected statistics.

    >>> from lp.translations.model.pofile import POFile
    >>> pofile = POFile.get(34)
    >>> pofile.getStatistics()
    (0, 0, 3, 0)

We have a POFile with zero current, updated, and unreviewed translations, and
3 translations changed in Ubuntu (compared to upstream).

A software bug incorrectly sets the number of changed translations to 999.

    >>> pofile.rosettacount = 999

We run the verifier on the incorrect POFile (and all POFile's with
higher ids).  It detects and reports the problem, finding a count of 999
changed translations where it expected to find 3.

Incorrect statistics are reported but do not affect the successful
completion of the verifier.

    >>> verifier = VerifyPOFileStatsProcess(transaction, logger, 34)
    >>> verifier.run()
    INFO Starting verification of POFile stats at id 34
    INFO POFile 34:
    cached stats were (0, 0, 999, 0), recomputed as (0, 0, 3, 0)
    INFO Done.

The verifier also corrects the corrupted statistics it finds, so the numbers
are once again what they were.

    >>> pofile.getStatistics()
    (0, 0, 3, 0)

The Translations administrators also receive an email about the error.

    >>> from_addr, to_addrs, body = test_emails.pop()
    >>> len(test_emails)
    0
    >>> to_addrs
    ['rosetta@launchpad.net']
    >>> in_header = True
    >>> for line in body.splitlines():
    ...     if in_header:
    ...         in_header = (line != '')
    ...     else:
    ...         print line
    The POFile statistics verifier encountered errors while checking cached
    statistics in the database:
    <BLANKLINE>
    Exceptions: 0
    POFiles with incorrect statistics: 1
    Total POFiles checked: ...
    <BLANKLINE>
    See the log file for detailed information.

== Verify recently touched POFiles runs ==

A separate script is used to verify statistics on POFiles that have
been modified in the last week (or whatever number of days is configured
in rosetta_pofile_stats.days_considered_recent parameter).

    >>> from lp.translations.scripts.verify_pofile_stats import (
    ...     VerifyRecentPOFileStatsProcess)
    >>> from datetime import datetime, timedelta
    >>> import pytz
    >>> from zope.security.proxy import removeSecurityProxy

In default configuration, we are looking for files modified in the last
7 days.

    >>> from canonical.config import config
    >>> pofile_age = int(
    ...     config.rosetta_pofile_stats.days_considered_recent)
    >>> pofile_age
    7

We add two POFiles with incorrect statistics, with one of them last
modified 8 days ago, and another recently modified.

    >>> now = datetime.now(pytz.UTC)
    >>> more_than_a_week_ago = now - timedelta(pofile_age + 1)
    >>> pofile_recent = removeSecurityProxy(factory.makePOFile('sr'))
    >>> pofile_recent.rosettacount = 9
    >>> pofile_old = removeSecurityProxy(factory.makePOFile('sr'))
    >>> pofile_old.date_changed = more_than_a_week_ago
    >>> pofile_old.rosettacount = 9

A run of `VerifyRecentPOFileStatsProcess` script fixes only the POFile
which was modified in the last week.

    >>> verifier = VerifyRecentPOFileStatsProcess(transaction, logger)
    >>> verifier.run()
    INFO Verifying stats of POFiles updated in the last 7 days.
    INFO Verifying a total of 1 POFiles.
    INFO POFile ...:
    cached stats were (0, 0, 9, 0), recomputed as (0, 0, 0, 0)
    INFO Done.

We can see that stats have been updated in the recently touched POFile,
but not in the older one.

    >>> pofile_recent.rosettacount
    0
    >>> pofile_old.rosettacount
    9


An actual script run also works, though it finds no errors since they were
all fixed already.

    >>> transaction.commit() # Ensure other process can see latest changes

    >>> from canonical.launchpad.ftests.script import run_script
    >>> (returncode, out, err) = run_script(
    ...     'cronscripts/rosetta-pofile-stats-daily.py')
    >>> print returncode
    0
    >>> print err
    INFO    Creating lockfile: /var/lock/launchpad-pofile-stats-daily.lock
    INFO    Verifying stats of POFiles updated in the last ... days.
    INFO    Verifying a total of 1 POFiles.
    INFO    Done.

== Cron job ==

The rosetta-pofile-stats cron script invokes the verifier code.  It
completes without finding any errors: the one we introduced earlier was
fixed by running the verifier directly.

    >>> (returncode, out, err) = run_script(
    ...     'cronscripts/rosetta-pofile-stats.py', ['--start-id=99'])
    >>> print returncode
    0
    >>> print err
    INFO    Creating lockfile: /var/lock/launchpad-pofile-stats.lock
    INFO    Starting verification of POFile stats at id 99
    INFO    Done.