~launchpad-pqm/launchpad/devel

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
= Filebug Data Parser =

An application like Apport can upload data to Launchpad, and have the
information added to the bug report that the user will file. The
information is uploaded as a MIME multipart message, where the different
headers tells Launchpad what kind of information it is.


== FileBugDataParser Internals ==

FileBugDataParser is used to parse the MIME message with the information
to be added to the bug report. The information is passed as a file
object to the constructor.

    >>> from StringIO import StringIO
    >>> from lp.bugs.utilities.filebugdataparser import FileBugDataParser
    >>> parser = FileBugDataParser(StringIO('123456789'))

To make parsing easier and more efficient, it has a buffer where it
stores the next few bytes of the file. To begin with, it's empty.

    >>> parser._buffer
    ''

Whenever it needs to read some bytes of the file, it will read a fixed
number of bytes into the buffer. The number of bytes is specified by the
BUFFER_SIZE variable.

    >>> parser.BUFFER_SIZE = 3

There is helper method, _consumeBytes(), which will read from the file
until a certain delimiter string is encountered.

    >>> parser._consumeBytes('4')
    '1234'

In order to find the delimiter string, it had to read '123456' into
the buffer. Up to the delimiter string is read, but the rest of the
string is kept in the buffer.

    >>> parser._buffer
    '56'

The delimiter string isn't limited to one character.

    >>> parser._consumeBytes('67')
    '567'

    >>> parser._buffer
    '89'

If the delimiter isn't found in the file, the rest of the file is
returned.

    >>> parser._consumeBytes('0')
    '89'
    >>> parser._buffer
    ''

Subsequent reads will result in the empty string.

    >>> parser._consumeBytes('0')
    ''
    >>> parser._consumeBytes('0')
    ''


=== readLine() ===

readLine() is a helper method to read a single line of the file.

    >>> parser = FileBugDataParser(StringIO('123\n456\n789'))
    >>> parser.readLine()
    '123\n'
    >>> parser.readLine()
    '456\n'
    >>> parser.readLine()
    '789'

If we try to read past the end of the file an AssertionError will be
raised. This is to ensure that invalid messages won't cause an infinite
loop, or something like that.

    >>> parser.readLine()
    Traceback (most recent call last):
    ...
    AssertionError: End of file reached.


=== readHeaders() ====

readHeaders() reads the headers of a MIME message. It reads all the
headers, untils it sees a blank line.

    >>> msg = """Header: value
    ... Space-Folded-Header: this header
    ...  is folded with a space.
    ... Tab-Folded-Header: this header
    ... \tis folded with a tab.
    ... Another-header: another-value
    ...
    ... Not-A-Header: not-a-value
    ... """
    >>> parser = FileBugDataParser(StringIO(msg))
    >>> headers = parser.readHeaders()
    >>> headers['Header']
    'value'
    >>> headers['Space-Folded-Header']
    'this header\n is folded with a space.'
    >>> headers['Tab-Folded-Header']
    'this header\n\tis folded with a tab.'
    >>> headers['Another-Header']
    'another-value'
    >>> 'Not-A-Header' in headers
    False


== Parsing the data ==

The parse() method returns a FileBugData object, with the information
from the message as attributes.


=== Headers ===

The headers are processed by the _setDataFromHeaders() method. It
accepts a FileBugData object and a dictionary of the headers.


==== Subject ====

The Subject header is available in the initial_summary attribute.

    >>> from lp.bugs.browser.bugtarget import FileBugData
    >>> data = FileBugData()
    >>> parser = FileBugDataParser(None)
    >>> parser._setDataFromHeaders(data, {'Subject': 'Bug Subject'})
    >>> data.initial_summary
    u'Bug Subject'


==== Tags ====

The Tags headers is translated into a list of strings as the
initial_tags attributes. The tags are translated to lower case
automatically.

    >>> data = FileBugData()
    >>> parser._setDataFromHeaders(data, {'Tags': 'Tag-One Tag-Two'})
    >>> sorted(data.initial_tags)
    [u'tag-one', u'tag-two']


==== Private ====

The Private header gets translated into a boolean, as the private
attribute. The values "yes" and "no" are accepted, which get translated
into True and False.

    >>> data = FileBugData()
    >>> parser._setDataFromHeaders(data, {'Private': 'yes'})
    >>> data.private
    True
    >>> data = FileBugData()
    >>> parser._setDataFromHeaders(data, {'Private': 'no'})
    >>> data.private
    False

We're in no position of presenting a good error message to the user at
this point, so invalid values get ignored.

    >>> data = FileBugData()
    >>> parser._setDataFromHeaders(data, {'Private': 'not-valid'})
    >>> print data.private
    None


==== Subscribers ====

The Subscribers header is turned into a list of strings, available
through the subscribers attribute. The strings get lowercased
automatically.

    >>> data = FileBugData()
    >>> parser._setDataFromHeaders(data, {'Subscribers': 'sub-one SUB-TWO'})
    >>> sorted(data.subscribers)
    [u'sub-one', u'sub-two']


==== HWDB submission keys ====

The HWDB-Submission key is turned into a list of of strings, available
through the hwdb_submission_keys attribute.

    >>> data = FileBugData()
    >>> parser._setDataFromHeaders(
    ...     data, {'HWDB-Submission': 'submission-one'})
    >>> list(data.hwdb_submission_keys)
    [u'submission-one']

Two or more submission keys may be specified, separated by a comma,
optionally followed by space characters.

    >>> data = FileBugData()
    >>> parser._setDataFromHeaders(
    ...     data,
    ...     {'HWDB-Submission': ' submission-one, two\t,three ,  \nfour  '})
    >>> data.hwdb_submission_keys
    [u'four', u'submission-one', u'three', u'two']


=== Message Parts ===

Different parts of the message gets treated differently. In short, we
look at the Content-Disposition header. If it's inline, it's a comment,
if it's attachment, it's an attachment.


==== Inline parts ====

The first inline part is special. Instead of being treated as a comment,
it gets appended to the bug description. It's available through the
extra_description attribute.

    >>> used_parsers = []
    >>> def parse_message(message):
    ...     parser = FileBugDataParser(StringIO(message))
    ...     used_parsers.append(parser)
    ...     return parser.parse()

    >>> debug_data = """MIME-Version: 1.0
    ... Content-type: multipart/mixed; boundary=boundary
    ...
    ... --boundary
    ... Content-disposition: inline
    ... Content-type: text/plain; charset=utf-8
    ...
    ... This should be added to the description.
    ...
    ... Another line.
    ...
    ... --boundary--
    ... """
    >>> data = parse_message(debug_data)
    >>> data.extra_description
    u'This should be added to the description.\n\nAnother line.'


The text can also be base64 decoded.

    >>> encoded_text = """VGhpcyBzaG91bGQgYmUgYWRkZWQgdG8g
    ... dGhlIGRlc2NyaXB0aW9uLgoKQW5vdGhl
    ... ciBsaW5lLg=="""
    >>> encoded_text.decode('base64')
    'This should be added to the description.\n\nAnother line.'

    >>> debug_data = """MIME-Version: 1.0
    ... Content-type: multipart/mixed; boundary=boundary
    ...
    ... --boundary
    ... Content-disposition: inline
    ... Content-type: text/plain; charset=utf-8
    ... Content-transfer-encoding: base64
    ...
    ... %s
    ...
    ... --boundary--
    ... """ % encoded_text
    >>> data = parse_message(debug_data)
    >>> data.extra_description
    u'This should be added to the description.\n\nAnother line.'


==== Other inline parts ====

If there are more than one inline part, those will be added as comments
to the bug. The comments are simple ext strings, accessible through the
comments attribute.

    >>> debug_data = """MIME-Version: 1.0
    ... Content-type: multipart/mixed; boundary=boundary
    ...
    ... --boundary
    ... Content-disposition: inline
    ... Content-type: text/plain; charset=utf-8
    ...
    ... This should be added to the description.
    ...
    ... --boundary
    ... Content-disposition: inline
    ... Content-type: text/plain; charset=utf-8
    ...
    ... This should be added as a comment.
    ...
    ... --boundary
    ... Content-disposition: inline
    ... Content-type: text/plain; charset=utf-8
    ...
    ... This should be added as another comment.
    ...
    ... Line 2.
    ...
    ... --boundary--
    ... """
    >>> data = parse_message(debug_data)
    >>> len(data.comments)
    2
    >>> data.comments[0]
    u'This should be added as a comment.'
    >>> data.comments[1]
    u'This should be added as another comment.\n\nLine 2.'


=== Attachment parts ===

All the parts that have a 'Content-disposition: attachment' header
will get added as attachments to the bug. The attachment description can
be specified using a Content-description header, but it's not required.

    >>> debug_data = """MIME-Version: 1.0
    ... Content-type: multipart/mixed; boundary=boundary
    ...
    ... --boundary
    ... Content-disposition: attachment; filename='attachment1'
    ... Content-type: text/plain; charset=utf-8
    ...
    ... This is an attachment.
    ...
    ... Another line.
    ...
    ... --boundary
    ... Content-disposition: attachment; filename='attachment2'
    ... Content-description: Attachment description.
    ... Content-type: text/plain; charset=ISO-8859-1
    ...
    ... This is another attachment, with a description.
    ... --boundary--
    ... """
    >>> data = parse_message(debug_data)
    >>> len(data.attachments)
    2
    >>> first_attachment, second_attachment = data.attachments

The filename is copied into the 'filename' item.

    >>> first_attachment['filename']
    u'attachment1'
    >>> second_attachment['filename']
    u'attachment2'

The Content-Type header is copied as is.

    >>> first_attachment['content_type']
    u'text/plain; charset=utf-8'
    >>> second_attachment['content_type']
    u'text/plain; charset=ISO-8859-1'

If there is a Content-Description header, it's accessible as
'description'.

    >>> second_attachment['description']
    u'Attachment description.'

If there isn't any Content-Description header, the file name is used
instead.

    >>> first_attachment['description']
    u'attachment1'

The contents of the attachments are stored in a file.

    >>> first_file = first_attachment['content']
    >>> first_file.read()
    'This is an attachment.\n\nAnother line.\n\n'
    >>> first_file.close()

    >>> second_file = second_attachment['content']
    >>> second_file.read()
    'This is another attachment, with a description.\n'
    >>> second_file.close()

Binary files are base64 encoded. They are decoded automatically.

    >>> binary_data = '\n'.join(['\x00'*5, '\x01'*5])
    >>> debug_data = """MIME-Version: 1.0
    ... Content-type: multipart/mixed; boundary=boundary
    ...
    ... --boundary
    ... Content-disposition: attachment; filename='attachment1'
    ... Content-type: application/octet-stream
    ... Content-transfer-encoding: base64
    ...
    ... %s
    ... --boundary--
    ... """ % binary_data.encode('base64')
    >>> data = parse_message(debug_data)
    >>> len(data.attachments)
    1

    >>> contents = data.attachments[0]['content']
    >>> contents.read()
    '\x00\x00\x00\x00\x00\n\x01\x01\x01\x01\x01'
    >>> contents.close()


== Invalid messages ==

If someone gives an invalid message, for example one that doesn't have
an end boundary, and AssertionError will be raised. This is mainly to
assert that nothing bad will happen if we can't parse the message. We
don't care about giving the user a good error message, since the format
is well-known.

    >>> debug_data = """MIME-Version: 1.0
    ... Content-type: multipart/mixed; boundary=boundary
    ...
    ... --boundary
    ... Content-disposition: attachment; filename='attachment1'
    ... Content-type: text/plain; charset=utf-8
    ...
    ... This is an attachment.
    ...
    ... Another line."""
    >>> data = parse_message(debug_data)
    Traceback (most recent call last):
    ...
    AssertionError: End of file reached.