~launchpad-pqm/launchpad/devel

11461.2.1 by Henning Eggers
Added format-imports script and documented it.
1
#!/usr/bin/python
2
#
3
# Copyright 2010 Canonical Ltd.  This software is licensed under the
4
# GNU Affero General Public License version 3 (see the file LICENSE).
5
6
""" Format import sections in python files
7
8
= Usage =
9
10
format-imports <file or directory> ...
11
12
= Operation =
13
14
The script will process each filename on the command line. If the file is a
15
directory it recurses into it an process all *.py files found in the tree.
16
It will output the paths of all the files that have been changed.
17
11461.2.3 by Henning Eggers
Usage on the LP tree.
18
For Launchpad it was applied to the "lib/canonical/launchpad" and the "lib/lp"
19
subtrees. Running it with those parameters on a freshly branched LP tree
20
should not produce any output, meaning that all the files in the tree should
21
be formatted correctly.
22
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
23
The script identifies the import section of each file as a block of lines
24
that start with "import" or "from" or are indented with at least one space or
25
are blank lines. Comment lines are also included if they are followed by an
26
import statement. An inital __future__ import and a module docstring are
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
27
explicitly skipped.
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
28
29
The import section is rewritten as three subsections, each separated by a
30
blank line. Any of the sections may be empty.
31
 1. Standard python library modules
32
 2. Import statements explicitly ordered to the top (see below)
33
 3. Third-party modules, meaning anything not fitting one of the other
34
    subsection criteria
35
 4. Local modules that begin with "canonical" or "lp".
36
37
Each section is sorted alphabetically by module name. Each module is put
38
on its own line, i.e.
39
{{{
40
  import os, sys
41
}}}
42
becomes
43
{{{
44
  import os
45
  import sys
46
}}}
47
Multiple import statements for the same module are conflated into one
48
statement, or two if the module was imported alongside an object inside it,
49
i.e.
50
{{{
51
  import sys
52
  from sys import stdin
53
}}}
54
55
Statements that import more than one objects are put on multiple lines in
56
list style, i.e.
57
{{{
58
  from sys import (
59
      stdin,
60
      stdout,
61
      )
62
}}}
63
Objects are sorted alphabetically and case-insensitively. One-object imports
64
are only formatted in this manner if the statement exceeds 78 characters in
65
length.
66
67
Comments stick with the import statement that followed them. Comments at the
68
end of one-line statements are moved to be be in front of it, .i.e.
69
{{{
70
  from sys import exit # Have a way out
71
}}}
72
becomes
73
{{{
74
  # Have a way out
75
  from sys import exit
76
}}}
77
78
= Format control =
79
11461.2.4 by Henning Eggers
Reviewer comments.
80
Two special comments allow to control the operation of the formatter.
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
81
11461.2.4 by Henning Eggers
Reviewer comments.
82
When an import statement is immediately preceded by a comment that starts
83
with the word "FIRST", it is placed into the second subsection (see above).
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
84
85
When the first import statement is directly preceded by a comment that starts
86
with the word "SKIP", the entire file is exempt from formatting.
87
88
= Known bugs =
89
90
Make sure to always check the result of the re-formatting to see if you have
91
been bitten by one of these.
92
93
Comments inside multi-line import statements break the formatter. A statement
94
like this will be ignored:
95
{{{
96
  from lp.app.interfaces import (
97
      # Don't do this.
98
      IMyInterface,
99
      IMyOtherInterface, # Don't do this either
100
      )
101
}}}
102
Actually, this will make the statement and all following to be ignored:
103
{{{
104
  from lp.app.interfaces import (
105
  # Breaks indentation rules anyway.
106
      IMyInterface,
107
      IMyOtherInterface,
108
      )
109
}}}
110
111
If a single-line statement has both a comment in front of it and at the end
112
of the line, only the end-line comment will survive. This could probably
113
easily be fixed to concatenate the too.
114
{{{
115
  # I am a gonner.
116
  from lp.app.interfaces import IMyInterface # I will survive!
117
}}}
118
119
Line continuation characters are recognized and resolved but
120
not re-introduced. This may leave the re-formatted text with a line that
121
is over the length limit.
122
{{{
123
    from lp.app.verylongnames.orverlydeep.modulestructure.leavenoroom \
124
        import object
125
}}}
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
126
"""
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
127
128
__metaclass__ = type
129
130
# SKIP this file when reformatting.
131
import os
132
import re
133
import sys
11461.2.2 by Henning Eggers
Made documentation easily available.
134
from textwrap import dedent
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
135
136
sys.path[0:0] = [os.path.dirname(__file__)]
137
from python_standard_libs import python_standard_libs
138
12855.1.3 by Gavin Panella
Convert python_standard_libs into a frozenset in format-imports because it's only ever used for membership tests.
139
# python_standard_libs is only used for membership tests.
140
python_standard_libs = frozenset(python_standard_libs)
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
141
142
# To search for escaped newline chars.
143
escaped_nl_regex = re.compile("\\\\\n", re.M)
144
import_regex = re.compile("^import +(?P<module>.+)$", re.M)
145
from_import_single_regex = re.compile(
146
    "^from (?P<module>.+) +import +"
147
    "(?P<objects>[*]|[a-zA-Z0-9_, ]+)"
148
    "(?P<comment>#.*)?$", re.M)
149
from_import_multi_regex = re.compile(
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
150
    "^from +(?P<module>.+) +import *[(](?P<objects>[a-zA-Z0-9_, \n]+)[)]$",
151
    re.M)
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
152
comment_regex = re.compile(
153
    "(?P<comment>(^#.+\n)+)(^import|^from) +(?P<module>[a-zA-Z0-9_.]+)", re.M)
154
split_regex = re.compile(",\s*")
14612.2.2 by William Grant
Fix format-imports to correctly handle 'as', and files that end with the import section.
155
module_base_regex = re.compile("([^. ]+)")
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
156
157
# Module docstrings are multiline (""") strings that are not indented and are
158
# followed at some point by an import .
159
module_docstring_regex = re.compile(
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
160
    '(?P<docstring>^["]{3}[^"]+["]{3}\n).*^(import |from .+ import)',
161
    re.M | re.S)
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
162
# The imports section starts with an import state that is not a __future__
163
# import and consists of import lines, indented lines, empty lines and
164
# comments which are followed by an import line. Sometimes we even find
165
# lines that contain a single ")"... :-(
166
imports_section_regex = re.compile(
167
    "(^#.+\n)*^(import|(from ((?!__future__)\S+) import)).*\n"
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
168
    "(^import .+\n|^from .+\n|^[\t ]+.+\n|(^#.+\n)+((^import|^from) "
169
    ".+\n)|^\n|^[)]\n)*",
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
170
    re.M)
171
172
173
def format_import_lines(module, objects):
174
    """Generate correct from...import strings."""
175
    if len(objects) == 1:
176
        statement = "from %s import %s" % (module, objects[0])
177
        if len(statement) < 79:
178
            return statement
179
    return "from %s import (\n    %s,\n    )" % (
180
        module, ",\n    ".join(objects))
181
182
183
def find_imports_section(content):
184
    """Return that part of the file that contains the import statements."""
185
    # Skip module docstring.
186
    match = module_docstring_regex.search(content)
187
    if match is None:
188
        startpos = 0
189
    else:
190
        startpos = match.end('docstring')
191
192
    match = imports_section_regex.search(content, startpos)
193
    if match is None:
194
        return (None, None)
195
    startpos = match.start()
196
    endpos = match.end()
197
    if content[startpos:endpos].startswith('# SKIP'):
198
        # Skip files explicitely.
199
        return(None, None)
200
    return (startpos, endpos)
201
202
203
class ImportStatement:
204
    """Holds information about an import statement."""
205
206
    def __init__(self, objects=None, comment=None):
207
        self.import_module = objects is None
208
        if objects is None:
209
            self.objects = None
210
        else:
211
            self.objects = sorted(objects, key=str.lower)
212
        self.comment = comment
213
214
    def addObjects(self, new_objects):
215
        """More objects in this statement; eliminate duplicates."""
216
        if self.objects is None:
217
            # No objects so far.
218
            self.objects = new_objects
219
        else:
220
            # Use set to eliminate double objects.
221
            more_objects = set(self.objects + new_objects)
222
            self.objects = sorted(list(more_objects), key=str.lower)
223
224
    def setComment(self, comment):
225
        """Add a comment to the statement."""
226
        self.comment = comment
227
228
229
def parse_import_statements(import_section):
230
    """Split the import section into statements.
231
232
    Returns a dictionary with the module as the key and the objects being
233
    imported as a sorted list of strings."""
234
    imports = {}
235
    # Search for escaped newlines and remove them.
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
236
    searchpos = 0
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
237
    while True:
238
        match = escaped_nl_regex.search(import_section, searchpos)
239
        if match is None:
240
            break
241
        start = match.start()
242
        end = match.end()
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
243
        import_section = import_section[:start] + import_section[end:]
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
244
        searchpos = start
245
    # Search for simple one-line import statements.
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
246
    searchpos = 0
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
247
    while True:
248
        match = import_regex.search(import_section, searchpos)
249
        if match is None:
250
            break
251
        # These imports are marked by a "None" value.
252
        # Multiple modules in one statement are split up.
253
        for module in split_regex.split(match.group('module').strip()):
254
            imports[module] = ImportStatement()
255
        searchpos = match.end()
256
    # Search for "from ... import" statements.
257
    for pattern in (from_import_single_regex, from_import_multi_regex):
258
        searchpos = 0
259
        while True:
260
            match = pattern.search(import_section, searchpos)
261
            if match is None:
262
                break
263
            import_objects = split_regex.split(
264
                match.group('objects').strip(" \n,"))
265
            module = match.group('module').strip()
266
            # Only one pattern has a 'comment' group.
267
            comment = match.groupdict().get('comment', None)
268
            if module in imports:
269
                # Catch double import lines.
270
                imports[module].addObjects(import_objects)
271
            else:
272
                imports[module] = ImportStatement(import_objects)
273
            if comment is not None:
274
                imports[module].setComment(comment)
275
            searchpos = match.end()
276
    # Search for comments in import section.
277
    searchpos = 0
278
    while True:
279
        match = comment_regex.search(import_section, searchpos)
280
        if match is None:
281
            break
282
        module = match.group('module').strip()
283
        comment = match.group('comment').strip()
284
        imports[module].setComment(comment)
285
        searchpos = match.end()
286
287
    return imports
288
14606.3.3 by William Grant
Some fixes to format-imports: treat _pythonpath specially, and recognise a few more packages as local.
289
LOCAL_PACKAGES = (
290
    'canonical', 'lp', 'launchpad_loggerhead', 'devscripts',
291
    # database/* have some implicit relative imports.
14612.2.9 by William Grant
Other bits and pieces.
292
    'fti', 'replication', 'preflight', 'security', 'upgrade',
14606.3.3 by William Grant
Some fixes to format-imports: treat _pythonpath specially, and recognise a few more packages as local.
293
    )
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
294
295
def format_imports(imports):
296
    """Group and order imports, return the new import statements."""
14606.3.3 by William Grant
Some fixes to format-imports: treat _pythonpath specially, and recognise a few more packages as local.
297
    early_section = {}
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
298
    standard_section = {}
299
    first_section = {}
300
    thirdparty_section = {}
301
    local_section = {}
302
    # Group modules into sections.
303
    for module, statement in imports.iteritems():
14612.2.2 by William Grant
Fix format-imports to correctly handle 'as', and files that end with the import section.
304
        module_base = module_base_regex.findall(module)[0]
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
305
        comment = statement.comment
14606.3.3 by William Grant
Some fixes to format-imports: treat _pythonpath specially, and recognise a few more packages as local.
306
        if module_base == '_pythonpath':
307
            early_section[module] = statement
308
        elif comment is not None and comment.startswith("# FIRST"):
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
309
            first_section[module] = statement
14606.3.3 by William Grant
Some fixes to format-imports: treat _pythonpath specially, and recognise a few more packages as local.
310
        elif module_base in LOCAL_PACKAGES:
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
311
            local_section[module] = statement
312
        elif module_base in python_standard_libs:
313
            standard_section[module] = statement
314
        else:
315
            thirdparty_section[module] = statement
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
316
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
317
    all_import_lines = []
318
    # Sort within each section and generate statement strings.
319
    sections = (
14606.3.3 by William Grant
Some fixes to format-imports: treat _pythonpath specially, and recognise a few more packages as local.
320
        early_section,
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
321
        standard_section,
322
        first_section,
323
        thirdparty_section,
324
        local_section,
325
        )
326
    for section in sections:
327
        import_lines = []
328
        for module in sorted(section.keys(), key=str.lower):
329
            if section[module].comment is not None:
330
                import_lines.append(section[module].comment)
331
            if section[module].import_module:
332
                import_lines.append("import %s" % module)
333
            if section[module].objects is not None:
334
                import_lines.append(
335
                    format_import_lines(module, section[module].objects))
336
        if len(import_lines) > 0:
337
            all_import_lines.append('\n'.join(import_lines))
11461.2.4 by Henning Eggers
Reviewer comments.
338
    # Sections are separated by two blank lines.
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
339
    return '\n\n'.join(all_import_lines)
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
340
341
342
def reformat_importsection(filename):
11461.2.4 by Henning Eggers
Reviewer comments.
343
    """Replace the given file with a reformatted version of it."""
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
344
    pyfile = file(filename).read()
345
    import_start, import_end = find_imports_section(pyfile)
346
    if import_start is None:
347
        # Skip files with no import section.
348
        return False
349
    imports_section = pyfile[import_start:import_end]
350
    imports = parse_import_statements(imports_section)
351
14612.2.2 by William Grant
Fix format-imports to correctly handle 'as', and files that end with the import section.
352
    next_char = pyfile[import_end:import_end + 1]
353
354
    if next_char == '':
355
        number_of_newlines = 1
356
    elif next_char != '#':
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
357
        # Two newlines before anything but comments.
358
        number_of_newlines = 3
359
    else:
360
        number_of_newlines = 2
361
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
362
    new_imports = format_imports(imports) + ("\n" * number_of_newlines)
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
363
    if new_imports == imports_section:
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
364
        # No change, no need to write a new file.
365
        return False
366
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
367
    new_file = open(filename, "w")
368
    new_file.write(pyfile[:import_start])
369
    new_file.write(new_imports)
370
    new_file.write(pyfile[import_end:])
371
372
    return True
373
374
375
def process_file(fpath):
376
    """Process the file with the given path."""
377
    changed = reformat_importsection(fpath)
378
    if changed:
379
        print fpath
380
381
382
def process_tree(dpath):
383
    """Walk a directory tree and process all *.py files."""
384
    for dirpath, dirnames, filenames in os.walk(dpath):
385
        for filename in filenames:
386
            if filename.endswith('.py'):
387
                process_file(os.path.join(dirpath, filename))
388
389
390
if __name__ == "__main__":
11461.2.2 by Henning Eggers
Made documentation easily available.
391
    if len(sys.argv) == 1 or sys.argv[1] in ("-h", "-?", "--help"):
392
        sys.stderr.write(dedent("""\
393
        usage: format-imports <file or directory> ...
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
394
11461.2.2 by Henning Eggers
Made documentation easily available.
395
        Type "format-imports --docstring | less" to see the documentation.
396
        """))
397
        sys.exit(1)
398
    if sys.argv[1] == "--docstring":
399
        sys.stdout.write(__doc__)
400
        sys.exit(2)
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
401
    for filename in sys.argv[1:]:
402
        if os.path.isdir(filename):
403
            process_tree(filename)
404
        else:
405
            process_file(filename)
11461.2.2 by Henning Eggers
Made documentation easily available.
406
    sys.exit(0)