~launchpad-pqm/launchpad/devel

11461.2.1 by Henning Eggers
Added format-imports script and documented it.
1
#!/usr/bin/python
2
#
3
# Copyright 2010 Canonical Ltd.  This software is licensed under the
4
# GNU Affero General Public License version 3 (see the file LICENSE).
5
6
""" Format import sections in python files
7
8
= Usage =
9
10
format-imports <file or directory> ...
11
12
= Operation =
13
14
The script will process each filename on the command line. If the file is a
15
directory it recurses into it an process all *.py files found in the tree.
16
It will output the paths of all the files that have been changed.
17
11461.2.3 by Henning Eggers
Usage on the LP tree.
18
For Launchpad it was applied to the "lib/canonical/launchpad" and the "lib/lp"
19
subtrees. Running it with those parameters on a freshly branched LP tree
20
should not produce any output, meaning that all the files in the tree should
21
be formatted correctly.
22
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
23
The script identifies the import section of each file as a block of lines
24
that start with "import" or "from" or are indented with at least one space or
25
are blank lines. Comment lines are also included if they are followed by an
26
import statement. An inital __future__ import and a module docstring are
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
27
explicitly skipped.
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
28
29
The import section is rewritten as three subsections, each separated by a
30
blank line. Any of the sections may be empty.
31
 1. Standard python library modules
32
 2. Import statements explicitly ordered to the top (see below)
33
 3. Third-party modules, meaning anything not fitting one of the other
34
    subsection criteria
35
 4. Local modules that begin with "canonical" or "lp".
36
37
Each section is sorted alphabetically by module name. Each module is put
38
on its own line, i.e.
39
{{{
40
  import os, sys
41
}}}
42
becomes
43
{{{
44
  import os
45
  import sys
46
}}}
47
Multiple import statements for the same module are conflated into one
48
statement, or two if the module was imported alongside an object inside it,
49
i.e.
50
{{{
51
  import sys
52
  from sys import stdin
53
}}}
54
55
Statements that import more than one objects are put on multiple lines in
56
list style, i.e.
57
{{{
58
  from sys import (
59
      stdin,
60
      stdout,
61
      )
62
}}}
63
Objects are sorted alphabetically and case-insensitively. One-object imports
64
are only formatted in this manner if the statement exceeds 78 characters in
65
length.
66
67
Comments stick with the import statement that followed them. Comments at the
68
end of one-line statements are moved to be be in front of it, .i.e.
69
{{{
70
  from sys import exit # Have a way out
71
}}}
72
becomes
73
{{{
74
  # Have a way out
75
  from sys import exit
76
}}}
77
78
= Format control =
79
11461.2.4 by Henning Eggers
Reviewer comments.
80
Two special comments allow to control the operation of the formatter.
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
81
11461.2.4 by Henning Eggers
Reviewer comments.
82
When an import statement is immediately preceded by a comment that starts
83
with the word "FIRST", it is placed into the second subsection (see above).
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
84
85
When the first import statement is directly preceded by a comment that starts
86
with the word "SKIP", the entire file is exempt from formatting.
87
88
= Known bugs =
89
90
Make sure to always check the result of the re-formatting to see if you have
91
been bitten by one of these.
92
93
Comments inside multi-line import statements break the formatter. A statement
94
like this will be ignored:
95
{{{
96
  from lp.app.interfaces import (
97
      # Don't do this.
98
      IMyInterface,
99
      IMyOtherInterface, # Don't do this either
100
      )
101
}}}
102
Actually, this will make the statement and all following to be ignored:
103
{{{
104
  from lp.app.interfaces import (
105
  # Breaks indentation rules anyway.
106
      IMyInterface,
107
      IMyOtherInterface,
108
      )
109
}}}
110
111
If a single-line statement has both a comment in front of it and at the end
112
of the line, only the end-line comment will survive. This could probably
113
easily be fixed to concatenate the too.
114
{{{
115
  # I am a gonner.
116
  from lp.app.interfaces import IMyInterface # I will survive!
117
}}}
118
119
Line continuation characters are recognized and resolved but
120
not re-introduced. This may leave the re-formatted text with a line that
121
is over the length limit.
122
{{{
123
    from lp.app.verylongnames.orverlydeep.modulestructure.leavenoroom \
124
        import object
125
}}}
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
126
"""
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
127
128
__metaclass__ = type
129
130
# SKIP this file when reformatting.
131
import os
132
import re
133
import sys
11461.2.2 by Henning Eggers
Made documentation easily available.
134
from textwrap import dedent
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
135
136
sys.path[0:0] = [os.path.dirname(__file__)]
137
from python_standard_libs import python_standard_libs
138
139
140
# To search for escaped newline chars.
141
escaped_nl_regex = re.compile("\\\\\n", re.M)
142
import_regex = re.compile("^import +(?P<module>.+)$", re.M)
143
from_import_single_regex = re.compile(
144
    "^from (?P<module>.+) +import +"
145
    "(?P<objects>[*]|[a-zA-Z0-9_, ]+)"
146
    "(?P<comment>#.*)?$", re.M)
147
from_import_multi_regex = re.compile(
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
148
    "^from +(?P<module>.+) +import *[(](?P<objects>[a-zA-Z0-9_, \n]+)[)]$",
149
    re.M)
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
150
comment_regex = re.compile(
151
    "(?P<comment>(^#.+\n)+)(^import|^from) +(?P<module>[a-zA-Z0-9_.]+)", re.M)
152
split_regex = re.compile(",\s*")
153
154
# Module docstrings are multiline (""") strings that are not indented and are
155
# followed at some point by an import .
156
module_docstring_regex = re.compile(
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
157
    '(?P<docstring>^["]{3}[^"]+["]{3}\n).*^(import |from .+ import)',
158
    re.M | re.S)
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
159
# The imports section starts with an import state that is not a __future__
160
# import and consists of import lines, indented lines, empty lines and
161
# comments which are followed by an import line. Sometimes we even find
162
# lines that contain a single ")"... :-(
163
imports_section_regex = re.compile(
164
    "(^#.+\n)*^(import|(from ((?!__future__)\S+) import)).*\n"
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
165
    "(^import .+\n|^from .+\n|^[\t ]+.+\n|(^#.+\n)+((^import|^from) "
166
    ".+\n)|^\n|^[)]\n)*",
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
167
    re.M)
168
169
170
def format_import_lines(module, objects):
171
    """Generate correct from...import strings."""
172
    if len(objects) == 1:
173
        statement = "from %s import %s" % (module, objects[0])
174
        if len(statement) < 79:
175
            return statement
176
    return "from %s import (\n    %s,\n    )" % (
177
        module, ",\n    ".join(objects))
178
179
180
def find_imports_section(content):
181
    """Return that part of the file that contains the import statements."""
182
    # Skip module docstring.
183
    match = module_docstring_regex.search(content)
184
    if match is None:
185
        startpos = 0
186
    else:
187
        startpos = match.end('docstring')
188
189
    match = imports_section_regex.search(content, startpos)
190
    if match is None:
191
        return (None, None)
192
    startpos = match.start()
193
    endpos = match.end()
194
    if content[startpos:endpos].startswith('# SKIP'):
195
        # Skip files explicitely.
196
        return(None, None)
197
    return (startpos, endpos)
198
199
200
class ImportStatement:
201
    """Holds information about an import statement."""
202
203
    def __init__(self, objects=None, comment=None):
204
        self.import_module = objects is None
205
        if objects is None:
206
            self.objects = None
207
        else:
208
            self.objects = sorted(objects, key=str.lower)
209
        self.comment = comment
210
211
    def addObjects(self, new_objects):
212
        """More objects in this statement; eliminate duplicates."""
213
        if self.objects is None:
214
            # No objects so far.
215
            self.objects = new_objects
216
        else:
217
            # Use set to eliminate double objects.
218
            more_objects = set(self.objects + new_objects)
219
            self.objects = sorted(list(more_objects), key=str.lower)
220
221
    def setComment(self, comment):
222
        """Add a comment to the statement."""
223
        self.comment = comment
224
225
226
def parse_import_statements(import_section):
227
    """Split the import section into statements.
228
229
    Returns a dictionary with the module as the key and the objects being
230
    imported as a sorted list of strings."""
231
    imports = {}
232
    # Search for escaped newlines and remove them.
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
233
    searchpos = 0
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
234
    while True:
235
        match = escaped_nl_regex.search(import_section, searchpos)
236
        if match is None:
237
            break
238
        start = match.start()
239
        end = match.end()
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
240
        import_section = import_section[:start] + import_section[end:]
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
241
        searchpos = start
242
    # Search for simple one-line import statements.
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
243
    searchpos = 0
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
244
    while True:
245
        match = import_regex.search(import_section, searchpos)
246
        if match is None:
247
            break
248
        # These imports are marked by a "None" value.
249
        # Multiple modules in one statement are split up.
250
        for module in split_regex.split(match.group('module').strip()):
251
            imports[module] = ImportStatement()
252
        searchpos = match.end()
253
    # Search for "from ... import" statements.
254
    for pattern in (from_import_single_regex, from_import_multi_regex):
255
        searchpos = 0
256
        while True:
257
            match = pattern.search(import_section, searchpos)
258
            if match is None:
259
                break
260
            import_objects = split_regex.split(
261
                match.group('objects').strip(" \n,"))
262
            module = match.group('module').strip()
263
            # Only one pattern has a 'comment' group.
264
            comment = match.groupdict().get('comment', None)
265
            if module in imports:
266
                # Catch double import lines.
267
                imports[module].addObjects(import_objects)
268
            else:
269
                imports[module] = ImportStatement(import_objects)
270
            if comment is not None:
271
                imports[module].setComment(comment)
272
            searchpos = match.end()
273
    # Search for comments in import section.
274
    searchpos = 0
275
    while True:
276
        match = comment_regex.search(import_section, searchpos)
277
        if match is None:
278
            break
279
        module = match.group('module').strip()
280
        comment = match.group('comment').strip()
281
        imports[module].setComment(comment)
282
        searchpos = match.end()
283
284
    return imports
285
286
287
def format_imports(imports):
288
    """Group and order imports, return the new import statements."""
289
    standard_section = {}
290
    first_section = {}
291
    thirdparty_section = {}
292
    local_section = {}
293
    # Group modules into sections.
294
    for module, statement in imports.iteritems():
295
        module_base = module.split('.')[0]
296
        comment = statement.comment
297
        if comment is not None and comment.startswith("# FIRST"):
298
            first_section[module] = statement
299
        elif module_base in ('canonical', 'lp'):
300
            local_section[module] = statement
301
        elif module_base in python_standard_libs:
302
            standard_section[module] = statement
303
        else:
304
            thirdparty_section[module] = statement
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
305
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
306
    all_import_lines = []
307
    # Sort within each section and generate statement strings.
308
    sections = (
309
        standard_section,
310
        first_section,
311
        thirdparty_section,
312
        local_section,
313
        )
314
    for section in sections:
315
        import_lines = []
316
        for module in sorted(section.keys(), key=str.lower):
317
            if section[module].comment is not None:
318
                import_lines.append(section[module].comment)
319
            if section[module].import_module:
320
                import_lines.append("import %s" % module)
321
            if section[module].objects is not None:
322
                import_lines.append(
323
                    format_import_lines(module, section[module].objects))
324
        if len(import_lines) > 0:
325
            all_import_lines.append('\n'.join(import_lines))
11461.2.4 by Henning Eggers
Reviewer comments.
326
    # Sections are separated by two blank lines.
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
327
    return '\n\n'.join(all_import_lines)
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
328
329
330
def reformat_importsection(filename):
11461.2.4 by Henning Eggers
Reviewer comments.
331
    """Replace the given file with a reformatted version of it."""
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
332
    pyfile = file(filename).read()
333
    import_start, import_end = find_imports_section(pyfile)
334
    if import_start is None:
335
        # Skip files with no import section.
336
        return False
337
    imports_section = pyfile[import_start:import_end]
338
    imports = parse_import_statements(imports_section)
339
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
340
    if pyfile[import_end:import_end + 1] != '#':
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
341
        # Two newlines before anything but comments.
342
        number_of_newlines = 3
343
    else:
344
        number_of_newlines = 2
345
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
346
    new_imports = format_imports(imports) + ("\n" * number_of_newlines)
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
347
    if new_imports == imports_section:
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
348
        # No change, no need to write a new file.
349
        return False
350
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
351
    new_file = open(filename, "w")
352
    new_file.write(pyfile[:import_start])
353
    new_file.write(new_imports)
354
    new_file.write(pyfile[import_end:])
355
356
    return True
357
358
359
def process_file(fpath):
360
    """Process the file with the given path."""
361
    changed = reformat_importsection(fpath)
362
    if changed:
363
        print fpath
364
365
366
def process_tree(dpath):
367
    """Walk a directory tree and process all *.py files."""
368
    for dirpath, dirnames, filenames in os.walk(dpath):
369
        for filename in filenames:
370
            if filename.endswith('.py'):
371
                process_file(os.path.join(dirpath, filename))
372
373
374
if __name__ == "__main__":
11461.2.2 by Henning Eggers
Made documentation easily available.
375
    if len(sys.argv) == 1 or sys.argv[1] in ("-h", "-?", "--help"):
376
        sys.stderr.write(dedent("""\
377
        usage: format-imports <file or directory> ...
11896.1.2 by Gavin Panella
Fix lint in utilities/format-imports.
378
11461.2.2 by Henning Eggers
Made documentation easily available.
379
        Type "format-imports --docstring | less" to see the documentation.
380
        """))
381
        sys.exit(1)
382
    if sys.argv[1] == "--docstring":
383
        sys.stdout.write(__doc__)
384
        sys.exit(2)
11461.2.1 by Henning Eggers
Added format-imports script and documented it.
385
    for filename in sys.argv[1:]:
386
        if os.path.isdir(filename):
387
            process_tree(filename)
388
        else:
389
            process_file(filename)
11461.2.2 by Henning Eggers
Made documentation easily available.
390
    sys.exit(0)