Ensure that loc = sloc + multi + single-line comments + blank. #113

ashanbrown · 2017-04-06T17:21:09Z

This PR tries to address #109. It fixes loc to genuinely be the sum of sloc, multi-line comments (including blank lines), single-line comments and blank lines (outside of comments). This requires a lot of test changes.

There is a TODO on one case where the tokenizer approach seemed to be getting tripped up:

def func():
    """ my comment """ \
   
    pass

In this case, the tokenizer is reporting an error until it joins the line containing pass. Not sure if this is a bug with the tokenizer or not.

I'm also not sure what the intermediate values in the comments should be for the second_mi example.

coveralls · 2017-04-06T17:23:28Z

Coverage increased (+0.009%) to 98.433% when pulling 904ca9f on ashanbrown:fix-loc into 74c01b7 on rubik:master.

coveralls · 2017-04-06T17:23:28Z

Coverage increased (+0.009%) to 98.433% when pulling 904ca9f on ashanbrown:fix-loc into 74c01b7 on rubik:master.

ashanbrown · 2017-04-10T14:07:09Z

@zukoj @rubik Any thoughts on the approach above?

rubik · 2017-04-10T18:41:20Z

I like this definition of LOC, it's very practical. I'll investigate a bit on the tokenize issue. I don't even understand why that would be valid Python syntax: a line-continuation to nothing? In any case, if Python can parse it, we have to be able to parse it too.

rubik · 2017-04-10T18:48:45Z

The tokenizer seems to be working correctly. This is the result for the code above:

[TokenInfo(type=59 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''),
 TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3), line='def func():\n'),
 TokenInfo(type=1 (NAME), string='func', start=(1, 4), end=(1, 8), line='def func():\n'),
 TokenInfo(type=53 (OP), string='(', start=(1, 8), end=(1, 9), line='def func():\n'),
 TokenInfo(type=53 (OP), string=')', start=(1, 9), end=(1, 10), line='def func():\n'),
 TokenInfo(type=53 (OP), string=':', start=(1, 10), end=(1, 11), line='def func():\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 11), end=(1, 12), line='def func():\n'),
 TokenInfo(type=5 (INDENT), string='    ', start=(2, 0), end=(2, 4), line='    """ my comment """ \\\n'),
 TokenInfo(type=3 (STRING), string='""" my comment """', start=(2, 4), end=(2, 22), line='    """ my comment """ \\\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 0), end=(3, 1), line='\n'),
 TokenInfo(type=1 (NAME), string='pass', start=(4, 4), end=(4, 8), line='    pass\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(4, 8), end=(4, 9), line='    pass\n'),
 TokenInfo(type=6 (DEDENT), string='', start=(5, 0), end=(5, 0), line=''),
 TokenInfo(type=0 (ENDMARKER), string='', start=(5, 0), end=(5, 0), line='')]

Whereas, without the backslash:

[TokenInfo(type=59 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''),
 TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3), line='def func():\n'),
 TokenInfo(type=1 (NAME), string='func', start=(1, 4), end=(1, 8), line='def func():\n'),
 TokenInfo(type=53 (OP), string='(', start=(1, 8), end=(1, 9), line='def func():\n'),
 TokenInfo(type=53 (OP), string=')', start=(1, 9), end=(1, 10), line='def func():\n'),
 TokenInfo(type=53 (OP), string=':', start=(1, 10), end=(1, 11), line='def func():\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 11), end=(1, 12), line='def func():\n'),
 TokenInfo(type=5 (INDENT), string='    ', start=(2, 0), end=(2, 4), line='    """ my comment """\n'),
 TokenInfo(type=3 (STRING), string='""" my comment """', start=(2, 4), end=(2, 22), line='    """ my comment """\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 22), end=(2, 23), line='    """ my comment """\n'),
 TokenInfo(type=58 (NL), string='\n', start=(3, 0), end=(3, 1), line='\n'),
 TokenInfo(type=1 (NAME), string='pass', start=(4, 4), end=(4, 8), line='    pass\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(4, 8), end=(4, 9), line='    pass\n'),
 TokenInfo(type=6 (DEDENT), string='', start=(5, 0), end=(5, 0), line=''),
 TokenInfo(type=0 (ENDMARKER), string='', start=(5, 0), end=(5, 0), line='')]

The only difference is the presence of the NL token (58) in the second case.

ashanbrown · 2017-04-10T19:35:38Z

The parsing issue is just with how the line counting algorithm works and it doesn't cause the tool to fail.

def func():
    """ my comment """ \
    """ continued """

If I run the tokenizer on the file above I get:

1,0-1,3:	NAME	'def'
1,4-1,8:	NAME	'func'
1,8-1,9:	OP	'('
1,9-1,10:	OP	')'
1,10-1,11:	OP	':'
1,11-1,12:	NEWLINE	'\n'
1,16-1,34:	STRING	'""" my comment """'
1,41-1,58:	STRING	'""" continued """'
1,58-1,59:	NEWLINE	'\n'
1,59-1,60:	NEWLINE	'\n'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 170, in tokenize
    tokenize_loop(readline, tokeneater)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 176, in tokenize_loop
    for token_info in generate_tokens(readline):
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tokenize.py", line 380, in generate_tokens
    raise TokenError, ("EOF in multi-line statement", (lnum, 0))
tokenize.TokenError: ('EOF in multi-line statement', (2, 0))

It needs something else to complete successfully (like the "pass" command). The funny thing is that code above runs fine in python. The algorithm for finding multiline strings is to keep adding lines to a buffer until the tokenizer is satisfied that is has a completely "tokenizable" buffer. However, the issue above means that it may need to consume extra lines into the multi-line string. The mutliline string token still has its correct start and end positions but the next row has now been consumed and is no longer considered source. The result is that this testcase:

      def function():
          """ docstring continued by blank line is still a single-line comment """ \
  
          pass

reports that there is only 1 sloc and 1 lloc when there should be 2 of each. It considers the whole docstring to be a "single-line comment" which includes the pass command.

rubik · 2017-04-13T10:31:55Z

Weird, I cannot reproduce the issue. Here's what I get:

$ cat a.py
def func():
    """ my comment """ \
    """ continued """
$ python2.7 -m tokenize a.py
1,0-1,3:	NAME	'def'
1,4-1,8:	NAME	'func'
1,8-1,9:	OP	'('
1,9-1,10:	OP	')'
1,10-1,11:	OP	':'
1,11-1,12:	NEWLINE	'\n'
2,0-2,4:	INDENT	'    '
2,4-2,22:	STRING	'""" my comment """'
3,4-3,21:	STRING	'""" continued """'
3,21-3,22:	NEWLINE	'\n'
4,0-4,0:	DEDENT	''
4,0-4,0:	ENDMARKER	''
$ python3.6 -m tokenize a.py
0,0-0,0:            ENCODING       'utf-8'        
1,0-1,3:            NAME           'def'          
1,4-1,8:            NAME           'func'         
1,8-1,9:            OP             '('            
1,9-1,10:           OP             ')'            
1,10-1,11:          OP             ':'            
1,11-1,12:          NEWLINE        '\n'           
2,0-2,4:            INDENT         '    '         
2,4-2,22:           STRING         '""" my comment """'
3,4-3,21:           STRING         '""" continued """'
3,21-3,22:          NEWLINE        '\n'           
4,0-4,0:            DEDENT         ''             
4,0-4,0:            ENDMARKER      ''

I find it extremely odd that your output is different from mine!?

rubik · 2017-05-10T20:47:47Z

Since this is at worst a bug in the tokenize module I'm merging this one. This way we can have sane definitions of LOC, SLOC, etc. again.

…ts + blank. (#113)" This reverts commit 302ecc6.

#113) This reverts commit 302ecc6.

rubik · 2017-05-10T21:56:13Z

I was forced to revert this because it didn't pass the tests. I should have checked before merging here. It gives weird results with oddly high numbers. For example, for

def function():
    """ docstring continued by blank line is still a single-line comment """

    pass

it gives

ex.py
    LOC: 8
    LLOC: 2
    SLOC: 5
    Comments: 0
    Single comments: 1
    Multi: 0
    Blank: 2

It seems like some lines are counted more twice or more.

Ensure that loc = sloc + multi + single-line-comments + blank.

904ca9f

rubik mentioned this pull request Apr 10, 2017

Fix for issue 109 and 110 #111

Closed

rubik merged commit 302ecc6 into rubik:master May 10, 2017

rubik added a commit that referenced this pull request May 10, 2017

Revert "fix(raw): ensure that loc = sloc + multi + single-line-commen…

aea8228

…ts + blank. (#113)" This reverts commit 302ecc6.

rubik mentioned this pull request May 10, 2017

Revert "Ensure that loc = sloc + multi + single-line comments + blank." #117

Merged

rubik added a commit that referenced this pull request May 10, 2017

revert: ensure that loc = sloc + multi + single-line-comments + blank. (

613b986

#113) This reverts commit 302ecc6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure that loc = sloc + multi + single-line comments + blank. #113

Ensure that loc = sloc + multi + single-line comments + blank. #113

ashanbrown commented Apr 6, 2017 •

edited

Loading

coveralls commented Apr 6, 2017 •

edited

Loading

coveralls commented Apr 6, 2017

ashanbrown commented Apr 10, 2017

rubik commented Apr 10, 2017

rubik commented Apr 10, 2017

ashanbrown commented Apr 10, 2017 •

edited

Loading

rubik commented Apr 13, 2017

rubik commented May 10, 2017

rubik commented May 10, 2017 •

edited

Loading

Ensure that loc = sloc + multi + single-line comments + blank. #113

Ensure that loc = sloc + multi + single-line comments + blank. #113

Conversation

ashanbrown commented Apr 6, 2017 • edited Loading

coveralls commented Apr 6, 2017 • edited Loading

coveralls commented Apr 6, 2017

ashanbrown commented Apr 10, 2017

rubik commented Apr 10, 2017

rubik commented Apr 10, 2017

ashanbrown commented Apr 10, 2017 • edited Loading

rubik commented Apr 13, 2017

rubik commented May 10, 2017

rubik commented May 10, 2017 • edited Loading

ashanbrown commented Apr 6, 2017 •

edited

Loading

coveralls commented Apr 6, 2017 •

edited

Loading

ashanbrown commented Apr 10, 2017 •

edited

Loading

rubik commented May 10, 2017 •

edited

Loading