codeformat.py bug
Description:
When codeformat.py formats *.h or *.c files (#10595) , it reduces file content by deleting not only formating symbols and whitespaces but also the code being formated!
before_codeformat.txt
after_codeformat.txt
and while working on before_codeformat.h codformat.py gives the folowing error:
Parsing: ..\ports\renesas-ra\boards\VK_RA4W1\ra_cfg\fsp_cfg\bsp\bsp_cfg.h as language C
Traceback (most recent call last):
File "D:\micropython\tools\codeformat.py", line 218, in <module>
main()
File "D:\micropython\tools\codeformat.py", line 205, in main
fixup_c(file)
File "D:\micropython\tools\codeformat.py", line 136, in fixup_c
if dedent_stack[-1] >= 0:
~~~~~~~~~~~~^^^^
IndexError: list index out of range
Environment:
| OS | Win 10 (22H2) |
|---|
| Uncrustify | 0.72.0_f |
|---|
| Python | 3.11.1 |
|---|
| Package | Version |
|---|---|
| black | 22.12.0 |
| click | 8.1.3 |
| colorama | 0.4.6 |
| mypy-extensions | 0.4.3 |
| pathspec | 0.11.0 |
| pip | 23.0 |
| platformdirs | 2.6.2 |
| pyserial | 3.5 |
| setuptools | 65.5.0 |
tools/codeformat.py: Fix IndexError and improve robustness in fixup_c
Description
The fixup_c function in tools/codeformat.py currently contains a logic flaw when processing C files. Specifically, when a preprocessor directive (like #if) appears near the end of a file, the script attempts to "peek" at the next line using lines[line_number]. Since enumerate(lines, 1) makes line_number a 1-based index, this causes an IndexError: list index out of range on the final line of any file.
Additionally, the script is susceptible to crashes if the dedent_stack is not handled correctly on malformed files and lacks protection against argument injection if filenames start with a hyphen.
Code Size
def fixup_c(filename):
# Read file.
with open(filename) as f:
lines = f.readlines()
num_lines = len(lines)
# Write out file with fixups.
with open(filename, "w", newline="") as f:
dedent_stack = []
for line_idx, line in enumerate(lines):
line_number = line_idx + 1 # 1-based for error reporting
# Dedent #'s to match indent of following line
m = re.match(r"( +)#(if |ifdef |ifndef |elif |else|endif)", line)
if m:
indent = len(m.group(1))
directive = m.group(2)
if directive in ("if ", "ifdef ", "ifndef "):
# LOOPHOLE FIX: Check if next line exists before peeking
if line_number < num_lines:
line_next = lines[line_number] # line_number is already line_idx + 1
m_next = re.match(r"( *)", line_next)
indent_next = len(m_next.group(1)) if m_next else 0
if indent - 4 == indent_next and re.match(r" +(} else |case )", line_next):
line = line[4:]
dedent_stack.append(indent - 4)
else:
dedent_stack.append(-1)
else:
# End of file reached, no next line to match
dedent_stack.append(-1)
else:
if not dedent_stack:
raise IndentationError(
f'dedent stack is empty for "{directive}" at {filename}:{line_number}'
)
if dedent_stack[-1] >= 0:
indent_diff = indent - dedent_stack[-1]
# Ensure we don't slice more than available indentation
line = line[max(0, indent_diff):]
if directive == "endif":
dedent_stack.pop()
f.write(line)
if dedent_stack:
print(f"Warning: Unbalanced directives in {filename}")
IMPROVED BATCH LOGIC
def batch(cmd, N=200):
files_iter = iter(files)
while True:
file_args = list(itertools.islice(files_iter, N))
if not file_args:
break
# SECURITY FIX: Use "--" to prevent argument injection from filenames
subprocess.check_call(cmd + ["--"] + file_args)
Implementation
I hope the MicroPython maintainers or community will implement this feature
Code of Conduct
Yes, I agree