Clang undefined behavior sanitizer diagnostics (mostly uninteresting??)
Port, board and/or hardware
unix port, coverage build, x86_64 linux, clang-19
MicroPython version
v1.27.0-preview-15-g744270ac1b
Reproduction
perform the undefined behavior sanitizer build but with CC=clang, then try doing pretty much anything (such as starting micropython to the repl)
Expected behaviour
It works and is essentially free of undefined behavior diagnostics.
Observed behaviour
Several classes of diagnostic appear almost immediately.
I investigated two main classes of diagnostic:
- Applying zero offsets to NULL pointers
- Calling functions without exactly matching prototypes
Here's an example of each kind:
../../py/map.c:193:37: runtime error: applying zero offset to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../py/map.c:193:37
../../py/stream.c:60:28: runtime error: call to function vfs_posix_file_write through pointer to incorrect function type 'unsigned long (*)(void *, void *, unsigned long, int *)'
/home/jepler/src/micropython/ports/unix/../../extmod/vfs_posix_file.c:129: note: vfs_posix_file_write defined here
These are both classes of "technically forbidden per the C specification but work fine almost always in practice".
The first can be replaced by an extra guard check, but at the possible cost of code. For example,
- const mp_obj_t *kwargs = args + n_args;
+ const mp_obj_t *kwargs = args ? args + n_args : NULL;
As discussed in the old sanitizer threads, I think this specific behavior is set to become defined ( (NULL+0 is NULL) in a future C standard.
The second is harder to resolve. For instance, this technically means the trick of calling either a read or write func through a function pointer with the read type is incorrect (the prototypes differ only by whether the data argument is const:
if (flags & MP_STREAM_RW_WRITE) {
io_func = (io_func_t)stream_p->write;
} else {
io_func = stream_p->read;
}
... mp_uint_t out_sz = io_func(stream, buf, size, errcode); ...
I didn't find a fine grained method to turn off these diagnostics. For instance, the first one is under the general umbrella of "pointer overflow" checks, which includes actual overflow in pointer arithmetic like uint32_t *ptr; ptr[large] when large * sizeof(uint32_t) makes the address wrap around.
Additional Information
I was interested in clang ubsan because the AFLplusplus fuzzer can be run in a mode where it treats sanitizer diagnostics as crashes. However, it defaulted to using clang rather than gcc, so I discovered that it really doesn't like the current state of micropython and so it can't make any interesting findings.
Oh here's a bonus that I found when preparing this issue. It occurs when building an empty list (and, probably, tuple). It results because unsigned subtraction is being used but the intent is to grow the stack by an element. Technically it is an overflowed subtraction so it is undefined behavior. but not interesting. More uninteresting signed overflows appear in vm.c and touching any of them is likely to cause code growth without benefit.
Starting program: /home/jepler/src/micropython/ports/unix/build-coverage/micropython -c '[]'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
../../py/vm.c:832:24: runtime error: subtraction of unsigned offset from 0x7fffffffd920 overflowed to 0x7fffffffd928
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../py/vm.c:832:24
Code of Conduct
Yes, I agree
emitbc: Avoid undefined behavior calling memset()
When micropython is built with 'clang -fsanitize=undefined', a diagnostic like the following will occur:
$ UBSAN_OPTIONS=abort_on_error=1 ./micropython_fuzzing -c 'print(1)'
../../py/emitbc.c:319:16: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:62:62: note: nonnull attribute specified here
Aborted
Traditionally, memset(NULL, value, 0) has been accepted without causing problems. However, it is not standards-compliant behavior; and for instance Ted Unangst of the OpenBSD project notes that "A smart C compiler may observe a call to memcpy, flag both pointers as valid, and then delete any null checks. Forwards and backwards."
https://www.tedunangst.com/flak/post/zero-size-objects
Since micropython is using -fdelete-null-pointer-checks ("enabled by default on most targets") and it is probably giving good code size improvements, we have to pay a modest price and add a few checks.