-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
Open
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)type-crashA hard crash of the interpreter, possibly with a core dumpA hard crash of the interpreter, possibly with a core dump
Description
Crash report
What happened?
The problem occurs after the commit f8290df (Python 3.13+)
The minimum reproducer:
- Checkout cpython repository
./configuremakecd Programsecho "# -*- coding: UTF -*-" > crash.py./_freeze_module crash crash.py crash.h- Segmentation fault
Program received signal SIGSEGV, Segmentation fault.
PyDict_GetItemRef (op=0x0, key='utf', result=result@entry=0x7fffffffdb70) at ./Include/object.h:795
795 return ((flags & feature) != 0);
(gdb) bt
#0 PyDict_GetItemRef (op=0x0, key='utf', result=result@entry=0x7fffffffdb70) at ./Include/object.h:795
#1 0x00005555557ab5a7 in _PyCodec_Lookup (encoding=encoding@entry=0x7ffff7bc4050 "UTF") at Python/codecs.c:164
#2 0x00005555557ac2e2 in _PyCodec_LookupTextEncoding (alternate_command=0x55555593d93d "codecs.decode()", encoding=0x7ffff7bc4050 "UTF") at Python/codecs.c:525
#3 codec_getitem_checked (index=1, alternate_command=0x55555593d93d "codecs.decode()", encoding=0x7ffff7bc4050 "UTF") at Python/codecs.c:574
#4 _PyCodec_TextDecoder (encoding=0x7ffff7bc4050 "UTF") at Python/codecs.c:590
#5 _PyCodec_DecodeText (object=object@entry=<memoryview at remote 0x7ffff7b88280>, encoding=encoding@entry=0x7ffff7bc4050 "UTF", errors=errors@entry=0x0) at Python/codecs.c:612
#6 0x000055555573fb39 in PyUnicode_Decode (s=s@entry=0x7ffff7b5dfb0 "# -*- coding: UTF -*-\n", size=<optimized out>, encoding=encoding@entry=0x7ffff7bc4050 "UTF", errors=errors@entry=0x0)
at Objects/unicodeobject.c:3712
#7 0x000055555574007f in PyUnicode_Decode (s=s@entry=0x7ffff7b5dfb0 "# -*- coding: UTF -*-\n", size=<optimized out>, encoding=<optimized out>, encoding@entry=0x7ffff7bc4050 "UTF", errors=<optimized out>,
errors@entry=0x0) at Objects/unicodeobject.c:3730
#8 0x000055555560f706 in _PyTokenizer_translate_into_utf8 (str=str@entry=0x7ffff7b5dfb0 "# -*- coding: UTF -*-\n", enc=0x7ffff7bc4050 "UTF") at Parser/tokenizer/helpers.c:206
#9 0x000055555560ecfc in decode_str (preserve_crlf=<optimized out>, tok=0x555555b7d510, single=<optimized out>, input=<optimized out>) at Parser/tokenizer/string_tokenizer.c:103
#10 _PyTokenizer_FromString (str=<optimized out>, exec_input=<optimized out>, preserve_crlf=<optimized out>) at Parser/tokenizer/string_tokenizer.c:125
#11 0x00005555555da1e7 in _PyPegen_run_parser_from_string (str=str@entry=0x555555b4c4a0 "# -*- coding: UTF -*-\n", start_rule=start_rule@entry=257, filename_ob=filename_ob@entry='<frozen crash>',
flags=flags@entry=0x0, arena=arena@entry=0x7ffff7b5df70) at Parser/pegen.c:1054
#12 0x000055555560a0e6 in _PyParser_ASTFromString (str=str@entry=0x555555b4c4a0 "# -*- coding: UTF -*-\n", filename=filename@entry='<frozen crash>', mode=mode@entry=257, flags=flags@entry=0x0,
arena=arena@entry=0x7ffff7b5df70) at Parser/peg_api.c:13
#13 0x0000555555826df5 in Py_CompileStringObject (optimize=0, flags=0x0, start=257, filename='<frozen crash>', str=0x555555b4c4a0 "# -*- coding: UTF -*-\n") at Python/pythonrun.c:1517
#14 Py_CompileStringExFlags (str=str@entry=0x555555b4c4a0 "# -*- coding: UTF -*-\n", filename_str=filename_str@entry=0x555555b4c2a0 "<frozen crash>", start=start@entry=257, flags=flags@entry=0x0,
optimize=optimize@entry=0) at Python/pythonrun.c:1545
#15 0x00005555555c5398 in compile_and_marshal (text=0x555555b4c4a0 "# -*- coding: UTF -*-\n", name=0x7fffffffe2ed "crash") at Programs/_freeze_module.c:117
#16 main (argc=<optimized out>, argv=<optimized out>) at Programs/_freeze_module.c:231If build with --with-pydebug:
_freeze_module: Python/codecs.c:149: _PyCodec_Lookup: Assertion `interp->codecs.initialized' failed.
The problem is not very popular, but if you are building an analog of _freeze_module for yourself, it will segfault on problematic encodings. So we found the following cases in our repository: UTF, U8 :)
Before commit f8290df in _PyCodec_Lookup, if codecs was not initialized, then we tried to initialize it, and if it failed, NULL was returned
if (interp->codec_search_path == NULL && _PyCodecRegistry_Init()) {
return NULL;
}My naive solution is to replace assert with the old behavior:
@@ -138,6 +138,9 @@ PyObject *_PyCodec_Lookup(const char *encoding)
}
.
PyInterpreterState *interp = _PyInterpreterState_GET();
- assert(interp->codecs.initialized);
+ if (!interp->codecs.initialized) {
+ return NULL;
+ }
.
/* Convert the encoding to a normalized Python string: allCPython versions tested on:
3.13, 3.14, 3.15, CPython main branch
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
Python 3.15.0a2+ (heads/main:c98182be8d4, Dec 13 2025, 16:49:21) [GCC 9.4.0]
Metadata
Metadata
Assignees
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)type-crashA hard crash of the interpreter, possibly with a core dumpA hard crash of the interpreter, possibly with a core dump
Projects
Status
No status