Just some random ideas for the ggml-python high-level api. - [x] Automatic memory / context management - [x] Simplify backend offloading - Immediate mode(?) - File loaders - On-the-fly Quantization