Hello! I noticed that each gemm call allocates memory on the heap for "packing buffers".
So maybe there is a way to give the user the ability to pre-allocate all the necessary memory and simply pass it as an argument without using a global allocator?
For example, add helper function(s) to the public api to calculate the required memory for a particular type/shape/kernel, and then initialize aligned_alloc::Alloc struct from a pointer or something like that.
I think this feature can help use the crate in real-time programs where hidden allocations are not welcome