This repository provides a native Ruby extension for fast matrix/tensor operations, packaged as the tensor gem. It uses OpenMP for parallelism and can be linked against OpenBLAS for better numeric performance.
- Build the extension and gem:
ruby extconf.rbmakegem build tensor.gemspec
- Install locally:
gem install tensor-0.1.0.gem
In Ruby:
require "tensor"This defines a top-level Tensor class backed by the C extension (an order-2 tensor). Matrix is provided as a backwards-compatible alias.
m1 = Tensor.new(2, 2, dtype: "float32")
m2 = Tensor.new(2, 2, dtype: "float32")
product = m1.multiply(m2) # or m1.matmul(m2)
relu = product.relu
softmax = product.softmax
loss, grad = softmax.cross_entropy_loss([0, 1])- Supported dtypes for most math ops:
"float32","float64". - Integer dtypes (
"int16","int8") are supported for storage, indexing, and conversion, but many numeric operations will raiseArgumentErrorif used with integer matrices.
- Create tensors by shape:
Tensor.zeros(shape: [batch, channels, height, width], dtype: "float32")t.shape #=> [b, c, h, w],t.rank,t.size.
- Build from nested Ruby arrays:
Tensor.from_array(nested, "float32")wherenestedis a rectangular N-D array.
- N-D indexing:
t[batch, channel, i, j]to get/set a scalar element (negative indices supported).t.to_areturns nested arrays matchingt.shape.
For now, higher-order tensors primarily support storage, indexing, and conversion; linear algebra and neural-network operations (e.g., matmul, softmax, cross_entropy_loss) are defined for 2D tensors, and batched N-D versions can be added next.
- The C code uses OpenMP (
#pragma omp parallel for) for:- Matrix multiplication (
multiply/matmul) - Element-wise ops (
subtract,hadamard,scale, ReLU, ReLU grad) - Cross-entropy loss and gradient
- Matrix multiplication (
- The number of threads is chosen as:
OMP_NUM_THREADS(if set by your environment/OpenMP runtime), otherwiseomp_get_max_threads()at runtime, or a fallback of 16 when OpenMP is not available.
To force single-threaded execution (for debugging or deterministic benchmarking), set:
export OMP_NUM_THREADS=1To allow more parallelism (on a machine with many cores), set for example:
export OMP_NUM_THREADS=8Note: the extension currently runs heavy OpenMP regions under Ruby’s Global VM Lock (GVL), so long-running operations can block other Ruby threads even though native work is parallelised internally. Use separate Ruby processes (e.g., via Process.fork or job systems) for true multi-process parallelism.
- Ruby with development headers (e.g., via
rbenv/rvmandruby-build). - A C toolchain:
- macOS: Xcode Command Line Tools (
xcode-select --install). - Linux:
gcc,make, and standard build tools.
- macOS: Xcode Command Line Tools (
- Optional but recommended:
- OpenMP runtime (
libompon macOS,libgomp/libompon Linux) for multi-threaded kernels. - OpenBLAS for potential BLAS-level optimizations.
- OpenMP runtime (
On macOS with Homebrew:
brew install libomp openblasThe extconf.rb script will auto-detect these libraries via pkg-config and common Homebrew paths. If they are missing, the gem will still attempt to build, but some optimizations (OpenMP/BLAS) may be disabled.