The goal of this repository is to reimplement the paper Dual-sPLS: a family of Dual Sparse Partial Least Squares regressions for feature selection and prediction with tunable sparsity... from R to Python. It can be used both as a learning tool to understand the theory behind the algorithm and as a standalone installable library.
To start using the repository with the ability to modify it locally, you can clone it:
git clone https://github.com/malerbe/Dual-sPLS.git
and then install it using pip:
cd ./Dual-sPLS
pip install -e .
To install and use the library without cloning it locally, it is also available on Pypi:
pip install dual-spls
It is suggested to use the notebook notebooks/predict_simulated.ipynb as a "documentation" to understand how to use different features implemented in the library. Reading the docstrings and the commentaries in the code will allow a better understanding of what the arguments correspond to.
The library also allows the user to generate synthetic data as presented in the paper. To see how to use the generation function, see: notebooks/simulate.ipynb
If your goal is to fully grasp the mechanics behind the algorithms, it is recommended to follow the explanation notebooks in this specific order:
-
Fundamentals:
docs/PLS.ipynb -
Introducing Sparsity:
docs/sPLS.ipynb -
The Dual Approach:
docs/Dual_sPLS.ipynb
It is then possible to fully understand the first production implementation src/dual_spls/lasso.py easily as it only uses code already explained and implemented the last docs/Dual_sPLS.ipynb notebook.