From 4843d2f918941e50a2892ed3de6ab5ece877c63b Mon Sep 17 00:00:00 2001 From: yiseungmi87 Date: Wed, 26 Nov 2025 23:54:24 +0100 Subject: [PATCH 1/2] Docs: extended quickstart and install/run improvements --- docs/site/quickstart_extended.md | 49 ++++++ docs/site/release_install.md | 122 +++++++++++++++ docs/site/run_extended.md | 253 +++++++++++++++++++++++++++++++ docs/site/source_install.md | 159 +++++++++++++++++++ 4 files changed, 583 insertions(+) create mode 100644 docs/site/quickstart_extended.md create mode 100644 docs/site/release_install.md create mode 100644 docs/site/run_extended.md create mode 100644 docs/site/source_install.md diff --git a/docs/site/quickstart_extended.md b/docs/site/quickstart_extended.md new file mode 100644 index 00000000000..1869c41d69b --- /dev/null +++ b/docs/site/quickstart_extended.md @@ -0,0 +1,49 @@ +# Extended Quickstart Guide + +Welcome to the extended quickstart guide for Apache SystemDS. This quickstart page provides a high-level overview of both installation and points you to the detailed documentation for each path. + +SystemDS can be installed and used in two different ways: + +1. Using a **downloaded release** +2. Using a **source build** + +Each method is demonstrated in: +- Local mode +- Spark mode +- Federated mode (simple example) + +For detailed configuration topics (BLAS, GPU, federated setup, contributing), see the links at the end. + +--- + +# 1. Install from a Release + +If you simply want to *use* SystemDS without modifying the source code, the recommended approach is to install SystemDS from an official Apache release. + +**Full Release Installation Guide:** [SystemDS Install from release](https://apache.github.io/systemds/site/release_install.html) + +# 2. Install from Source + +If you plan to contribute to SystemDS or need to modify its internals, you can build SystemDS from source. + +**Full Source Build Guide:** [SystemDS Install from source](https://apache.github.io/systemds/site/source_install.html) + +# 3. After Installation + +Once either installation path is completed, you can start running scripts: + +- Local Mode - Run SystemDS locally +- Spark Mode - Execute scripts on Spark through `spark-submit` +- Federated Mode - Run operations on remote data using federated workers + +For detailed commands and examples: [Execute SystemDS](https://apache.github.io/systemds/site/run_extended.html) + +# 4. More Configuration + +SystemDS provides advanced configuration options for performance tuning and specialized execution environments. + +- GPU Support — [GPU Guide](https://apache.github.io/systemds/site/gpu) +- BLAS / Native Acceleration — [Native Backend (BLAS) Guide](https://apache.github.io/systemds/site/native-backend) +- Federated Backend Deployment — [Federated Guide](https://apache.github.io/systemds/site/federated-monitoring.html) +- Contributing to SystemDS — [Contributing Guide](https://github.com/apache/systemds/blob/main/CONTRIBUTING.md) + diff --git a/docs/site/release_install.md b/docs/site/release_install.md new file mode 100644 index 00000000000..384cf48d9a1 --- /dev/null +++ b/docs/site/release_install.md @@ -0,0 +1,122 @@ + +# Install SystemDS from a Release + +This guide explains how to install and set up SystemDS using the pre-built release archives. + +--- + +- [1. Download a Release](#1-download-a-release) +- [2. Install on Windows](#2-install-on-windows) +- [3. Install on Ubuntu 22.04](#3-install-on-ubuntu-2204) +- [4. Install on macOS](#4-install-pon-macos) +- [5. Verify the Installation](#5-verify-the-installation) + +--- + +# 1. Download a Release + +Download the official release archive from the Apache SystemDS website: + +https://apache.org/dyn/closer.lua/systemds/ + +After downloading the file `systemds-.tar.gz`, place the file in any directory you choose for installation. + +### Java Requirement ### +For compatability with Spark execution and parser components, **Java 17** is strongly recommended for SystemDS. + +Verify Java 17: + +```bash +java -version +``` + +If missing, install a JDK 17 distribution. + +--- + +# 2. Install on Windows + +### 2.1 Extract the Release Archive + +Use Windows built-in extractor. + +### 2.2 Set Evironment Variables + +To run SystemDS from the command line, configure: +- `SYSTEMDS_ROOT`-> the extracted folder +- Add `%SYSTEMDS_ROOT%\bin` to your `PATH` + +Example (PowerShell): + +```bash +setx SYSTEMDS_ROOT "C:\path\to\systemds-" +setx PATH "$env:SYSTEMDS_ROOT\bin;$env:PATH" +``` + +Restart the terminal afterward. + +# 3. Install on Ubuntu 22.04 + +### 3.1 Extract the Release + +```bash +cd /path/to/install +tar -xvf systemds-.tar.gz +cd systemds- +``` + +### 3.2 Add SystemDS to PATH + +```bash +export SYSTEMDS_ROOT=$(pwd) +export PATH="$SYSTEMDS_ROOT/bin:$PATH" +``` + +# 4. Install on macOS + +### 4.1 Extract the Release + +```bash +cd /path/to/install +tar -xvf systemds-.tar.gz +cd systemds- +``` +### 4.2 Add SystemDS to PATH + +```bash +export SYSTEMDS_ROOT=$(pwd) +export PATH="$SYSTEMDS_ROOT/bin:$PATH" +``` + +# Verify the Installation + +### 5.1 Check the CLI + +```bash +systemds -help +``` + +You should see usage information printed to the console. + +### 5.2 Create a Simple Script + +```bash +echo 'print("Hello World!")' > hello.dml +``` + +### 5.3 Run the Script + +```bash +systemds -f hello.dml +``` + +Expected output: + +```bash +Hello World! +``` + +# Next Steps + +For running scripts in Spark mode or experimenting with federated workers, see the Execution Guide: [Execute SystemDS](run_extended.md) + diff --git a/docs/site/run_extended.md b/docs/site/run_extended.md new file mode 100644 index 00000000000..9aa07d3c6ed --- /dev/null +++ b/docs/site/run_extended.md @@ -0,0 +1,253 @@ +# Running SystemDS + +This guide explains how to run SystemDS regardless of whether you installed it from a Release or built it from Source. All execution modes -local, Spark, and federated- are covered in this document. + +--- + +- [1. Prerequisites](#1-prerequisites) +- [2. Set SYSTEMDS_ROOT and PATH](#2-set-systemds_root-and-path) +- [3. Run a Simple Script Locally](#3-run-a-simple-script-locally) +- [4. Run a Script on Spark](#4-run-a-script-on-spark) +- [5. Run a Script in Federated Mode](#5-run-a-script-in-federated-mode) + +--- + +# 1. Prerequisites + +### Java Requirement ### +For compatability with Spark execution and parser components, **Java 17** is strongly recommended for SystemDS. + +Verify Java version: + +```bash +java -version +``` + +### Spark (required only for Spark execution) ### + +- Use Spark 3.x. +- Spark 4.x is not supported due to ANTLR runtime incompatibilities. + +Verify Spark version: + +```bash +spark-submit --version +``` + +--- + +# 2. Set SYSTEMDS_ROOT and PATH + +This step is required for both Release and Source-build installations. Run the following in the root directory of your SystemDS installation: + +```bash +export SYSTEMDS_ROOT=$(pwd) +export PATH="$SYSTEMDS_ROOT/bin:$PATH" +``` + +It can be beneficial to enter these into your `~/.profile` or `~/.bashrc` for linux, +(but remember to change `$(pwd` to the full folder path) +or your environment variables in windows to enable reuse between terminals and restarts. + +```bash +echo 'export SYSTEMDS_ROOT='$(pwd) >> ~/.bashrc +echo 'export PATH=$SYSTEMDS_ROOT/bin:$PATH' >> ~/.bashrc +``` +--- +# 3. Run a Simple Script Locally + +This mode does not require Spark. It only needs Java 17. + +### 3.1 Create and Run a Hello World + +```bash +echo 'print("Hello, World!")' > hello.dml +``` + +Run: + +```bash +systemds -f hello.dml +``` + +Expected output: + +```bash +Hello, World! +``` + +### (Optional) MacOS Note: `realpath: illegal option -- -` Error +If you are running MacOS and encounter an error message similar to `realpath: illegal option -- -` when executing `systemds hello.dml`. You may try to replace the system-wide command `realpath` with the homebrew version `grealpath` that comes with the `coreutils`. Alternatively, you may change all occurrences within the script accordingly, i.e., by prepending a `g` to avoid any side effects. + +### 3.2 Run a Real Example + +This example demonstrates local execution of a real script `Univar-stats.dml`. The relevant commands to run this example with SystemDS is described in the DML Language reference guide at [DML Language Reference](dml-language-reference.html). + +Prepare the data (macOS: use `curl`instead of `wget`): +```bash +# download test data +wget -P data/ http://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data + +# generate a metadata file for the dataset +echo '{"rows": 306, "cols": 4, "format": "csv"}' > data/haberman.data.mtd + +# generate type description for the data +echo '1,1,1,2' > data/types.csv +echo '{"rows": 1, "cols": 4, "format": "csv"}' > data/types.csv.mtd +``` + +Execute the DML Script: +```bash +systemds -f scripts/algorithms/Univar-Stats.dml -nvargs \ + X=data/haberman.data \ + TYPES=data/types.csv \ + STATS=data/univarOut.mtx \ + CONSOLE_OUTPUT=TRUE +``` + +### (Optional) MacOS Note: `SparkException` Error +If SystemDS tries to initialize Spark and you see `SparkException: A master URL must be set in your configuration`, you can force single-node execution without Spark/Hadoop initialization via: + +```bash +systemds -exec singlenode -f scripts/algorithms/Univar-Stats.dml -nvargs \ + X=data/haberman.data \ + TYPES=data/types.csv \ + STATS=data/univarOut.mtx \ + CONSOLE_OUTPUT=TRUE +``` + +The script computes basic statistics (min, max, variance, skewness, etc) for each column of a dataset. Expected output (example): +```bash +------------------------------------------------- +Feature [1]: Scale + (01) Minimum | 30.0 + (02) Maximum | 83.0 + (03) Range | 53.0 + (04) Mean | 52.45751633986928 + (05) Variance | 116.71458266366658 + (06) Std deviation | 10.803452349303281 + (07) Std err of mean | 0.6175922641866753 + (08) Coeff of variation | 0.20594669940735139 + (09) Skewness | 0.1450718616532357 + (10) Kurtosis | -0.6150152487211726 + (11) Std err of skewness | 0.13934809593495995 + (12) Std err of kurtosis | 0.277810485320835 + (13) Median | 52.0 + (14) Interquartile mean | 52.16013071895425 +------------------------------------------------- +Feature [2]: Scale + (01) Minimum | 58.0 + (02) Maximum | 69.0 + (03) Range | 11.0 + (04) Mean | 62.85294117647059 + (05) Variance | 10.558630665380907 + (06) Std deviation | 3.2494046632238507 + (07) Std err of mean | 0.18575610076612029 + (08) Coeff of variation | 0.051698529971741194 + (09) Skewness | 0.07798443581479181 + (10) Kurtosis | -1.1324380182967442 + (11) Std err of skewness | 0.13934809593495995 + (12) Std err of kurtosis | 0.277810485320835 + (13) Median | 63.0 + (14) Interquartile mean | 62.80392156862745 +------------------------------------------------- +Feature [3]: Scale + (01) Minimum | 0.0 + (02) Maximum | 52.0 + (03) Range | 52.0 + (04) Mean | 4.026143790849673 + (05) Variance | 51.691117539912135 + (06) Std deviation | 7.189653506248555 + (07) Std err of mean | 0.41100513466216837 + (08) Coeff of variation | 1.7857418611299172 + (09) Skewness | 2.954633471088322 + (10) Kurtosis | 11.425776549251449 + (11) Std err of skewness | 0.13934809593495995 + (12) Std err of kurtosis | 0.277810485320835 + (13) Median | 1.0 + (14) Interquartile mean | 1.2483660130718954 +------------------------------------------------- +Feature [4]: Categorical (Nominal) + (15) Num of categories | 2 + (16) Mode | 1 + (17) Num of modes | 1 +SystemDS Statistics: +Total execution time: 0,470 sec. +``` + +To check the location of output file created: +```bash +ls -l data/univarOut.mtx +``` +--- +# 4. Run a Script on Spark + +SystemDS can be executed on Spark using the main executable JAR. The location of this JAR differs depending on whether you installed SystemDS from: + +- a **Release archive**, or +- a **Source-build installation** (built with Maven) + +### 4.1 Running with a Release installation + +If you installed SystemDS from a release archive, the main JAR is located at: + +```bash +SystemDS.jar +``` + +Run: + +```bash +spark-submit SystemDS.jar -f hello.dml +``` + +### 4.2 Running with a Source-build installation + +If you cloned the SystemDS repository and built it yourself, you must first run Maven to generate the executable JAR. + +```bash +mvn -P distribution package +``` +This creates several JAR files in `target/`: + +Example output: + +```bash +target/systemds-3.3.0-shaded.jar +target/systemds-3.3.0.jar +target/systemds-3.3.0-unshaded.jar +target/systemds-3.3.0-extra.jar +target/SystemDS.jar <-- main runnable JAR +target/systemds-3.3.0-ropt.jar +target/systemds-3.3.0-javadoc.jar +``` + +Run: + +```bash +spark-submit target/SystemDS.jar -f hello.dml +``` +--- +# 5. Run a Script in Federated Mode + +Federated mode allows SystemDS to execute operations on data located on remote or distributed workers. Federated execution requires: + +1. One or more **federated workers** +2. A **driver program** (DML or Python) that sends operations to those workers. + +Note: The SystemDS documentation provides federated execution examples primarily via the Python API. This Quickstart demonstrates only how to start a federated worker, and refers users to the official Federated Environment guide for complete end-to-end examples. + +### 5.1 Start a federated worker + +Run in a separate terminal: + +```bash +systemds WORKER 8001 +``` + +This starts a worker on port `8001`. + +### 5.2 Next steps and full examples + +For complete, runnable examples of federated execution (including data files, metadata, and Python code), see the official [Federated Environment guide](https://systemds.apache.org/docs/2.1.0/api/python/guide/federated.html) + diff --git a/docs/site/source_install.md b/docs/site/source_install.md new file mode 100644 index 00000000000..437bebb7991 --- /dev/null +++ b/docs/site/source_install.md @@ -0,0 +1,159 @@ +# Install SystemDS from Source + +This guide helps in the install and setup of SystemDS from source code. + +--- + +- [1. Install on Windows](#1-install-on-windows) +- [2. Install on Ubuntu 22.04](#2-install-on-ubuntu-2204) +- [3. Install on macOS](#3-install-on-macos) +- [4. Build the Project](#4-build-the-project) +- [5. Run a Component Test](#5-run-a-component-test) +- [6. Next Steps](#6-next-steps) + +Once the individual versions is set up skip to the common part of building the system. + +--- + +# 1. Install on Windows + +First setup java and maven to compile the system note the java version is 17, we suggest using Java OpenJDK 17. + +- +- + +Setup your environment variables with JAVA_HOME and MAVEN_HOME. Using these variables add the JAVA_HOME/bin and MAVEN_HOME/bin to the path environment variable. An example of setting it for java can be found here: + +To run the system we also have to setup some Hadoop and spark specific libraries. These can be found in the SystemDS repository. To add this, simply take out the files, or add 'src/test/config/hadoop_bin_windows/bin' to PATH. Just like for JAVA_HOME set a HADOOP_HOME to the environment variable without the bin part, and add the `%HADOOP_HOME%\bin` to path. + +Finally if you want to run systemds from command line, add a SYSTEMDS_ROOT that points to the repository root, and add the bin folder to the path. + +To make the build go faster set the IDE or environment variables for java: '-Xmx16g -Xms16g -Xmn1600m'. Here set the memory to something close to max memory of the device you are using. + +To start editing the files remember to import the code style formatting into the IDE, to keep the changes of the files consistent. + +A suggested starting point would be to run some of the component tests from your IDE. + +# 2. Install on Ubuntu 22.04 + +First setup java and maven to compile the system note that the java version is 17. + +```bash +sudo apt install openjdk-17-jdk +sudo apt install maven +``` + +Verify the install with: + +```bash +java -version +mvn -version +``` + +This should return something like: + +```bash +openjdk 17.0.11 2024-04-16 +OpenJDK Runtime Environment Temurin-17.0.11+9 (build 17.0.11+9) +OpenJDK 64-Bit Server VM Temurin-17.0.11+9 (build 17.0.11+9, mixed mode, sharing) + +Apache Maven 3.9.9 (8e8579a9e76f7d015ee5ec7bfcdc97d260186937) +Maven home: /home/usr/Programs/maven +Java version: 17.0.11, vendor: Eclipse Adoptium, runtime: /home/usr/Programs/jdk-17.0.11+9 +Default locale: en_US, platform encoding: UTF-8 +OS name: "linux", version: "6.8.0-59-generic", arch: "amd64", family: "unix" +``` + +#### Testing + +R should be installed to run the test suite, since many tests are constructed to compare output with common R packages. +One option to install this is to follow the guide on the following link: + +At the time of writing the commands to install R 4.0.2 are: + +```bash +sudo apt install dirmngr gnupg apt-transport-https ca-certificates software-properties-common +sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 +sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' +sudo apt install r-base +``` + +Optionally, you need to install the R dependencies for integration tests, like this: +(use `sudo` mode if the script couldn't write to local R library) + +```bash +Rscript ./src/test/scripts/installDependencies.R +``` + +# 3. Install on MacOS + +Prerequisite install homebrew on the device. + +```bash +# To allow relative paths: +brew install coreutils +# To install open jdk 17. +brew install openjdk@17 +# Install maven to enable compilation of SystemDS. +brew install maven +``` + +Then afterwards verify the install: + +```bash +java --version +mvn --version +``` + +This should print java version. + +Note that if you have multiple __java__ versions installed then you have to change the used version to 17, on __both java and javadoc__. This is done by setting the environment variable JAVA_HOME to the install path of open JDK 17 : + +```bash +export JAVA_HOME=`/usr/libexec/java_home -v 17` +``` + +For running all tests [r-base](https://cran.r-project.org/bin/macosx/) has to be installed as well since this is used as a secondary system to verify the correctness of our code, but it is not a requirement to enable building the project. + +Optionally, you need to install the R dependencies for integration tests, like this: +(use `sudo` mode if the script couldn't write to local R library) + +```bash +Rscript ./src/test/scripts/installDependencies.R +``` + +# 4. Build the project + +To compile the project use: + +```bash +mvn package -P distribution +``` + +Example output: +```bash +[INFO] ------------------------------------------------------------------------ +[INFO] BUILD SUCCESS +[INFO] ------------------------------------------------------------------------ +[INFO] Total time: 31.730 s +[INFO] Finished at: 2020-06-18T11:00:29+02:00 +[INFO] ------------------------------------------------------------------------ +``` + +The first time you package the system it will take longer since maven will download the dependencies. +But successive compiles should become faster. The runnable JAR files will appear in `target/` + +# 5. Run A Component Test + +As an example here is how to run the component matrix tests from command line via maven. + +```bash +mvn test -Dtest="**.component.matrix.**" +``` + +To run other tests simply specify other packages by modifying the +test argument part of the command. + +# 6. Next Steps + +Now everything is setup and ready to go! For running scripts in Spark mode or experimenting with federated workers, see the Execution Guide: [Execute SystemDS](run_extended.md) From c72c6bd07de8d7e3146e4a7e3131102fde47b819 Mon Sep 17 00:00:00 2001 From: yiseungmi87 Date: Mon, 29 Dec 2025 11:42:07 +0900 Subject: [PATCH 2/2] docs: adress os-specific install issues and further changes --- docs/site/quickstart_extended.md | 2 + docs/site/release_install.md | 112 +++++++++++++++++++++++++++---- docs/site/run_extended.md | 76 ++++++++++++--------- docs/site/source_install.md | 29 +++++--- 4 files changed, 169 insertions(+), 50 deletions(-) diff --git a/docs/site/quickstart_extended.md b/docs/site/quickstart_extended.md index 1869c41d69b..3c7791d06ef 100644 --- a/docs/site/quickstart_extended.md +++ b/docs/site/quickstart_extended.md @@ -7,6 +7,8 @@ SystemDS can be installed and used in two different ways: 1. Using a **downloaded release** 2. Using a **source build** +If you are primarily a user of SystemDS, start with the Release installation. If you plan to contribute or modify internals, follow the Source installation. + Each method is demonstrated in: - Local mode - Spark mode diff --git a/docs/site/release_install.md b/docs/site/release_install.md index 384cf48d9a1..3695531549a 100644 --- a/docs/site/release_install.md +++ b/docs/site/release_install.md @@ -8,8 +8,7 @@ This guide explains how to install and set up SystemDS using the pre-built relea - [1. Download a Release](#1-download-a-release) - [2. Install on Windows](#2-install-on-windows) - [3. Install on Ubuntu 22.04](#3-install-on-ubuntu-2204) -- [4. Install on macOS](#4-install-pon-macos) -- [5. Verify the Installation](#5-verify-the-installation) +- [4. Install on macOS](#4-install-on-macos) --- @@ -17,7 +16,7 @@ This guide explains how to install and set up SystemDS using the pre-built relea Download the official release archive from the Apache SystemDS website: -https://apache.org/dyn/closer.lua/systemds/ +https://systemds.apache.org/download After downloading the file `systemds-.tar.gz`, place the file in any directory you choose for installation. @@ -47,7 +46,6 @@ To run SystemDS from the command line, configure: - Add `%SYSTEMDS_ROOT%\bin` to your `PATH` Example (PowerShell): - ```bash setx SYSTEMDS_ROOT "C:\path\to\systemds-" setx PATH "$env:SYSTEMDS_ROOT\bin;$env:PATH" @@ -55,14 +53,58 @@ setx PATH "$env:SYSTEMDS_ROOT\bin;$env:PATH" Restart the terminal afterward. +### 2.3 Verify the Installation by Checking the CLI + +On Windows, the `systemds`CLI wrapper may not be executable. This is expected because the `bin/systemds`launcher is implemented as a shell script, which Windows cannot execute natively. To verify the installation on Windows, navigate to the bin directory and run the JAR directly. Note that running `systemds -help` without JAR may result in a CommandNotFoundExeption: + +```bash +java -jar systemds-3.3.0.jar -help +``` + +You should see usage information as an output printed to the console. + +### 2.4 Create a Simple Script + +On Windows, especially when using PowerShell, creating text files via shell redirection (e.g., echo...) may result in unexpected encoding or invisible characters. This can lead to parsing errors when executing the script, even though the file appears correct in an editor. Therefore, you may try creating the file explicitly using PowerShell: +```bash +Set-Content -Path .\hello.dml -Value 'print("Hello World!")' -Encoding Ascii +``` + +This ensures the script is stored as plain text without additional encoding metadata. +Note: This behavior depends on the shell and environment configuration and may not affect all Windows setups. + +Verify the file contents: +```bash +Get-Content .\hello.dml +``` + +Expected output: +```bash +print("Hello World!") +``` + +### 2.5 Run the Script + +Now run the script: +```bash +java -jar systemds-3.3.0.jar -f .\hello.dml +``` + +Expected output: +```bash +Hello World! +SystemDS Statistics: +Total execution time: 0.012 sec. +``` + # 3. Install on Ubuntu 22.04 ### 3.1 Extract the Release ```bash cd /path/to/install -tar -xvf systemds-.tar.gz -cd systemds- +tar -xvf systemds--bin.tgz +cd systemds--bin ``` ### 3.2 Add SystemDS to PATH @@ -72,14 +114,54 @@ export SYSTEMDS_ROOT=$(pwd) export PATH="$SYSTEMDS_ROOT/bin:$PATH" ``` +(Optional but recommended) To make SystemDS available in new terminals, add the following lines to your shell configuration (e.g., ~/.bashrc or ~/.profile): +```bash +export SYSTEMDS_ROOT=/absolute/path/to/systemds- +export PATH=$SYSTEMDS_ROOT/bin:$PATH +``` + +### 3.3 Verify the Installation by Checking the CLI + +```bash +systemds -help +``` + +You should see usage information printed to the console. + +### 3.4 Create a Simple Script + +```bash +echo 'print("Hello World!")' > hello.dml +``` + +### 3.5 Run the Script + +On some Ubuntu setups (including clean Docker images), running SystemDS directly may fail with `Invalid or corrupt jarfile hello.dml` Error. In this case, explicitly pass the SystemDS JAR shipped with the release. + +Locate the JAR in the release root: +```bash +SYSTEMDS_JAR=$(find "$SYSTEMDS_ROOT" -maxdepth 1 -type f -name "systemds-*.jar" | head -n 1) +echo "Using SystemDS JAR: $SYSTEMDS_JAR" +``` + +Then run: +```bash +systemds "$SYSTEMDS_JAR" -f hello.dml +``` + +Expected output: +```bash +Hello World! +``` + # 4. Install on macOS ### 4.1 Extract the Release ```bash cd /path/to/install -tar -xvf systemds-.tar.gz -cd systemds- +tar -xvf systemds--bin.tgz +cd systemds--bin ``` ### 4.2 Add SystemDS to PATH @@ -88,9 +170,15 @@ export SYSTEMDS_ROOT=$(pwd) export PATH="$SYSTEMDS_ROOT/bin:$PATH" ``` -# Verify the Installation +(Optional but recommended) +To make SystemDS available in new terminals, add the following lines +to your shell configuration (e.g., ~/.bashrc or ~/.profile): +```bash +export SYSTEMDS_ROOT=/absolute/path/to/systemds- +export PATH=$SYSTEMDS_ROOT/bin:$PATH +``` -### 5.1 Check the CLI +### 4.3 Verify the Installation by Checking the CLI ```bash systemds -help @@ -98,13 +186,13 @@ systemds -help You should see usage information printed to the console. -### 5.2 Create a Simple Script +### 4.4 Create a Simple Script ```bash echo 'print("Hello World!")' > hello.dml ``` -### 5.3 Run the Script +### 4.5 Run the Script ```bash systemds -f hello.dml diff --git a/docs/site/run_extended.md b/docs/site/run_extended.md index 9aa07d3c6ed..9d73f700de4 100644 --- a/docs/site/run_extended.md +++ b/docs/site/run_extended.md @@ -14,25 +14,15 @@ This guide explains how to run SystemDS regardless of whether you installed it f # 1. Prerequisites -### Java Requirement ### -For compatability with Spark execution and parser components, **Java 17** is strongly recommended for SystemDS. +This guide assumes that SystemDS has already been installed successfully. -Verify Java version: +Please make sure you have followed one of the installation guides: +- [Install SystemDS from a Release](release_install.html) +- [Install SystemDS from Source](source_install.html) -```bash -java -version -``` - -### Spark (required only for Spark execution) ### - -- Use Spark 3.x. -- Spark 4.x is not supported due to ANTLR runtime incompatibilities. - -Verify Spark version: - -```bash -spark-submit --version -``` +In particular, ensure that: +- Java 17 is installed +- Spark 3.x is available if you plan to run SystemDS on Spark --- @@ -45,15 +35,15 @@ export SYSTEMDS_ROOT=$(pwd) export PATH="$SYSTEMDS_ROOT/bin:$PATH" ``` -It can be beneficial to enter these into your `~/.profile` or `~/.bashrc` for linux, -(but remember to change `$(pwd` to the full folder path) -or your environment variables in windows to enable reuse between terminals and restarts. +It can be beneficial to persist these variables in your `~/.profile` or `~/.bashrc`(Linux/macOS) or as environment variables on Windows, so that SystemDS is available across terminal sessions. Make sure to replace the path below with the absolute path to your SystemDS installation. ```bash -echo 'export SYSTEMDS_ROOT='$(pwd) >> ~/.bashrc -echo 'export PATH=$SYSTEMDS_ROOT/bin:$PATH' >> ~/.bashrc +echo 'export SYSTEMDS_ROOT=/absolute/path/to/systemds-' >> ~/.bashrc +echo 'export PATH="$SYSTEMDS_ROOT/bin:$PATH"' >> ~/.bashrc +source ~/.bashrc ``` --- + # 3. Run a Simple Script Locally This mode does not require Spark. It only needs Java 17. @@ -77,9 +67,17 @@ Hello, World! ``` ### (Optional) MacOS Note: `realpath: illegal option -- -` Error -If you are running MacOS and encounter an error message similar to `realpath: illegal option -- -` when executing `systemds hello.dml`. You may try to replace the system-wide command `realpath` with the homebrew version `grealpath` that comes with the `coreutils`. Alternatively, you may change all occurrences within the script accordingly, i.e., by prepending a `g` to avoid any side effects. +If you are running MacOS and encounter an error message similar to `realpath: illegal option -- -` when executing `systemds -f hello.dml`. You may try to replace the system-wide command `realpath` with the homebrew version `grealpath` that comes with the `coreutils`. Alternatively, you may change all occurrences within the script accordingly, i.e., by prepending a `g` to avoid any side effects. + +### (Optional) Ubuntu Note: `Invalid or corrupt jarfile hello.dml` +On some Ubuntu setups (especially clean environments such as Docker images), running `systemds -f hello.dml` may result in an error like `Invalid or corrupt jarfile hello.dml`. If this happens, the SystemDS launcher may not automatically locate the correct JAR. Please refer to the Ubuntu troubleshooting section in the installation guide for a detailed workaround: [Release Installation – Ubuntu Note](release_install.md#optional-ubuntu-note-invalid-or-corrupt-jarfile-hellodml-error) -### 3.2 Run a Real Example +### (Optional) Windows Note: `systemds` Command Not Found +On Windows (e.g., PowerShell), running `systemds -f hello.dml` may fail with an error indicating that `systemds` is not recognized as a command. This is expected, since the `systemds` launcher in `bin/` is implemented as a shell script, +which cannot be executed natively on Windows. In this case, SystemDS should be invoked directly via the runnable JAR using `java -jar`. For a detailed Windows-specific walkthrough, please refer to the installation guide: [Release Installation – Windows Notes](release_install.md#2-install-on-windows) + + +### 3.2 Create a Real Example This example demonstrates local execution of a real script `Univar-stats.dml`. The relevant commands to run this example with SystemDS is described in the DML Language reference guide at [DML Language Reference](dml-language-reference.html). @@ -107,7 +105,6 @@ systemds -f scripts/algorithms/Univar-Stats.dml -nvargs \ ### (Optional) MacOS Note: `SparkException` Error If SystemDS tries to initialize Spark and you see `SparkException: A master URL must be set in your configuration`, you can force single-node execution without Spark/Hadoop initialization via: - ```bash systemds -exec singlenode -f scripts/algorithms/Univar-Stats.dml -nvargs \ X=data/haberman.data \ @@ -116,6 +113,23 @@ systemds -exec singlenode -f scripts/algorithms/Univar-Stats.dml -nvargs \ CONSOLE_OUTPUT=TRUE ``` +### (Optional) Ubuntu Note: `NoClassDefFoundError` Error / JAR Resolution Issues +On some Ubuntu setups, executing the example may fail with a class loading error such as `NoClassDefFoundError: org/apache/commons/cli/AlreadySelectedException`. This happens when the SystemDS launcher script does not automatically resolve the correct executable JAR. In this case, explicitly pass the SystemDS JAR located in the release root directory: +```bash +SYSTEMDS_JAR=$(find "$SYSTEMDS_ROOT" -maxdepth 1 -type f -name "systemds-*.jar" | head -n 1) +echo "Using SystemDS JAR: $SYSTEMDS_JAR" +``` +Then run the example again: +```bash +systemds "$SYSTEMDS_JAR" -f scripts/algorithms/Univar-Stats.dml -nvargs \ + X=data/haberman.data \ + TYPES=data/types.csv \ + STATS=data/univarOut.mtx \ + CONSOLE_OUTPUT=TRUE +``` + +### 3.3 Run the Real Example + The script computes basic statistics (min, max, variance, skewness, etc) for each column of a dataset. Expected output (example): ```bash ------------------------------------------------- @@ -179,6 +193,7 @@ To check the location of output file created: ```bash ls -l data/univarOut.mtx ``` + --- # 4. Run a Script on Spark @@ -189,16 +204,16 @@ SystemDS can be executed on Spark using the main executable JAR. The location of ### 4.1 Running with a Release installation -If you installed SystemDS from a release archive, the main JAR is located at: +If you installed SystemDS from a release archive, locate the runnable JAR in the release root directory. It is typically named like `systemds-.jar`. +Example: ```bash -SystemDS.jar +ls -1 "$SYSTEMDS_ROOT"/*.jar ``` Run: - ```bash -spark-submit SystemDS.jar -f hello.dml +spark-submit "$SYSTEMDS_ROOT/systemds-.jar" -f hello.dml ``` ### 4.2 Running with a Source-build installation @@ -227,6 +242,7 @@ Run: ```bash spark-submit target/SystemDS.jar -f hello.dml ``` + --- # 5. Run a Script in Federated Mode @@ -249,5 +265,5 @@ This starts a worker on port `8001`. ### 5.2 Next steps and full examples -For complete, runnable examples of federated execution (including data files, metadata, and Python code), see the official [Federated Environment guide](https://systemds.apache.org/docs/2.1.0/api/python/guide/federated.html) +For complete, runnable examples of federated execution (including data files, metadata, and Python code), see the official [Federated Environment guide](https://systemds.apache.org/docs/3.3.0/api/python/guide/federated.html) diff --git a/docs/site/source_install.md b/docs/site/source_install.md index 437bebb7991..68d3149ab6c 100644 --- a/docs/site/source_install.md +++ b/docs/site/source_install.md @@ -26,6 +26,13 @@ Setup your environment variables with JAVA_HOME and MAVEN_HOME. Using these vari To run the system we also have to setup some Hadoop and spark specific libraries. These can be found in the SystemDS repository. To add this, simply take out the files, or add 'src/test/config/hadoop_bin_windows/bin' to PATH. Just like for JAVA_HOME set a HADOOP_HOME to the environment variable without the bin part, and add the `%HADOOP_HOME%\bin` to path. +On windows, cloning large repositories via GitHub Desktop may stall in some environments. If this happens, cloning via the Git command line is a reliable alternative. +Example: +```bash +git clone https://github.com/apache/systemds.git +cd systemds +``` + Finally if you want to run systemds from command line, add a SYSTEMDS_ROOT that points to the repository root, and add the bin folder to the path. To make the build go faster set the IDE or environment variables for java: '-Xmx16g -Xms16g -Xmn1600m'. Here set the memory to something close to max memory of the device you are using. @@ -44,14 +51,12 @@ sudo apt install maven ``` Verify the install with: - ```bash java -version mvn -version ``` This should return something like: - ```bash openjdk 17.0.11 2024-04-16 OpenJDK Runtime Environment Temurin-17.0.11+9 (build 17.0.11+9) @@ -124,8 +129,7 @@ Rscript ./src/test/scripts/installDependencies.R # 4. Build the project -To compile the project use: - +To compile the project use in the directory of the source code: ```bash mvn package -P distribution ``` @@ -140,8 +144,18 @@ Example output: [INFO] ------------------------------------------------------------------------ ``` -The first time you package the system it will take longer since maven will download the dependencies. -But successive compiles should become faster. The runnable JAR files will appear in `target/` +The first time you package the system it will take longer since maven will download the dependencies. But successive compiles should become faster. The runnable JAR files will appear in `target/`. + +### (Optional) Add SystemDS CLI to PATH + +After building SystemDS from source, you can add the `bin` directory to your +`PATH` in order to run `systemds` directly from the command line: + +```bash +export SYSTEMDS_ROOT=$(pwd) +export PATH="$SYSTEMDS_ROOT/bin:$PATH" +``` +This allows you to run `systemds` from the repository root. For running the freshly built executable JAR (e.g., `target/SystemDS.jar`) on Spark, see the Spark section in [Execute SystemDS](run_extended.md). # 5. Run A Component Test @@ -151,8 +165,7 @@ As an example here is how to run the component matrix tests from command line vi mvn test -Dtest="**.component.matrix.**" ``` -To run other tests simply specify other packages by modifying the -test argument part of the command. +To run other tests simply specify other packages by modifying the test argument part of the command. # 6. Next Steps