diff --git a/ci/vale/dictionary.txt b/ci/vale/dictionary.txt index 10ee9a02a4b..fe990f0160f 100644 --- a/ci/vale/dictionary.txt +++ b/ci/vale/dictionary.txt @@ -1909,6 +1909,7 @@ pg_dump pg_dumpall pg_restore pgAdmin +PGvector pgpass Phalcon pharmer diff --git a/docs/contributors/sander-rodenhuis/_index.md b/docs/contributors/sander-rodenhuis/_index.md new file mode 100644 index 00000000000..07760344d73 --- /dev/null +++ b/docs/contributors/sander-rodenhuis/_index.md @@ -0,0 +1,8 @@ +--- +title: "Sander Rodenhuis" +link: "https://www.linkedin.com/in/srodenhuis/" +email: "srodenhu@akamai.com" +description: "The Linode documentation library's profile page and submission listing for Sander Rodenhuis" +--- + +Sander Rodenhuis is a Principal Architect at Akamai and one of creators of App Platform for LKE (Linode Kubernetes Engine). \ No newline at end of file diff --git a/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/APL-LLM-Enable-Knative.jpg b/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/APL-LLM-Enable-Knative.jpg index f3d14999647..086a6861cdd 100644 Binary files a/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/APL-LLM-Enable-Knative.jpg and b/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/APL-LLM-Enable-Knative.jpg differ diff --git a/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/APL-LLM-Llama3.jpg b/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/APL-LLM-Llama3.jpg index 519b16e087c..8ae01f01273 100644 Binary files a/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/APL-LLM-Llama3.jpg and b/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/APL-LLM-Llama3.jpg differ diff --git a/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/Diagram_APL_LLM_guide.jpg b/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/Diagram_APL_LLM_guide.jpg new file mode 100644 index 00000000000..fe26b0b65d4 Binary files /dev/null and b/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/Diagram_APL_LLM_guide.jpg differ diff --git a/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/index.md b/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/index.md index c6aa25b5ee8..2ccb4e9caed 100644 --- a/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/index.md +++ b/docs/guides/kubernetes/deploy-llm-for-ai-inferencing-on-apl/index.md @@ -1,41 +1,41 @@ --- slug: deploy-llm-for-ai-inferencing-on-apl -title: "Deploy an LLM for AI Inferencing with App Platform for LKE" -description: "This guide includes steps and guidance for deploying a large language model for AI inferencing using App Platform for Linode Kubernetes Engine." -authors: ["Akamai"] -contributors: ["Akamai"] +title: "AI Inference with App Platform for LKE" +description: "This guide provides steps and guidance for deploying LLMs (large language models) for AI inference using App Platform for LKE." +authors: ["Sander Rodenhuis"] +contributors: ["Sander Rodenhuis"] published: 2025-03-25 -modified: 2025-06-26 -keywords: ['ai','ai inference','ai inferencing','llm','large language model','app platform','lke','linode kubernetes engine','llama 3','kserve','istio','knative'] +modified: 2025-12-09 +keywords: ['ai','ai inference','llm','large language model','app platform','lke','linode kubernetes engine','llama 3','kserve','istio','knative'] license: '[CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0)' external_resources: - '[Akamai App Platform for LKE](https://techdocs.akamai.com/cloud-computing/docs/application-platform)' - '[Akamai App Platform Documentation](https://techdocs.akamai.com/app-platform/docs/welcome)' --- -LLMs (large language models) are deep-learning models that are pre-trained on vast amounts of information. AI inferencing is the method by which an AI model (such as an LLM) is trained to "infer", and subsequently deliver accurate information. The LLM used in this deployment, Meta AI's [Llama 3](https://www.llama.com/docs/overview/), is an open-source, pre-trained LLM often used for tasks like responding to questions in multiple languages, coding, and advanced reasoning. +LLMs (large language models) generate human-like text and perform language-based tasks by being trained on massive datasets. AI inference is the process to make predictions or decisions on new data outside of the training datasets. The LLM used in this deployment, Meta AI's [Llama 3](https://www.llama.com/docs/overview/), is an open-source, pre-trained LLM often used for tasks like responding to questions in multiple languages, coding, and advanced reasoning. -[KServe](https://kserve.github.io/website/latest/) is a standard Model Inference Platform for Kubernetes, built for highly-scalable use cases. KServe comes with multiple Model Serving Runtimes, including the [Hugging Face](https://huggingface.co/welcome) serving runtime. The Hugging Face runtime supports the following machine learning (ML) tasks: text generation, Text2Text generation, token classification, sequence and text classification, and fill mask. +[KServe](https://kserve.github.io/website/latest/) is a Model Inference Framework for Kubernetes, built for highly-scalable use cases. KServe comes with multiple Model Serving Runtimes, including the [Hugging Face](https://huggingface.co/welcome) serving runtime. The Hugging Face runtime supports the following machine learning (ML) tasks: text generation, Text2Text generation, token classification, sequence and text classification, and fill mask. KServe is integrated in App Platform for LKE, Akamai's pre-built Kubernetes developer platform. -Akamai App Platform for LKE comes with a set of preconfigured and integrated open source Kubernetes applications like [Istio](https://istio.io/latest/docs/overview/what-is-istio/) and [Knative](https://knative.dev/docs/concepts/), both of which are prerequisites for using KServe. App Platform automates the provisioning process of these applications. +App Platform also integrates [Istio](https://istio.io/latest/docs/overview/what-is-istio/) and [Knative](https://knative.dev/docs/concepts/), both of which are prerequisites for using KServe. App Platform automates the provisioning process of these applications. -This guide describes the steps required to: install KServe with Akamai App Platform for LKE, deploy Meta AI's Llama 3 model using the Hugging Face service runtime, and deploy a chatbot using Open WebUI. Once functional, use our [Deploy a RAG Pipeline and Chatbot with App Platform for LKE](/docs/guides/deploy-rag-pipeline-and-chatbot-on-apl/) guide to configure an additional LLM trained on a custom data set. +This guide describes the steps required to: install KServe with App Platform, deploy the Meta Llama 3.1 8B model using the Hugging Face runtime server, and deploy a chatbot interface using Open WebUI. Once functional, use our [Deploy a RAG Pipeline and Chatbot with App Platform for LKE](/docs/guides/deploy-rag-pipeline-and-chatbot-on-apl/) guide to add and run a RAG pipeline and deploy an AI Agent that exposes an OpenAI compatible API. -If you prefer to manually install an LLM and RAG Pipeline on LKE rather than using Akamai App Platform, see our [Deploy a Chatbot and RAG Pipeline for AI Inferencing on LKE](/docs/guides/ai-chatbot-and-rag-pipeline-for-inference-on-lke/) guide. +If you prefer to manually install an LLM and RAG Pipeline on LKE rather than using Akamai App Platform, see our [Deploy a Chatbot and RAG Pipeline for AI Inference on LKE](/docs/guides/ai-chatbot-and-rag-pipeline-for-inference-on-lke/) guide. ## Diagram -![Diagram Test](Diagram_APL_LLM_guide.svg) +![Diagram Test](Diagram_APL_LLM_guide.jpg) ## Components ### Infrastructure -- **Linode GPUs (NVIDIA RTX 4000)**: Akamai has several GPU virtual machines available, including NVIDIA RTX 4000 (used in this tutorial) and Quadro RTX 6000. NVIDIA’s Ada Lovelace architecture in the RTX 4000 VMs are adept at many AI tasks, including [inferencing](https://www.nvidia.com/en-us/solutions/ai/inference/) and [image generation](https://blogs.nvidia.com/blog/ai-decoded-flux-one/). +- **Linode GPUs (NVIDIA RTX 4000)**: Akamai has several GPU virtual machines available, including NVIDIA RTX 4000 (used in this tutorial) and Quadro RTX 6000. NVIDIA’s Ada Lovelace architecture in the RTX 4000 VMs are adept at many AI tasks, including [inference](https://www.nvidia.com/en-us/solutions/ai/inference/) and [image generation](https://blogs.nvidia.com/blog/ai-decoded-flux-one/). - **Linode Kubernetes Engine (LKE)**: LKE is Akamai’s managed Kubernetes service, enabling you to deploy containerized applications without needing to build out and maintain your own Kubernetes cluster. -- **App Platform for LKE**: A Kubernetes-based platform that combines developer and operations-centric tools, automation, self-service, and management of containerized application workloads. App Platform for LKE streamlines the application lifecycle from development to delivery and connects numerous CNCF (Cloud Native Computing Foundation) technologies in a single environment, allowing you to construct a bespoke Kubernetes architecture. +- **App Platform for LKE**: A Kubernetes-based platform that combines developer and operations-centric tools, automation, self-service, and management of containerized application workloads. App Platform streamlines the application lifecycle from development to delivery and connects numerous CNCF (Cloud Native Computing Foundation) technologies in a single environment, allowing you to construct a bespoke Kubernetes architecture. ### Software @@ -43,7 +43,7 @@ If you prefer to manually install an LLM and RAG Pipeline on LKE rather than usi - **Hugging Face**: A data science platform and open-source library of data sets and pre-trained AI models. A Hugging Face account and access key is required to access the Llama 3 large language model (LLM) used in this deployment. -- **Meta AI's Llama 3**: The [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model is used as the LLM in this guide. You must review and agree to the licensing agreement before deploying. +- **meta-llama/Llama-3.1-8B-Instruct LLM**: The [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model is used as the foundational LLM in this guide. You must review and agree to the licensing agreement before deploying. - **KServe**: Serves machine learning models. This tutorial installs the Llama 3 LLM to KServe, which then serves it to other applications, such as the chatbot UI. @@ -51,8 +51,6 @@ If you prefer to manually install an LLM and RAG Pipeline on LKE rather than usi - **Knative**: Used for deploying and managing serverless workloads on the Kubernetes platform. -- **Kyverno**: A comprehensive toolset used for managing the Policy-as-Code (PaC) lifecycle for Kubernetes. - ## Prerequisites - A [Cloud Manager](https://cloud.linode.com/) account is required to use Akamai's cloud computing services, including LKE. @@ -67,28 +65,32 @@ If you prefer to manually install an LLM and RAG Pipeline on LKE rather than usi We recommend provisioning an LKE cluster with [App Platform](https://techdocs.akamai.com/cloud-computing/docs/application-platform) enabled and the following minimum requirements: -- 3 **8GB Dedicated CPUs** with [autoscaling](https://techdocs.akamai.com/cloud-computing/docs/manage-nodes-and-node-pools#autoscale-automatically-resize-node-pools) turned on -- A second node pool consisting of at least 2 **RTX4000 Ada x1 Medium [GPU](https://techdocs.akamai.com/cloud-computing/docs/gpu-compute-instances)** plans +- 3 **8GB Dedicated CPUs** with [autoscaling](https://techdocs.akamai.com/cloud-computing/docs/manage-nodes-and-node-pools#autoscale-automatically-resize-node-pools) turned on. +- A second node pool consisting of at least 2 **RTX4000 Ada x1 Medium [GPU](https://techdocs.akamai.com/cloud-computing/docs/gpu-compute-instances)** plans. -Once your LKE cluster is provisioned and the App Platform web UI is available, complete the following steps to continue setting up your infrastructure. +Once your LKE cluster is provisioned and the App Platform portal is available, complete the following steps to continue setting up your infrastructure. Sign into the App Platform web UI using the `platform-admin` account, or another account that uses the `platform-admin` role. Instructions for signing into App Platform for the first time can be found in our [Getting Started with Akamai App Platform](https://techdocs.akamai.com/cloud-computing/docs/getting-started-with-akamai-application-platform) guide. -### Enable Knative +### Enable Knative an KServe 1. Select **view** > **platform** in the top bar. 1. Select **Apps** in the left menu. -1. Enable the **Knative** and **Kyverno** apps by hovering over each app icon and clicking the **power on** button. It may take a few minutes for the apps to enable. +1. Enable the **Knative** and **KServe** apps by hovering over each app icon and clicking the **power on** button. It may take a few minutes for the apps to enable. Enabled apps move up and appear in color towards the top of the available app list. - ![Enable Knative and Kyverno](APL-LLM-Enable-Knative.jpg) + ![Enable Knative and KServe](APL-LLM-Enable-Knative.jpg) + +### Create Teams + +[Teams](https://techdocs.akamai.com/app-platform/docs/platform-teams) are isolated tenants on the platform to support development and DevOps teams, projects or even DTAP. A team gets access to the App Platform portal, including access to self-service features and all shared apps available on the platform. -### Create a New Team +For this guide you will need to create 2 teams. One team that offers access to LLMs as a shared service and one team that consumes the LLMs. -[Teams](https://techdocs.akamai.com/app-platform/docs/platform-teams) are isolated tenants on the platform to support Development/DevOps teams, projects or even DTAP. A Team gets access to the Console, including access to self-service features and all shared apps available on the platform. +First, create a team to run the LLMs: 1. Select **view** > **platform**. @@ -96,7 +98,25 @@ Sign into the App Platform web UI using the `platform-admin` account, or another 1. Click **Create Team**. -1. Provide a **Name** for the Team. Keep all other default values, and click **Create Team**. This guide uses the Team name `demo`. +1. Provide a **Name** for the team. This guide uses the team name `models`. + +1. Under **Resource Quota**, change the **Compute Resource Quota** to 50 Cores and 64 Gi + +1. Under **Network Policies**, disable **Egress Control** and **Ingress Control**. + + See Appendix 1 and 2 to learn what to do when **Egress Control** and **Ingress Control** should be enabled because of compliance. + +1. Click **Create Team**. + +Now create a team to run the apps that are to consume the LLMs: + +1. Click **Create Team**. + +1. Provide a **Name** for the team. This guide uses the team name `demo`. + +1. Under **Network Policies**, disable **Egress Control** and **Ingress Control**. + +1. Click **Create Team**. ### Install the NVIDIA GPU Operator @@ -116,136 +136,61 @@ The [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-op helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v24.9.1 ``` -### Add the kserve-crd Helm Chart to the Catalog - -[Helm charts](https://helm.sh/) provide information for defining, installing, and managing resources on a Kubernetes cluster. Custom Helm charts can be added to App Platform Catalog using the **Add Helm Chart** feature. +### Add the open-webui Helm Chart to the catalog 1. Click on **Catalog** in the left menu. 1. Select **Add Helm Chart**. - ![Add Helm Chart](APL-LLM-Add-Helm-Chart.jpg) - -1. Under **Git Repository URL**, add the URL to the `kserve-crd` Helm chart: +1. Under **Git Repository URL**, add the URL to the `open-webui` Helm chart: ```command - https://github.com/kserve/kserve/blob/v0.14.1/charts/kserve-crd/Chart.yaml + https://github.com/open-webui/helm-charts/blob/open-webui-5.20.0/charts/open-webui/Chart.yaml ``` -1. Click **Get Details** to populate the `kserve-crd` Helm chart details. - - {{< note title="Optional: Add a Catalogue Icon" >}} - Use an image URL in the **Icon URL** field to optionally add an icon to your custom Helm chart in the Catalog. - {{< /note >}} +1. Click **Get Details** to populate the `open-webui` Helm chart details. -1. Deselect **Allow teams to use this chart**. +1. Leave the **Allow teams to use this chart** option selected. 1. Click **Add Chart**. -### Create a Workload for the kserve-crd Helm Chart - -A [Workload](https://techdocs.akamai.com/app-platform/docs/team-workloads) is a self-service feature for creating Kubernetes resources using Helm charts from the Catalog. - -1. Select **view** > **team** and **team** > **admin** in the top bar. - -1. Select **Workloads**. - -1. Click on **Create Workload**. - -1. Select the _Kserve-Crd_ Helm chart from the Catalog. - -1. Click on **Values**. - -1. Provide a name for the Workload. This guide uses the Workload name `kserve-crd`. - -1. Add `kserve` as the namespace. - -1. Select **Create a new namespace**. - -1. Continue with the rest of the default values, and click **Submit**. - -After the Workload is submitted, App Platform creates an Argo CD application to install the `kserve-crd` Helm chart. Wait for the **Status** of the Workload to become ready, and click on the ArgoCD **Application** link. You should be brought to the Argo CD screen in a separate window: - -![Argo CD](APL-LLM-ArgoCDScreen.jpg) - -Confirm the **App Health** is marked "Healthy", and return to the App Platform UI. - -### Add the kserve-resources Helm Chart to the Catalog +### Add the hf-meta-llama-3-1-8b-instruct Helm Chart to the catalog 1. Click on **Catalog** in the left menu. 1. Select **Add Helm Chart**. -1. Under **Git Repository URL**, add the URL to the `kserve-resources` Helm chart: +1. Under **Git Repository URL**, add the URL to the `hf-meta-llama-3-1-8b-instruct` Helm chart: ```command - https://github.com/kserve/kserve/blob/v0.14.1/charts/kserve-resources/Chart.yaml + https://github.com/linode/apl-examples/blob/main/inference/kserve/hf-meta-llama-3-1-8b-instruct/Chart.yaml ``` -1. Click **Get Details** to populate the `kserve-resources` Helm chart details. - -1. Note the name of the Helm chart populates as `Kserve` rather than `Kserve-Resources`. Edit **Target Directory Name** to read `Kserve-Resources` so that it can be identified later. +1. Click **Get Details** to populate the Helm chart details. -1. Deselect **Allow teams to use this chart**. +1. Uncheck the **Allow teams to use this chart** option. In the next step, configure the RBAC of the catalog to make this Helm chart available to the `models` team. 1. Click **Add Chart**. -### Create a Workload for the kserve-resources Helm Chart - -1. Select **view** > **team** and **team** > **admin** in the top bar. - -1. Select **Workloads**. - -1. Click on **Create Workload**. - -1. Select the _Kserve-Resources_ Helm chart from the Catalog. +Now, configure the RBAC of the catalog: -1. Click on **Values**. - -1. Provide a name for the Workload. This guide uses the Workload name `kserve-resources`. - -1. Add `kserve` as the namespace. +1. Select **view** > **platform**. -1. Select **Create a new namespace**. +1. Select **App** in the left menu. -1. Continue with the default values, and click **Submit**. The Workload may take a few minutes to become ready. +1. Click on the `Gitea` app. -### Add the open-webui Helm Chart to the Catalog +1. In the list of repositories, click on `otomi/charts`. -1. Click on **Catalog** in the left menu. +1. At the bottom, click on the file `rbac.yaml`. -1. Select **Add Helm Chart**. +1. Change the RBAC for the `hf-meta-llama-3.1-8b-instruct` Helm chart as shown below: -1. Under **Git Repository URL**, add the URL to the `open-webui` Helm chart: - - ```command - https://github.com/open-webui/helm-charts/blob/open-webui-5.20.0/charts/open-webui/Chart.yaml ``` - -1. Click **Get Details** to populate the `open-webui` Helm chart details. - -1. Leave the **Allow teams to use this chart** option selected. - -1. Click **Add Chart**. - -### Add the inferencing-service Helm Chart to the Catalog - -1. Click on **Catalog** in the left menu. - -1. Select **Add Helm Chart**. - -1. Under **Git Repository URL**, add the URL to the `inferencing-service` Helm chart: - - ```command - https://github.com/linode/apl-examples/blob/main/inferencing-service/Chart.yaml + hf-meta-llama-3.1-8b-instruct: + - team-models ``` -1. Click **Get Details** to populate the `inferencing-service` Helm chart details. - -1. Leave the **Allow teams to use this chart** option selected. - -1. Click **Add Chart**. - ### Create a Hugging Face Access Token 1. Navigate to the Hugging Face [Access Tokens page](https://huggingface.co/settings/tokens). @@ -256,7 +201,7 @@ Confirm the **App Health** is marked "Healthy", and return to the App Platform U 1. Enter a name for your token, and click **Create token**. -1. Save your Access Token information. +1. Save your access token information. See the Hugging Face user documentation on [User access tokens](https://huggingface.co/docs/hub/en/security-tokens) for additional information. @@ -264,13 +209,13 @@ See the Hugging Face user documentation on [User access tokens](https://huggingf If you haven't done it already, request access to the Llama 3 LLM model. To do this, go to Hugging Face's [Llama 3-8B Instruct LLM link](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), read and agree the license agreement, and submit your information. You must wait for access to be granted in order to proceed. -## Deploy and Expose the Model +## Deploy the Llama Model ### Create a Sealed Secret -[Sealed Secrets](https://techdocs.akamai.com/app-platform/docs/team-secrets) are encrypted Kubernetes Secrets stored in the Values Git repository. When a Sealed Secret is created in the Console, the Kubernetes Secret will appear in the Team's namespace. +[Sealed Secrets](https://techdocs.akamai.com/app-platform/docs/team-secrets) are encrypted Kubernetes secrets stored in the Values repository. When a sealed secret is created in the console, the Kubernetes secret appears in the team's namespace. -1. Select **view** > **team** and **team** > **demo** in the top bar. +1. Select **view** > **team** and **team** > **models** in the top bar. 1. Select **Sealed Secrets** from the menu. @@ -282,123 +227,137 @@ If you haven't done it already, request access to the Llama 3 LLM model. To do t 1. Add **Key**: `HF_TOKEN`. -1. Add your Hugging Face Access Token in the **Value** field: {{< placeholder "HUGGING_FACE_TOKEN" >}} +1. Add your Hugging Face access token in the **Value** field: {{< placeholder "HUGGING_FACE_TOKEN" >}} -1. Click **Submit**. The Sealed Secret may take a few minutes to become ready. +1. Click **Submit**. The sealed secret may take a few minutes to become ready. ### Create a Workload to Deploy the Model -1. Select **view** > **team** and **team** > **demo** in the top bar. +1. Select **view** > **team** and **team** > **models** in the top bar. 1. Select **Catalog** from the menu. -1. Select the _Kserve-Ai-Inferencing-Service_ chart. +1. Select the _hf-meta-llama-3-1-8b-instruct_ chart. 1. Click on **Values**. -1. Provide a name for the Workload. This guide uses the Workload name `llama3-model`. +1. Provide a name for the workload. This guide uses the workload name `llama-3-1-8b`. -1. Set the following values to disable sidecar injection, define your Hugging Face token, and specify resource limits: - - ``` - labels: - sidecar.istio.io/inject: "{{< placeholder "false" >}}" - env: - - name: {{< placeholder "HF_TOKEN" >}} - valueFrom: - secretKeyRef: - name: {{< placeholder "hf-secret" >}} - key: {{< placeholder "HF_TOKEN" >}} - optional: "{{< placeholder "false" >}}" - args: - - --model_name=llama3 - - --model_id=meta-llama/meta-llama-3-8b-instruct - resources: - limits: - cpu: "{{< placeholder "12" >}}" - memory: {{< placeholder "24Gi" >}} - nvidia.com/gpu: "{{< placeholder "1" >}}" - requests: - cpu: "{{< placeholder "6" >}}" - memory: {{< placeholder "12Gi" >}} - nvidia.com/gpu: "{{< placeholder "1" >}}" - ``` - -1. Click **Submit**. +1. Use the default values and click **Submit**. #### Check the Status of Your Workload -1. It may take a few minutes for the _Kserve-Ai-Inferencing-Service_ Workload to become ready. To check the status of the Workload build, open a shell session by selecting **Shell** in the left menu, and use the following command to check the status of the pods with `kubectl`: +1. It may take a few minutes for the _llama-3-1-8b_ workload to become ready. To check the status of the workload build, open a shell session by selecting **Shell** in the left menu, and use the following command to check the status of the pods with `kubectl`: ```command - kubectl get pods + kubectl get pods -n team-models ``` ```output NAME READY STATUS RESTARTS AGE - llama3-model-predictor-00001-deployment-86f5fc5d5d-7299c 0/2 Pending 0 4m22s - tekton-dashboard-5f57787b8c-gswc2 2/2 Running 0 19h + llama-3-1-8b-predictor-00001-deployment-68d58ccfb4-jg6rw 0/3 Pending 0 22s + tekton-dashboard-5f57787b8c-gswc2 2/2 Running 0 1h ``` -1. To gather more information about a pod in a `Pending` state, run the `kubectl describe pod` command below, replacing {{< placeholder "POD_NAME" >}} with the name of your pod. In the output above, `llama3-model-predictor-00001-deployment-86f5fc5d5d-7299c` is the name of the pending pod: - - ```command - kubectl describe pod {{< placeholder "POD_NAME" >}} - ``` +Wait for the workload to be ready before proceeding. - Scroll to the bottom of the output and look for `Events`. If there is an event with Reason `FailedScheduling`, the `resources.request` values in your _Kserve-Ai-Inferencing-Service_ Workload may need to be adjusted. +## Deploy and Expose the AI Interface - ```output - Events: - Type Reason Age From Message - ---- ------ ---- ---- ------- - Warning FailedScheduling 12s default-scheduler 0/3 nodes are available: 3 Insufficient cpu. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod. - ``` +### Create a Workload to Deploy the AI Interface - Based on the output above, the `Insufficient cpu` warning denotes the CPU `resources.request` is set too high. +1. Select **view** > **team** and **team** > **demo** in the top bar. -1. If this is the case, edit the `resources.request` values for your _Kserve-Ai-Inferencing-Service_ Workload: +1. Select **Catalog** from the menu. - 1. Navigate to **Workloads**. +1. Select the _open-webui_ chart. - 1. Select your `llama3-model` Workload. +1. Click on **Values**. - 1. Click the **Values** tab. +1. Provide a name for the workload. This guide uses the workload name `llama3-ui`. - 1. Adjust the necessary `resources.request` value. In the example above, the number of CPUs should be lowered. +1. Add the following values and change the `nameOverride` value to the name of your workload, `llama3-ui`: - 1. Click **Submit** when you have finished adjusting your resources values. + ```file + # Change the nameOverride to match the name of the Workload + nameOverride: {{< placeholder "llama3-ui" >}} + ollama: + enabled: "false" + pipelines: + enabled: "false" + replicaCount: "1" + persistence: + enabled: "false" + openaiBaseApiUrl: http://llama-3-1-8b.team-models.svc.cluster.local/openai/v1 + extraEnvVars: + - name: "WEBUI_AUTH" + value: "false" + ``` -Wait for the Workload to be ready again, and proceed to the following steps for [exposing the model](#expose-the-model). +1. Click **Submit**. -### Expose the Model +### Expose the AI Interface 1. Select **Services** from the menu. 1. Click **Create Service**. -1. In the **Service Name** dropdown list, select the `llama3-model-predictor` service. +1. In the **Service Name** dropdown menu, select the `llama3-ui` service. 1. Click **Create Service**. -Once the Service is ready, copy the URL for the `llama3-model-predictor` service, and add it to your clipboard. +## Access the Open Web User Interface -## Deploy and Expose the AI Interface +Once the AI user interface is ready, you should be able to access the web UI for the Open WebUI chatbot. + +1. Click on **Services** in the menu. + +1. In the list of available services, click on the URL for the `llama3-ui` service. This should bring you to the chatbot user interface. + + ![Llama 3 LLM](APL-LLM-Llama3.jpg) + +## Next Steps + +See our [Deploy a RAG Pipeline and Chatbot with App Platform for LKE](/docs/guides/deploy-rag-pipeline-and-chatbot-on-apl) guide to expand on the architecture built in this guide. This tutorial deploys a RAG (Retrieval-Augmented Generation) pipeline that indexes a custom data set and attaches relevant data as context when users send the LLM queries. + +## appendix 1: Ingress control + +When we created the teams **demo** and **models**, we turned off the **Ingress Control**. Ingress Control controls internal access to pods. When Ingress Control is enabled, pods in the team namespace are not accessible to other pods (in the same team namespace or in other team namespaces). For the simplicity of this guide, Ingress Control was turned off. If you don't want to disable Ingress Control for all the workloads in a team, then you can turn Ingress Control on and create **Inbound Rules** in the team's network policies. Follow these steps to create inbound policies to control access to the models hosted in the team `models`: + +1. Select **view** > **team** and **team** > **models** in the top bar. + +1. Select **Network Policies** in the left menu. + +1. Click **Create Inbound Rule*** + +1. Add a name for the rule (like model-access) + +1. Under **Sources**, select the workload (in this case the `llama3-ui` workload) and select a pod label. -The publicly-exposed LLM in this guide uses a wide range of ports, and as a result, all pods in a Team are automatically injected with an Istio sidecar. Sidecar injection is a means of adding additional containers and their configurations to a pod template. +1. Under **Target**, select the workload (in this case the `llama-3-1-8b` workload) and select a pod label. -The Istio sidecar in this case prevents the `open-webui` pod from connecting to the `llama3-model` service, because all egress traffic for pods in the Team namespace are blocked by an Istio ServiceEntry by default. This means that prior to deploying the AI interface using the `open-webui` Helm chart, the `open-webui` pod must be prevented from getting the Istio sidecar. +1. Click **Create Inbound Rule** -Since the `open-webui` Helm chart does not allow for the addition of extra labels, there are two workarounds: +Note that in some cases, the **Target** pod needs to be restarted if it already had accepted connections before the inbound rule was created. -1. Adjust the `open-webui` Helm chart in the chart's Git repository. This is the Git repository where the `open-webui` Helm chart was been stored when it was added to the Catalog. -2. Add a Kyverno **Policy** that mutates the `open-webui` pod so that it will have the `sidecar.istio.io/inject: "false"` label. +## appendix 2: Egress control -Follow the steps below to follow the second option and add the Kyverno security policy. +When we created the teams **demo** and **models**, we turned off the **Egress Control**. Egress Control is implemented using Istio Service Entries and Istio sidecar injection is enabled by default. Egress Control controls pod access to public URLs. Because the Hugging Face models need to be downloaded from an external repository and open-webui installs multiple binaries from external sources, both the LLM pod and open-webui need to have access to multiple public URLs. For the simplicity of this guide we turned the Egress Control off. If you don't want to disable Egress Control for all the workloads in a team, then you can turn Egress Control on and create **Outbound Rules** in the team's network policies or turn of the sidecar injection for a specific workloads (pods). There are several ways to do this: + +- Add the label `sidecar.istio.io/inject: "false"` to the workload using the Chart Values + +- Enable Kyverno and create a Kyverno **Policy** that mutates the a pod so that it has the `sidecar.istio.io/inject: "false"` label. + +The `open-webui` Helm chart used in this guide does not support adding additional labels to pods. The following instructions and example show how to use Kyverno to mutate the open-webui pods and add the `sidecar.istio.io/inject: "false"` label. + +1. Select **view** > **platform** in the top bar. + +1. Select **Apps** in the left menu. + +1. In the **Apps** section, enable the **Kyverno** app. 1. In the **Apps** section, select the **Gitea** app. -1. Navigate to the `team-demo-argocd` repository. +1. In Gitea, navigate to the `team-demo-argocd` repository. 1. Click the **Add File** dropdown, and select **New File**. Create a file named `open-webui-policy.yaml` with the following contents: @@ -431,70 +390,12 @@ Follow the steps below to follow the second option and add the Kyverno security sidecar.istio.io/inject: "false" ``` -1. Optionally add a title and any notes to the change history, and click **Commit Changes**. +1. Optionally add a title and any notes to the change history. Then, click **Commit Changes**. ![Add Open WebUI Policy](APL-LLM-Add-OpenWebUIPolicy.jpg) 1. Check to see if the policy has been created in Argo CD: - 1. Go to **Apps**, and open the _Argocd_ application. - - 1. Using the search feature, go to the `team-demo` application to see if the policy has been created. If it isn't there yet, view the `team-demo` application in the list of **Applications**, and click **Refresh** as needed. - -### Create a Workload to Deploy the AI Interface - -1. Select **view** > **team** and **team** > **demo** in the top bar. - -1. Select **Catalog** from the menu. - -1. Select the _Open-Webui_ chart. - -1. Click on **Values**. - -1. Provide a name for the Workload. This guide uses the Workload name `llama3-ui`. - -1. Add the following values, and change the `openaiBaseApiUrl` to the host and domain name you added to your clipboard when [exposing the model](#expose-the-model) (the URL for the `llama3-model-predictor` service). Make sure to append `/openai/v1` to your URL as shown below. - - Remember to change the `nameOverride` value to the name of your Workload, `llama3-ui`: - - ``` - # Change the nameOverride to match the name of the Workload - nameOverride: {{< placeholder "llama3-ui" >}} - ollama: - enabled: {{< placeholder "false" >}} - pipelines: - enabled: {{< placeholder "false" >}} - replicaCount: {{< placeholder "1" >}} - persistence: - enabled: {{< placeholder "false" >}} - openaiBaseApiUrl: {{< placeholder "https://llama3-model--predictor-team-demo./openai/v1" >}} - extraEnvVars: - - name: {{< placeholder "WEBUI_AUTH" >}} - value: "{{< placeholder "false" >}}" - ``` - -1. Click **Submit**. - -### Expose the AI Interface - -1. Select **Services** from the menu. - -1. Click **Create Service**. - -1. In the **Service Name** dropdown menu, select the `llama3-ui` service. - -1. Click **Create Service**. - -## Access the Open Web User Interface - -Once the AI user interface is ready, you should be able to access the web UI for the Open WebUI chatbot. - -1. Click on **Services** in the menu. - -1. In the list of available services, click on the URL for the `llama3-ui` service. This should bring you to the chatbot user interface. - - ![Llama 3 LLM](APL-LLM-Llama3.jpg) - -## Next Steps + 1. Go to **Apps** and open the _Argocd_ application. -See our [Deploy a RAG Pipeline and Chatbot with App Platform for LKE](/docs/guides/deploy-rag-pipeline-and-chatbot-on-apl) guide to expand on the architecture built in this guide. This tutorial deploys a RAG (Retrieval-Augmented Generation) pipeline that indexes a custom data set and attaches relevant data as context when users send the LLM queries. \ No newline at end of file + 1. Using the search feature, go to the `team-demo` application to see if the policy has been created. If it isn't there yet, view the `team-demo` application in the list of **Applications**, and click **Refresh** if needed. \ No newline at end of file diff --git a/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/AI_RAG_Diagram.jpg b/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/AI_RAG_Diagram.jpg new file mode 100644 index 00000000000..489992e8a27 Binary files /dev/null and b/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/AI_RAG_Diagram.jpg differ diff --git a/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/APL-RAG-LLMs.jpg b/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/APL-RAG-LLMs.jpg index db7286eecae..634ea516150 100644 Binary files a/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/APL-RAG-LLMs.jpg and b/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/APL-RAG-LLMs.jpg differ diff --git a/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/APL-RAG-docs-run-complete.jpg b/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/APL-RAG-docs-run-complete.jpg index 9bb5261fd65..8354ffc0013 100644 Binary files a/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/APL-RAG-docs-run-complete.jpg and b/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/APL-RAG-docs-run-complete.jpg differ diff --git a/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/index.md b/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/index.md index 55d4eb2b411..9ccc7425452 100644 --- a/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/index.md +++ b/docs/guides/kubernetes/deploy-rag-pipeline-and-chatbot-on-apl/index.md @@ -1,37 +1,33 @@ --- slug: deploy-rag-pipeline-and-chatbot-on-apl -title: "Deploy a RAG Pipeline and Chatbot with App Platform for LKE" -description: "This guide expands on a previously built LLM and AI inferencing architecture to include a RAG pipeline that indexes a custom data set. The steps provided utilize Akamai App Platform for Linode Kubernetes Engine to deploy the RAG pipeline." -authors: ["Akamai"] -contributors: ["Akamai"] +title: "Implement RAG (Retrieval-Augmented Generation) with App Platform for LKE" +description: "This guide expands on a previously AI inference with App Platform for LKE guide. The steps provided utilize Akamai App Platform for LKE to implement Retrieval-Augmented Generation (RAG)." +authors: ["Sander Rodenhuis"] +contributors: ["Sander Rodenhuis"] published: 2025-03-25 -modified: 2025-07-23 -keywords: ['ai','ai inference','ai inferencing','llm','large language model','app platform','lke','linode kubernetes engine','rag pipeline','retrieval augmented generation','open webui','kubeflow'] +modified: 2025-12-09 +keywords: ['ai','ai inference','ai inference','llm','large language model','app platform','lke','linode kubernetes engine','rag pipeline','retrieval augmented generation','open webui','kubeflow pipelines'] license: '[CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0)' external_resources: - '[Akamai App Platform for LKE](https://techdocs.akamai.com/cloud-computing/docs/application-platform)' - '[Akamai App Platform Documentation](https://techdocs.akamai.com/app-platform/docs/welcome)' --- -{{< note title="This guide is being updated" type="warning" >}} -This guide is currently undergoing updates due to ongoing development of App Platform and may not function as expected. An updated version will be available soon that incorporates these changes. Check back for the revised guide before proceeding with this deployment. -{{< /note >}} +This guide extends the LLM (Large Language Model) inference architecture built in our [Deploy an LLM for AI Inference with App Platform for LKE](/docs/guides/deploy-llm-for-ai-inferencing-on-apl) guide by deploying a RAG (Retrieval-Augmented Generation) pipeline that indexes a custom data set. RAG is a particular method of context augmentation that attaches relevant data as context when users send queries to an LLM. -This guide builds on the LLM (Large Language Model) architecture built in our [Deploy an LLM for AI Inferencing with App Platform for LKE](/docs/guides/deploy-llm-for-ai-inferencing-on-apl) guide by deploying a RAG (Retrieval-Augmented Generation) pipeline that indexes a custom data set. RAG is a particular method of context augmentation that attaches relevant data as context when users send queries to an LLM. +Follow the steps in this tutorial to enable Kubeflow Pipelines and deploy a RAG pipeline using App Platform for LKE. The data set you use may vary depending on your use case. For example purposes, this guide uses a sample data set from Akamai Techdocs that includes documentation about all Akamai Cloud services. -Follow the steps in this tutorial to install Kubeflow Pipelines and deploy a RAG pipeline using Akamai App Platform for LKE. The deployment in this guide uses the previously deployed Open WebUI chatbot to respond to queries using a custom data set. The data set you use may vary depending on your use case. For example purposes, this guide uses a sample data set from Linode Docs in Markdown format. - -If you prefer a manual installation rather than one using App Platform for LKE, see our [Deploy a Chatbot and RAG Pipeline for AI Inferencing on LKE](/docs/guides/ai-chatbot-and-rag-pipeline-for-inference-on-lke/) guide. +If you prefer a manual installation rather than one using App Platform for LKE, see our [Deploy a Chatbot and RAG Pipeline for AI Inference on LKE](/docs/guides/ai-chatbot-and-rag-pipeline-for-inference-on-lke/) guide. ## Diagram -![RAG Diagram Test](AI_RAG_Diagram.svg) +![RAG Diagram Test](AI_RAG_Diagram.jpg) ## Components ### Infrastructure -- **Linode GPUs (NVIDIA RTX 4000)**: Akamai has several high-performance GPU virtual machines available, including NVIDIA RTX 4000 (used in this tutorial) and Quadro RTX 6000. NVIDIA’s Ada Lovelace architecture in the RTX 4000 VMs are adept at many AI tasks, including [inferencing](https://www.nvidia.com/en-us/solutions/ai/inference/) and [image generation](https://blogs.nvidia.com/blog/ai-decoded-flux-one/). +- **Linode GPUs (NVIDIA RTX 4000)**: Akamai has several high-performance GPU virtual machines available, including NVIDIA RTX 4000 (used in this tutorial) and Quadro RTX 6000. NVIDIA’s Ada Lovelace architecture in the RTX 4000 VMs are adept at many AI tasks, including [inference](https://www.nvidia.com/en-us/solutions/ai/inference/) and [image generation](https://blogs.nvidia.com/blog/ai-decoded-flux-one/). - **Linode Kubernetes Engine (LKE)**: LKE is Akamai’s managed Kubernetes service, enabling you to deploy containerized applications without needing to build out and maintain your own Kubernetes cluster. @@ -39,19 +35,19 @@ If you prefer a manual installation rather than one using App Platform for LKE, ### Additional Software -- **Open WebUI**: A self-hosted AI chatbot application that’s compatible with LLMs like Llama 3 and includes a built-in inference engine for RAG (Retrieval-Augmented Generation) solutions. Users interact with this interface to query the LLM. +- **Open WebUI Pipelines**: A self-hosted UI-agnostic OpenAI API plugin framework that brings modular, customizable workflows to any UI client supporting OpenAI API specs. -- **Milvus**: Milvus is an open-source vector database and is used for generative AI workloads. This tutorial uses Milvus to store embeddings generated by LlamaIndex and make them available to queries sent to the Llama 3 LLM. +- **PGvector**: Vector similarity search for Postgres. This tutorial uses a Postgres database with a `vector` extension to store embeddings generated by LlamaIndex and make them available to queries sent to the Llama 3.1 8B LLM. -- **Kubeflow**: An open-source software platform designed for Kubernetes that includes a suite of applications used for machine learning tasks. This tutorial installs all default applications and makes specific use of the following: +- **KServe**: Serves machine learning models. The architecture in this guide uses the [Llama 3 LLM](https://huggingface.co/meta-llama/Meta-Llama-3-1-8B) deployed using the Hugging Face runtime server with KServe, which then serves it to other applications, including the chatbot UI. - - **KServe**: Serves machine learning models. The architecture in this guide uses the [Llama 3 LLM](https://huggingface.co/meta-llama/Meta-Llama-3-8B) installed on KServe, which then serves it to other applications, including the chatbot UI. +- **intfloat/e5-mistral-7b-instruct LLM**: The [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) model is used as the embedding LLM in this guide. - - **Kubeflow Pipeline**: Used to deploy pipelines, reusable machine learning workflows built using the Kubeflow Pipelines SDK. In this tutorial, a pipeline is used to run LlamaIndex to process the dataset and store embeddings. +- **Kubeflow Pipelines**: Used to deploy pipelines, reusable machine learning workflows built using the Kubeflow Pipelines SDK. In this tutorial, a pipeline is used to run a pipeline to process the dataset and store embeddings in the PGvector database. ## Prerequisites -- Complete the deployment in the [Deploy an LLM for AI Inferencing with App Platform for LKE](/docs/guides/deploy-llm-for-ai-inferencing-on-apl) guide. Your LKE cluster should include the following minimum hardware requirements: +- Complete the deployment in the [Deploy an LLM for AI Inference with App Platform for LKE](/docs/guides/deploy-llm-for-ai-inferencing-on-apl) guide. Your LKE cluster should include the following minimum hardware requirements: - 3 **8GB Dedicated CPUs** with [autoscaling](https://techdocs.akamai.com/cloud-computing/docs/manage-nodes-and-node-pools#autoscale-automatically-resize-node-pools) turned on @@ -59,248 +55,92 @@ If you prefer a manual installation rather than one using App Platform for LKE, - [Python3](https://www.python.org/downloads/) and the [venv](https://docs.python.org/3/library/venv.html) Python module installed on your local machine -## Set Up Infrastructure - -Once your LLM has been deployed and is accessible, complete the following steps to continue setting up your infrastructure. +- Object Storage configured. Make sure to configure Object Storage as described [here](https://techdocs.akamai.com/app-platform/docs/lke-automatic-install#provision-object-storage-for-the-app-platform) before Kubeflow Pipelines is enabled. -Sign into the App Platform web UI using the `platform-admin` account, or another account that uses the `platform-admin` role. +## Set Up Infrastructure -### Add the milvus Helm Chart to the Catalog +Before continuing, sign into the App Platform web console as `platform-admin` or any other account that uses the `platform-admin` role. -1. Select **view** > **team** and **team** > **admin** in the top bar. +### Add the hf-e5-mistral-7b-instruct Helm Chart to the Catalog 1. Click on **Catalog** in the left menu. 1. Select **Add Helm Chart**. -1. Under **Git Repository URL**, add the URL to the `milvus` Helm chart: +1. Under **Git Repository URL**, add the URL to the `hf-e5-mistral-7b-instruct` Helm chart: ```command - https://github.com/zilliztech/milvus-helm/blob/milvus-4.2.40/charts/milvus/Chart.yaml + https://github.com/linode/apl-examples/blob/main/inference/kserve/hf-e5-mistral-7b-instruct/Chart.yaml ``` 1. Click **Get Details** to populate the Helm chart details. -1. Deselect **Allow teams to use this chart**. +1. Uncheck the **Allow teams to use this chart** option. In the next step, you'll configure the RBAC of the catalog to make this Helm chart available for the team `models` to use. 1. Click **Add Chart**. -### Create an Object Storage Bucket and Access Key for Milvus - -1. In Cloud Manager, navigate to **Object Storage**. - -1. Click **Create Bucket**. - -1. Enter a name for your bucket, and select a **Region** close to, or the same as, your App Platform LKE cluster. - -1. While on the **Object Storage** page, select the **Access Keys** tab, and then click **Create Access Key**. +Now configure the RBAC of the catalog: -1. Enter a name for your access key, select the same **Region** as your Milvus bucket, and make sure your access key has "Read/Write" access enabled for your bucket. +1. Select **view** > **platform**. -1. Save your access key information. +1. Select **App** in the left menu. -### Create a Workload for the Milvus Helm Chart - -1. Select **view** > **team** and **team** > **admin** in the top bar. - -1. Select **Workloads**. - -1. Click on **Create Workload**. - -1. Select the _Milvus_ Helm chart from the Catalog. - -1. Click on **Values**. +1. Click on the **Gitea** app. -1. Provide a name for the Workload. This guide uses the Workload name `milvus`. +1. In the list of repositories, click on `otomi/charts`. -1. Add `milvus` as the namespace. +1. At the bottom, click on the file `rbac.yaml`. -1. Select **Create a new namespace**. - -1. Set the following values. Make sure to replace `externalS3` values with those of your Milvus bucket and access key. You may also need to add lines for the resources requests and limits under `standalone`: - - {{< note title="Tip: Use Command + F" >}} - While navigating the **Values** configuration window, use the cmd + F keyboard search feature to locate each value. - {{< /note >}} +1. Change the RBAC for the `hf-e5-mistral-7b-instruct` Helm chart as shown below: ``` - cluster: - enabled: {{< placeholder "false" >}} - pulsarv3: - enabled: {{< placeholder "false" >}} - minio: - enabled: {{< placeholder "false" >}} - externalS3: - enabled: {{< placeholder "true" >}} - host: {{< placeholder ".linodeobjects.com" >}} - port: "{{< placeholder "443" >}}" - accessKey: {{< placeholder "" >}} - secretKey: {{< placeholder "" >}} - useSSL: {{< placeholder "true" >}} - bucketName: {{< placeholder "" >}} - cloudProvider: aws - region: {{< placeholder "" >}} - standalone: - resources: - requests: - nvidia.com/gpu: "{{< placeholder "1" >}}" - limits: - nvidia.com/gpu: "{{< placeholder "1" >}}" + hf-e5-mistral-7b-instruct: + - team-models ``` - {{< note type="warning" title="Unencrypted Secret Keys" >}} - The Milvus Helm chart does not support the use of a secretKeyRef. Using unencrypted Secret Keys in chart values is not considered a Kubernetes security best-practice. - {{< /note >}} - -1. Click **Submit**. - -### Create an Object Storage Bucket and Access Key for kubeflow-pipelines - -1. In Cloud Manager, navigate to **Object Storage**. - -1. Click **Create Bucket**. - -1. Enter a name for your bucket, and select a **Region** close to, or the same as, your App Platform LKE cluster. - -1. While on the **Object Storage** page, select the **Access Keys** tab, and then click **Create Access Key**. - -1. Enter a name for your access key, select a **Region** as your Kubeflow-Pipelines bucket, and make sure your access key has "Read/Write" access enabled for your bucket. - -1. Save your access key information. - -### Make Sealed Secrets - -#### Create a Sealed Secret for mlpipeline-minio-artifact - -Make a Sealed Secret named `mlpipeline-minio-artifact` granting access to your `kubeflow-pipelines` bucket. - -1. Select **view** > **team** and **team** > **demo** in the top bar. - -1. Select **Sealed Secrets** from the menu, and click **Create SealedSecret**. +### Create a Workload to Deploy the Model -1. Add a name for your SealedSecret, `mlpipeline-minio-artifact`. +1. Select **view** > **team** and **team** > **models** in the top bar. -1. Select type _[kubernetes.io/opaque](kubernetes.io/opaque)_ from the **type** dropdown menu. - -1. Add the **Key** and **Value** details below. Replace {{< placeholder "YOUR_ACCESS_KEY" >}} and {{< placeholder "YOUR_SECRET_KEY" >}} with your `kubeflow-pipelines` access key information. - - To add a second key for your secret key, click the **Add Item** button after entering your access key information: - - - Type: `kubernetes.io/opaque` - - Key=`accesskey`, Value={{< placeholder "YOUR_ACCESS_KEY" >}} - - Key=`secretkey`, Value={{< placeholder "YOUR_SECRET_KEY" >}} - -1. Click **Submit**. - -#### Create a Sealed Secret for mysql-credentials - -Make another Sealed Secret named `mysql-credentials` to establish root user credentials. Make a strong root password, and save it somewhere secure. - -1. Select **view** > **team** and **team** > **demo** in the top bar. - -1. Select **Sealed Secrets** from the menu, and click **Create SealedSecret**. - -1. Add a name for your SealedSecret, `mysql-credentials`. - -1. Select type _[kubernetes.io/opaque](kubernetes.io/opaque)_ from the **type** dropdown menu. - -1. Add the **Key** and **Value** details, replacing {{< placeholder "YOUR_ROOT_PASSWORD" >}} with a strong root password you've created and saved: - - - Type: `kubernetes.io/opaque` - - Key=`username`, Value=`root` - - Key=`password`, Value={{< placeholder "YOUR_ROOT_PASSWORD" >}} - -1. Click **Submit**. - -### Create a Network Policy - -Create a [**Network Policy**](https://techdocs.akamai.com/app-platform/docs/team-network-policies) in the Team where the `kubeflow-pipelines` Helm chart will be installed (Team name **demo** in this guide). This allows communication between all Kubeflow Pipelines Pods. - -1. Select **view** > **team** and **team** > **demo** in the top bar. - -1. Select **Network Policies** from the menu. - -1. Click **Create Netpol**. - -1. Add a name for the Network Policy. - -1. Select **Rule type** `ingress` using the following values, where `kfp` is the name of the Workload created in the next step: - - - **Selector label name**: [`app.kubernetes.io/instance`](http://app.kubernetes.io/instance) - - - **Selector label value**: `kfp` - -1. Click **Submit**. - -### Create a Workload and Install the kfp-cluster-resources Helm Chart - -1. Select **view** > **team** and **team** > **admin** in the top bar. - -1. Select **Workloads**. - -1. Click on **Create Workload**. +1. Select **Catalog** from the menu. -1. Select the _Kfp-Cluster-Resources_ Helm chart from the Catalog. +1. Select the _hf-e5-mistral-7b-instruct_ chart. 1. Click on **Values**. -1. Provide a name for the Workload. This guide uses the Workload name `kfp-cluster-resources`. +1. Provide a name for the workload. This guide uses the workload name `mistral-7b`. -1. Add `kubeflow` as the namespace. +1. Use the default values and click **Submit**. -1. Select **Create a new namespace**. +### Create a Workload to deploy a PGvector cluster -1. Continue with the default values, and click **Submit**. The Workload may take a few minutes to become ready. - -### Create a Workload for the kubeflow-pipelines Helm Chart - -1. Select **view** > **team** and **team** > **admin** in the top bar. - -1. Select **Workloads**. +1. Select **view** > **team** and **team** > **demo** in the top bar. -1. Click on **Create Workload**. +1. Select **Catalog** from the menu. -1. Select the _Kubeflow-Pipelines_ Helm chart from the Catalog. +1. Select the _pgvector-cluster_ chart. 1. Click on **Values**. -1. Provide a name for the Workload. This guide uses the Workload name `kfp`. - -1. Add `team-demo` as the namespace. - -1. Select **Create a new namespace**. +1. Provide a name for the workload. This guide uses the workload name `pgvector`. -1. Set the following values. Replace {{< placeholder "" >}} and {{< placeholder "" >}} with those of your `kubeflow-pipelines` bucket: +1. Use the default values and click **Submit**. - ``` - objectStorage: - region: {{< placeholder "" >}} - bucket: {{< placeholder "" >}} - mysql: - secret: mysql-credentials - ``` +Note that the `pgvector-cluster` chart also creates a database in the PGvector cluster with the name `app`. -1. Click **Submit**. It may take a few minutes for the Workload to be ready. +## Set Up Kubeflow Pipelines -### Expose the Kubeflow Pipelines UI +### Enable Kubeflow Pipelines -1. Select **view** > **team** and **team** > **demo** in the top bar. - -1. Select **Services**. +1. Select **view** > **platform** in the top bar. -1. Click **Create Service**. - -1. In the **Service Name** dropdown menu, select the `ml-pipeline-ui` service. - -1. Click **Create Service**. +1. Select **Apps** in the left menu. -Kubeflow Pipelines is now ready to be used by members of the Team **demo**. - -## Set Up Kubeflow Pipeline to Ingest Data +1. Enable the **Kubeflow Pipelines** app by hovering over the app icon and clicking the **power on** button. It may take a few minutes for the apps to enable. ### Generate the Pipeline YAML File -The steps below create and use a Python script to create a Kubeflow pipeline file. This YAML file describes each step of the pipeline workflow. +Follow the steps below to create a Kubeflow pipeline file. This YAML file describes each step of the pipeline workflow. 1. On your local machine, create a virtual environment for Python: @@ -317,18 +157,17 @@ The steps below create and use a Python script to create a Kubeflow pipeline fil 1. Create a file named `doc-ingest-pipeline.py` with the following contents. - Replace {{< placeholder "" >}} with the domain of your App Platform instance. The {{< placeholder "" >}} is contained in the console URL in your browser, where `console.lke123456.akamai-apl.net` is the URL and `lke123456.akamai-apl.net` is the {{< placeholder "" >}}. - - This script configures the pipeline that downloads the Markdown data set to be ingested, reads the content using LlamaIndex, generates embeddings of the content, and stores the embeddings in the milvus database: + This script configures the pipeline that downloads the Markdown data set to be ingested, reads the content using LlamaIndex, generates embeddings of the content, and stores the embeddings in the PGvector database. ```file from kfp import dsl @dsl.component( - base_image='nvcr.io/nvidia/ai-workbench/python-cuda117:1.0.3', - packages_to_install=['pymilvus>=2.4.2', 'llama-index', 'llama-index-vector-stores-milvus', 'llama-index-embeddings-huggingface', 'llama-index-llms-openai-like'] - ) - def doc_ingest_component(url: str, collection: str) -> None: + base_image='nvcr.io/nvidia/ai-workbench/python-cuda117:1.0.3', + packages_to_install=['psycopg2-binary', 'llama-index', 'llama-index-vector-stores-postgres', + 'llama-index-embeddings-openai-like', 'llama-index-llms-openai-like', 'kubernetes'] + ) + def doc_ingest_component(url: str, table_name: str) -> None: print(">>> doc_ingest_component") from urllib.request import urlopen @@ -344,35 +183,54 @@ The steps below create and use a Python script to create a Kubeflow pipeline fil # load documents documents = SimpleDirectoryReader("./md_docs/", recursive=True, required_exts=[".md"]).load_data() - from llama_index.embeddings.huggingface import HuggingFaceEmbedding + from llama_index.embeddings.openai_like import OpenAILikeEmbedding from llama_index.core import Settings - Settings.embed_model = HuggingFaceEmbedding( - model_name="sentence-transformers/all-MiniLM-L6-v2" + Settings.embed_model = OpenAILikeEmbedding( + model_name="mistral-7b-instruct", + api_base="http://mistral-7b.team-models.svc.cluster.local/openai/v1", + api_key="EMPTY", + embed_batch_size=50, + max_retries=3, + timeout=180.0 ) - from llama_index.llms.openai_like import OpenAILike - - llm = OpenAILike( - model="llama3", - api_base="https://llama3-model-predictor-team-demo.{{< placeholder "" >}}/openai/v1", - api_key = "EMPTY", - max_tokens = 512) - - Settings.llm = llm - from llama_index.core import VectorStoreIndex, StorageContext - from llama_index.vector_stores.milvus import MilvusVectorStore - - vector_store = MilvusVectorStore(uri="http://milvus.milvus.svc.cluster.local:19530", collection=collection, dim=384, overwrite=True) + from llama_index.vector_stores.postgres import PGVectorStore + import base64 + from kubernetes import client, config + + def get_secret_credentials(): + try: + config.load_incluster_config() # For in-cluster access + v1 = client.CoreV1Api() + secret = v1.read_namespaced_secret(name="pgvector-app", namespace="team-demo") + password = base64.b64decode(secret.data['password']).decode('utf-8') + username = base64.b64decode(secret.data['username']).decode('utf-8') + return username, password + except Exception as e: + print(f"Error getting secret: {e}") + return 'app', 'changeme' + + pg_user, pg_password = get_secret_credentials() + + vector_store = PGVectorStore.from_params( + database="app", + host="pgvector-rw.team-demo.svc.cluster.local", + port=5432, + user=pg_user, + password=pg_password, + table_name=table_name, + embed_dim=4096 + ) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( documents, storage_context=storage_context ) @dsl.pipeline - def doc_ingest_pipeline(url: str, collection: str) -> None: - comp = doc_ingest_component(url=url, collection=collection) + def doc_ingest_pipeline(url: str, table_name: str) -> None: + comp = doc_ingest_component(url=url, table_name=table_name) from kfp import compiler @@ -397,49 +255,43 @@ The steps below create and use a Python script to create a Kubeflow pipeline fil 1. Select **view** > **team** and **team** > **demo** in the top bar. -1. Select **Services**. +1. Select **Apps**. -1. Click on the URL of the service `ml-pipeline-ui`. +1. Click on the `kubeflow-pipelines` app. -1. Navigate to the **Pipelines** section, click **Upload pipeline**. +1. The UI opens the **Pipelines** section. Click **Upload pipeline**. 1. Under **Upload a file**, select the `pipeline.yaml` file created in the previous section, and click **Create**. ![Upload Pipeline YAML](APL-RAG-upload-pipeline-yaml.jpg) -1. Select **Experiments** from the left menu, and click **Create experiment**. Enter a name and description for the experiment, and click **Next**. - - ![Create Experiment](APL-RAG-create-experiment.jpg) - - When complete, you should be brought to the **Runs** > **Start a new run** page. - -1. Complete the following steps to start a new run: +1. Select **Runs** from the left menu, and click **Create run**. - - Under **Pipeline**, choose the pipeline `pipeline.yaml` you just created. +1. Under **Pipeline**, choose the pipeline `pipeline.yaml` you just created. - - For **Run Type** choose **One-off**. +1. For **Run Type** choose **One-off**. - - Provide the collection name and URL of the data set to be processed. This is the zip file with the documents you wish to process. +1. Use `linode_docs` for the **table_name** - To use the sample Linode Docs data set in this guide, use the name `linode_docs` for **collection-string** and the following GitHub URL for **url-string**: +1. To use the sample Linode Docs data set in this guide, use the following GitHub URL for **url-string**: ```command - https://github.com/linode/docs/archive/refs/tags/v1.360.0.zip + https://github.com/linode/rag-datasets/raw/refs/heads/main/cloud-computing.zip ``` 1. Click **Start** to run the pipeline. When completed, the run is shown with a green checkmark to the left of the run title. ![Docs Run Complete](APL-RAG-docs-run-complete.jpg) -## Deploy the Chatbot +## Deploy the AI Agent -The next step is to install the Open WebUI pipeline and web interface and configure it to connect the data generated in the Kubernetes Pipeline with the LLM deployed in KServe. +The next step is to use Open WebUI pipelines configured with an agent pipeline. This connects the data generated in the Kubernetes Pipeline with the LLM deployed in KServe. It also exposes an OpenAI API endpoint to allow for a connection with the chatbot. -The Open WebUI Pipeline uses the Milvus database to load context related to the search. The pipeline sends it, and the query, to the Llama 3 LLM instance within KServe. The LLM then sends back a response to the chatbot, and your browser displays an answer informed by the custom data set. +The Open WebUI pipeline uses the PGvector database to load context related to the search. The pipeline sends it, and the query, to the Llama LLM instance within KServe. The LLM then sends back a response to the chatbot, and your browser displays an answer informed by the custom data set. -### Create a configmap with the RAG Pipeline Files +### Create a configmap with the Agent Pipeline Files -The RAG pipeline files in this section are not related to the Kubeflow pipeline create in the previous section. Rather, the RAG pipeline instructs the chatbot how to interact with each component created thus far, including the Milvus data store and the Llama 3 LLM. +The Agent pipeline files in this section are not related to the Kubeflow pipeline created in the previous section. Instead, the Agent pipeline instructs the agent how to interact with each component created thus far, including the PGvector data store, the embedding model and the Llama (foundation) model. 1. Select **view** > **team** and **team** > **demo** in the top bar. @@ -447,118 +299,156 @@ The RAG pipeline files in this section are not related to the Kubeflow pipeline 1. In Gitea, navigate to the `team-demo-argocd` repository on the right. -1. Click the **Add File** dropdown, and select **New File**. Create a file with the name `pipeline-files.yaml` with the following contents. Replace {{< placeholder "" >}} with the domain of your App Platform instance: +1. Click the **Add File** dropdown, and select **New File**. Create a file with the name `my-agent-pipeline-files.yaml` with the following contents: ```file apiVersion: v1 - data: - pipeline-requirements.txt: | - requests - pymilvus - llama-index - llama-index-vector-stores-milvus - llama-index-embeddings-huggingface - llama-index-llms-openai-like - opencv-python-headless - rag-pipeline.py: | - """ - title: RAG Pipeline - version: 1.0 - description: RAG Pipeline - """ - from typing import List, Optional, Union, Generator, Iterator - - class Pipeline: - - def __init__(self): - self.name = "RAG Pipeline" - self.index = None - pass - - - async def on_startup(self): - from llama_index.embeddings.huggingface import HuggingFaceEmbedding - from llama_index.core import Settings, VectorStoreIndex - from llama_index.llms.openai_like import OpenAILike - from llama_index.vector_stores.milvus import MilvusVectorStore - - print(f"on_startup:{__name__}") - - Settings.embed_model = HuggingFaceEmbedding( - model_name="sentence-transformers/all-MiniLM-L6-v2" - ) - - llm = OpenAILike( - model="llama3", - api_base="https://llama3-model-predictor-team-demo.{{< placeholder "" >}}/openai/v1", - api_key = "EMPTY", - max_tokens = 512) - - Settings.llm = llm - - vector_store = MilvusVectorStore(uri="http://milvus.milvus.svc.cluster.local:19530", collection="linode_docs", dim=384, overwrite=False) - self.index = VectorStoreIndex.from_vector_store(vector_store=vector_store) - - async def on_shutdown(self): - print(f"on_shutdown:{__name__}") - pass - - - def pipe( - self, user_message: str, model_id: str, messages: List[dict], body: dict - ) -> Union[str, Generator, Iterator]: - print(f"pipe:{__name__}") - - query_engine = self.index.as_query_engine(streaming=True, similarity_top_k=5) - response = query_engine.query(user_message) - print(f"rag_response:{response}") - return f"{response}" kind: ConfigMap metadata: - name: pipelines-files + name: my-agent-pipeline + data: + agent-pipeline-requirements.txt: | + psycopg2-binary + llama-index + llama-index-vector-stores-postgres + llama-index-embeddings-openai-like + llama-index-llms-openai-like + opencv-python-headless + kubernetes + agent-pipeline.py: | + import base64 + from llama_index.core import Settings, VectorStoreIndex + from llama_index.core.llms import ChatMessage + from llama_index.llms.openai_like import OpenAILike + from llama_index.embeddings.openai_like import OpenAILikeEmbedding + from llama_index.vector_stores.postgres import PGVectorStore + from kubernetes import client, config as k8s_config + + # LLM configuration + LLM_MODEL = "meta-llama-3-1-8b" + LLM_API_BASE = "http://llama-3-1-8b.team-models.svc.cluster.local/openai/v1" + LLM_API_KEY = "EMPTY" + LLM_MAX_TOKENS = 512 + + # Embedding configuration + EMBEDDING_MODEL = "mistral-7b-instruct" + EMBEDDING_API_BASE = "http://mistral-7b.team-models.svc.cluster.local/openai/v1" + EMBED_BATCH_SIZE = 10 + EMBED_DIM = 4096 + + # Database configuration + DB_NAME = "app" + DB_TABLE_NAME = "linode_docs" + DB_SECRET_NAME = "pgvector-app" + DB_SECRET_NAMESPACE = "team-demo" + + # RAG configuration + SIMILARITY_TOP_K = 3 + SYSTEM_PROMPT = """You are a helpful AI assistant for Linode.""" + + class Pipeline: + def __init__(self): + self.name = "my-agent" + self.kb_index = None # Store the KB index for creating chat engines + self.system_prompt = SYSTEM_PROMPT # Store system prompt for LLM-only mode + + async def on_startup(self): + Settings.llm = OpenAILike( + model=LLM_MODEL, + api_base=LLM_API_BASE, + api_key=LLM_API_KEY, + max_tokens=LLM_MAX_TOKENS, + is_chat_model=True, + is_function_calling_model=True + ) + Settings.embed_model = OpenAILikeEmbedding( + model_name=EMBEDDING_MODEL, + api_base=EMBEDDING_API_BASE, + embed_batch_size=EMBED_BATCH_SIZE, + max_retries=3, + timeout=180.0 + ) + self.kb_index = self._build_vector_index() + + def _build_vector_index(self): + """Builds a vector index from database.""" + db_credentials = self._get_db_credentials() + + vector_store = PGVectorStore.from_params( + database=DB_NAME, + host=db_credentials["host"], + port=db_credentials["port"], + user=db_credentials["username"], + password=db_credentials["password"], + table_name=DB_TABLE_NAME, + embed_dim=EMBED_DIM, + ) + return VectorStoreIndex.from_vector_store(vector_store) + + def _get_db_credentials(self): + """Get database credentials from Kubernetes secret.""" + k8s_config.load_incluster_config() + v1 = client.CoreV1Api() + secret = v1.read_namespaced_secret( + name=DB_SECRET_NAME, + namespace=DB_SECRET_NAMESPACE, + ) + return { + "username": base64.b64decode(secret.data["username"]).decode("utf-8"), + "password": base64.b64decode(secret.data["password"]).decode("utf-8"), + "host": base64.b64decode(secret.data["host"]).decode("utf-8"), + "port": int(base64.b64decode(secret.data["port"]).decode("utf-8")), + } + + def _convert_to_chat_history(self, messages): + """Convert request messages to ChatMessage objects for chat history. + + Args: + messages: List of message dicts with 'role' and 'content' + + Returns: + List of ChatMessage objects excluding the last message (current message) + """ + chat_history = [] + if messages and len(messages) > 1: + for msg in messages[:-1]: # Exclude current message + chat_history.append(ChatMessage(role=msg['role'], content=msg['content'])) + return chat_history + + def pipe(self, user_message, model_id, messages, body): + try: + if self.kb_index is None: + yield "Error: Knowledge base not initialized. Please check system configuration." + return + + chat_history = self._convert_to_chat_history(messages) + + # Create chat engine on-demand (stateless) + chat_engine = self.kb_index.as_chat_engine( + chat_mode="condense_plus_context", + streaming=True, + similarity_top_k=SIMILARITY_TOP_K, + system_prompt=self.system_prompt + ) + # Get streaming response + streaming_response = chat_engine.stream_chat(user_message, chat_history=chat_history) + for token in streaming_response.response_gen: + yield token + except Exception as e: + print(f"\nDEBUG: Unexpected error: {type(e).__name__}: {str(e)}") + yield "I apologize, but I encountered an unexpected error while processing your request. Please try again." + return ``` - Optionally add a title and any notes to the change history, and click **Commit Changes**. - -1. Go to **Apps**, and open the _Argocd_ application. Navigate to the `team-demo` application to see if the configmap has been created. If it is not ready yet, click **Refresh** as needed. +1. Optionally add a title and any notes to the change history, and click **Commit Changes**. - ![Pipelines-files CM](APL-RAG-pipelines-files-CM.jpg) +1. Go to **Apps**, and open the _Argocd_ application. Navigate to the `team-demo` application to see if the configmap has been created. If it is not ready yet, click **Refresh** if needed. ### Deploy the open-webui Pipeline and Web Interface -Update the Kyverno **Policy** `open-webui-policy.yaml` created in the previous tutorial ([Deploy an LLM for AI Inferencing with App Platform for LKE](/docs/guides/deploy-llm-for-ai-inferencing-on-apl)) to mutate the `open-webui` pods that will be deployed. - -1. Open the **Gitea** app, navigate to the `team-demo-argocd` repository, and open the `open-webui-policy.yaml` file. - -1. Add the following resources so that the `open-webui` pods are deployed with the `sidecar.istio.io/inject: "false"` label that prevents Istio sidecar injection: - - ```file - - resources: - kinds: - - StatefulSet - - Deployment - selector: - matchLabels: - ## change the value to match the name of the Workload - app.kubernetes.io/instance: "linode-docs-chatbot" - - resources: - kinds: - - StatefulSet - - Deployment - selector: - matchLabels: - ## change the value to match the name of the Workload - app.kubernetes.io/instance: "open-webui-pipelines" - ``` - - {{< note title="YAML Spacing" isCollapsible=true >}} - Be mindful of indentations when editing the YAML file. Both `-resources` sections should live under the `-name` > `match` > `any` block in `rules`. - - ![Open WebUI Policy YAML Edit](APL-RAG-openwebui-policy-edit.jpg) +Update the Kyverno **Policy** `open-webui-policy.yaml` created in the previous tutorial ([Deploy an LLM for AI Inference with App Platform for LKE](/docs/guides/deploy-llm-for-ai-inferencing-on-apl)) to mutate the `open-webui` pods that will be deployed. - {{< /note >}} - -#### Add the open-webui-pipelines Helm Chart to the Catalog +#### Add the pipelines Helm Chart to the Catalog 1. Select **view** > **team** and **team** > **admin** in the top bar. @@ -566,19 +456,19 @@ Update the Kyverno **Policy** `open-webui-policy.yaml` created in the previous t 1. Select **Add Helm Chart**. -1. Under **Github URL**, add the URL to the `open-webui-pipelines` Helm chart: +1. Under **Github URL**, add the URL to the open-webui `pipelines` Helm chart: ```command https://github.com/open-webui/helm-charts/blob/pipelines-0.4.0/charts/pipelines/Chart.yaml ``` -1. Click **Get Details** to populate the `open-webui-pipelines` Helm chart details. If preferred, rename the **Target Directory Name** from `pipelines` to `open-webui-pipelines` for reference later on. +1. Click **Get Details** to populate the `pipelines` Helm chart details. 1. Leave **Allow teams to use this chart** selected. 1. Click **Add Chart**. -#### Create a Workload for the open-webui-pipelines Helm Chart +#### Create a Workload for the pipelines Helm Chart 1. Select **view** > **team** and **team** > **demo** in the top bar. @@ -586,13 +476,13 @@ Update the Kyverno **Policy** `open-webui-policy.yaml` created in the previous t 1. Click on **Create Workload**. -1. Select the _Open-Webui-Pipelines_ Helm chart from the Catalog. +1. Select the _pipelines_ Helm chart from the catalog. 1. Click on **Values**. -1. Provide a name for the Workload. This guide uses the Workload name `open-webui-pipelines`. +1. Provide a name for the workload. This guide uses the workload name `my-agent`. -1. Add in or change the following chart values. Make sure to set the name of the Workload in the `nameOverride` field. +1. Add in or change the following chart values. Make sure to set the name of the workload in the `nameOverride` field. You may need to uncomment some fields by removing the `#` sign in order to make them active. Remember to be mindful of indentations: @@ -600,42 +490,67 @@ Update the Kyverno **Policy** `open-webui-policy.yaml` created in the previous t nameOverride: {{< placeholder "linode-docs-pipeline" >}} resources: requests: - cpu: "{{< placeholder "1" >}}" - memory: {{< placeholder "512Mi" >}} + cpu: "1" + memory: "512Mi" limits: - cpu: "{{< placeholder "3" >}}" - memory: {{< placeholder "2Gi" >}} + cpu: "3" + memory: "2Gi" ingress: - enabled: {{< placeholder "false" >}} + enabled: false extraEnvVars: - - name: {{< placeholder "PIPELINES_REQUIREMENTS_PATH" >}} - value: {{< placeholder "/opt/pipeline-requirements.txt" >}} - - name: {{< placeholder "PIPELINES_URLS" >}} - value: {{< placeholder "file:///opt/rag-pipeline.py" >}} + - name: PIPELINES_REQUIREMENTS_PATH + value: "/opt/agent-pipeline-requirements.txt" + - name: PIPELINES_URLS + value: "file:///opt/agent-pipeline.py" volumeMounts: - - name: {{< placeholder "config-volume" >}} - mountPath: {{< placeholder "/opt" >}} + - name: config-volume + mountPath: "/opt" volumes: - - name: {{< placeholder "config-volume" >}} + - name: config-volume configMap: - name: {{< placeholder "pipelines-files" >}} + name: my-agent-pipeline ``` 1. Click **Submit**. -#### Expose the linode-docs-pipeline Service +#### Add a new Role and a RoleBinding for the Agent -1. Select **view** > **team** and **team** > **demo** in the top bar. +The agent pipeline requires access to the PGvector database. For configure this, the ServiceAccount of the Agent needs access to the `pgvector-app` secret that includes the database credentials. Create the Role and RoleBinding by following the steps below. -1. Select **Services**. +1. Select **view** > **platform** in the top bar. -1. Click **Create Service**. +1. Select **Apps** in the left menu. -1. In the **Service Name** dropdown menu, select the `linode-docs-pipeline` service. +1. In the **Apps** section, select the **Gitea** app. -1. Click **Create Service**. +1. In Gitea, navigate to the `team-demo-argocd` repository. -1. Once submitted, copy the URL of the `linode-docs-pipeline` service to your clipboard. +1. Click the **Add File** dropdown, and select **New File**. Create a file named `my-agent-rbac.yaml` with the following contents: + + ```file + apiVersion: rbac.authorization.k8s.io/v1 + kind: Role + metadata: + name: pgvector-app-secret-reader + rules: + - apiGroups: [""] + resources: ["secrets"] + resourceNames: ["pgvector-app"] + verbs: ["get", "list"] + --- + apiVersion: rbac.authorization.k8s.io/v1 + kind: RoleBinding + metadata: + name: pgvector-app-secret-reader + roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: pgvector-app-secret-reader + subjects: + - kind: ServiceAccount + name: my-agent + namespace: team-demo + ``` #### Create a Workload to Install the open-webui Helm Chart @@ -645,52 +560,43 @@ Update the Kyverno **Policy** `open-webui-policy.yaml` created in the previous t 1. Click on **Create Workload**. -1. Select the _Open-Webui_ Helm chart from the Catalog. This Helm chart should have been added in the previous [Deploy an LLM for AI Inferencing with App Platform for LKE](/docs/guides/deploy-llm-for-ai-inferencing-on-apl/#add-the-open-webui-helm-chart-to-the-catalog) guide. +1. Select the _open-webui_ Helm chart from the catalog. This Helm chart should have been added in the previous [Deploy an LLM for AI Inference with App Platform for LKE](/docs/guides/deploy-llm-for-ai-inferencing-on-apl/#add-the-open-webui-helm-chart-to-the-catalog) guide. 1. Click on **Values**. -1. Provide a name for the Workload. This guide uses the name `linode-docs-chatbot`. +1. Provide a name for the workload. This guide uses the name `my-agent-ui`. -1. Edit the chart to include the below values, and set the name of the Workload in the `nameOverride` field. Replace {{< placeholder "" >}} with your App Platform cluster domain. - - You may need to add new lines for the additional names and values under `extraEnvVars` (extra environment variables): +1. Edit the chart to include the below values, and set the name of the workload in the `nameOverride` field. ``` - nameOverride: {{< placeholder "linode-docs-chatbot" >}} + nameOverride: {{< placeholder "my-agent-ui" >}} ollama: - enabled: {{< placeholder "false" >}} + enabled: false pipelines: - enabled: {{< placeholder "false" >}} + enabled: false persistence: - enabled: {{< placeholder "false" >}} - replicaCount: {{< placeholder "1" >}} + enabled: false + replicaCount: 1 + openaiBaseApiUrl: "http://my-agent.team-demo.svc.cluster.local:9099" extraEnvVars: - - name: {{< placeholder "WEBUI_AUTH" >}} - value: "{{< placeholder "false" >}}" - - name: {{< placeholder "OPENAI_API_BASE_URLS" >}} - value: https://llama3-model-predictor-team-demo.{{< placeholder "" >}}/openai/v1;https://linode-docs-pipeline-demo.{{< placeholder "" >}} - - name: {{< placeholder "OPENAI_API_KEYS" >}} - value: {{< placeholder "EMPTY;0p3n-w3bu!" >}} + - name: WEBUI_AUTH + value: false + - name: OPENAI_API_KEY + value: "0p3n-w3bu!" ``` 1. Click **Submit**. -#### Expose the linode-docs-chatbot Service +#### Publicly expose the my-agent-ui Service 1. Select **Services**. 1. Click **Create Service**. -1. In the **Service Name** dropdown list, select the `linode-docs-chatbot` service. +1. In the **Service Name** dropdown list, select the `my-agent-ui` service. 1. Click **Create Service**. -## Access the Open Web User Interface - -In your list of available **Services**, click on the URL of the `linode-docs-chatbot` to navigate to the Open WebUI chatbot interface. Select the model you wish to use in the top left dropdown menu (`llama3-model` or `RAG Pipeline`). - -The Llama 3 AI model uses information that is pre-trained by other data sources - not your custom data set. If you give this model a query, it will use its pre-trained data set to answer your question in real time. - -The RAG Pipeline model defined in this guide uses data from the custom data set with which it was provided. The example data set used in this guide is sourced from Linode Docs. If you give this model a query relevant to your custom data, the chatbot should respond with an answer informed by that data set. +In the list of available **Services**, click on the URL of the `my-agent-ui` to navigate to the Open WebUI. ![Llama and RAG LLMs](APL-RAG-LLMs.jpg) \ No newline at end of file