Deploying an LLM using MLRun#

This notebook illustrates deploying an LLM using MLRun.

Since this tutorial is for illustrative purposes, it uses minimal resources — CPU and not GPU, and a small amount of data.

In this tutorial:

See also:



MLRun installation and configuration#

Before running this notebook make sure the mlrun package is installed (pip install mlrun) and that you have configured the access to MLRun service.

# Install MLRun if not installed, run this only once. Restart the notebook after the install
%pip install mlrun
import json
import mlrun

Get or create a new project

First create, load or use (get) an MLRun Project. The get_or_create_project() method tries to load the project from the MLRun DB. If the project does not exist, it creates a new one.

project = mlrun.get_or_create_project("genai-tutorial", user_project=True)
> 2024-11-13 09:15:51,255 [info] Created and saved project: {"context":"./","from_template":null,"name":"genai-tutorial-new","overwrite":false,"save":true}
> 2024-11-13 09:15:51,257 [info] Project created successfully: {"project_name":"genai-tutorial-new","stored_in_db":true}

Set up the vector database in the cluster#

These two steps imports a pre-defined dataset and load it into a vector database. Then the vector database is stored in the data layer of the cluster.

If you're not using Iguazio's Jupyter, download fetch-vectordb-data.py.

The image for this step can be created using the following Dockerfile (contains MLRun and Chroma DB):

FROM mlrun/mlrun:1.6.3
RUN pip install chromadb==0.5.0 langchain==0.2.3 langchain-community==0.2.4 langchain-core==0.2.5 langchain-text-splitters==0.2.1 clean-text==0.6.0
# The model used is the free open-source PHI 2
MODEL_ID = "microsoft/phi-2"

# Define the dataset for the VectorDB
DATA_SET = mlrun.get_sample_path("data/genai-tutorial/labelled_newscatcher_dataset.csv")

# The location of the VectorDB files
CACHE_DIR = mlrun.mlconf.artifact_path
CACHE_DIR = (
    CACHE_DIR.replace("v3io://", "/v3io").replace("{{run.project}}", project.name)
    + "/cache"
)

Fetch the dataset for the Vector DB and save it in cluster:

fetch = project.set_function(
    name="fetch-vectordb-data",
    func="src/fetch-vectordb-data.py",
    kind="job",
    image="gcr.io/iguazio/mlrun-genai/mlrun-llm-demo-data:1.7.0",
)
fetch.save()
'db://genai-tutorial-new/fetch-vectordb-data:latest'
ret = project.run_function(
    name="fetch-vectordb-data-run",
    function="fetch-vectordb-data",
    handler="handler",
    params={"data_set": DATA_SET},
)
> 2024-11-13 09:15:59,365 [info] Storing function: {"db":"http://mlrun-api:8080","name":"fetch-vectordb-data-run","uid":"61d75d706cfb4ce0874e6ea2cbe94314"}
> 2024-11-13 09:15:59,640 [info] Job is running in the background, pod: fetch-vectordb-data-run-n96gh
Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.
WARNING:root:USER_AGENT environment variable not set, consider setting it to identify your requests.
Fetching pages: 100%|##########| 80/80 [00:21<00:00,  3.71it/s]
> 2024-11-13 09:17:03,760 [info] Dataset dowloaded and logged
> 2024-11-13 09:17:04,026 [info] To track results use the CLI: {"info_cmd":"mlrun get run 61d75d706cfb4ce0874e6ea2cbe94314 -p genai-tutorial-new","logs_cmd":"mlrun logs 61d75d706cfb4ce0874e6ea2cbe94314 -p genai-tutorial-new"}
> 2024-11-13 09:17:04,026 [info] Or click for UI: {"ui_url":"https://dashboard.default-tenant.app.llm5.iguazio-cd1.com/mlprojects/genai-tutorial-new/jobs/monitor/61d75d706cfb4ce0874e6ea2cbe94314/overview"}
> 2024-11-13 09:17:04,026 [info] Run execution finished: {"name":"fetch-vectordb-data-run","status":"completed"}
project uid iter start state kind name labels inputs parameters results artifacts
genai-tutorial-new 0 Nov 13 09:16:34 completed run fetch-vectordb-data-run
v3io_user=edmond
kind=job
owner=edmond
mlrun/client_version=1.7.0
mlrun/client_python_version=3.9.16
host=fetch-vectordb-data-run-n96gh
data_set=https://s3.wasabisys.com/iguazio/data/genai-tutorial/labelled_newscatcher_dataset.csv
vector-db-dataset

> to track results use the .show() or .logs() methods or click here to open in UI
> 2024-11-13 09:17:11,015 [info] Run execution finished: {"name":"fetch-vectordb-data-run","status":"completed"}
ret.outputs
{'vector-db-dataset': 'store://datasets/genai-tutorial-new/fetch-vectordb-data-run_vector-db-dataset:latest@61d75d706cfb4ce0874e6ea2cbe94314'}

Build the vector DB#

Build the vector DB in the data layer and load the data into it.

The image for this step can be created using the following Dockerfile (contains MLRun and Chroma DB):

FROM mlrun/mlrun:1.6.3
RUN pip install chromadb==0.5.0 langchain==0.2.3 langchain-community==0.2.4 langchain-core==0.2.5 langchain-text-splitters==0.2.1 clean-text==0.6.0

If you're not using Iguazio's Jupyter, download the build vector db.

# Build the vector DB using the image
build_vectordb = project.set_function(
    name="build-vectordb",
    func="src/build-vector-db.py",
    kind="job",
    image="gcr.io/iguazio/mlrun-genai/mlrun-llm-demo-data:1.7.0",
).apply(mlrun.auto_mount())
build_vectordb.save()
'db://genai-tutorial-new/build-vectordb:latest'
project.run_function(
    function="build-vectordb",
    inputs={"df": ret.outputs["vector-db-dataset"]},
    params={"cache_dir": CACHE_DIR},
    handler="handler_chroma",
)
> 2024-11-13 09:18:35,155 [info] Storing function: {"db":"http://mlrun-api:8080","name":"build-vectordb-handler-chroma","uid":"9bcc04c5b8994e2086954bda1451af57"}
> 2024-11-13 09:18:35,407 [info] Job is running in the background, pod: build-vectordb-handler-chroma-8sv5v
Creating collection: 'my_news'
/root/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:00<00:00, 91.3MiB/s]
> 2024-11-13 09:21:21,169 [info] Vector DB was created
> 2024-11-13 09:21:21,226 [info] To track results use the CLI: {"info_cmd":"mlrun get run 9bcc04c5b8994e2086954bda1451af57 -p genai-tutorial-new","logs_cmd":"mlrun logs 9bcc04c5b8994e2086954bda1451af57 -p genai-tutorial-new"}
> 2024-11-13 09:21:21,226 [info] Or click for UI: {"ui_url":"https://dashboard.default-tenant.app.llm5.iguazio-cd1.com/mlprojects/genai-tutorial-new/jobs/monitor/9bcc04c5b8994e2086954bda1451af57/overview"}
> 2024-11-13 09:21:21,227 [info] Run execution finished: {"name":"build-vectordb-handler-chroma","status":"completed"}
project uid iter start state kind name labels inputs parameters results
genai-tutorial-new 0 Nov 13 09:19:11 completed run build-vectordb-handler-chroma
v3io_user=edmond
kind=job
owner=edmond
mlrun/client_version=1.7.0
mlrun/client_python_version=3.9.16
host=build-vectordb-handler-chroma-8sv5v
df
cache_dir=/v3io/projects/genai-tutorial-new/artifacts/cache

> to track results use the .show() or .logs() methods or click here to open in UI
> 2024-11-13 09:21:27,133 [info] Run execution finished: {"name":"build-vectordb-handler-chroma","status":"completed"}
<mlrun.model.RunObject at 0x7f031262f910>

Serving the function#

The image for this step can be created using the following Dockerfile (contains Chroma DB, Transformers, TF and PyTorch):

FROM mlrun/mlrun:1.6.3
RUN pip install chromadb==0.5.0 transformers==4.41.2 tensorflow==2.16.1 torch

If you're not using Iguazio's Jupyter, download serving.py. Now you can deploy the the Nuclio function that serves the LLM:

serve_func = project.set_function(
    name="serve-llm",
    func="src/serving.py",
    image="gcr.io/iguazio/mlrun-genai/llmserve:1.7.0",
    kind="nuclio",
).apply(mlrun.auto_mount())

# Transferring the model and VectorDB path to the serving functions
serve_func.set_envs(env_vars={"MODEL_ID": MODEL_ID, "CACHE_DIR": CACHE_DIR})

# Since the model is stored in memory, use only 1 replica and and one worker
# Since this is running on CPU only, inference might take ~1 minute (increasing timeout)
serve_func.spec.min_replicas = 1
serve_func.spec.max_replicas = 1
serve_func.with_http(worker_timeout=120, gateway_timeout=150, workers=1)
serve_func.set_config("spec.readinessTimeoutSeconds", 1200)
> 2024-11-13 09:22:04,626 [warning] Adding HTTP trigger despite the default HTTP trigger creation being disabled
<mlrun.runtimes.nuclio.function.RemoteRuntime at 0x7f038754edf0>
serve_func = project.deploy_function(function="serve-llm")

Test Serving Function#

body = {
    "question": "What are some new developments in space travel?",
    "topic": "science",
}
resp = serve_func.function.invoke("/", body=json.dumps(body))
> 2024-11-13 09:47:21,541 [info] Invoking function: {"method":"POST","path":"http://nuclio-genai-tutorial-new-serve-llm.default-tenant.svc.cluster.local:8080/"}
print(resp["response"])
Space travel has seen many new developments in recent years, including the increasing efforts of companies like SpaceX to send humans to Mars. SpaceX recently tested a prototype of the next-generation Starship vehicle, which could pave the way for carrying humans to the Moon and Mars. However, the increasing number of objects being sent into space is creating a dangerous situation, as they leave debris that can cause future collisions. This is a major threat to space travel and the environment, and more needs to be done to address this issue.
print(resp["sources"])
['https://www.express.co.uk/news/science/1324095/space-news-spacex-rocket-launch-stars-elon-musk-Comet-Neowise-latest']
print(resp["prompt"])
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
User question:
What are some new developments in space travel?

Context:
new designs by the star map company, Under Lucky Stars, have exposed the destructive impacts exploration could cause on future space travel and the environment. Trending The data shows the increasing danger created by the number of objects being sent into space as they leave debris after their launches.For about 70 years, humans have been launching vessels into space to explore the world beyond earth.Over the years the number of space missions have increased as technology continues to

that there be as little debris and space junk as possible."This comes as SpaceX have been increasing their efforts to send humans to Mars.The space company recently tested a prototype of the next-generation Starship vehicle which could be paving the way to carrying humans to the Moon and Mars.READ MORE:Asteroid bigger than a bus will shoot by closer than Moon THIS WEEKSpace data shows the increasing danger created by the number of objects being sent into space (Image: getty) SUBSCRIBE Invalid

Space news: Rocket launches from earth pose threat to space travel | Science | News | Express.co.uk

### Response:
project.set_function(f"db://{project.name}/fetch-vectordb-data")
project.set_function(f"db://{project.name}/build-vectordb")
project.set_function(f"db://{project.name}/serve-llm")
project.save()
<mlrun.projects.project.MlrunProject at 0x7f373e114ee0>

Run E2E Workflow#

%%writefile workflow.py
import mlrun
from kfp import dsl

    
@dsl.pipeline(
    name="GenAI demo"
)

def kfpipeline(data_set, cache_dir):
    
    project = mlrun.get_current_project()
    
    fetch = project.run_function(
        function="fetch-vectordb-data",
        name="fetch-vectordb-data-run",
        handler="handler",
        params = {"data_set" : data_set},
        outputs=['vector-db-dataset']
    )
    
    
    vectordb_build = project.run_function(
        function="build-vectordb",
        inputs={"df" : fetch.outputs["vector-db-dataset"]},
        params={"cache_dir" : cache_dir},
        handler="handler_chroma"
    )

    deploy = project.deploy_function("serve-llm").after(vectordb_build)
Writing workflow.py
project.set_workflow("main", "workflow.py", embed=True)
project.save()
<mlrun.projects.project.MlrunProject at 0x7f373e114ee0>
run_id = project.run(
    "main", arguments={"cache_dir": CACHE_DIR, "data_set": DATA_SET}, watch=True
)
Pipeline running (id=702859f3-bb9c-4793-a1b6-8f6d90dcdf31), click here to view the details in MLRun UI
../_images/29673abdf382f7e1e475d1c34d9a55bd5eadfb5ee5b43728179df9e4f8c4f989.svg

Run Results

[info] Workflow 702859f3-bb9c-4793-a1b6-8f6d90dcdf31 finished, state=Succeeded


click the hyper links below to see detailed results
uid start state name parameters results
Jun 17 23:03:44 completed build-vectordb-handler-chroma
cache_dir=/v3io/projects/genai-tutorial-new/artifacts/cache
Jun 17 23:02:17 completed fetch-vectordb-data-run
data_set=https://s3.wasabisys.com/iguazio/data/genai-tutorial/labelled_newscatcher_dataset.csv