Scheduled jobs and workflows#
Oftentimes you may want to run a job
on a regular schedule. For example, fetching from a datasource every morning, compiling an analytics report every month, or detecting model drift every hour.
Schedules have a minimum interval that will be allowed between two scheduled jobs. By default, a job is not allowed to be scheduled twice in a 10-minute period Currently, schedules like */13 * * * * (every 13th minute), in which the job would trigger at the 52nd minute and then again at the start of the next hour (minute 0) (with only 8 minutes between runs) are not allowed. See mlrun.mlconf.httpdb.scheduling for service schedules configuration.
Creating a job and scheduling it#
MLRun makes it very simple to add a schedule to a given job
. To showcase this, the following job runs the code below, which resides in a file titled schedule.py
:
def hello(context):
print("You just ran a scheduled job!")
To create the job, use the set_function
syntax and specify the kind
like below:
import mlrun
project = mlrun.get_or_create_project("schedule")
job = project.set_function(
name="my-scheduled-job", # Name of the job (displayed in console and UI)
filename="schedule.py", # Python file or Jupyter notebook to run
kind="job", # Run as a job
image="mlrun/mlrun", # Use this Docker image
handler="hello", # Execute the function hello() within code.py
)
Running the job using a schedule
To add a schedule, run the job and specify the schedule
parameter using Cron syntax like so:
job.run(schedule="0 * * * *")
This runs the job every hour. An excellent resource for generating Cron schedules is Crontab.guru.
Scheduling a workflow#
Note
Tech Preview
After loading the project (load_project
), run the project with the scheduled workflow:
project.run("main", schedule='0 * * * *')
Remote/Scheduled workflows can be performed by a project with a remote source or one that is contained on the image.
Remote source will be pulled each time the workflow is run, while the local source will be loaded from the image.
To use a remote source you can either put your code in Git or archive it and then set a source to it (e.g. git://github.com/mlrun/something.git, http://some/url/file.zip, s3://some/url/file.tar.gz etc.). By default, the defined project source will be used.
To set project source use the
project.set_source
method.To set workflow use the
project.set_workflow
method.
To use a different remote source, specify the source URL when running the workflow with project.run(source=<source-URL>)
method.
You can also use a context path to load the project from a local directory contained in the image used for execution:
To set project source use the
project.set_source
method (make surepull_at_runtime
is set toFalse
).To build the image with the project yaml and code use
project.build_image
method. Optionally specify atarget_dir
for the project content.Create the workflow e.g.
project.set_workflow(name="my-workflow", workflow_path="./src/workflow.py")
.The default workflow image is
project.spec.default_image
which was enriched to and built withproject.build_image
unless specified otherwise.Run the workflow with the context path e.g.
project.run("my-workflow", source="./", engine="remote")
. Thesource
can be absolute or relative path with"."
or"./"
.
Example for a remote GitHub project - mlrun/project-demo
import mlrun
project_name = "remote-workflow-example"
source_url = "git://github.com/mlrun/project-demo.git"
source_code_target_dir = "./project" # Optional, relative to "/home/mlrun_code". A different absolute path can be specified.
# Create a new project
project = mlrun.load_project(context=f"./{project_name}", url=source_url, name=project_name)
# Set the project source and workflow
project.set_source(source_url)
project.set_workflow(name="main", workflow_path="kflow.py")
# Build the image, load the source to the target dir and save the project
project.build_image(target_dir=source_code_target_dir)
project.save()
# Run the workflow, load the project from the target dir on the image
project.run("main", source="./", engine="remote", dirty=True)
You can delete a scheduled workflow in the MLRun UI. To update a scheduled workflow, re-define the schedule in the workflow, for example:
project.run("main", schedule='0 * * * *')