Leap
Leap is the static entry point for loading on-device models.
Leap.load(model:quantization:options:downloadProgressHandler:)
Download a model from the LEAP Model Library and load it into memory. If the model has already been downloaded, it will be loaded from the local cache without a remote request.
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
model | String | Yes | - | The name of the model to load. See the LEAP Model Library for all available models. |
quantization | String | Yes | - | The quantization level to download for the given model. See the LEAP Model Library for all available quantization levels. |
options | LiquidInferenceEngineManifestOptions | No | nil | Override options for loading the model (recommended for advanced use cases only). |
downloadProgressHandler | (_ progress: Double, _speed: Int64) -> Void | No | nil | A callback function to receive the download progress (as a percentage in decimal form between 0 and 1) and speed (in bytes per second). |
ModelRunner: A ModelRunner instance that can be used to interact with the loaded model.
ModelDownloader.downloadModel(model:quantization:downloadProgress:)
Download a model from the LEAP Model Library and save it to the local cache, without loading it into memory.
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
model | String | Yes | - | The name of the model to load. See the LEAP Model Library for all available models. |
quantization | String | Yes | - | The quantization level to download for the given model. See the LEAP Model Library for all available quantization levels. |
downloadProgressHandler | (_ progress: Double, _speed: Int64) -> Void | No | nil | A callback function to receive the download progress (as a percentage in decimal form between 0 and 1) and speed (in bytes per second). |
DownloadedModelManifest: The DownloadedModelManifest instance that contains the metadata of the downloaded model:
Legacy: Leap.load(url:options:)
Legacy: Leap.load(url:options:)
Loads a local model file (either a
.bundle package or a .gguf checkpoint) and returns a ModelRunner instance.- Throws
LeapError.modelLoadingFailureif the file cannot be loaded. - Automatically detects companion files placed alongside your model:
mmproj-*.ggufenables multimodal vision tokens for both bundle and GGUF flows.- Audio decoder artifacts whose filename contains โaudioโ and โdecoderโ with a
.ggufor.binextension unlock audio input/output for compatible checkpoints. - Must be called from an async context (for example inside an
asyncfunction or aTask). Keep the returnedModelRunneralive while you interact with the model.
LiquidInferenceEngineOptions
Pass a LiquidInferenceEngineOptions value when you need to override the default runtime configuration.
bundlePath: Path to the model file on disk. When you callLeap.load(url:), this is filled automatically.cacheOptions: Configure persistence of KV-cache data between generations.cpuThreads: Number of CPU threads for token generation.contextSize: Override the default maximum context length for the model.nGpuLayers: Number of layers to offload to GPU (for macOS/macCatalyst targets with Metal support).mmProjPath: Optional path to an auxiliary multimodal projection model. Leavenilto auto-detect a siblingmmproj-*.gguf.audioDecoderPath: Optional audio decoder model. Leavenilto auto-detect nearby decoder artifacts.chatTemplate: Advanced override for backend chat templating.audioTokenizerPath: Optional tokenizer for audio-capable checkpoints.extras: Backend-specific configuration payload (advanced use only).
Backend selection is automatic:
.bundle files run on the ExecuTorch backend, while .gguf
checkpoints use the embedded llama.cpp backend. Bundled models reference their projection data in
metadata; GGUF checkpoints look for sibling companion files (multimodal projection, audio decoder,
audio tokenizer) unless you override the paths through LiquidInferenceEngineOptions. Ensure
these artifacts are co-located when you want vision or audio features.