Cloud AI Comparison

If you are familiar with cloud-based AI APIs (e.g. OpenAI API), this document shows the similarity and differences between these clould APIs and Leap. We will inspect this piece of Python-based OpenAI API chat completion request to figure out how to migrate it to LeapSDK. This example code is modified from OpenAI API documentation.

from openai import OpenAI
client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices:
        delta_content = chunk.choices[0].delta.get("content")
        if delta_content:
            print(delta_content, end="", flush=True)

print("")
print("Generation done!")

Loading the model

While models can be directly used on the cloud-based APIs once the API client is created, LeapSDK requires the developers to explicitly load the model before requesting the generation, since the model will run locally. This step generally takes a few seconds depending on the model size and the device performance. On cloud API, you need to create a API client:

client = OpenAI()

In LeapSDK, you need to load the model to create a model runner.

val modelRunner = LeapClient.loadModel(MODEL_BUNDLE_PATH)

The parameter will be the path to the model bundle file, and must be local. The return value is a “model runner” which plays a similar role to the client object in the cloud API – except that it carries the model weights. If the model runner is released, the app has to reload the model before requesting new generations.

Request for generation

In the cloud API calls, client.chat.completions.create will return a stream object for the caller to fetch the generated contents.

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True,
)

In LeapSDK for Android, we use generateResponse on the conversation object to obtain a Kotlin flow (equivalent to a Python stream) for generation. Since the model runner object contains all information about the model, we don’t need to indicate the model name in the call again.

val conversation = modelRunner.createConversation()
val stream = conversation.generateResponse(
    ChatMessage(
        ChatMessage.Role.User,
        listOf(ChatMessageContent.Text("Say 'double bubble bath' ten times fast."))
    )
)

// This simplified call has the exactly same effect of the above call
val stream = conversation.generateResponse("Say 'double bubble bath' ten times fast.")

Process generated contents

In cloud API Python code, a for-loop on the stream object retrieves the contents.

for chunk in stream:
    if chunk.choices:
        delta_content = chunk.choices[0].delta.get("content")
        if delta_content:
            print(delta_content, end="", flush=True)

print("")
print("Generation done!")

In LeapSDK, we call onEach function on the Kotlin flow to process the content. When the completion is done, the callback in onCompletion will be invoked. In the end, a call to collect() is necessary to start the generation.

stream.onEach { chunk ->
    when (chunk) {
        is MessageResponse.Chunk -> {
            print(chunk.text)
        }
        else -> {}
    }
}.onCompletion {
    print("")
    print("Generation done!")
}.collect()

Coroutine scope

Hopefully by now it is clear how similar LeapSDK API is to cloud-based APIs. It is worth noting that most LeapSDK Android APIs are based on Kotlin coroutine. You will need to use a coroutine scope to execute these functions. In Android, we recommend to use “lifecycle scope” defined on lifecycle-aware components:

lifecycleScope.launch {
    val modelRunner = LeapClient.loadModel(MODEL_BUNDLE_PATH)

    val conversation = modelRunner.createConversation()
    val stream = conversation.generateResponse(
        ChatMessage(
            ChatMessage.Role.User,
            listOf(ChatMessageContent.Text("Say 'double bubble bath' ten times fast."))
        )
    )

    stream.onEach { chunk ->
        when (chunk) {
            is MessageResponse.Chunk -> {
                print(chunk.text)
            }
            else -> {}
        }
    }.onCompletion {
        print("")
        print("Generation done!")
    }.collect()
}

Next steps

For more information, please refer to the quick start guide and API reference.

Get Started

iOS

Android

Model Bundling Service

Loading the model

Request for generation

Process generated contents

Coroutine scope

Next steps

Get Started

iOS

Android

Model Bundling Service

​Loading the model

​Request for generation

​Process generated contents

​Coroutine scope

​Next steps

Loading the model

Request for generation

Process generated contents

Coroutine scope

Next steps