Loading the model
While models can be directly used on the cloud-based APIs once the API client is created, LeapSDK requires the developers to explicitly load the model before requesting the generation, since the model will run locally. This step generally takes a few seconds depending on the model size and the device performance. On cloud API, you need to create a API client:Request for generation
In the cloud API calls,client.chat.completions.create will return a stream object for the
caller to fetch the generated contents.
generateResponse on
the conversation object to obtain a Kotlin flow (equivalent to a Python stream) for generation. Since
the model runner object contains all information about the model, we don’t need to indicate the model name
in the call again.
Process generated contents
In cloud API Python code, a for-loop on the stream object retrieves the contents.onEach function on the Kotlin flow to process the content. When the completion is done, the
callback in onCompletion will be invoked. In the end, a call to collect() is necessary to start the generation.