Prompting Guide

Prompt Roles

LFM2 models use a structured conversation format with three prompt roles:

system (optional) - Sets assistant behavior, context, and instructions. Use for personality, task context, output format, or constraints.
user - Contains the question, instruction, or request from the user.
assistant - Provides a partial response for the model to continue from. Useful for multi-turn conversations, few-shot prompting, or prefilling structured outputs (e.g., JSON opening brace).

Example:

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "How do I sort a list in Python?"}
]

Additional examples: few-shot prompting and prefill

Multi-turn conversations / Few-shot prompting:Continue a previous conversation or provide example interactions to guide the model’s behavior. The model learns from the conversation history and applies patterns to new inputs.

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What are the benefits of exercise?"},
    {"role": "assistant", "content": "Exercise has many benefits including:\n1. Improved cardiovascular health\n2. "},  # Partial response to continue
    {"role": "user", "content": "Tell me more about cardiovascular health."}
]

Or provide few-shot examples:

messages = [
    {"role": "system", "content": "You are a helpful assistant that formats dates."},
    {"role": "user", "content": "2024-01-15"},
    {"role": "assistant", "content": "January 15, 2024"},
    {"role": "user", "content": "2024-12-25"},
    {"role": "assistant", "content": "December 25, 2024"},
    {"role": "user", "content": "2024-03-08"}  # Model follows the pattern
]

Prefill for structured output:Start the model with a specific format or structure (e.g., JSON opening brace) to guide it toward structured outputs.

messages = [
    {"role": "system", "content": "Extract information and return as JSON."},
    {"role": "user", "content": "Extract the name and age from: John is 30 years old."},
    {"role": "assistant", "content": "{\n  \"name\": "}  # Prefill with JSON structure
]

For structured generation with schema validation, see Outlines.

Text Models

Control text generation behavior, balancing creativity, determinism, and quality:

temperature (0.0-2.0) - Randomness control. Lower (0.1-0.7) = deterministic; higher (0.8-1.5) = creative.
top_p (0.0-1.0) - Nucleus sampling. Lower (0.1-0.5) = focused; higher (0.7-0.95) = diverse.
top_k - Limits to top-k tokens. Lower (10-50) = high-probability; higher (50-100) = diverse.
min_p (0.0-1.0) - Filters tokens below min_p * max_probability. Maintains quality with diversity.
repetition_penalty (1.0+) - Reduces repetition. 1.0 = no penalty; >1.0 = prevents repetition.
max_tokens / max_new_tokens - Maximum tokens to generate.

Parameter names and syntax vary by platform. See Transformers, vLLM, or llama.cpp for details.

Recommended Settings Text

For LFM2.5 text models:

temperature=0.1
top_k=50
top_p=0.1
repetition_penalty=1.05

For LFM2 text models:

temperature=0.3
min_p=0.15
repetition_penalty=1.05

Vision Models

LFM2-VL models use a variable resolution encoder to control the quality/speed tradeoff by adjusting how images are tokenized.

Image Token Management

Control image tokenization with:

min_image_tokens - Minimum tokens for image encoding
max_image_tokens - Maximum tokens for image encoding
do_image_splitting - Split large images into 512×512 patches

How it works: Large images are split into non-overlapping patches, then a 2-layer MLP connector with pixel unshuffle reduces tokens (e.g., 256×384 → 96 tokens, 1000×3000 → 1,020 tokens). Adjust min_image_tokens and max_image_tokens to balance quality vs. speed. Example configurations:

# High quality (slower)
max_image_tokens = 256
min_image_tokens = 128

# Balanced
max_image_tokens = 128
min_image_tokens = 64

# Fast (lower quality)
max_image_tokens = 64
min_image_tokens = 32

Recommended Settings Vision

For vision models:

temperature=0.1
min_p=0.15
repetition_penalty=1.05
min_image_tokens=64
max_image_tokens=256
do_image_splitting=True

Liquid Nanos (task-specific models like LFM2-Extract, LFM2-RAG, LFM2-Tool, etc.) may have special prompting requirements and different generation parameters. For the best usage guidelines, refer to the individual model cards on the Liquid Nanos page.

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

Prompt Roles

Text Models

Recommended Settings Text

Vision Models

Image Token Management

Recommended Settings Vision

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

​Prompt Roles

​Text Models

​Recommended Settings Text

​Vision Models

​Image Token Management

​Recommended Settings Vision

Prompt Roles

Text Models

Recommended Settings Text

Vision Models

Image Token Management

Recommended Settings Vision