Skip to main content

Prompt Roles

LFM2 models use a structured conversation format with three prompt roles:
  • system (optional) - Sets assistant behavior, context, and instructions. Use for personality, task context, output format, or constraints.
  • user - Contains the question, instruction, or request from the user.
  • assistant - Provides a partial response for the model to continue from. Useful for multi-turn conversations, few-shot prompting, or prefilling structured outputs (e.g., JSON opening brace).
Example:
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "How do I sort a list in Python?"}
]
Multi-turn conversations / Few-shot prompting:Continue a previous conversation or provide example interactions to guide the model’s behavior. The model learns from the conversation history and applies patterns to new inputs.
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What are the benefits of exercise?"},
    {"role": "assistant", "content": "Exercise has many benefits including:\n1. Improved cardiovascular health\n2. "},  # Partial response to continue
    {"role": "user", "content": "Tell me more about cardiovascular health."}
]
Or provide few-shot examples:
messages = [
    {"role": "system", "content": "You are a helpful assistant that formats dates."},
    {"role": "user", "content": "2024-01-15"},
    {"role": "assistant", "content": "January 15, 2024"},
    {"role": "user", "content": "2024-12-25"},
    {"role": "assistant", "content": "December 25, 2024"},
    {"role": "user", "content": "2024-03-08"}  # Model follows the pattern
]
Prefill for structured output:Start the model with a specific format or structure (e.g., JSON opening brace) to guide it toward structured outputs.
messages = [
    {"role": "system", "content": "Extract information and return as JSON."},
    {"role": "user", "content": "Extract the name and age from: John is 30 years old."},
    {"role": "assistant", "content": "{\n  \"name\": "}  # Prefill with JSON structure
]
For structured generation with schema validation, see Outlines.

Text Models

Control text generation behavior, balancing creativity, determinism, and quality:
  • temperature (0.0-2.0) - Randomness control. Lower (0.1-0.7) = deterministic; higher (0.8-1.5) = creative.
  • top_p (0.0-1.0) - Nucleus sampling. Lower (0.1-0.5) = focused; higher (0.7-0.95) = diverse.
  • top_k - Limits to top-k tokens. Lower (10-50) = high-probability; higher (50-100) = diverse.
  • min_p (0.0-1.0) - Filters tokens below min_p * max_probability. Maintains quality with diversity.
  • repetition_penalty (1.0+) - Reduces repetition. 1.0 = no penalty; >1.0 = prevents repetition.
  • max_tokens / max_new_tokens - Maximum tokens to generate.
Parameter names and syntax vary by platform. See Transformers, vLLM, or llama.cpp for details. For LFM2.5 text models:
  • temperature=0.1
  • top_k=50
  • top_p=0.1
  • repetition_penalty=1.05
For LFM2 text models:
  • temperature=0.3
  • min_p=0.15
  • repetition_penalty=1.05

Vision Models

LFM2-VL models use a variable resolution encoder to control the quality/speed tradeoff by adjusting how images are tokenized.

Image Token Management

Control image tokenization with:
  • min_image_tokens - Minimum tokens for image encoding
  • max_image_tokens - Maximum tokens for image encoding
  • do_image_splitting - Split large images into 512×512 patches
How it works: Large images are split into non-overlapping patches, then a 2-layer MLP connector with pixel unshuffle reduces tokens (e.g., 256×384 → 96 tokens, 1000×3000 → 1,020 tokens). Adjust min_image_tokens and max_image_tokens to balance quality vs. speed. Example configurations:
# High quality (slower)
max_image_tokens = 256
min_image_tokens = 128

# Balanced
max_image_tokens = 128
min_image_tokens = 64

# Fast (lower quality)
max_image_tokens = 64
min_image_tokens = 32
For vision models:
  • temperature=0.1
  • min_p=0.15
  • repetition_penalty=1.05
  • min_image_tokens=64
  • max_image_tokens=256
  • do_image_splitting=True
Liquid Nanos (task-specific models like LFM2-Extract, LFM2-RAG, LFM2-Tool, etc.) may have special prompting requirements and different generation parameters. For the best usage guidelines, refer to the individual model cards on the Liquid Nanos page.