Skip to main content

General Questions

LFM (Liquid Foundation Models) are a family of efficient language models built on a new hybrid architecture designed for fast training and inference. They range from 350M to 8B parameters and support text, vision, and audio modalities.
All LFM models support a 32k token text context length for extended conversations and document processing.
LFM models are compatible with:
  • Transformers - For research and development
  • llama.cpp - For efficient CPU inference
  • vLLM - For high-throughput production serving
  • MLX - For Apple Silicon optimization
  • Ollama - For easy local deployment
  • LEAP - For edge and mobile deployment

Model Selection

  • General chat/instruction following: LFM2.5-1.2B-Instruct (recommended)
  • Vision tasks: LFM2.5-VL-1.6B
  • Audio/speech: LFM2.5-Audio-1.5B
  • Extraction tasks: LFM2-1.2B-Extract or LFM2-350M-Extract
  • Edge deployment: LFM2-350M or LFM2-700M for smallest footprint
  • Highest performance: LFM2-8B-A1B (MoE architecture)
LFM2.5 models are updated versions with improved training that deliver higher performance while maintaining the same architecture. We recommend using LFM2.5 variants when available.
Liquid Nanos are task-specific models fine-tuned for specialized use cases like:
  • Information extraction (LFM2-Extract)
  • Translation (LFM2-350M-ENJP-MT)
  • RAG question answering (LFM2-1.2B-RAG)
  • Meeting summarization (LFM2-2.6B-Transcript)

Deployment

Yes! Use the LEAP SDK to deploy models on iOS and Android devices. LEAP provides optimized inference for edge deployment with support for quantized models.
  • GGUF: For llama.cpp, LM Studio, Ollama (Q4_0, Q4_K_M, Q5_K_M, Q6_K, Q8_0, F16)
  • MLX: For Apple Silicon (4-bit, 5-bit, 6-bit, 8-bit, bf16)
  • ONNX: For cross-platform deployment with ONNX Runtime
  • Q4_0 / 4-bit: Smallest size, fastest inference, some quality loss
  • Q8_0 / 8-bit: Good balance of size and quality
  • F16 / bf16: Full precision, best quality, largest size
For most use cases, Q4_K_M or Q5_K_M provide good quality with significant size reduction.

Fine-tuning

Yes! Most LFM models support fine-tuning with TRL and Unsloth. Check the Model Library for trainability information.
  • LoRA/QLoRA: Memory-efficient fine-tuning
  • Full fine-tuning: For maximum customization
  • SFT (Supervised Fine-Tuning): For instruction tuning

Still Have Questions?