LFM2-VL-3B

← Back to Vision Models LFM2-VL-3B is Liquid AI’s highest-capacity multimodal model, delivering enhanced visual reasoning and detailed image understanding. Ideal for complex vision tasks requiring deeper comprehension.

HF GGUF MLX

Specifications

Property	Value
Parameters	3B
Context Length	32K tokens
Architecture	LFM2-VL (Dense)

Advanced Reasoning

Complex visual logic and analysis

Document Understanding

Detailed document and chart parsing

Multi-Image

Compare and reason across images

Quick Start

Transformers
vLLM

Install:

pip install git+https://github.com/huggingface/transformers.git@3c2517727ce28a30f5044e01663ee204deb1cdbe pillow torch

Download & Run:

from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image

model_id = "LiquidAI/LFM2-VL-3B"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16"
)
processor = AutoProcessor.from_pretrained(model_id)

url = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
image = load_image(url)

conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "What is in this image?"},
        ],
    },
]

inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256)
response = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)

vLLM support for LFM Vision Models requires a specific version. Install from the custom source below.

Install:

VLLM_PRECOMPILED_WHEEL_COMMIT=72506c98349d6bcd32b4e33eec7b5513453c1502 \
  VLLM_USE_PRECOMPILED=1 \
  pip install git+https://github.com/vllm-project/vllm.git

pip install git+https://github.com/huggingface/transformers.git@3c2517727ce28a30f5044e01663ee204deb1cdbe pillow

Run:

from vllm import LLM, SamplingParams

IMAGE_URL = "http://images.cocodataset.org/val2017/000000039769.jpg"

llm = LLM(
    model="LiquidAI/LFM2-VL-3B",
    max_model_len=1024,
)

sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=256,
)

messages = [{
    "role": "user",
    "content": [
        {"type": "image_url", "image_url": {"url": IMAGE_URL}},
        {"type": "text", "text": "Describe what you see in this image."},
    ],
}]

outputs = llm.chat(messages, sampling_params)
print(outputs[0].outputs[0].text)

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

Specifications

Advanced Reasoning

Document Understanding

Multi-Image

Quick Start

Get Started

Models

Key Concepts

Inference

Fine-tuning

Frameworks

Help

​Specifications

Advanced Reasoning

Document Understanding

Multi-Image

​Quick Start

Specifications

Quick Start