1.2B parameter model for structured information extraction from documents
← Back to Liquid NanosLFM2-1.2B-Extract is optimized for extracting structured data (JSON, XML, YAML) from unstructured documents. It handles complex nested schemas and multi-field extraction with high accuracy.
Use temperature=0 (greedy decoding) for best results. This model is intended for single-turn conversations only.
System Prompt Format:
Copy
Ask AI
Identify and extract information matching the following schema.Return data as a JSON object. Missing data should be omitted.Schema:- field_name: "Description of what to extract"- nested_object: - nested_field: "Description"
If no system prompt is provided, defaults to JSON. Specify format (JSON, XML, or YAML) and schema for better accuracy.
from transformers import AutoTokenizer, AutoModelForCausalLMmodel_id = "LiquidAI/LFM2-1.2B-Extract"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")system_prompt = """Identify and extract information matching the following schema.Return data as a JSON object. Missing data should be omitted.Schema:- name: "Person's full name"- email: "Email address"- company: "Company name""""user_input = "Contact John Smith at john.smith@acme.com. He works at Acme Corp."messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_input}]inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)outputs = model.generate(inputs, max_new_tokens=256, temperature=0, do_sample=False)response = tokenizer.decode(outputs[0], skip_special_tokens=True)print(response)