MLX is Apple’s machine learning framework optimized for Apple Silicon. It provides efficient inference on Mac devices with M-series chips (M1, M2, M3, M4) using Metal acceleration for GPU computing.
Use MLX for running models on Apple Silicon Macs with Metal GPU acceleration.
MLX leverages unified memory architecture on Apple Silicon, allowing seamless data sharing between CPU and GPU. The mlx-lm package provides a simple interface for loading and serving LLMs.
The mlx-lm package provides a simple interface for text generation with MLX models.See the Models page for all available MLX models, or browse MLX community models at mlx-community LFM2 models.
Copy
Ask AI
from mlx_lm import load, generate# Load model and tokenizermodel, tokenizer = load("mlx-community/LFM2-1.2B-8bit")# Generate textprompt = "What is machine learning?"# Apply chat templatemessages = [{"role": "user", "content": prompt}]prompt = tokenizer.apply_chat_template( messages, tokenizer=False, add_generation_prompt=True)response = generate(model, tokenizer, prompt=prompt, verbose=True)print(response)
from mlx_lm import load, stream_generatemodel, tokenizer = load("mlx-community/LFM2-1.2B-8bit")messages = [{"role": "user", "content": "Tell me a story about space exploration."}]prompt = tokenizer.apply_chat_template( messages, tokenizer=False, add_generation_prompt=True)for token in stream_generate(model, tokenizer, prompt=prompt, max_tokens=512): print(token, end="", flush=True)