OLMoE.swift: Private, Offline AI Experience on iOS and macOS

OLMoE.swift: Private, Offline AI Experience on iOS and macOS

Summary

OLMoE.swift provides a unique, privacy-focused AI experience by running large language models directly on your device. This Swift-based application ensures your queries and data remain private, operating entirely offline without an internet connection. It offers a robust solution for local AI inference on iOS and macOS, with integration options for Hugging Face.

Repository Info

Updated on October 20, 2025
View on GitHub

Tags

Click on any tag to explore related repositories

Introduction

OLMoE.swift is an open-source project from AllenAI that brings a fully private, offline, and on-device AI experience to your Apple devices. Built with Swift, this repository provides the necessary tools and code to run large language models (LLMs) locally, ensuring your data never leaves your device. It's designed for users who prioritize privacy and wish to interact with AI models without an internet connection.

Installation

To get started with OLMoE.swift, you first need to clone the repository:

git clone https://github.com/allenai/OLMoE.swift.git

Building the iOS app for a simulator:

  1. Open the project in Xcode.
  2. Ensure the target device is set to an appropriate device (e.g., iPhone 15 Pro or higher).
  3. Run the project.

For detailed instructions on running on macOS or a physical device, please refer to the official OLMoE.swift README.

Examples

OLMoE.swift also supports integration with Hugging Face for running models. You can install transformers and torch and then use the following Python code:

from transformers import OlmoeForCausalLM, AutoTokenizer
import torch

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Load different ckpts via passing e.g. `revision=step10000-tokens41B`
model = OlmoeForCausalLM.from_pretrained("allenai/OLMoE-1B-7B-0125").to(DEVICE)
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMoE-1B-7B-0125")
inputs = tokenizer("Bitcoin is", return_tensors="pt")
inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
out = model.generate(**inputs, max_length=64)
print(tokenizer.decode(out[0]))

This example demonstrates how to load an OLMoE model and tokenizer, then generate text using a simple prompt.

Why Use OLMoE.swift?

OLMoE.swift stands out for several compelling reasons:

  • Uncompromised Privacy: All queries and data are processed locally on your device, ensuring complete privacy. Nothing is stored or sent to external servers.
  • Offline Functionality: The application works perfectly without an internet connection, even in Flight Mode, making it ideal for use anywhere, anytime.
  • Open Source: Being open source, the project fosters transparency, allowing developers to inspect, modify, and contribute to its development.
  • On-Device AI: Experience the power of large language models directly on your iOS or macOS device, leveraging local hardware for efficient inference.

Links

Explore OLMoE.swift further using these official links: