Documentation
Trellis grids
A Trellis grid is a group of laptops and workstations that collectively implement a distributed LLM architecture.
When you deploy a model to a grid, Trellis distributes the model's layers across the available nodes, according to their advertised capacity (e.g. disk and memory).
Nodes of a Trellis grid may be geographically remote, connected over different networks (behind NATs or firewalls), and run on distinct hardware.
Sign up
Create an account to get access to the management dashboard. From there you can generate grid join tokens and (with a paid plan) create new grids and onboard your organization.
The command line interface
The trellis-cli program performs all numerical computation and exchanges activations with the Trellis grid it's joined to.
To connect a node to the grid, run the CLI in either serve or run mode.
run nodes load transformer layers and process LLM activations. These are best for machines with spare capacity and predictable uptime that can be dedicated to the LLM grid.
serve nodes load the embedding and sampling layers, as well as expose a local API endpoint (currently in Anthropic format). If the grid lacks capacity, serve nodes can also load LLM layers and process activations.
Please see available commands and options with trellis-cli --help.
You can download and install the trellis-cli binary with a single command:
curl -sSL https://trellis.unfoldml.com/cli | shJoin nodes to a grid
In order to provision a Trellis grid, you must connect one or more machines running the CLI to a grid you own or have been invited to. Each machine can run one or more serve or run nodes, depending on its hardware capacity and your needs.
When starting a node, the CLI automatically detects the machine's hardware resources (CPU, GPU and memory) and advertises them to the grid orchestrator so it can allocate workloads.
Users can also declare resource preferences for their nodes with CLI flags like --disk-capacity or --memory-capacity.
Example: To connect a node such that it will only download and cache 1 GB of model weights, you can use the following command:
trellis-cli run --token <joinToken> --bootstrap-url <bootstrapURL> --disk-capacity 1000000000The join token and bootstrap URL are available in your dashboard; each user can generate as many join tokens as they need to connect multiple machines to the grid.
Inference API
Launch a serve node to expose a local Anthropic-compatible API.
trellis-cli serve --token <joinToken> --bootstrap-url <bootstrapURL>Agent harnesses like Claude Code can connect directly to your new local endpoint:
ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_AUTH_TOKEN=local claudeLevel up to Managed or Self-Hosted
Check out our plans to create private grids with support for unlimited nodes.
With the Managed plan, the grid orchestrator is hosted on our own infrastructure. This is the best option to get started, as it requires minimal setup.
The Self-Hosted plan lets you run grids on your own infrastructure, i.e. the data plane is completely air-gapped from our infrastructure. This plan is popular with users that have strict data sovereignty requirements or want to run on a private network with no internet access.
Load a LLM on a grid
From the dashboard, you can deploy any compatible HuggingFace model to your grid.
Two model weight formats are supported: GGUF and SafeTensors. GGUF is ideal for CPU inference, as it minimizes disk usage and memory traffic thanks to floating point quantization and memory mapping. SafeTensors (the native HuggingFace export, typically BF16) runs directly without a conversion step — on the production path BF16 matmuls execute on the native ggml kernels.
Trellis brings a custom LLM runtime to your machines; we have growing support for model architectures and hardware configurations. The table below lists the architectures this version of Trellis can load, identified by the value of general.architecture in the model's GGUF file.
| Architecture |
|---|
| llama |
| mistral |
| qwen2 |
| qwen3 |
If you need a specific model architecture or hardware setup that we don't yet support, please reach out and we'll be happy to help.
Invite your team
With a private grid, you can invite your teammates to join. Each grid user can issue as many join tokens as they need to connect their physical machines to the grid.
Each user can launch nodes with their own join tokens, and a grid admin can track node activity in their dashboard.
Security and data privacy
At inference time, Trellis embeds the input on the requesting node, routes activations between grid nodes as needed, and computes the model's output on the initial node. The inputs and outputs are only ever present as cleartext on the requesting serve node.
Model activations never leave the grid, and are not shared with our infrastructure or any third party at any time.
All communication between the nodes of a grid is end-to-end encrypted thanks to SPIFFE/SPIRE, ensuring the privacy and integrity of your data with mTLS and automatically rotated certificates.