This repository contains configuration and scripts for running Ollama LLM server on Apple Silicon Macs in headless mode (tested on Mac Studio with M1 Ultra).
This configuration is optimized for running Mac Studio as a dedicated Ollama server, with:
- Headless operation (SSH access recommended)
- Minimal resource usage (GUI and unnecessary services disabled)
- Automatic startup and recovery
- Performance optimizations for Apple Silicon
- [v1.2.0] Added Docker autostart support for container applications (with Colima)
- [v1.1.0] Added GPU Memory Optimization - configure Metal to use more RAM for models
- [v1.0.0] Initial release with system optimizations and Ollama configuration
See the CHANGELOG for detailed version history.
- Automatic startup on boot
- Optimized for Apple Silicon
- System resource optimization through service disabling
- External network access
- Proper logging setup
- SSH-based remote management
- Docker autostart for container applications
- Mac with Apple Silicon
- macOS Sonoma or later
- Ollama installed
- Administrative privileges
- SSH enabled (System Settings → Sharing → Remote Login)
For optimal performance, we recommend:
- Primary access method: SSH
ssh username@your-mac-studio-ip- (Optional) Screen Sharing is kept available for emergency/maintenance access but not recommended for regular use to save resources.
- Clone this repository:
git clone https://github.com/anurmatov/mac-studio-server.git
cd mac-studio-server- (Optional) Configure installation:
# Default values shown
export OLLAMA_USER=$(whoami) # User to run Ollama as
export OLLAMA_BASE_DIR="/Users/$OLLAMA_USER/mac-studio-server"
# Optional features - only set these if you need them
export OLLAMA_GPU_PERCENT="80" # Optional: Enable GPU memory optimization (percentage of RAM to allocate)
export DOCKER_AUTOSTART="true" # Optional: Enable automatic Docker startup- Run the installation script:
chmod +x scripts/install.sh
./scripts/install.shThe Ollama service is configured with the following optimizations:
- External access enabled (0.0.0.0:11434)
- 8 parallel requests (adjustable)
- 30-minute model keep-alive
- Flash attention enabled
- Support for 4 simultaneously loaded models
- Model pruning disabled
To modify the Ollama service configuration:
- Edit the configuration file:
vim config/com.ollama.service.plist- Apply the changes:
# Stop the current service
sudo launchctl unload /Library/LaunchDaemons/com.ollama.service.plist
# Copy the updated configuration
sudo cp config/com.ollama.service.plist /Library/LaunchDaemons/
# Set proper permissions
sudo chown root:wheel /Library/LaunchDaemons/com.ollama.service.plist
sudo chmod 644 /Library/LaunchDaemons/com.ollama.service.plist
# Load the updated service
sudo launchctl load -w /Library/LaunchDaemons/com.ollama.service.plist- Check the logs for any issues:
tail -f logs/ollama.err logs/ollama.logThe installation process:
- Disables unnecessary system services
- Configures power management for server use
- Optimizes for background operation
- Maintains Screen Sharing capability for remote management
Log files are stored in the logs directory:
ollama.log- Ollama service logsollama.err- Ollama error logsinstall.log- Installation logsoptimization.log- System optimization logs
This configuration significantly reduces system resource usage:
- Memory usage reduction from 11GB to 3GB (tested on Mac Studio M1 Ultra)
- Disables GUI-related services
- Minimizes background processes
- Prevents sleep/hibernation
- Optimizes for headless operation
The dramatic reduction in memory usage (around 8GB) is achieved by:
- Disabling Spotlight indexing
- Turning off unnecessary system services
- Minimizing GUI-related processes
- Optimizing for headless operation
By default, Metal runtime allocates only about 75% of system RAM for GPU operations. This configuration includes optional GPU memory optimization that:
- Runs at system startup (when enabled)
- Allocates a configurable percentage of your total RAM to GPU operations
- Logs the changes for monitoring
The GPU memory setting is critical for LLM performance on Apple Silicon, as it determines how much of your unified memory can be used for model operations.
This allows:
- More efficient model loading
- Better performance for large models
- Increased number of concurrent model instances
- Fuller utilization of Apple Silicon's unified memory architecture
To enable and configure GPU memory optimization, set the environment variable before installation:
export OLLAMA_GPU_PERCENT="80" # Allocate 80% of RAM to GPU
./scripts/install.shOr to adjust after installation:
# Run with a custom percentage
OLLAMA_GPU_PERCENT=85 sudo ./scripts/set-gpu-memory.shIf you don't set OLLAMA_GPU_PERCENT, GPU memory optimization will be skipped.
For best performance:
- Use SSH for remote management
- Keep display disconnected when possible
- Avoid running GUI applications
- Consider disabling Screen Sharing if not needed for emergency access
- Adjust GPU memory percentage based on your available memory and workload
These optimizations leave more resources available for Ollama model operations, allowing for better performance when running large language models.
If you need to run Docker containers (e.g., for Open WebUI), you can configure Docker to start automatically using Colima. This feature is completely optional.
Colima is a container runtime for macOS that's designed to work well in headless environments. It provides Docker API compatibility without requiring Docker Desktop, making it ideal for server use.
- Homebrew must be installed (the script will use it to install Colima and Docker CLI)
- No special GUI requirements (works perfectly in headless environments)
To enable Docker autostart, run:
export DOCKER_AUTOSTART="true"
./scripts/install.shThis will:
- Install Colima and Docker CLI via Homebrew (if not already installed)
- Create a LaunchDaemon that starts Colima automatically at boot time
- Configure Colima with default settings
If Docker doesn't start automatically:
- Check the logs:
cat ~/mac-studio-server/logs/docker.log- Try starting Colima manually:
colima start- Check Colima status:
colima statusIf you don't need Docker containers, you can skip this feature entirely.
This project follows Semantic Versioning:
- MAJOR version for incompatible changes
- MINOR version for new features
- PATCH version for bug fixes
The current version is 1.2.0.
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License