AI Model Providers - Autohand Docs

Overview

Autohand separates the CLI interface from the AI model provider. This means you can switch between different models and providers without changing how you work. Whether you prefer cloud-hosted models or local inference, Autohand adapts to your needs.

You have two main options:

Cloud providers - Access powerful models via API with no hardware requirements
Local providers - Run models on your own machine for privacy and offline use

Cloud providers

Cloud providers offer access to the most capable models without requiring powerful hardware. Your prompts are sent to remote servers for processing.

OpenRouter

Recommended

Access 200+ models from Anthropic, OpenAI, Google, and more through a single API. Automatic fallbacks and cost optimization included.

Claude, GPT-4, Gemini, Llama, and more
Single API key for all providers
Automatic model fallback
Pay-per-use pricing

OpenAI

Direct access to GPT-4o, GPT-4 Turbo, and o1 reasoning models. The industry standard for AI development with excellent documentation.

GPT-4o, GPT-4 Turbo, o1 reasoning
128K context window
Function calling support
Vision capabilities

AWS Bedrock

Enterprise

Run Autohand through Bedrock Converse or Bedrock's OpenAI-compatible endpoints. Designed for AWS credential chains, profiles, private endpoints, and model access controls.

Converse, Chat Completions, and Responses modes
AWS credentials, SSO profiles, IAM roles, or Bedrock API keys
Inference profiles and ARN model IDs
Regional and private VPC endpoints

DeepSeek

Direct access to DeepSeek V4 models through an OpenAI-compatible API. A strong choice for coding, reasoning, and cost-sensitive engineering workflows.

DeepSeek V4 Flash and V4 Pro
OpenAI-compatible chat completions
Fast coding and reasoning workflows
Simple API-key setup

xAI Grok

Access xAI's Grok models with real-time knowledge and advanced reasoning. Built for developers who want cutting-edge AI assistance.

Grok-2 and Grok-2 mini
Real-time knowledge from X data
128K context window
Vision capabilities

Z.ai

Use GLM-family models through Z.ai's OpenAI-compatible API for high-throughput coding and agentic workflows.

GLM coding and reasoning models
OpenAI-compatible API surface
High-performance inference
API-key authentication

Cerebras

Fastest

Lightning-fast inference at over 2000 tokens per second. Powered by the world's largest AI chip for near-instant responses.

Llama 3.1 8B and 70B models
2000+ tokens per second
128K context window
Competitive pricing

NVIDIA

Connect to NVIDIA-hosted models through NVIDIA's OpenAI-compatible inference endpoint, including DeepSeek, Mistral, and NVIDIA model families.

NVIDIA-hosted model catalog
OpenAI-compatible chat completions
GPU-backed inference infrastructure
Good fit for enterprise GPU stacks

LLMGateway

Enterprise

Enterprise AI gateway with unified API access to multiple providers. Smart routing, caching, and observability for production applications.

Multi-provider access
Automatic failover
Response caching
Cost optimization

Azure AI Foundry

Enterprise

Enterprise-grade AI models with Azure security and compliance. Access OpenAI, Llama, and Mistral through Azure infrastructure.

GPT-4o, Llama, Mistral models
API Key, Entra ID, and Managed Identity auth
Regional data residency
SOC 2, HIPAA, GDPR compliance

GCP Vertex AI

Enterprise

Google's Gemini models and open-source alternatives through Vertex AI. Enterprise infrastructure with Google Cloud security.

Gemini 1.5 Pro, 2.0 Flash
2 million token context
Model Garden open-source models
Google Cloud IAM

Default choice: OpenRouter is the default provider for Autohand CLI. It gives you immediate access to the best models with a single API key.

Local providers

Local providers run AI models directly on your machine. Your code never leaves your computer, and you can work without an internet connection.

Ollama

Easy setup

The simplest way to run local models. One-command installation with automatic GPU detection on macOS, Linux, and Windows.

Simple installation and model management
Cross-platform support
Automatic GPU acceleration
Large model library

MLX

Apple Silicon

Apple's machine learning framework optimized for M1, M2, M3, and M4 chips. Native Metal acceleration for exceptional performance.

Native Apple Silicon optimization
Unified memory efficiency
Metal GPU acceleration
macOS only

llama.cpp

Advanced

High-performance C++ inference engine. Maximum flexibility and control for power users who want to fine-tune every parameter.

Maximum performance potential
GGUF model format
Extensive configuration options
Cross-platform support

Choosing a provider

Select a provider based on your priorities:

Priority	Recommended Provider
Best model quality	OpenRouter or OpenAI
Fastest inference	Cerebras (2000+ tokens/sec)
AWS enterprise deployment	AWS Bedrock
Enterprise compliance	AWS Bedrock, Azure AI Foundry, or GCP Vertex AI
Long context (2M tokens)	GCP Vertex AI (Gemini)
Multi-provider management	LLMGateway
Cost-sensitive coding models	DeepSeek or Z.ai
NVIDIA-hosted inference	NVIDIA
Real-time knowledge	xAI Grok
Complete privacy	Ollama or MLX
Offline work	Ollama, MLX, or llama.cpp
No API costs	Any local provider
Apple Silicon performance	MLX

Switching providers

Switch between providers at any time using the CLI or configuration file.

Using CLI flags

# Use OpenRouter (default)
autohand --provider openrouter --model nvidia/nemotron-3-super-120b-a12b:free

# Use AWS Bedrock
AWS_PROFILE=enterprise-prod autohand --provider bedrock --model us.anthropic.claude-3-5-sonnet-20241022-v2:0

# Use DeepSeek
autohand --provider deepseek --model deepseek-v4-flash

# Use Ollama
autohand --provider ollama --model codellama:13b

# Use MLX
autohand --provider mlx --model codellama-7b-instruct

# Use llama.cpp
autohand --provider llamacpp --model codellama-7b

Using configuration

Set your default provider in ~/.autohand/config.json:

{
  "provider": "openrouter",
  "model": "nvidia/nemotron-3-super-120b-a12b:free"
}

During a session

Switch models without restarting:

# Switch to a different model
/model ollama/codellama:13b

# Switch to a different provider
/model openrouter/nvidia/nemotron-3-super-120b-a12b:free

# Configure enterprise provider details interactively
/model bedrock

Provider comparison

Cloud providers

Provider	Best for	Authentication	API style
OpenRouter	Broad model access	API key	OpenAI-compatible
OpenAI	Direct OpenAI models	API key or ChatGPT auth	OpenAI native
AWS Bedrock	AWS enterprise workloads	AWS credentials/profile/IAM or Bedrock API key	Converse and OpenAI-compatible
DeepSeek	Coding and reasoning value	API key	OpenAI-compatible
xAI Grok	Grok models	API key	OpenAI-compatible
Z.ai	GLM-family inference	API key	OpenAI-compatible
Cerebras	Low-latency inference	API key	OpenAI-compatible
NVIDIA	GPU-hosted model catalog	API key	OpenAI-compatible
Azure AI Foundry	Azure enterprise deployments	API key, Entra ID, or Managed Identity	Azure OpenAI
GCP Vertex AI	Google Cloud and Gemini	Google Cloud access token	Vertex AI OpenAPI endpoint

Local providers

Feature	Ollama	MLX	llama.cpp
Setup difficulty	Easy	Medium	Advanced
Model quality	Good	Good	Good
Offline capable	Yes	Yes	Yes
Cost	Free	Free	Free
macOS	Yes	Apple Silicon	Yes
Linux	Yes	No	Yes
Windows	Yes	No	Yes