Overview

Autohand separates the CLI interface from the AI model provider. This means you can switch between different models and providers without changing how you work. Whether you prefer cloud-hosted models or local inference, Autohand adapts to your needs.

You have two main options:

  • Cloud providers - Access powerful models via API with no hardware requirements
  • Local providers - Run models on your own machine for privacy and offline use

Cloud providers

Cloud providers offer access to the most capable models without requiring powerful hardware. Your prompts are sent to remote servers for processing.

OpenRouter

Recommended

Access 200+ models from Anthropic, OpenAI, Google, and more through a single API. Automatic fallbacks and cost optimization included.

  • Claude, GPT-4, Gemini, Llama, and more
  • Single API key for all providers
  • Automatic model fallback
  • Pay-per-use pricing

OpenAI

Direct access to GPT-4o, GPT-4 Turbo, and o1 reasoning models. The industry standard for AI development with excellent documentation.

  • GPT-4o, GPT-4 Turbo, o1 reasoning
  • 128K context window
  • Function calling support
  • Vision capabilities

AWS Bedrock

Enterprise

Run Autohand through Bedrock Converse or Bedrock's OpenAI-compatible endpoints. Designed for AWS credential chains, profiles, private endpoints, and model access controls.

  • Converse, Chat Completions, and Responses modes
  • AWS credentials, SSO profiles, IAM roles, or Bedrock API keys
  • Inference profiles and ARN model IDs
  • Regional and private VPC endpoints

DeepSeek

Direct access to DeepSeek V4 models through an OpenAI-compatible API. A strong choice for coding, reasoning, and cost-sensitive engineering workflows.

  • DeepSeek V4 Flash and V4 Pro
  • OpenAI-compatible chat completions
  • Fast coding and reasoning workflows
  • Simple API-key setup

xAI Grok

Access xAI's Grok models with real-time knowledge and advanced reasoning. Built for developers who want cutting-edge AI assistance.

  • Grok-2 and Grok-2 mini
  • Real-time knowledge from X data
  • 128K context window
  • Vision capabilities

Z.ai

Use GLM-family models through Z.ai's OpenAI-compatible API for high-throughput coding and agentic workflows.

  • GLM coding and reasoning models
  • OpenAI-compatible API surface
  • High-performance inference
  • API-key authentication

Cerebras

Fastest

Lightning-fast inference at over 2000 tokens per second. Powered by the world's largest AI chip for near-instant responses.

  • Llama 3.1 8B and 70B models
  • 2000+ tokens per second
  • 128K context window
  • Competitive pricing

NVIDIA

Connect to NVIDIA-hosted models through NVIDIA's OpenAI-compatible inference endpoint, including DeepSeek, Mistral, and NVIDIA model families.

  • NVIDIA-hosted model catalog
  • OpenAI-compatible chat completions
  • GPU-backed inference infrastructure
  • Good fit for enterprise GPU stacks

LLMGateway

Enterprise

Enterprise AI gateway with unified API access to multiple providers. Smart routing, caching, and observability for production applications.

  • Multi-provider access
  • Automatic failover
  • Response caching
  • Cost optimization

Azure AI Foundry

Enterprise

Enterprise-grade AI models with Azure security and compliance. Access OpenAI, Llama, and Mistral through Azure infrastructure.

  • GPT-4o, Llama, Mistral models
  • API Key, Entra ID, and Managed Identity auth
  • Regional data residency
  • SOC 2, HIPAA, GDPR compliance

GCP Vertex AI

Enterprise

Google's Gemini models and open-source alternatives through Vertex AI. Enterprise infrastructure with Google Cloud security.

  • Gemini 1.5 Pro, 2.0 Flash
  • 2 million token context
  • Model Garden open-source models
  • Google Cloud IAM

Default choice: OpenRouter is the default provider for Autohand CLI. It gives you immediate access to the best models with a single API key.

Local providers

Local providers run AI models directly on your machine. Your code never leaves your computer, and you can work without an internet connection.

Choosing a provider

Select a provider based on your priorities:

PriorityRecommended Provider
Best model qualityOpenRouter or OpenAI
Fastest inferenceCerebras (2000+ tokens/sec)
AWS enterprise deploymentAWS Bedrock
Enterprise complianceAWS Bedrock, Azure AI Foundry, or GCP Vertex AI
Long context (2M tokens)GCP Vertex AI (Gemini)
Multi-provider managementLLMGateway
Cost-sensitive coding modelsDeepSeek or Z.ai
NVIDIA-hosted inferenceNVIDIA
Real-time knowledgexAI Grok
Complete privacyOllama or MLX
Offline workOllama, MLX, or llama.cpp
No API costsAny local provider
Apple Silicon performanceMLX

Switching providers

Switch between providers at any time using the CLI or configuration file.

Using CLI flags

# Use OpenRouter (default)
autohand --provider openrouter --model nvidia/nemotron-3-super-120b-a12b:free

# Use AWS Bedrock
AWS_PROFILE=enterprise-prod autohand --provider bedrock --model us.anthropic.claude-3-5-sonnet-20241022-v2:0

# Use DeepSeek
autohand --provider deepseek --model deepseek-v4-flash

# Use Ollama
autohand --provider ollama --model codellama:13b

# Use MLX
autohand --provider mlx --model codellama-7b-instruct

# Use llama.cpp
autohand --provider llamacpp --model codellama-7b

Using configuration

Set your default provider in ~/.autohand/config.json:

{
  "provider": "openrouter",
  "model": "nvidia/nemotron-3-super-120b-a12b:free"
}

During a session

Switch models without restarting:

# Switch to a different model
/model ollama/codellama:13b

# Switch to a different provider
/model openrouter/nvidia/nemotron-3-super-120b-a12b:free

# Configure enterprise provider details interactively
/model bedrock

Provider comparison

Cloud providers

ProviderBest forAuthenticationAPI style
OpenRouterBroad model accessAPI keyOpenAI-compatible
OpenAIDirect OpenAI modelsAPI key or ChatGPT authOpenAI native
AWS BedrockAWS enterprise workloadsAWS credentials/profile/IAM or Bedrock API keyConverse and OpenAI-compatible
DeepSeekCoding and reasoning valueAPI keyOpenAI-compatible
xAI GrokGrok modelsAPI keyOpenAI-compatible
Z.aiGLM-family inferenceAPI keyOpenAI-compatible
CerebrasLow-latency inferenceAPI keyOpenAI-compatible
NVIDIAGPU-hosted model catalogAPI keyOpenAI-compatible
Azure AI FoundryAzure enterprise deploymentsAPI key, Entra ID, or Managed IdentityAzure OpenAI
GCP Vertex AIGoogle Cloud and GeminiGoogle Cloud access tokenVertex AI OpenAPI endpoint

Local providers

FeatureOllamaMLXllama.cpp
Setup difficultyEasyMediumAdvanced
Model qualityGoodGoodGood
Offline capableYesYesYes
CostFreeFreeFree
macOSYesApple SiliconYes
LinuxYesNoYes
WindowsYesNoYes