Integrations
AI Model Providers
Connect Autohand to your preferred AI models. Use cloud APIs for maximum capability or run models locally for privacy and offline access.
Overview
Autohand separates the CLI interface from the AI model provider. This means you can switch between different models and providers without changing how you work. Whether you prefer cloud-hosted models or local inference, Autohand adapts to your needs.
You have two main options:
- Cloud providers - Access powerful models via API with no hardware requirements
- Local providers - Run models on your own machine for privacy and offline use
Cloud providers
Cloud providers offer access to the most capable models without requiring powerful hardware. Your prompts are sent to remote servers for processing.
OpenRouter
RecommendedAccess 200+ models from Anthropic, OpenAI, Google, and more through a single API. Automatic fallbacks and cost optimization included.
- Claude, GPT-4, Gemini, Llama, and more
- Single API key for all providers
- Automatic model fallback
- Pay-per-use pricing
OpenAI
Direct access to GPT-4o, GPT-4 Turbo, and o1 reasoning models. The industry standard for AI development with excellent documentation.
- GPT-4o, GPT-4 Turbo, o1 reasoning
- 128K context window
- Function calling support
- Vision capabilities
AWS Bedrock
EnterpriseRun Autohand through Bedrock Converse or Bedrock's OpenAI-compatible endpoints. Designed for AWS credential chains, profiles, private endpoints, and model access controls.
- Converse, Chat Completions, and Responses modes
- AWS credentials, SSO profiles, IAM roles, or Bedrock API keys
- Inference profiles and ARN model IDs
- Regional and private VPC endpoints
DeepSeek
Direct access to DeepSeek V4 models through an OpenAI-compatible API. A strong choice for coding, reasoning, and cost-sensitive engineering workflows.
- DeepSeek V4 Flash and V4 Pro
- OpenAI-compatible chat completions
- Fast coding and reasoning workflows
- Simple API-key setup
xAI Grok
Access xAI's Grok models with real-time knowledge and advanced reasoning. Built for developers who want cutting-edge AI assistance.
- Grok-2 and Grok-2 mini
- Real-time knowledge from X data
- 128K context window
- Vision capabilities
Z.ai
Use GLM-family models through Z.ai's OpenAI-compatible API for high-throughput coding and agentic workflows.
- GLM coding and reasoning models
- OpenAI-compatible API surface
- High-performance inference
- API-key authentication
Cerebras
FastestLightning-fast inference at over 2000 tokens per second. Powered by the world's largest AI chip for near-instant responses.
- Llama 3.1 8B and 70B models
- 2000+ tokens per second
- 128K context window
- Competitive pricing
NVIDIA
Connect to NVIDIA-hosted models through NVIDIA's OpenAI-compatible inference endpoint, including DeepSeek, Mistral, and NVIDIA model families.
- NVIDIA-hosted model catalog
- OpenAI-compatible chat completions
- GPU-backed inference infrastructure
- Good fit for enterprise GPU stacks
LLMGateway
EnterpriseEnterprise AI gateway with unified API access to multiple providers. Smart routing, caching, and observability for production applications.
- Multi-provider access
- Automatic failover
- Response caching
- Cost optimization
Azure AI Foundry
EnterpriseEnterprise-grade AI models with Azure security and compliance. Access OpenAI, Llama, and Mistral through Azure infrastructure.
- GPT-4o, Llama, Mistral models
- API Key, Entra ID, and Managed Identity auth
- Regional data residency
- SOC 2, HIPAA, GDPR compliance
GCP Vertex AI
EnterpriseGoogle's Gemini models and open-source alternatives through Vertex AI. Enterprise infrastructure with Google Cloud security.
- Gemini 1.5 Pro, 2.0 Flash
- 2 million token context
- Model Garden open-source models
- Google Cloud IAM
Default choice: OpenRouter is the default provider for Autohand CLI. It gives you immediate access to the best models with a single API key.
Local providers
Local providers run AI models directly on your machine. Your code never leaves your computer, and you can work without an internet connection.
Ollama
Easy setupThe simplest way to run local models. One-command installation with automatic GPU detection on macOS, Linux, and Windows.
- Simple installation and model management
- Cross-platform support
- Automatic GPU acceleration
- Large model library
MLX
Apple SiliconApple's machine learning framework optimized for M1, M2, M3, and M4 chips. Native Metal acceleration for exceptional performance.
- Native Apple Silicon optimization
- Unified memory efficiency
- Metal GPU acceleration
- macOS only
llama.cpp
AdvancedHigh-performance C++ inference engine. Maximum flexibility and control for power users who want to fine-tune every parameter.
- Maximum performance potential
- GGUF model format
- Extensive configuration options
- Cross-platform support
Choosing a provider
Select a provider based on your priorities:
| Priority | Recommended Provider |
|---|---|
| Best model quality | OpenRouter or OpenAI |
| Fastest inference | Cerebras (2000+ tokens/sec) |
| AWS enterprise deployment | AWS Bedrock |
| Enterprise compliance | AWS Bedrock, Azure AI Foundry, or GCP Vertex AI |
| Long context (2M tokens) | GCP Vertex AI (Gemini) |
| Multi-provider management | LLMGateway |
| Cost-sensitive coding models | DeepSeek or Z.ai |
| NVIDIA-hosted inference | NVIDIA |
| Real-time knowledge | xAI Grok |
| Complete privacy | Ollama or MLX |
| Offline work | Ollama, MLX, or llama.cpp |
| No API costs | Any local provider |
| Apple Silicon performance | MLX |
Switching providers
Switch between providers at any time using the CLI or configuration file.
Using CLI flags
# Use OpenRouter (default)
autohand --provider openrouter --model nvidia/nemotron-3-super-120b-a12b:free
# Use AWS Bedrock
AWS_PROFILE=enterprise-prod autohand --provider bedrock --model us.anthropic.claude-3-5-sonnet-20241022-v2:0
# Use DeepSeek
autohand --provider deepseek --model deepseek-v4-flash
# Use Ollama
autohand --provider ollama --model codellama:13b
# Use MLX
autohand --provider mlx --model codellama-7b-instruct
# Use llama.cpp
autohand --provider llamacpp --model codellama-7b
Using configuration
Set your default provider in ~/.autohand/config.json:
{
"provider": "openrouter",
"model": "nvidia/nemotron-3-super-120b-a12b:free"
}
During a session
Switch models without restarting:
# Switch to a different model
/model ollama/codellama:13b
# Switch to a different provider
/model openrouter/nvidia/nemotron-3-super-120b-a12b:free
# Configure enterprise provider details interactively
/model bedrock
Provider comparison
Cloud providers
| Provider | Best for | Authentication | API style |
|---|---|---|---|
| OpenRouter | Broad model access | API key | OpenAI-compatible |
| OpenAI | Direct OpenAI models | API key or ChatGPT auth | OpenAI native |
| AWS Bedrock | AWS enterprise workloads | AWS credentials/profile/IAM or Bedrock API key | Converse and OpenAI-compatible |
| DeepSeek | Coding and reasoning value | API key | OpenAI-compatible |
| xAI Grok | Grok models | API key | OpenAI-compatible |
| Z.ai | GLM-family inference | API key | OpenAI-compatible |
| Cerebras | Low-latency inference | API key | OpenAI-compatible |
| NVIDIA | GPU-hosted model catalog | API key | OpenAI-compatible |
| Azure AI Foundry | Azure enterprise deployments | API key, Entra ID, or Managed Identity | Azure OpenAI |
| GCP Vertex AI | Google Cloud and Gemini | Google Cloud access token | Vertex AI OpenAPI endpoint |
Local providers
| Feature | Ollama | MLX | llama.cpp |
|---|---|---|---|
| Setup difficulty | Easy | Medium | Advanced |
| Model quality | Good | Good | Good |
| Offline capable | Yes | Yes | Yes |
| Cost | Free | Free | Free |
| macOS | Yes | Apple Silicon | Yes |
| Linux | Yes | No | Yes |
| Windows | Yes | No | Yes |