Skip to content

Configuration

After installation, configure the plugin via Window > Preferences > Peon AI.

Peon AI Config

Provider Settings

Ollama

Run models locally e.g. windows.

SettingValue
ProviderOLLAMA
Modelllama3.2, codellama, qwen2.5-coder, mistral
Base URLhttp://localhost:11434

LM Studio

Run models locally — e.g. for MAC.

SettingValue
ProviderLM Studio / OpenAI HTTP 1.1
Modelqwen/qwen3.5-9b
Base URLhttp://localhost:1234/v1

google gemnini

OpenAI

SettingValue
ProviderOPEN_AI
Modelgpt-4o, gpt-4o-mini, o3-mini
Base URLhttps://api.openai.com/v1
API KeyYour OpenAI API key

OpenAI-compatible APIs

Any OpenAI-compatible server (LM Studio, OpenRouter, LocalAI, vLLM, …) works by changing the Base URL. Set the API Key to a dummy value like none if the server does not require one.

Google Gemini

SettingValue
ProviderGOOGLE_GEMINI
Modelgemini-2.0-flash, gemini-2.5-pro-preview-03-25
Base URL(leave empty)
API KeyYour Google AI Studio API key

google gemnini

Mistral AI

SettingValue
ProviderMISTRAL
Modelmistral-large-latest, mistral-small-latest, codestral-latest
Base URL(leave empty) — set https://api.mistral.ai in Voice preferences if using voice input
API KeyYour Mistral API key

mistral

GitHub (Marketplace)

Using the GitHub Models marketplace with pay-per-use billing.

SettingValue
ProviderGitHub (PAT)
ModelVaries; use model picker to list available models
Base URLhttps://models.inference.ai.azure.com (default) or custom endpoint
API KeyYour GitHub Personal Access Token (PAT) with models:read scope

Authentication:

  1. Generate a GitHub PAT with models:read scope
  2. Paste the token in the API Key field
  3. Click "Check Host and Port..." to verify connectivity

Available models: Use the Model picker to list all marketplace models you have access to. Models are filtered to those supporting tool calling only.


GitHub Copilot (Subscription)

Access Claude Sonnet, Claude Opus, Claude Haiku, GPT-5, and more as a GitHub Copilot subscriber.

SettingValue
ProviderGitHub Copilot (subscription)
ModelClaude Sonnet, Claude Opus, Claude Haiku, GPT-4o, GPT-5-mini, etc.
Base URL(leave empty for github.com; enter custom domain for GitHub Enterprise)
API Key(leave empty; OAuth token obtained via login button)

Authentication:

  1. Click Login with GitHub Copilot... button in the preferences
  2. Select GitHub deployment type (github.com or GitHub Enterprise)
  3. Complete the device flow authorization in your browser
  4. The plugin stores your OAuth token and auto-selects the GITHUB_COPILOT provider
  5. Use the Model picker to list available Copilot models (Sonnet, Opus, etc.)

Requirements:

  • Active GitHub Copilot subscription (Individual $10/month, Business, or Enterprise)
  • Copilot access enabled on your GitHub account

Model availability: Models listed depend on:

  • Your Copilot subscription tier
  • Regional availability
  • GitHub's current model catalog

Difference: Marketplace vs. Subscription

Marketplace (PAT)Copilot (OAuth)
ProductGitHub Models marketplaceGitHub Copilot subscription
Auth methodPersonal Access TokenOAuth Device Flow
BillingPay-per-useMonthly subscription
ModelsPublic marketplace catalogCopilot subscriber models (Claude, GPT-5)
Provider nameGITHUB_MODELSGITHUB_COPILOT
Use caseOne-off testing, marketplace explorationPrimary AI assistant with Copilot benefits

Advanced Settings

Token Window

The Token Window setting controls how many tokens of conversation history are sent to the AI model with each request. This is configured in the preference page as an integer field and stored in LlmConfig.tokenWindow.

SettingValue
Preference Keyllm.tokenWindow (PREF_TOKEN_WINDOW)
Default Value16000 tokens
TypeInteger
Editor ComponentIntegerFieldEditor in AiConfigPreferenceView

Configuration Flow

mermaid
graph LR
    A[User sets value in UI] --> B[Eclipse Preference Store<br/>ScopedPreferenceStore]
    B --> C[LlmConfig.tokenWindow field]
    C --> D1[Provider buildChatModel()<br/>limits context sent to LLM]
    C --> D2[TemplateContext.setTokenWindow()<br/>available as ${tokenWindow}]

Key Points:

  1. Value is stored in Eclipse ScopedPreferenceStore with instance scope
  2. Retrieved by LlmConfig.tokenWindow() field (default: 16000)
  3. Used by provider's buildChatModel() for context window limits
  4. Also set in TemplateContext as template variable ${tokenWindow}

Template Variable Integration

The token window is available as a template variable ${tokenWindow} in prompt templates, allowing dynamic reference to the configured limit:

java
// Example usage in system prompts or instruction templates
String systemPrompt = "Keep responses within ${tokenWindow} tokens.";

This enables agent personas and instruction templates to respect context boundaries defined by the user.


Important Distinction: Token Window vs Message Memory Buffer

ComponentPurposeLimitConfigurable
Token WindowContext sent to AI provider~16000 (configurable)Yes - via this setting
Message Memory BufferInternal conversation history storage500,000 messagesFixed in MessageWindowChatMemory

Explanation: The token window limits what context is sent to the LLM for each request, while the message memory buffer stores conversation history internally. A user can have a large internal history but only send the most recent tokens within their configured window to the AI model.


How It Works

  1. User sets token window value in preferences
  2. Value stored in ScopedPreferenceStore under key llm.tokenWindow
  3. When chat service starts, LlmConfig.newConfig() initializes with default 16000 (or loaded from prefs)
  4. Each request includes up to tokenWindow tokens of conversation history
  5. AI provider may adjust based on their own limits

Practical Guidance

Recommended Values by Use Case:

Value RangeUse CaseTrade-offs
1000-2000Simple queries, coding tasks, one-off requestsFast responses, lower costs, limited context awareness
2000-16000General conversation, most everyday use casesBalanced approach, good context retention without excessive latency
16000-8192Complex multi-turn conversations, long discussionsBest for referencing earlier messages, may increase latency and costs

Provider-Specific Context Limits

Different AI providers have different maximum context window capabilities:

ProviderTypical MaximumNotes
Ollama (local)2048-4096 (model dependent)Check specific model documentation; varies by quantization
OpenAI gpt-4oUp to 128,000 tokensGenerous limits; token window setting often irrelevant for these models
LM StudioModel-dependentDepends on underlying model configuration in LM Studio UI
Google GeminiUp to 1M+ tokens (some models)Check specific model documentation
Mistral AIVaries by modelTypically 32K-128K context windows

Recommendation: For local providers (Ollama, LM Studio), set token window close to their default. For cloud providers with large context windows (OpenAI, Gemini), you can set higher values if needed for long conversations.


Important Considerations

  1. Token counting includes both directions: The token window counts both user messages AND AI model responses in the conversation history.

  2. Provider auto-truncation: If you set a value larger than your provider supports, they may:

    • Automatically truncate older messages (silently)
    • Reject the request with an error
    • Return degraded responses
  3. Check provider documentation: Before setting very high values, verify your specific model's maximum context window in the provider's documentation.

  4. Testing recommended: After changing token window, test with a longer conversation to ensure the AI can reference earlier messages correctly.

  5. Token estimation vs actual tokens: The value you set is an estimate; actual token counts vary by language and model encoding.

Thinking Support

The Supports Thinking option enables models that support "thinking" or chain-of-thought reasoning, allowing them to show intermediate reasoning steps before generating final responses. This is useful for complex problem-solving tasks where you want to see the model's thought process.

Testing the Connection

  1. Open the Peon AI chat view
  2. Type a test message like "Hello"
  3. If configured correctly, you should receive a response

Troubleshooting

If connection tests fail, verify:

  • The server is running and accessible at the specified URL
  • Firewall settings allow connections to the port
  • For local servers (Ollama, LM Studio), ensure they're started before testing

Released under the MIT License.