Cost Optimization | Vigthoria Guides

Understand Your Usage

Monitor your usage patterns to identify optimization opportunities:

Go to Dashboard → Usage Analytics
Review your token consumption by:
- Model (which models use the most tokens)
- Time (peak usage hours/days)
- Application (which apps/integrations)
Set usage alerts at 50%, 80%, and 100% thresholds

Quick Win

Export your usage data monthly to identify trends. Many users find 20-30% of their usage is redundant or could use lighter models.

Smart Model Selection

Not every task needs the most powerful model:

Task Type	Recommended Model	Cost Level
Simple Q&A, classification	vigthoria-reasoning-v2	Standard
Code generation	vigthoria-code-v2	Standard
Creative content	vigthoria-creative-v2	Standard
Image analysis	vigthoria-vision-v2	Premium

Potential Savings: 15-40%

By matching models to tasks instead of using one model for everything.

Reduce Token Usage

1. Optimize Prompts

Shorter, clearer prompts use fewer input tokens:

// Before: 45 tokens
"I would really appreciate it if you could please help me by writing 
a function that takes a number as input and returns whether that 
number is a prime number or not."

// After: 18 tokens
"Write a function isPrime(n) that returns true if n is prime."

2. Limit Response Length

Set appropriate max_tokens for each use case:

{
  "max_tokens": 200,  // For short answers
  "max_tokens": 500,  // For explanations
  "max_tokens": 1500  // For articles
}

3. Use Stop Sequences

End generation early when you have what you need:

{
  "stop": ["---", "END", "\n\n\n"]
}

Potential Savings: 20-50%

By reducing average tokens per request from 2000 to 800.

Implement Caching

Don't pay for the same generation twice:

import hashlib
import redis

cache = redis.Redis()

def cached_generation(prompt, model, **kwargs):
    # Create cache key from request
    cache_key = hashlib.sha256(
        f"{model}:{prompt}:{kwargs}".encode()
    ).hexdigest()
    
    # Check cache
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # Generate and cache
    response = vigthoria.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        **kwargs
    )
    
    # Cache for 24 hours
    cache.setex(cache_key, 86400, json.dumps(response))
    return response

Good candidates for caching:

FAQ responses
Template-based generations
Code snippets for common patterns
Static content generation

Potential Savings: 30-60%

Depending on how many repeated requests you have.

Batch Processing

For non-real-time tasks, batch requests during off-peak hours:

Queue non-urgent generations
Process during low-traffic periods
Combine related requests when possible

// Instead of 10 separate requests:
const items = ['item1', 'item2', 'item3', ...];

// Combine into one:
const response = await vigthoria.chat.completions.create({
  model: 'vigthoria-reasoning-v2',
  messages: [{
    role: 'user',
    content: `Analyze these 10 items and provide a summary for each:
    ${items.join('\n')}`
  }]
});

Set Up Alerts

Prevent surprise overages with proactive monitoring:

Go to Dashboard → Settings → Alerts
Configure alerts:
- 50%: Review and optimize if needed
- 80%: Implement stricter controls
- 90%: Pause non-essential usage
Set up webhook notifications for real-time alerts

Right-Size Your Plan

Review your plan quarterly:

Underusing? Downgrade to save money
Hitting limits? Upgrade for better rates
Heavy user? Contact sales for Enterprise pricing

Annual Savings

Annual plans typically offer 15-20% savings over monthly billing. If you're committed to Vigthoria, consider switching.

Cost Optimization Checklist

✅ Monitor usage analytics weekly
✅ Use appropriate models for each task
✅ Optimize prompts for clarity and brevity
✅ Set max_tokens appropriately
✅ Implement caching for repeated requests
✅ Batch non-urgent requests
✅ Set up usage alerts
✅ Review plan fit quarterly

💰 Cost Optimization