Optimizing Generations | Vigthoria Guides

Model Selection

Choose the right model for your task to get optimal results:

Task	Best Model	Why
Complex analysis	vigthoria-reasoning-v2	Deep logical thinking, step-by-step reasoning
Code generation	vigthoria-code-v2	Optimized for syntax, patterns, best practices
Creative writing	vigthoria-creative-v2	Imaginative, varied, engaging output
Image understanding	vigthoria-vision-v2	Multimodal image + text processing
Quick responses	vigthoria-reasoning-v2	Fast, balanced for general tasks

Pro Tip

For mixed tasks (e.g., creative code), start with the primary need. Writing creative stories with code snippets? Use Creative. Building an app with creative copy? Use Code.

Parameter Tuning

Temperature

Controls randomness and creativity:

0.0-0.3: Deterministic, factual, consistent (code, data extraction)
0.4-0.7: Balanced creativity and focus (general tasks)
0.8-1.2: More varied, creative outputs (stories, brainstorming)
1.3+: Highly unpredictable (experimental only)

Max Tokens

Set appropriately to avoid truncation or waste:

Short answers: 100-300 tokens
Paragraphs: 300-600 tokens
Articles: 1000-2000 tokens
Long-form: 3000-4000 tokens

Stop Sequences

Use stop sequences to end generation at the right point:

{
  "stop": ["```", "\n\n---", "END"]
}

Prompt Optimization

Be Specific

❌ Bad	"Write about technology"
✓ Good	"Write a 300-word article about AI's impact on healthcare, focusing on diagnostic tools, for a non-technical audience"

Provide Context

❌ Bad	"Fix this bug"
✓ Good	"Fix this React useEffect bug. Expected: fetch data once on mount. Actual: infinite loop. Using React 18."

Use Examples (Few-Shot)

Convert these sentences to formal English:

"gonna grab lunch" → "I will be taking lunch now."
"u free tmrw?" → "Are you available tomorrow?"
"thx for the help" → ?

Structure Your Requests

Task: Summarize this article
Format: 3 bullet points, max 20 words each
Tone: Professional
Audience: Executives

Article:
[content here]

Speed Optimization

Reduce Token Count

Trim unnecessary context from prompts
Use concise instructions
Set appropriate max_tokens (don't over-allocate)
Use streaming for perceived faster responses

Use Caching

Cache identical requests to avoid redundant API calls:

const crypto = require('crypto');
const cache = new Map();

async function cachedGeneration(prompt, options) {
  const cacheKey = crypto
    .createHash('md5')
    .update(JSON.stringify({ prompt, options }))
    .digest('hex');
  
  if (cache.has(cacheKey)) {
    return cache.get(cacheKey);
  }
  
  const result = await vigthoria.chat.completions.create({
    messages: [{ role: 'user', content: prompt }],
    ...options
  });
  
  cache.set(cacheKey, result);
  return result;
}

Parallel Requests

Process multiple independent requests simultaneously:

const prompts = ['Task 1', 'Task 2', 'Task 3'];

const results = await Promise.all(
  prompts.map(prompt => 
    vigthoria.chat.completions.create({
      model: 'vigthoria-reasoning-v2',
      messages: [{ role: 'user', content: prompt }]
    })
  )
);

Quality Improvement

System Messages

Set clear expectations in the system message:

{
  "messages": [
    {
      "role": "system",
      "content": "You are an expert technical writer. Write clear, concise documentation. Use bullet points for lists. Include code examples where relevant. Avoid jargon."
    },
    {
      "role": "user",
      "content": "Document the authentication flow"
    }
  ]
}

Iterative Refinement

Get initial output
Ask for specific improvements
Request alternative approaches
Combine best elements

Self-Critique Pattern

messages: [
  { role: "user", content: "Write a product description for X" },
  { role: "assistant", content: "[initial output]" },
  { role: "user", content: "Now critique this description and provide an improved version that addresses the weaknesses" }
]

Quick Reference

✅ Choose the right model for the task
✅ Set temperature based on needed creativity
✅ Be specific and provide context
✅ Use examples for complex formats
✅ Set appropriate max_tokens
✅ Use streaming for UX
✅ Cache repeated requests
✅ Use parallel processing where possible
✅ Iterate for best results

⚡ Optimizing Generations