Tuesday, April 28, 2026

Temperature, Thinking Budget and Thinking Level

    

Temperature, thinking budget, and thinking level are key parameters for configuring LLM behavior. Temperature controls output randomness (0-2.0, default 1.0)). Thinking Budget determines the number of tokens dedicated to reasoning, while Thinking Level (e.g., MINIMAL, MEDIUM, HIGH) adjusts the depth of reasoning for complex tasks. 

Using thinking budget and thinking level simultaneously will return error 400. 

1. Temperature (Randomness Control)
  • Definition: A sampling parameter that controls the randomness of the model's output. It affects the probability distribution over possible next words.
  • Usage Examples:
    • Low Temperature (\(<0.5\)): Ideal for coding, data extraction, or factual tasks where accuracy is key (e.g., set to \(0\) for near-deterministic results).
    • High Temperature (\(>0.7\)): Suitable for creative writing, brainstorming, or marketing copy to generate varied responses.
  • Synonyms/Related Terms: Randomness, Sampling Temperature, Creativity Setting.
  • Important Note: For Gemini 3, a default of \(1.0\) is recommended for best performance, and lower settings may cause instability. 
2. Thinking Budget (Reasoning Amount)
  • Definition: Specific to older or specific API implementations (e.g., Gemini 2.5), this directly sets a numerical budget of tokens the model can use for intermediate reasoning before generating the final answer.
  • Usage Examples:
    • High Budget: Used for complex logic, multi-step math problems, or deep research.
    • Zero/Low Budget: Used to disable thinking to save costs and reduce latency.
  • Synonyms/Related Terms: Reasoning Tokens, Thought Limit. 
3. Thinking Level (Reasoning Depth) 
  • Definition: A higher-level configuration (introduced in Gemini 3) that controls the depth of reasoning.
  • Types/Usage Examples:
    • MINIMAL (e.g., Flash-Lite): Best for simple tasks requiring low latency.
    • MEDIUM (e.g., Flash/Pro): Balanced approach for moderate complexity.
    • HIGH (e.g., Pro): Used for complex, multi-step reasoning.
  • Synonyms/Related Terms: Reasoning Depth, Logic Level. 
Summary of Differences
Feature FocusUse Case
TemperatureCreativity & RandomnessCreative writing vs. factual answers
Thinking BudgetResource Control (Tokens)Managing cost and speed in reasoning
Thinking LevelReasoning Depth (Logic)Simple vs. complex problem-solving                            
    

credits for the below adaptation function : 
https://help.apiyi.com/en/gemini-api-thinking-budget-level-error-fix-en.html

Smart Parameter Adaptation Function

def get_thinking_config(model_name: str, complexity: str = "medium") -> dict:
    """
    Automatically selects the correct thinking mode parameter based on model version.

    Args:
        model_name: Gemini model name
        complexity: Thinking complexity ("minimal", "low", "medium", "high", "dynamic")

    Returns:
        Parameter dictionary suitable for extra_body
    """
    # Gemini 3.0 model list
    gemini_3_models = [
        "gemini-3.0-flash-preview",
        "gemini-3.0-pro-preview",
        "gemini-3-flash",
        "gemini-3-pro"
    ]

    # Gemini 2.5 model list
    gemini_2_5_models = [
        "gemini-2.5-flash-preview-04-17",
        "gemini-2.5-flash-lite",
        "gemini-2-flash",
        "gemini-2-flash-lite"
    ]

    # Determine model version
    if any(m in model_name for m in gemini_3_models):
        # Gemini 3.0 uses thinking_level
        level_map = {
            "minimal": "minimal",
            "low": "low",
            "medium": "medium",
            "high": "high",
            "dynamic": "high"  # Default to high
        }
        return {"thinking_level": level_map.get(complexity, "medium")}

    elif any(m in model_name for m in gemini_2_5_models):
        # Gemini 2.5 uses thinking_budget
        budget_map = {
            "minimal": 0,
            "low": 512,
            "medium": 2048,
            "high": 8192,
            "dynamic": -1
        }
        return {"thinking_budget": budget_map.get(complexity, -1)}

    else:
        # Unknown model, default to Gemini 3.0 parameters
        return {"thinking_level": "medium"}

# Usage example
import openai

client = openai.OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.apiyi.com/v1"
)

model = "gemini-3.0-flash-preview"  # Can be switched dynamically
thinking_config = get_thinking_config(model, complexity="high")

response = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "Your question here"}],
    extra_body=thinking_config
)



No comments:

Post a Comment

AI Agent to extract info from a static web page

  # STEP 1.1 INSTALL THE REQUIRED PACKAGES ! pip install langchain_community langchain_google_genai ! pip install -U duckduckgo-search #you ...