
gittech. site
for different kinds of informations and explorations.
Klarity β Open-source tool to analyze uncertainty/entropy in LLM output

Klarity
Generative AI Toolkit: Automated Explainability, Error Mitigation & Multi-Modal Support
πΌοΈ Update 12/02 support integration for VLM and visual attention monitoring
π³ Update 08/02 support for reasoning model to analyse CoTs entropy and improve RL datasets
π― Overview
Klarity is a toolkit for inspecting and debugging AI decision-making processes. By combining uncertainty analysis with reasoning insights and visual attention patterns, it helps you understand how models think and fix issues before they reach production.
- Dual Entropy Analysis: Measure model confidence through raw entropy and semantic similarity metrics
- Reasoning Analysis: Extract and evaluate step-by-step thinking patterns in model outputs
- Visual Attention Analysis: Visualize and analyze how vision-language models attend to images
- Semantic Clustering: Group similar predictions to reveal decision-making pathways
- Structured Insights: Get detailed JSON analysis of both uncertainty patterns and reasoning steps
- AI-powered Report: Leverage capable models to interpret generation patterns and provide human-readable insights
VLM Analysis Example β Gaining insights into where your model focuses and examining related token uncertainty.

Reasoning Analysis Example - Understanding model's step-by-step thinking process

Entropy Analysis Example - Analyzing token-level uncertainty patterns

π Quick Start Hugging Face
Install directly from GitHub:
pip install git+https://github.com/klara-research/klarity.git
πΌοΈ VLM Analysis Usage Example
For insights into where your model is focusing and to analyze related token uncertainty, you can use the VLMAnalyzer:
from transformers import AutoProcessor, LlavaOnevisionForConditionalGeneration, LogitsProcessorList
from PIL import Image
import torch
from klarity import UncertaintyEstimator
from klarity.core.analyzer import EnhancedVLMAnalyzer
import os
import json
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# Initialize VLM model
model_id = "llava-hf/llava-onevision-qwen2-0.5b-ov-hf"
model = LlavaOnevisionForConditionalGeneration.from_pretrained(
model_id,
output_attentions=True,
low_cpu_mem_usage=True
)
processor = AutoProcessor.from_pretrained(model_id)
# Create estimator with EnhancedVLMAnalyzer
estimator = UncertaintyEstimator(
top_k=100,
analyzer=EnhancedVLMAnalyzer(
min_token_prob=0.01,
insight_model="together:meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo",
insight_api_key="your_api_key",
vision_config=model.config.vision_config,
use_cls_token=True
),
)
uncertainty_processor = estimator.get_logits_processor()
# Set up generation for the example
image_path = "examples/images/plane.jpg"
question = "How many engines does the plane have?"
image = Image.open(image_path)
# Prepare input with image and text
conversation = [
{
"role": "user",
"content": [
{"type": "text", "text": question},
{"type": "image"}
]
}
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
# Process inputs
inputs = processor(
images=image,
text=prompt,
return_tensors='pt'
)
try:
# Generate with uncertainty and attention analysis
generation_output = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
do_sample=True,
logits_processor=LogitsProcessorList([uncertainty_processor]),
return_dict_in_generate=True,
output_scores=True,
output_attentions=True,
use_cache=True
)
# Analyze the generation - now includes both images and enhanced analysis
result = estimator.analyze_generation(
generation_output=generation_output,
model=model,
tokenizer=processor,
processor=uncertainty_processor,
prompt=question,
image=image # Image is required for enhanced analysis
)
# Get generated text
input_length = inputs.input_ids.shape[1]
generated_sequence = generation_output.sequences[0][input_length:]
generated_text = processor.decode(generated_sequence, skip_special_tokens=True)
print(f"\nQuestion: {question}")
print(f"Generated answer: {generated_text}")
# Token Analysis
print("\nDetailed Token Analysis:")
for idx, metrics in enumerate(result.token_metrics):
print(f"\nStep {idx}:")
print(f"Raw entropy: {metrics.raw_entropy:.4f}")
print(f"Semantic entropy: {metrics.semantic_entropy:.4f}")
print("Top 3 predictions:")
for i, pred in enumerate(metrics.token_predictions[:3], 1):
print(f" {i}. {pred.token} (prob: {pred.probability:.4f})")
# Show comprehensive insight
print("\nComprehensive Analysis:")
print(json.dumps(result.overall_insight, indent=2))
except Exception as e:
print(f"Error during generation: {str(e)}")
import traceback
traceback.print_exc()
π Reasoning LLM Usage Example
For insights and uncertainty analytics into model reasoning patterns, you can use the ReasoningAnalyzer:
from transformers import AutoModelForCausalLM, AutoTokenizer, LogitsProcessorList
from klarity import UncertaintyEstimator
from klarity.core.analyzer import ReasoningAnalyzer
import torch
# Initialize model with GPU support
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"
model = AutoModelForCausalLM.from_pretrained(model_name)
model = model.to(device) # Move model to GPU
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Create estimator with reasoning analyzer and togetherai hosted model
estimator = UncertaintyEstimator(
top_k=100,
analyzer=ReasoningAnalyzer(
min_token_prob=0.01,
insight_model="together:meta-llama/Llama-3.3-70B-Instruct-Turbo",
insight_api_key="your_api_key",
reasoning_start_token="<think>",
reasoning_end_token="</think>"
)
)
uncertainty_processor = estimator.get_logits_processor()
# Set up generation
prompt = "Your prompt <think>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Generate with uncertainty analysis
generation_output = model.generate(
**inputs,
max_new_tokens=200, # Increased for reasoning steps
temperature=0.6,
logits_processor=LogitsProcessorList([uncertainty_processor]),
return_dict_in_generate=True,
output_scores=True,
)
# Analyze the generation
result = estimator.analyze_generation(
generation_output,
tokenizer,
uncertainty_processor,
prompt # Include prompt for better reasoning analysis
)
# Get generated text
generated_text = tokenizer.decode(generation_output.sequences[0], skip_special_tokens=True)
print(f"\nPrompt: {prompt}")
print(f"Generated text: {generated_text}")
# Show reasoning analysis
print("\nReasoning Analysis:")
if result.overall_insight and "reasoning_analysis" in result.overall_insight:
analysis = result.overall_insight["reasoning_analysis"]
# Print each reasoning step
for idx, step in enumerate(analysis["steps"], 1):
print(f"\nStep {idx}:") # Use simple counter instead of accessing step_number
print(f"Content: {step['step_info']['content']}")
# Print step analysis
if 'analysis' in step and 'training_insights' in step['analysis']:
step_analysis = step['analysis']['training_insights']
print("\nQuality Metrics:")
for metric, score in step_analysis['step_quality'].items():
print(f" {metric}: {score}")
print("\nImprovement Targets:")
for target in step_analysis['improvement_targets']:
print(f" Aspect: {target['aspect']}")
print(f" Importance: {target['importance']}")
print(f" Issue: {target['current_issue']}")
print(f" Suggestion: {target['training_suggestion']}")
π Standard LLM Usage Example
To prevent most of common uncertainty scenarios and route to better models you can use our EntropyAnalyzer
from transformers import AutoModelForCausalLM, AutoTokenizer, LogitsProcessorList
from klarity import UncertaintyEstimator
from klarity.core.analyzer import EntropyAnalyzer
# Initialize your model
model_name = "Qwen/Qwen2.5-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Create estimator
estimator = UncertaintyEstimator(
top_k=100,
analyzer=EntropyAnalyzer(
min_token_prob=0.01,
insight_model=model,
insight_tokenizer=tokenizer
)
)
uncertainty_processor = estimator.get_logits_processor()
# Set up generation
prompt = "Your prompt"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate with uncertainty analysis
generation_output = model.generate(
**inputs,
max_new_tokens=20,
temperature=0.7,
top_p=0.9,
logits_processor=LogitsProcessorList([uncertainty_processor]),
return_dict_in_generate=True,
output_scores=True,
)
# Analyze the generation
result = estimator.analyze_generation(
generation_output,
tokenizer,
uncertainty_processor
)
generated_text = tokenizer.decode(generation_output.sequences[0], skip_special_tokens=True)
# Inspect results
print(f"\nPrompt: {prompt}")
print(f"Generated text: {generated_text}")
print("\nDetailed Token Analysis:")
for idx, metrics in enumerate(result.token_metrics):
print(f"\nStep {idx}:")
print(f"Raw entropy: {metrics.raw_entropy:.4f}")
print(f"Semantic entropy: {metrics.semantic_entropy:.4f}")
print("Top 3 predictions:")
for i, pred in enumerate(metrics.token_predictions[:3], 1):
print(f" {i}. {pred.token} (prob: {pred.probability:.4f})")
# Show comprehensive insight
print("\nComprehensive Analysis:")
print(result.overall_insight)
π Analysis Output
Klarity provides three types of analysis output:
VLM Analysis
Attention insights into where your model is focusing and related token uncertainty:
{
"scores": {
"overall_uncertainty": "<0-1>",
"visual_grounding": "<0-1>",
"confidence": "<0-1>"
},
"visual_analysis": {
"attention_quality": {
"score": "<0-1>",
"key_regions": ["<main area 1>", "<main area 2>"],
"missed_regions": ["<ignored area 1>", "<ignored area 2>"]
},
"token_attention_alignment": [
{
"word": "<token>",
"focused_spot": "<region>",
"relevance": "<0-1>",
"uncertainty": "<0-1>"
}
]
}},
"uncertainty_analysis": {
"problem_spots": [
{
"text": "<text part>",
"reason": "<why uncertain>",
"looked_at": "<image area>",
"connection": "<focus vs doubt link>"
}
],
"improvement_tips": [
{
"area": "<what to fix>",
"tip": "<how to fix>"
}
]
}
}
Reasoning Analysis
You'll get detailed insights into the model's reasoning process:
{
"reasoning_analysis": {
"steps": [
{
"step_number": 1,
"step_info": {
"content": "",
"type": ""
},
"analysis": {
"training_insights": {
"step_quality": {
"coherence": "<0-1>",
"relevance": "<0-1>",
"confidence": "<0-1>"
},
"improvement_targets": [
{
"aspect": "",
"importance": "<0-1>",
"current_issue": "",
"training_suggestion": ""
}
]
}
}
}
]
}
}
Entropy Analysis
For standard language models you will get a general uncertainty report:
{
"scores": {
"overall_uncertainty": "<0-1>",
"confidence_score": "<0-1>",
"hallucination_risk": "<0-1>"
},
"uncertainty_analysis": {
"high_uncertainty_parts": [
{
"text": "",
"why": ""
}
],
"main_issues": [
{
"issue": "",
"evidence": ""
}
],
"key_suggestions": [
{
"what": "",
"how": ""
}
]
}
}
π€ Supported Frameworks & Models
Model Frameworks
Currently supported:
β Hugging Face Transformers -> Full uncertainty analysis with raw and semantic entropy metrics & vision attention monitoring
β vLLM -> Full uncertainty analysis with raw and semantic entropy metrics with max 20 logprobs per token
β Together AI -> Uncertainty analysis with raw log prob. metrics
Planned support:
- β³ PyTorch
Analysis Model (for the insights) Frameworks
Currently supported:
- β Hugging Face Transformers
- β Together AI API
Planned support:
- β³ PyTorch
Tested Target Models
Model | Type | Status | Notes |
---|---|---|---|
Qwen2.5-0.5B | Base | β Tested | Full Support |
Qwen2.5-0.5B-Instruct | Instruct | β Tested | Full Support |
Qwen2.5-1.5B-Instruct | Instruct | β Tested | Full Support |
Qwen2.5-7B | Base | β Tested | Full Support |
Qwen2.5-7B-Instruct | Instruct | β Tested | Full Support |
Llama-3.2-3B-Instruct | Instruct | β Tested | Full Support |
Meta-Llama-3-8B | Base | β Tested | Together API Insights |
gemma-2-2b-it | Instruct | β Tested | Full Support |
mistralai/Mistral-7B-Instruct-v0.3 | Instruct | β Tested | Together API Insights |
Qwen/Qwen2.5-72B-Instruct-Turbo | Instruct | β Tested | Together API Insights |
DeepSeek-R1-Distill-Qwen-1.5B | Reasoning | β Tested | Together API Insights |
DeepSeek-R1-Distill-Qwen-7B | Reasoning | β Tested | Together API Insights |
Llava-onevision-qwen2-0.5b-ov-hf | Vision | β Tested | Together API Insights |
Analysis Models
Model | Type | Status | JSON Reliability | Notes |
---|---|---|---|---|
Qwen2.5-0.5B-Instruct | Instruct | β Tested | β‘ Low | Consistently output unstructured analysis instead of JSON. Best used with structured prompting and validation. |
Qwen2.5-7B-Instruct | Instruct | β Tested | β οΈ Moderate | Sometimes outputs well-formed JSON analysis. |
Llama-3.3-70B-Instruct-Turbo | Instruct | β Tested | β High | Reliably outputs well-formed JSON analysis. Recommended for production use. |
Llama-3.2-90B-Vision-Instruct-Turbo | Vision | β Tested | β High | Reliably outputs well-formed JSON analysis. Recommended for production use. |
JSON Output Reliability Guide:
- β High: Consistently outputs valid JSON (>80% of responses)
- β οΈ Moderate: Usually outputs valid JSON (50-80% of responses)
- β‘ Low: Inconsistent JSON output (<50% of responses)
π Advanced Features
Custom Analysis Configuration
You can customize the analysis parameters:
analyzer = EntropyAnalyzer(
min_token_prob=0.01, # Minimum probability threshold
semantic_similarity_threshold=0.8 # Threshold for semantic grouping
)
π€ Contributing
Contributions are welcome! Areas we're looking to improve:
- Additional framework support
- More tested models
- Enhanced semantic analysis
- Additional analysis metrics
- Documentation and examples
Please see our Contributing Guide for details.
π License
Apache 2.0 License. See LICENSE for more information.
π« Community & Support
- Website
- Discord Community for discussions & chatting
- GitHub Issues for bugs and features
