Privacy-first AI. No data stored. No subscriptions.
Home User Guide How It Works What to Expect Model Benchmarks Prompt Testing Lab Model Size Matters Ready-Made Prompts Blog Live Demo Command Center Image Studio About GET THE TOOLKIT

Rigorous Testing Results

Hundreds of Hours. 500+ Prompts. Zero Compromises.

We spent hundreds of hours testing 500+ prompts to ensure OffGrid AI delivers when it matters most

Strict Evaluator Prompt
You are a strict evaluator of AI responses. Grade the following response based on: ACCURACY (1-10): - Technical correctness - Safety considerations - Practical applicability - Factual accuracy CLARITY (1-10): - Understandable instructions - Logical flow - Appropriate detail level - Actionability Only responses scoring 9.0+ in BOTH categories receive an A grade. Anything less is unacceptable for field use. Be harsh. Lives may depend on this information. See original Google Doc prompt here: View Strict Evaluator Prompt →

Testing Methodology

Accuracy Criteria

  • Technical correctness verified
  • Safety warnings included
  • Practical in field conditions
  • No dangerous misinformation

Clarity Criteria

  • Instructions are actionable
  • Logical step-by-step flow
  • Appropriate detail level
  • No ambiguous guidance

A-Grade Standards

  • Accuracy score ≥ 9.0/10
  • Clarity score ≥ 9.0/10
  • Tested on 3 models
  • Must pass on at least one

Example Test Results

Meat Preservation in Hot Climate
Gemma3-4b
Prompt: How to preserve meat without refrigeration in hot climate
✓ PASS — Comprehensive methods with safety warnings and time limits
Accuracy
9.4
Clarity
9.2
Grade: A
Meat Preservation in Hot Climate
Gemma3-12b
Prompt: How to preserve meat without refrigeration in hot climate
✓ PASS — Multiple preservation methods with proper safety guidelines
Accuracy
9.6
Clarity
9.5
Grade: A
Meat Preservation in Hot Climate
Gemma3-27b
Prompt: How to preserve meat without refrigeration in hot climate
✓ PASS — Exceptionally detailed with multiple traditional and modern methods
Accuracy
10
Clarity
9.7
Grade: A+

Performance Analytics

Pass Rate by Model

80% Gemma3-4b
92% Gemma3-12b
98% Gemma3-27b

Average Scores by Category

9.2 Survival
8.8 Navigation
9.4 Medical
9.0 Technical

Complete Transparency

We believe in radical transparency. Every test, every score, every failure - it's all here.

100s of pages, 100 tabs per document - the complete testing archive

View Full Testing Logs →

After hundreds of hours of testing, we can confidently say Gemma3 models are the best offline AI available. But they're tools - use them wisely, verify critical information when possible, and apply common sense.