Rigorous Testing Results

Hundreds of Hours. 500+ Prompts. Zero Compromises.

We spent hundreds of hours testing 500+ prompts to ensure OffGrid AI delivers when it matters most

ℹ Strict Evaluator Prompt

▶

You are a strict evaluator of AI responses. Grade the following response based on: ACCURACY (1-10): - Technical correctness - Safety considerations - Practical applicability - Factual accuracy CLARITY (1-10): - Understandable instructions - Logical flow - Appropriate detail level - Actionability Only responses scoring 9.0+ in BOTH categories receive an A grade. Anything less is unacceptable for field use. Be harsh. Lives may depend on this information. See original Google Doc prompt here: View Strict Evaluator Prompt →

Testing Methodology

Accuracy Criteria

Technical correctness verified
Safety warnings included
Practical in field conditions
No dangerous misinformation

Clarity Criteria

Instructions are actionable
Logical step-by-step flow
Appropriate detail level
No ambiguous guidance

A-Grade Standards

Accuracy score ≥ 9.0/10
Clarity score ≥ 9.0/10
Tested on 3 models
Must pass on at least one

Example Test Results

Meat Preservation in Hot Climate

Gemma3-4b

Prompt: How to preserve meat without refrigeration in hot climate

✓ PASS: Comprehensive methods with safety warnings and time limits

Accuracy

9.4

Clarity

9.2

Grade: A

Meat Preservation in Hot Climate

Gemma3-12b

Prompt: How to preserve meat without refrigeration in hot climate

✓ PASS: Multiple preservation methods with proper safety guidelines

Accuracy

9.6

Clarity

9.5

Grade: A

Meat Preservation in Hot Climate

Gemma3-27b

Prompt: How to preserve meat without refrigeration in hot climate

✓ PASS: Exceptionally detailed with multiple traditional and modern methods

Accuracy

Clarity

9.7

Grade: A+

Performance Analytics

Pass Rate by Model

80% Gemma3-4b

92% Gemma3-12b

98% Gemma3-27b

Average Scores by Category

9.2 Survival

8.8 Navigation

9.4 Medical

9.0 Technical

Complete Transparency

We believe in radical transparency. Every test, every score, every failure - it's all here.

100s of pages, 100 tabs per document - the complete testing archive

View Full Testing Logs →

After hundreds of hours of testing, we can confidently say Gemma3 models are the best offline AI available. But they're tools - use them wisely, verify critical information when possible, and apply common sense.