Common Sense Reasoning
Common sense reasoning tasks are intended to require the model to go beyond pattern recognition. Instead, the model should use "common sense" or world knowledge to make inferences.
Papers
Showing 1–10 of 939 papers
All datasetsWinoGrandearc_challengearc_easyReCoRDCommonsenseQAPARusRuCoSRWSDBIG-bench (Causal Judgment)BIG-bench (Date Understanding)BIG-bench (Disambiguation QA)BIG-bench (Sports Understanding)
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ST-MoE-32B 269B (fine-tuned) | Accuracy | 96.1 | — | Unverified |
| 2 | Unicorn 11B (fine-tuned) | Accuracy | 91.3 | — | Unverified |
| 3 | CompassMTL 567M with Tailor | Accuracy | 90.5 | — | Unverified |
| 4 | CompassMTL 567M | Accuracy | 89.6 | — | Unverified |
| 5 | UnifiedQA 11B (fine-tuned) | Accuracy | 89.4 | — | Unverified |
| 6 | Claude 3 Opus (5-shot) | Accuracy | 88.5 | — | Unverified |
| 7 | GPT-4 (5-shot) | Accuracy | 87.5 | — | Unverified |
| 8 | ExDeBERTa 567M | Accuracy | 87 | — | Unverified |
| 9 | LLaMA-2 13B + MixLoRA | Accuracy | 86.3 | — | Unverified |
| 10 | LLaMA3 8B+MoSLoRA | Accuracy | 85.8 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GPT-4 (few-shot, k=25) | Accuracy | 96.4 | — | Unverified |
| 2 | PaLM 2 (few-shot, CoT, SC) | Accuracy | 95.1 | — | Unverified |
| 3 | Shivaay (4B, few-shot, k=8) | Accuracy | 91.04 | — | Unverified |
| 4 | StupidLLM | Accuracy | 91.03 | — | Unverified |
| 5 | Claude 2 (few-shot, k=5) | Accuracy | 91 | — | Unverified |
| 6 | Claude 1.3 (few-shot, k=5) | Accuracy | 90 | — | Unverified |
| 7 | PaLM 540B (Self Improvement, Self Consistency) | Accuracy | 89.8 | — | Unverified |
| 8 | PaLM 540B (Self Consistency) | Accuracy | 88.7 | — | Unverified |
| 9 | PaLM 540B (Self Improvement, CoT Prompting) | Accuracy | 88.3 | — | Unverified |
| 10 | PaLM 540B (Self Improvement, Standard-Prompting) | Accuracy | 87.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ST-MoE-32B 269B (fine-tuned) | Accuracy | 95.2 | — | Unverified |
| 2 | LLaMA 3 8B+MoSLoRA (fine-tuned) | Accuracy | 90.5 | — | Unverified |
| 3 | PaLM 2-L (1-shot) | Accuracy | 89.7 | — | Unverified |
| 4 | PaLM 2-M (1-shot) | Accuracy | 88 | — | Unverified |
| 5 | LLaMA-3 8B + MixLoRA | Accuracy | 86.5 | — | Unverified |
| 6 | Camelidae-8×34B | Accuracy | 86.2 | — | Unverified |
| 7 | PaLM 2-S (1-shot) | Accuracy | 85.6 | — | Unverified |
| 8 | LLaMA 65B + CFG (0-shot) | Accuracy | 84.2 | — | Unverified |
| 9 | GAL 120B (0-shot) | Accuracy | 83.8 | — | Unverified |
| 10 | LLaMA-2 13B + MixLoRA | Accuracy | 83.5 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Turing NLR v5 XXL 5.4B (fine-tuned) | EM | 95.9 | — | Unverified |
| 2 | ST-MoE-32B 269B (fine-tuned) | EM | 95.1 | — | Unverified |
| 3 | T5-11B | F1 | 94.1 | — | Unverified |
| 4 | DeBERTa-1.5B | EM | 94.1 | — | Unverified |
| 5 | PaLM 540B (finetuned) | EM | 94 | — | Unverified |
| 6 | Vega v2 6B (fine-tuned) | EM | 93.9 | — | Unverified |
| 7 | PaLM 2-L (one-shot) | F1 | 93.8 | — | Unverified |
| 8 | T5-XXL 11B (fine-tuned) | EM | 93.4 | — | Unverified |
| 9 | PaLM 2-M (one-shot) | F1 | 92.4 | — | Unverified |
| 10 | PaLM 2-S (one-shot) | F1 | 92.1 | — | Unverified |