Logical Reasoning
Papers
Showing 1–10 of 747 papers
All datasetsLingOlyBIG-bench (Formal Fallacies Syllogisms Negation)BIG-bench (Penguins In A Table)BIG-bench (Reasoning About Colored Objects)BIG-bench (Temporal Sequences)BIG-bench (Logic Grid Puzzle)BIG-bench (StrategyQA)RuWorldTreeWinograd AutomaticBIG-bench (Logical Fallacy Detection)
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | PaLM 2 (few-shot, k=3, Direct) | Accuracy | 64.8 | — | Unverified |
| 2 | PaLM 2 (few-shot, k=3, CoT) | Accuracy | 57.2 | — | Unverified |
| 3 | OPT 66B (few-shot, k=3) | Accuracy | 54 | — | Unverified |
| 4 | PaLM 540B (few-shot, k=3) | Accuracy | 53.6 | — | Unverified |
| 5 | BLOOM 176B (few-shot, k=3) | Accuracy | 52.8 | — | Unverified |
| 6 | GPT-NeoX 20B (few-shot, k=3) | Accuracy | 52.8 | — | Unverified |
| 7 | Chinchilla-70B (few-shot, k=5) | Accuracy | 52.1 | — | Unverified |
| 8 | Bloomberg GPT 50B (few-shot, k=3) | Accuracy | 50.8 | — | Unverified |
| 9 | Gopher-280B (few-shot, k=5) | Accuracy | 50.7 | — | Unverified |