Logical Reasoning
Papers
Showing 1–10 of 747 papers
All datasetsLingOlyBIG-bench (Formal Fallacies Syllogisms Negation)BIG-bench (Penguins In A Table)BIG-bench (Reasoning About Colored Objects)BIG-bench (Temporal Sequences)BIG-bench (Logic Grid Puzzle)BIG-bench (StrategyQA)RuWorldTreeWinograd AutomaticBIG-bench (Logical Fallacy Detection)
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Claude Opus | Delta_NoContext | 28.8 | — | Unverified |
| 2 | GPT-4o | Delta_NoContext | 25.1 | — | Unverified |
| 3 | Gemini 1.5 Pro | Delta_NoContext | 23.4 | — | Unverified |
| 4 | GPT-4 | Delta_NoContext | 21.5 | — | Unverified |
| 5 | Command R+ | Delta_NoContext | 11.6 | — | Unverified |
| 6 | GPT-3.5 | Delta_NoContext | 11.2 | — | Unverified |
| 7 | Mixtral 8x7B | Delta_NoContext | 6.4 | — | Unverified |
| 8 | Llama 3 8B | Delta_NoContext | 4.9 | — | Unverified |
| 9 | Llama 3 70B | Delta_NoContext | 2.9 | — | Unverified |
| 10 | Gemma 7B | Delta_NoContext | 2.2 | — | Unverified |