| RaTEScore: A Metric for Radiology Report Generation | Jun 24, 2024 | DiagnosticEntity Embeddings | CodeCode Available | 4 |
| FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models | May 5, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 |
| Is CLIP ideal? No. Can we fix it? Yes! | Mar 10, 2025 | AttributeNegation | CodeCode Available | 2 |
| Controlling Language and Diffusion Models by Transporting Activations | Oct 30, 2024 | Negation | CodeCode Available | 2 |
| Editing Models with Task Arithmetic | Dec 8, 2022 | NegationTask Arithmetic | CodeCode Available | 2 |
| Discovering Latent Knowledge in Language Models Without Supervision | Dec 7, 2022 | Imitation LearningLanguage Modelling | CodeCode Available | 2 |
| GreaseLM: Graph REASoning Enhanced Language Models for Question Answering | Jan 21, 2022 | Knowledge GraphsMedical Question Answering | CodeCode Available | 2 |
| Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems | Nov 21, 2024 | 3D visual groundingNegation | CodeCode Available | 1 |
| NegMerge: Consensual Weight Negation for Strong Machine Unlearning | Oct 8, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| Through the Looking Glass, and what Horn Clause Programs Found There | Jul 29, 2024 | counterfactualNegation | CodeCode Available | 1 |