| RaTEScore: A Metric for Radiology Report Generation | Jun 24, 2024 | DiagnosticEntity Embeddings | CodeCode Available | 4 |
| FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models | May 5, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 |
| Is CLIP ideal? No. Can we fix it? Yes! | Mar 10, 2025 | AttributeNegation | CodeCode Available | 2 |
| Controlling Language and Diffusion Models by Transporting Activations | Oct 30, 2024 | Negation | CodeCode Available | 2 |
| Editing Models with Task Arithmetic | Dec 8, 2022 | NegationTask Arithmetic | CodeCode Available | 2 |
| Discovering Latent Knowledge in Language Models Without Supervision | Dec 7, 2022 | Imitation LearningLanguage Modelling | CodeCode Available | 2 |
| GreaseLM: Graph REASoning Enhanced Language Models for Question Answering | Jan 21, 2022 | Knowledge GraphsMedical Question Answering | CodeCode Available | 2 |
| Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems | Nov 21, 2024 | 3D visual groundingNegation | CodeCode Available | 1 |
| NegMerge: Consensual Weight Negation for Strong Machine Unlearning | Oct 8, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| Through the Looking Glass, and what Horn Clause Programs Found There | Jul 29, 2024 | counterfactualNegation | CodeCode Available | 1 |
| RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs | Jun 27, 2024 | DiversityNegation | CodeCode Available | 1 |
| Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA | May 30, 2024 | DiagnosticMedical Diagnosis | CodeCode Available | 1 |
| Towards Safer Large Language Models through Machine Unlearning | Feb 15, 2024 | Machine UnlearningNegation | CodeCode Available | 1 |
| Approximate Attributions for Off-the-Shelf Siamese Transformers | Feb 5, 2024 | NegationSentence | CodeCode Available | 1 |
| LongHealth: A Question Answering Benchmark with Long Clinical Documents | Jan 25, 2024 | Information RetrievalMultiple-choice | CodeCode Available | 1 |
| Expressive Sign Equivariant Networks for Spectral Geometric Learning | Dec 4, 2023 | Link PredictionNegation | CodeCode Available | 1 |
| Regularization by Texts for Latent Diffusion Inverse Solvers | Nov 27, 2023 | Negation | CodeCode Available | 1 |
| Instant3D: Instant Text-to-3D Generation | Nov 14, 2023 | 3D GenerationNegation | CodeCode Available | 1 |
| This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models | Oct 24, 2023 | DescriptiveNegation | CodeCode Available | 1 |
| Ask Again, Then Fail: Large Language Models' Vacillations in Judgment | Oct 3, 2023 | Negation | CodeCode Available | 1 |
| Resolving Legalese: A Multilingual Exploration of Negation Scope Resolution in Legal Documents | Sep 15, 2023 | NegationNegation Scope Resolution | CodeCode Available | 1 |
| Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining? | Aug 24, 2023 | AttributeNegation | CodeCode Available | 1 |
| CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No | Aug 23, 2023 | NegationOut-of-Distribution Detection | CodeCode Available | 1 |
| Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT | Aug 11, 2023 | Negation | CodeCode Available | 1 |
| This is not correct! Negation-aware Evaluation of Language Generation Systems | Jul 26, 2023 | Embeddings EvaluationNegation | CodeCode Available | 1 |