| Probabilistic Temporal Prediction of Continuous Disease Trajectories and Treatment Effects Using Neural SDEs | Jun 18, 2024 | Causal Inferencecounterfactual | —Unverified | 0 |
| JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models | Jun 17, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs | Jun 17, 2024 | counterfactualHallucination | CodeCode Available | 0 |
| They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias | Jun 17, 2024 | Allcounterfactual | —Unverified | 0 |
| Teleporter Theory: A General and Simple Approach for Modeling Cross-World Counterfactual Causality | Jun 17, 2024 | counterfactual | —Unverified | 0 |
| The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance | Jun 17, 2024 | counterfactualMMLU | —Unverified | 0 |
| Towards Lifelong Dialogue Agents via Timeline-based Memory Management | Jun 16, 2024 | counterfactualManagement | —Unverified | 0 |
| IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution | Jun 16, 2024 | counterfactual | CodeCode Available | 1 |
| Validation of human benchmark models for Automated Driving System approval: How competent and careful are they really? | Jun 13, 2024 | counterfactual | —Unverified | 0 |
| VLind-Bench: Measuring Language Priors in Large Vision-Language Models | Jun 13, 2024 | counterfactual | CodeCode Available | 1 |