| Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation | Jul 11, 2024 | Benchmarking | CodeCode Available | 1 |
| Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models | Jul 10, 2024 | Benchmarking | —Unverified | 0 |
| Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective | Jul 10, 2024 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior | Jul 10, 2024 | BenchmarkingDecoder | CodeCode Available | 2 |
| How Aligned are Different Alignment Metrics? | Jul 10, 2024 | Benchmarking | —Unverified | 0 |
| Training on the Test Task Confounds Evaluation and Emergence | Jul 10, 2024 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| Revisiting, Benchmarking and Understanding Unsupervised Graph Domain Adaptation | Jul 9, 2024 | BenchmarkingDomain Adaptation | CodeCode Available | 3 |
| SPINEX-Clustering: Similarity-based Predictions with Explainable Neighbors Exploration for Clustering Problems | Jul 9, 2024 | BenchmarkingClustering | —Unverified | 0 |
| Analyzing the Effectiveness of Listwise Reranking with Positional Invariance on Temporal Generalizability | Jul 9, 2024 | BenchmarkingDecoder | —Unverified | 0 |
| HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance | Jul 9, 2024 | BenchmarkingConditional Image Generation | CodeCode Available | 2 |