default search action
1st NeurIPS Datasets and Benchmarks 2021
- Joaquin Vanschoren, Sai-Kit Yeung:
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual. 2021 - Maria Korosteleva, Sung-Hee Lee:
Generating Datasets of 3D Garments with Sewing Patterns. - Zhiyuan Tang, Dong Wang, Yanguang Xu, Jianwei Sun, Xiaoning Lei, Shuaijiang Zhao, Cheng Wen, Xingjun Tan, Chuandong Xie, Shuran Zhou, Rui Yan, Chenjia Lv, Yang Han, Wei Zou, Xiangang Li:
KeSpeech: An Open Source Speech Dataset of Mandarin and Its Eight Subdialects. - Arjun D. Desai, Andrew M. Schmidt, Elka B. Rubin, Christopher M. Sandino, Marianne S. Black, Valentina Mazzoli, Kathryn J. Stevens, Robert Boutin, Christopher Ré, Garry Gold, Brian A. Hargreaves, Akshay Chaudhari:
SKM-TEA: A Dataset for Accelerated MRI Reconstruction with Dense Image Labels for Quantitative Clinical Evaluation. - Cédric Renggli, Luka Rimanic, Nora Hollenstein, Ce Zhang:
Evaluating Bayes Error Estimators on Real-World Datasets with FeeBee. - Lukasz Borchmann, Michal Pietruszka, Tomasz Stanislawek, Dawid Jurkiewicz, Michal Turski, Karolina Szyndler, Filip Gralinski:
DUE: End-to-End Document Understanding Benchmark. - Yuki M. Asano, Christian Rupprecht, Andrew Zisserman, Andrea Vedaldi:
PASS: An ImageNet replacement for self-supervised pretraining without humans. - Kenneth Peng, Arunesh Mathur, Arvind Narayanan:
Mitigating dataset harms requires stewardship: Lessons from 1000 papers. - Raesetje Sefala, Timnit Gebru, Nyalleng Moorosi, Luzango Mfupe, Richard Klein:
Constructing a Visual Dataset to Study the Effects of Spatial Apartheid in South Africa. - Inioluwa Deborah Raji, Emily Denton, Emily M. Bender, Alex Hanna, Amandalynne Paullada:
AI and the Everything in the Whole Wide World Benchmark. - Michael Laskin, Denis Yarats, Hao Liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel:
URLB: Unsupervised Reinforcement Learning Benchmark. - Dong Huk Park, Samaneh Azadi, Xihui Liu, Trevor Darrell, Anna Rohrbach:
Benchmark for Compositional Text-to-Image Synthesis. - Yanqiao Zhu, Yichen Xu, Qiang Liu, Shu Wu:
An Empirical Study of Graph Contrastive Learning. - Shusheng Xu, Yichen Liu, Xiaoyu Yi, Siyuan Zhou, Huizi Li, Yi Wu:
Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension. - Nafise Sadat Moosavi, Andreas Rücklé, Dan Roth, Iryna Gurevych:
SciGen: a Dataset for Reasoning-Aware Text Generation from Scientific Tables. - Malte Lücken, Daniel Burkhardt, Robrecht Cannoodt, Christopher Lance, Aditi Agrawal, Hananeh Aliee, Ann Chen, Louise Deconinck, Angela Detweiler, Alejandro Granados, Shelly Huynh, Laura Isacco, Yang Kim, Dominik Klein, Bony de Kumar, Sunil Kuppasani, Heiko Lickert, Aaron McGeever, Joaquin Melgarejo, Honey Mekonen, Maurizio Morri, Michaela Müller, Norma Neff, Sheryl Paul, Bastian Rieck, Kaylie Schneider, Scott Steelman, Michael Sterr, Daniel Treacy, Alexander Tong, Alexandra-Chloé Villani, Guilin Wang, Jia Yan, Ce Zhang, Angela Pisco, Smita Krishnaswamy, Fabian J. Theis, Jonathan M. Bloom:
A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. - Joy T. Wu, Nkechinyere Agu, Ismini Lourentzou, Arjun Sharma, Joseph Alexander Paguio, Jasper Seth Yao, Edward C. Dee, William Mitchell, Satyananda Kashyap, Andrea Giovannini, Leo Anthony Celi, Mehdi Moradi:
Chest ImaGenome Dataset for Clinical Reasoning. - Jieyu Zhang, Yue Yu, Yinghao Li, Yujing Wang, Yaming Yang, Mao Yang, Alexander Ratner:
WRENCH: A Comprehensive Benchmark for Weak Supervision. - Prithviraj Ammanabrolu, Mark O. Riedl:
Modeling Worlds in Text. - Wenhu Chen, Xinyi Wang, William Yang Wang:
A Dataset for Answering Time-Sensitive Questions. - Jesse Marshall, Ugne Klibaite, Amanda Gellis, Diego Aldarondo, Bence Olveczky, Timothy W. Dunn:
The PAIR-R24M Dataset for Multi-animal 3D Pose Estimation. - Daniel Galvez, Greg Diamos, Juan Torres, Keith Achorn, Juan Felipe Cerón, Anjali Gopi, David Kanter, Max Lam, Mark Mazumder, Vijay Janapa Reddi:
The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage. - Sheshera Mysore, Tim O'Gorman, Andrew McCallum, Hamed Zamani:
CSFCube - A Test Collection of Computer Science Research Articles for Faceted Query by Example. - Alex J. Chan, Ioana Bica, Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar:
The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation. - Charan Reddy, Deepak Sharma, Soroush Mehri, Adriana Romero-Soriano, Samira Shabanian, Sina Honari:
Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness Metrics. - Chak Hin Bryan Liu, Ângelo Cardoso, Paul Couturier, Emma J. McCoy:
Datasets for Online Controlled Experiments. - Zhiqiu Lin, Jia Shi, Deepak Pathak, Deva Ramanan:
The CLEAR Benchmark: Continual LEArning on Real-World Imagery. - Zhengxuan Wu, Elisa Kreiss, Desmond C. Ong, Christopher Potts:
ReaSCAN: Compositional Reasoning in Language Grounding. - Vincent Dumoulin, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, Hugo Larochelle:
A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches. - Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, Gavriel State:
Isaac Gym: High Performance GPU Based Physics Simulation For Robot Learning. - Chenyu Yi, Siyuan Yang, Haoliang Li, Yap-Peng Tan, Alex C. Kot:
Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions. - Alicia Curth, David Svensson, James Weatherall, Mihaela van der Schaar:
Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation. - Christian Dallago, Jody Mou, Kadina E. Johnston, Bruce J. Wittmann, Nicholas Bhattacharya, Samuel Goldman, Ali Madani, Kevin K. Yang:
FLIP: Benchmark tasks in fitness landscape inference for proteins. - Yuanqi Du, Shiyu Wang, Xiaojie Guo, Hengning Cao, Shujie Hu, Junji Jiang, Aishwarya Varala, Abhinav Angirekula, Liang Zhao:
GraphGT: Machine Learning Datasets for Graph Generation and Transformation. - Mateusz Jurewicz, Leon Derczynski:
PROCAT: Product Catalogue Dataset for Implicit Clustering, Permutation Learning and Structure Prediction. - Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li:
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models. - Nikhil X. Bhattasali, Momchil S. Tomov, Samuel J. Gershman:
CCNLab: A Benchmarking Framework for Computational Cognitive Neuroscience. - Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita:
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation. - Gaoussou Youssouf Kebe, Padraig Higgins, Patrick Jenkins, Kasra Darvish, Rishabh Sachdeva, Ryan Barron, John Winder, Don Engel, Edward Raff, Francis Ferraro, Cynthia Matuszek:
A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning. - Loren Lugosch, Piyush Papreja, Mirco Ravanelli, Abdelwahab Heba, Titouan Parcollet:
Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers. - Santhosh Kumar Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alexander Clegg, John M. Turner, Eric Undersander, Wojciech Galuba, Andrew Westbury, Angel X. Chang, Manolis Savva, Yili Zhao, Dhruv Batra:
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI. - Tudor Mare, Georgian-Emilian Duta, Mariana-Iuliana Georgescu, Adrian Sandru, Bogdan Alexe, Marius Popescu, Radu Tudor Ionescu:
A realistic approach to generate masked faces applied on two novel masked face recognition data sets. - Mingjie Li, Wenjia Cai, Rui Liu, Yuetian Weng, Xiaoyun Zhao, Cong Wang, Xin Chen, Zhong Liu, Caineng Pan, Mengke Li, Yingfeng Zheng, Yizhi Liu, Flora D. Salim, Karin Verspoor, Xiaodan Liang, Xiaojun Chang:
FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark. - Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini, Hao Cheng, Ge Yang, Christopher Meek, Ahmed Hassan Awadallah, Jianfeng Gao:
Few-Shot Learning Evaluation in Natural Language Understanding. - Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency:
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning. - Haozhe Sun, Wei-Wei Tu, Isabelle Guyon:
OmniPrint: A Configurable Printed Character Synthesizer. - Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg:
A Toolbox for Construction and Analysis of Speech Datasets. - Dan Hendrycks, Mantas Mazeika, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt:
What Would Jiminy Cricket Do? Towards Agents That Behave Morally. - Sudeep Dasari, Jianren Wang, Joyce Hong, Shikhar Bahl, Yixin Lin, Austin S. Wang, Abitha Thankaraj, Karanbir Chahal, Berk Çalli, Saurabh Gupta, David Held, Lerrel Pinto, Deepak Pathak, Vikash Kumar, Abhinav Gupta:
RB2: Robotic Manipulation Benchmarking with a Twist. - Tal Schuster, Ashwin Kalyan, Alex Polozov, Adam Kalai:
Programming Puzzles. - Bernard Koch, Emily Denton, Alex Hanna, Jacob G. Foster:
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research. - Karl Otness, Arvi Gjoka, Joan Bruna, Daniele Panozzo, Benjamin Peherstorfer, Teseo Schneider, Denis Zorin:
An Extensible Benchmark Suite for Learning to Simulate Physical Systems. - Alon Talmor, Ori Yoran, Ronan Le Bras, Chandra Bhagavatula, Yoav Goldberg, Yejin Choi, Jonathan Berant:
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification. - Samriddhi Singla, Ayan Mukhopadhyay, Michael Wilbur, Tina Diao, Vinayak Gajjewar, Ahmed Eldawy, Mykel J. Kochenderfer, Ross D. Shachter, Abhishek Dubey:
WildfireDB: An Open-Source Dataset Connecting Wildfire Occurrence with Relevant Determinants. - Joseph Suarez, Yilun Du, Clare Zhu, Igor Mordatch, Phillip Isola:
The Neural MMO Platform for Massively Multiagent Research. - Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, James Hays:
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting. - Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik:
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development. - David Alfonso-Hermelo, Ahmad Rashid, Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh:
NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language Evaluation. - Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, Yanfei Zhong:
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation. - Gabriel Tseng, Ivan Zvonkov, Catherine Nakalembe, Hannah Kerner:
CropHarvest: A global dataset for crop-type classification. - Jack Bandy, Nicholas Vincent:
Addressing "Documentation Debt" in Machine Learning: A Retrospective Datasheet for BookCorpus. - Yasumasa Onoe, Michael J. Q. Zhang, Eunsol Choi, Greg Durrett:
CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge. - Mayur Hemani, Abhinav Patel, Tejas Shimpi, Anirudha Ramesh, Balaji Krishnamurthy:
What Ails One-Shot Image Segmentation: A Data Perspective. - Hugo Yèche, Rita Kuznetsova, Marc Zimmermann, Matthias Hüser, Xinrui Lyu, Martin Faltys, Gunnar Rätsch:
HiRID-ICU-Benchmark - A Comprehensive Machine Learning Benchmark on High-resolution ICU Data. - Lukas Kondmann, Aysim Toker, Marc Rußwurm, Andrés Camero, Devis Peressutti, Grega Milcinski, Pierre-Philippe Mathieu, Nicolas Longépé, Timothy Davis, Giovanni Marchisio, Laura Leal-Taixé, Xiaoxiang Zhu:
DENETHOR: The DynamicEarthNET dataset for Harmonized, inter-Operable, analysis-Ready, daily crop monitoring from space. - Bo Wu, Shoubin Yu, Zhenfang Chen, Josh Tenenbaum, Chuang Gan:
STAR: A Benchmark for Situated Reasoning in Real-World Videos. - Nikolaos-Antonios Ypsilantis, Noa Garcia, Guangxing Han, Sarah Ibrahimi, Nanne van Noord, Giorgos Tolias:
The Met Dataset: Instance-level Recognition for Artworks. - Stefan Daniel Dumitrescu, Petru Rebeja, Beáta Lorincz, Mihaela Gaman, Andrei-Marius Avram, Mihai Ilie, Andrei Pruteanu, Adriana Stan, Lorena Rosia, Cristina Iacobescu, Luciana Morogan, George Dima, Gabriel Marchidan, Traian Rebedea, Madalina Chitez, Dani Yogatama, Sebastian Ruder, Radu Tudor Ionescu, Razvan Pascanu, Viorica Patraucean:
LiRo: Benchmark and leaderboard for Romanian language tasks. - Scott Freitas, Yuxiao Dong, Joshua Neil, Duen Horng Chau:
A Large-Scale Database for Graph Representation Learning. - Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Peng Xu, Feijun Jiang, Yuxiang Hu, Chen Shi, Pascale Fung:
BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling. - Jianhua Han, Xiwen Liang, Hang Xu, Kai Chen, Lanqing Hong, Jiageng Mao, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Xiaodan Liang, Chunjing Xu:
SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving. - Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, Iryna Gurevych:
BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. - Cécile Logé, Emily Ross, David Yaw Amoah Dadey, Saahil Jain, Adriel Saporta, Andrew Y. Ng, Pranav Rajpurkar:
Q-Pain: A Question Answering Dataset to Measure Social Bias in Pain Management. - Ivan Kiskin, Marianne Sinka, Adam D. Cobb, Waqas Rafique, Lawrence Wang, Davide Zilli, Benjamin Gutteridge, Rinita Dam, Theodoros Marinos, Yunpeng Li, Dickson Msaky, Emmanuel Kaindoa, Gerard Killeen, Eva Herreros-Moya, Kathy Willis, Stephen J. Roberts:
HumBugDB: A Large-scale Acoustic Mosquito Dataset. - Nikita Pavlichenko, Ivan Stelmakh, Dmitry Ustalov:
CrowdSpeech and Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription. - Afshin Dehghan, Gilad Baruch, Zhuoyuan Chen, Yuri Feigin, Peter Fu, Thomas Gebauer, Daniel Kurz, Tal Dimry, Brandon Joffe, Arik Schwartz, Elad Shulman:
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data. - Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Jie Yu, Chunjing Xu, Hang Xu:
One Million Scenes for Autonomous Driving: ONCE Dataset. - Rami Aly, Zhijiang Guo, Michael Sejr Schlichtkrull, James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Oana Cocarascu, Arpit Mittal:
FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information. - Sarah Wiegreffe, Ana Marasovic:
Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing. - Qinkai Zheng, Xu Zou, Yuxiao Dong, Yukuo Cen, Da Yin, Jiarong Xu, Yang Yang, Jie Tang:
Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning. - Dan Hendrycks, Collin Burns, Anya Chen, Spencer Ball:
CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review. - John Pougué-Biyong, Valentina Semenova, Alexandre Matton, Rachel Han, Aerin Kim, Renaud Lambiotte, Doyne Farmer:
DEBAGREEMENT: A comment-reply dataset for (dis)agreement detection in online debates. - John Lambert, James Hays:
Trust, but Verify: Cross-Modality Fusion for HD Map Change Detection. - Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Michael Lingelbach, Aidan Curtis, Kevin T. Feigelis, Daniel Bear, Dan Gutfreund, David D. Cox, Antonio Torralba, James J. DiCarlo, Josh Tenenbaum, Josh H. McDermott, Dan Yamins:
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation. - Thomas Liao, Rohan Taori, Deborah Raji, Ludwig Schmidt:
Are We Learning Yet? A Meta Review of Evaluation Failures Across Machine Learning. - Avanika Narayan, Piero Molino, Karan Goel, Willie Neiswanger, Christopher Ré:
Personalized Benchmarking with the Ludwig Benchmarking Toolkit. - Camille Garcin, Alexis Joly, Pierre Bonnet, Antoine Affouard, Jean-Christophe Lombardo, Mathias Chouet, Maximilien Servajean, Titouan Lorieul, Joseph Salmon:
Pl@ntNet-300K: a plant image dataset with high label ambiguity and a long-tailed distribution. - Zihao Wang, Hang Yin, Yangqiu Song:
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs. - Andreas Aakerberg, Kamal Nasrollahi, Thomas B. Moeslund:
RELLISUR: A Real Low-Light Image Super-Resolution Dataset. - Jennifer J. Sun, Tomomi Karigo, Dipam Chakraborty, Sharada P. Mohanty, Benjamin Wild, Quan Sun, Chen Chen, David J. Anderson, Pietro Perona, Yisong Yue, Ann Kennedy:
The Multi-Agent Behavior Dataset: Mouse Dyadic Social Interactions. - Catherine Ordun, Alexandra N. Cha, Edward Raff, Byron Gaskin, Alex Hanson, Mason Rule, Sanjay Purushotham, James L. Gulley:
Intelligent Sight and Sound: A Chronic Cancer Facial Pain Dataset. - Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge J. Belongie, Alan L. Yuille, Philip H. S. Torr, Song Bai:
Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge. - Thibaut Horel, Lorenzo Masoero, Raj Agrawal, Daria Roithmayr, Trevor Campbell:
The CPD Data Set: Personnel, Use of Force, and Complaints in the Chicago Police Department. - Jaeju An, Jeongho Kim, Hanbeen Lee, Jinbeom Kim, Junhyung Kang, Minha Kim, Saebyeol Shin, Donghee Hong, Simon S. Woo:
VFP290K: A Large-Scale Benchmark Dataset for Vision-based Fallen Person Detection. - Yang Deng, Juncheng Dong, Simiao Ren, Omar Khatib, Mohammadreza Soltani, Vahid Tarokh, Willie Padilla, Jordan M. Malof:
Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic Materials. - Megan Stanley, John Bronskill, Krzysztof Maziarz, Hubert Misztela, Jessica Lanini, Marwin H. S. Segler, Nadine Schneider, Marc Brockschmidt:
FS-Mol: A Few-Shot Learning Dataset of Molecules. - Alex Tamkin, Vincent Liu, Rongfei Lu, Daniel Fein, Colin Schultz, Noah D. Goodman:
DABS: a Domain-Agnostic Benchmark for Self-Supervised Learning. - Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Jimenez Rezende, Michael Mozer, Yoshua Bengio, Chris Pal:
Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning. - Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, Dacheng Tao:
AP-10K: A Benchmark for Animal Pose Estimation in the Wild. - Michelle Bao, Angela Zhou, Samantha Zottola, Brian Brubach, Sarah Desmarais, Aaron Horowitz, Kristian Lum, Suresh Venkatasubramanian:
It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks. - Katharina Eggensperger, Philipp Müller, Neeratyoy Mallik, Matthias Feurer, René Sass, Aaron Klein, Noor H. Awad, Marius Lindauer, Frank Hutter:
HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO. - Christopher Yeh, Chenlin Meng, Sherrie Wang, Anne Driscoll, Erik Rozi, Patrick Liu, Jihyeon Janel Lee, Marshall Burke, David B. Lobell, Stefano Ermon:
SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning. - Mark Weber, Jun Xie, Maxwell D. Collins, Yukun Zhu, Paul Voigtlaender, Hartwig Adam, Bradley Green, Andreas Geiger, Bastian Leibe, Daniel Cremers, Aljosa Osep, Laura Leal-Taixé, Liang-Chieh Chen:
STEP: Segmenting and Tracking Every Pixel. - Felix Pei, Joel Ye, David M. Zoltowski, Anqi Wu, Raeed H. Chowdhury, Hansem Sohn, Joseph E. O'Doherty, Krishna V. Shenoy, Matthew T. Kaufman, Mark M. Churchland, Mehrdad Jazayeri, Lee E. Miller, Jonathan W. Pillow, Il Memming Park, Eva L. Dyer, Chethan Pandarinath:
Neural Latents Benchmark '21: Evaluating latent variable models of neural population activity. - Sungjoon Park, Jihyung Moon, Sungdong Kim, Won-Ik Cho, Jiyoon Han, Jangwon Park, Chisung Song, Junseong Kim, Youngsook Song, Tae Hwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, Inkwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seonghyun Kim, Eunjeong Lucy Park, Alice Oh, Jung-Woo Ha, Kyunghyun Cho:
KLUE: Korean Language Understanding Evaluation. - Tal Ridnik, Emanuel Ben Baruch, Asaf Noy, Lihi Zelnik:
ImageNet-21K Pretraining for the Masses. - Afshin Sadeghi, Hirra Malik, Diego Collarana, Jens Lehmann:
Relational Pattern Benchmarking on the Knowledge Graph Link Prediction Task. - Ramya Srinivasan, Emily Denton, Jordan Famularo, Negar Rostamzadeh, Fernando Diaz, Beth Coleman:
Artsheets for Art Datasets. - Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alexander J. Smola:
Benchmarking Multimodal AutoML for Tabular Data with Text Fields. - Ann Yuan, Daphne Ippolito, Vitaly Nikolaev, Chris Callison-Burch, Andy Coenen, Sebastian Gehrmann:
SynthBio: A Case Study in Faster Curation of Text Datasets. - Ard Kastrati, Martyna Plomecka, Damian Pascual, Lukas Wolf, Victor Gillioz, Roger Wattenhofer, Nicolas Langer:
EEGEyeNet: a Simultaneous Electroencephalography and Eye-tracking Dataset and Benchmark for Eye Movement Prediction. - Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, Matthias Hein:
RobustBench: a standardized adversarial robustness benchmark. - Anthony M. Colas, Ali Sadeghian, Yue Wang, Daisy Zhe Wang:
EventNarrative: A Large-scale Event-centric Dataset for Knowledge Graph-to-Text Generation. - Jane Wang, Michael King, Nicolas Porcel, Zeb Kurth-Nelson, Tina Zhu, Charles Deck, Peter Choy, Mary Cassin, Malcolm Reynolds, H. Francis Song, Gavin Buttimore, David P. Reichert, Neil C. Rabinowitz, Loic Matthey, Demis Hassabis, Alexander Lerchner, Matt M. Botvinick:
Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents. - Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir R. Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, Shyam Ramji, Ulrich Finkler, Susan Malaika, Frederick Reiss:
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks. - Cameron Voloshin, Hoang Minh Le, Nan Jiang, Yisong Yue:
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning. - Lianmin Zheng, Ruochen Liu, Junru Shao, Tianqi Chen, Joseph Gonzalez, Ion Stoica, Ameer Haj-Ali:
TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers. - Georgios Papoudakis, Filippos Christianos, Lukas Schäfer, Stefano V. Albrecht:
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks. - Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Wang, William Yang Wang, Tamara L. Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu:
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation. - Neil Band, Tim G. J. Rudner, Qixuan Feng, Angelos Filos, Zachary Nado, Mike Dusenberry, Ghassen Jerfel, Dustin Tran, Yarin Gal:
Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks. - Andrey Malinin, Neil Band, Yarin Gal, Mark J. F. Gales, Alexander Ganshin, German Chesnokov, Alexey Noskov, Andrey Ploskonosov, Liudmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, Vyas Raina, Denis Roginskiy, Mariya Shmatova, Panagiotis Tigas, Boris Yangel:
Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks. - Aman Hussain, Nithin Holla, Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova:
Towards a robust experimental framework and benchmark for lifelong language learning. - Solène Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia A. Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Estève, Benjamin Lecouteux, François Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier:
Task Agnostic and Task Specific Self-Supervised Learning from Speech with LeBenchmark. - Rika Antonova, Peiyang Shi, Hang Yin, Zehang Weng, Danica Kragic:
Dynamic Environments with Deformable Objects. - Martin Pawelczyk, Sascha Bielawski, Johannes van den Heuvel, Tobias Richter, Gjergji Kasneci:
CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms. - Debing Zhang, Yuanqiang Cai, Sibo Wang, Jiahong Li, Zhuang Li, Yejun Tang, Hong Zhou:
A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer. - Zhe Huang, Liang Wang, Giles Blaney, Christopher Slaughter, Devon McKeon, Ziyu Zhou, Robert J. K. Jacob, Michael C. Hughes:
The Tufts fNIRS Mental Workload Dataset & Benchmark for Brain-Computer Interfaces that Generalize. - Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt:
Measuring Mathematical Problem Solving With the MATH Dataset. - William G. La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabrício Olivetti de França, Marco Virgolin, Ying Jin, Michael Kommenda, Jason H. Moore:
Contemporary Symbolic Regression Methods and their Relative Performance. - Yang Liu, Sujay Khandagale, Colin White, Willie Neiswanger:
Synthetic Benchmarks for Scientific Research in Explainable Machine Learning. - Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shujie Liu:
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. - Yuhang Li, Mingzhu Shen, Jian Ma, Yan Ren, Mingxin Zhao, Qi Zhang, Ruihao Gong, Fengwei Yu, Junjie Yan:
MQBench: Towards Reproducible and Deployable Model Quantization Benchmark. - Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt:
Measuring Coding Challenge Competence With APPS. - Ivan A. Nikolov, Mark Philip Philipsen, Jinsong Liu, Jacob V. Dueholm, Anders Skaarup Johansen, Kamal Nasrollahi, Thomas B. Moeslund:
Seasons in Drift: A Long Term Thermal Imaging Dataset for Studying Concept Drift. - Raphael J. L. Townshend, Martin Vögele, Patricia Suriana, Alexander Derry, Alexander S. Powers, Yianni Laloudakis, Sidhika Balachandar, Bowen Jing, Brandon M. Anderson, Stephan Eismann, Risi Kondor, Russ B. Altman, Ron O. Dror:
ATOM3D: Tasks on Molecules in Three Dimensions. - Joel Frank, Lea Schönherr:
WaveFake: A Data Set to Facilitate Audio Deepfake Detection. - Bernd Bischl, Giuseppe Casalicchio, Matthias Feurer, Pieter Gijsbers, Frank Hutter, Michel Lang, Rafael Gomes Mantovani, Jan N. van Rijn, Joaquin Vanschoren:
OpenML Benchmarking Suites. - Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven Q. H. Truong, Du Nguyen Duong, Tan Bui, Pierre J. Chambon, Yuhao Zhang, Matthew P. Lungren, Andrew Y. Ng, Curtis P. Langlotz, Pranav Rajpurkar:
RadGraph: Extracting Clinical Entities and Relations from Radiology Reports. - Dennis Assenmacher, Marco Niemann, Kilian Müller, Moritz Seiler, Dennis M. Riehle, Heike Trautmann:
RP-Mod&RP-Crowd: Moderator- and Crowd-Annotated German News Comment Datasets. - Johannes C. Paetzold, Julian McGinnis, Suprosanna Shit, Ivan Ezhov, Paul Büschl, Chinmay Prabhakar, Anjany Sekuboyina, Mihail I. Todorov, Georgios Kaissis, Ali Ertürk, Stephan Günnemann, Bjoern H. Menze:
Whole Brain Vessel Graphs: A Dataset and Benchmark for Graph Learning and Neuroscience. - Neel Alex, Eli Lifland, Lewis Tunstall, Abhishek Thakur, Pegah Maham, C. Jess Riedel, Emmie Hine, Carolyn Ashurst, Paul Sedille, Alexis Carlier, Michael Noetel, Andreas Stuhlmüller:
RAFT: A Real-World Few-Shot Text Classification Benchmark. - Daniel Bear, Elias Wang, Damian Mrowca, Felix J. Binder, Hsiao-Yu Tung, R. T. Pramod, Cameron Holdaway, Sirui Tao, Kevin A. Smith, Fan-Yun Sun, Fei-Fei Li, Nancy Kanwisher, Josh Tenenbaum, Dan Yamins, Judith E. Fan:
Physion: Evaluating Physical Prediction from Vision in Humans and Machines. - Sebastian Koch, Yurii Piadyk, Markus Worchel, Marc Alexa, Cláudio T. Silva, Denis Zorin, Daniele Panozzo:
Hardware Design and Accurate Simulation of Structured-Light Scanning for Benchmarking of 3D Reconstruction Algorithms. - C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, Olivier Bachem:
Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation. - Pan Lu, Liang Qiu, Jiaqi Chen, Tanglin Xia, Yizhou Zhao, Wei Zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu:
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning. - Timm Hess, Martin Mundt, Iuliia Pliushch, Visvanathan Ramesh:
A Procedural World Generation Framework for Systematic Evaluation of Continual Learning. - Robin Chan, Krzysztof Lis, Svenja Uhlemeyer, Hermann Blum, Sina Honari, Roland Siegwart, Pascal Fua, Mathieu Salzmann, Matthias Rottmann:
SegmentMeIfYouCan: A Benchmark for Anomaly Segmentation. - Kimin Lee, Laura M. Smith, Anca D. Dragan, Pieter Abbeel:
B-Pref: Benchmarking Preference-Based Reinforcement Learning. - Hasam Khalid, Shahroz Tariq, Minha Kim, Simon S. Woo:
FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset. - Sean Welleck, Jiacheng Liu, Ronan Le Bras, Hanna Hajishirzi, Yejin Choi, Kyunghyun Cho:
NaturalProofs: Mathematical Theorem Proving in Natural Language. - Colby R. Banbury, Vijay Janapa Reddi, Peter Torelli, Nat Jeffries, Csaba Király, Jeremy Holleman, Pietro Montino, David Kanter, Pete Warden, Danilo Pau, Urmish Thakker, Antonio Torrini, Jay Cordaro, Giuseppe Di Guglielmo, Javier M. Duarte, Honson Tran, Nhan Tran, Wenxu Niu, Xuesong Xu:
MLPerf Tiny Benchmark. - Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, Jure Leskovec:
OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs. - Md. Mustafizur Rahman, Dinesh Balakrishnan, Dhiraj Murthy, Mücahid Kutlu, Matt Lease:
An Information Retrieval Approach to Building Datasets for Hate Speech Detection. - Karan Desai, Gaurav Kaul, Zubin Aysola, Justin Johnson:
RedCaps: Web-curated image-text data created by the people, for the people. - Tong Xia, Dimitris Spathis, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Jing Han, Apinan Hasthanasombat, Erika Bondareva, Ting Dang, Andres Floto, Pietro Cicuta, Cecilia Mascolo:
COVID-19 Sounds: A Large-Scale Audio Dataset for Digital Respiratory Screening. - Laurynas Karazija, Iro Laina, Christian Rupprecht:
ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation. - Changhun Lee, Soohyeok Kim, Sehwa Jeong, Chiehyeon Lim, Jayun Kim, Yeji Kim, Minyoung Jung:
MIND dataset for diet planning and dietary healthcare with machine learning: Dataset creation using combinatorial optimization and controllable generation with domain experts. - Rui Li, Ondrej Bohdal, Rajesh K. Mishra, Hyeji Kim, Da Li, Nicholas D. Lane, Timothy M. Hospedales:
A Channel Coding Benchmark for Meta-Learning. - William Gilpin:
Chaos as an interpretable benchmark for forecasting and data-driven modelling. - Kwei-Herng Lai, Daochen Zha, Junjie Xu, Yue Zhao, Guanchu Wang, Xia Ben Hu:
Revisiting Time Series Outlier Detection: Definitions and Benchmarks. - Simon Mille, Kaustubh D. Dhole, Saad Mahamood, Laura Perez-Beltrachini, Varun Gangal, Mihir Sanjay Kale, Emiel van Miltenburg, Sebastian Gehrmann:
Automatic Construction of Evaluation Suites for Natural Language Generation Datasets. - Sebastian Pineda-Arango, Hadi S. Jomaa, Martin Wistuba, Josif Grabocka:
HPO-B: A Large-Scale Reproducible Benchmark for Black-Box HPO based on OpenML. - Björn Barz, Joachim Denzler:
WikiChurches: A Fine-Grained Dataset of Architectural Styles with Real-World Challenges. - Tongzhou Mu, Zhan Ling, Fanbo Xiang, Derek Yang, Xuanlin Li, Stone Tao, Zhiao Huang, Zhiwei Jia, Hao Su:
ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations. - Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, Pablo Montero-Manso:
Monash Time Series Forecasting Archive. - Aleksandar Botev, Andrew Jaegle, Peter Wirnsberger, Daniel Hennes, Irina Higgins:
Which priors matter? Benchmarking models for learning latent dynamics. - James Ault, Guni Sharon:
Reinforcement Learning Benchmarks for Traffic Signal Control. - Curtis G. Northcutt, Anish Athalye, Jonas Mueller:
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. - Moein Sorkhei, Yue Liu, Hossein Azizpour, Edward Azavedo, Karin Dembrower, Dimitra Ntoula, Athanasios Zouzos, Fredrik Strand, Kevin Smith:
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer. - Minghui Chen, Zhiqiang Wang, Feng Zheng:
Benchmarks for Corruption Invariant Person Re-identification. - Salva Rühling Cachay, Venkatesh Ramesh, Jason N. S. Cole, Howard Barker, David Rolnick:
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models. - Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao:
Variance-Aware Machine Translation Test Sets. - Cynthia Chen, Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H. Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah:
An Empirical Investigation of Representation Learning for Imitation. - Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel:
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research. - Mark Mazumder, Sharad Chitlangia, Colby R. Banbury, Yiping Kang, Juan Ciro, Keith Achorn, Daniel Galvez, Mark Sabini, Peter Mattson, David Kanter, Greg Diamos, Pete Warden, Josh Meyer, Vijay Janapa Reddi:
Multilingual Spoken Words Corpus.
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.