The Coleridge Initiative has launched a major Kaggle competition – Show US the Data – that challenges data scientists to help show how publicly funded data and evidence are used to serve science and society.
CHORUS is serving as a partner to the initiative. Data scientists will use machine learning and natural-language processing to discover references to datasets from peer-reviewed publications related to federally funded research. The algorithms developed in this challenge will then be used to assess dataset use, and produce data impact scorecards for federal agencies. Numerous CHORUS publisher members, along with other sources, have made a corpus of full text research articles reporting on funded research from NSF, NIH, USDA, NIST, NOAA available as primary source material for the competition.
Data and evidence are critical if the government is to effectively address global challenges such as pandemics, climate change, and food production. But much of that evidence is contained inside publications. The Coleridge Initiative aims to help government agencies leverage their data as a strategic asset, and to show the value of data produced with federal funding.
The new Show US the Data Kaggle competition builds on the work of the Coleridge Initiative Rich Context Project which aims to make data and evidence central to building scientific policy and government legislation. Data scientists will compete for $90,000 in prizes, awarded to the teams that identify the most precise methods of finding datasets in the publications. The competition is now open and as of today over 120 teams have signed up to compete! The final submission deadline is June 22, 2021.
Kaggle is the world’s largest online data science community. With more than 6 million+ members across 194 countries, the Kaggle community uses its diverse set of professional and academic backgrounds to solve complex data science problems. Working as individuals or in teams, the winning competitors are awarded prizes and industry recognition for their accomplishments. Data scientists from all over the world come to Kaggle to participate in machine learning competitions, practice data science, build portfolios, and share datasets, and code directly on the platform via Kaggle Notebooks.
About Coleridge Initiative
The Coleridge Initiative aims to change the empirical foundation of social science, statistical and public agencies in the United States and transform understanding of how our society works. The Coleridge Initiative has created dozens of pilot projects, worked with more than 100 agencies, and is building new technologies housed in a secure computational research platform, the Administrative Data Research Facility, to promote access and discovery of sensitive and confidential microdata.
CHORUS is creating a future where the output flowing from funded research is easily and permanently discoverable, accessible and verifiable by anyone in the world. By providing the necessary metadata infrastructure and governance to enable a smooth, low-friction interface between funders, authors, institutions and publishers in a distributed network environment, CHORUS can minimize open and public access compliance burdens while increasing access to literature and data in support of funder mandates worldwide.