Google, on Thursday, launched a new platform to help those users who “live and breathe data.” Dubbed Dataset Search, the service can be called a search engine for scientists, data journalists, data geeks, or anyone else seeking data for their work and “intellectual curiosity.”
Many thousands of data repositories on the web are currently providing us with access to millions of datasets. Local and national governments around the world also regularly publish their data as well. Google believes that Dataset Search will help users make sense of these datasets present online.
“Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher’s site, a digital library, or an author’s personal web page,” Natasha Noy, Research Scientist, Google AI, said in a blog post.
Google Scholar is the company’s popular search engine for academic studies and reports. It’s a free service that indexes scholarly literature across various publishing formats and disciplines.
To create Dataset Search, Google developed guidelines for dataset providers. The company asked them to describe their data to help it (and other search engines) better understand the content of their pages.
According to Noy, the guidelines for creating Dataset Search include salient information about datasets. For example, details like who created the dataset, when it was published, how the data was collected and what the terms are for using the same were included in the guidelines.
“We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset,” Noy added.
How to use it?
To use Dataset Search, you can simply enter what you are looking for and Google will guide you to the published dataset on the repository provider’s website.
If you are looking for daily weather data, type the query in Data Search as shown in the screenshot below:
You’ll get data not only from NASA and NOAA, but also from academic repositories like Harvard’s Dataverse and Inter-university Consortium for Political and Social Research (ICPSR).
Dataset Search is improving
Initially, users will find references to most datasets in various disciplines like environmental and social sciences, government data and data provided by news organizations, such as ProPublica.
Google said that the variety and coverage of datasets will continue to grow. Dataset Search currently works in multiple languages while support for additional languages is coming soon, according to the company.