Scientific Data
Publicly available, free, online scientific data, largely from university, industry, and government research programs.
353 listings
Submitted Dec 05, 2016 to Scientific Data We built this web site as a repository for your machine learning data. Upload your data, find interesting data sets, exchange solutions, compare yourself against other methods.
|
Submitted Dec 05, 2016 to Scientific Data With massive volumes of written text being produced every second, how do we make sure that we have the most recent and relevant information available to us? Maluuba is tackling this problem by building AI systems that can read and comprehend large volumes of complex text in real-time.
The purpose of Maluuba's NewsQA dataset is to help the research community build algorithms that are capable of answering questions requiring human-level comprehension and reasoning skills. Leveraging CNN articles from the DeepMind Q&A Dataset, we prepared a crowd-sourced machine reading comprehension dataset of over 100K Q&A pairs. |
Submitted Dec 05, 2016 to Scientific Data ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures.
|
Submitted Dec 05, 2016 to Scientific Data This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.
This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). |
Submitted Dec 05, 2016 to Scientific Data This database is intended for experiments in 3D object reocgnition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees).
The training set is composed of 5 instances of each category (instances 4, 6, 7, 8 and 9), and the test set of the remaining 5 instances (instances 0, 1, 2, 3, and 5). |
Submitted Dec 05, 2016 to Scientific Data The National Data Buoy Center (NDBC) is an agency within the National Weather Service (NWS) of the National Oceanic and Atmospheric Administration (NOAA). The National Data Buoy Center (NDBC) manages the development, operations, and maintenance of the national data buoy network. It serves as the NOAA focal point for data buoy and associated meteorological and environmental monitoring technology. It provides high quality meteorological/environmental data in real time from automated observing systems that include buoys and a Coastal-Marine Automated Network (C-MAN) in the open ocean and coastal zone surrounding the United States. It provides engineering support, including applications development, and manages data buoy deployment and operations, and installation and operation of automated observing systems installed on fixed platforms. It manages the Volunteer Observing Ship (VOS) program to acquire additional meteorological and oceanographic observations supporting NWS mission requirements. It operates the NWS test center for all surface sensor systems. It maintains the capability to support operational and research programs of NOAA and other national and international organizations. The US Coast Guard is the primary source of transportation for buoy deployments, retrievals, and maintenance.
|
Submitted Dec 05, 2016 to Scientific Data Browse and download over 1,400 New York State data resources on topics ranging from farmers’ markets to solar photovoltaic projects to MTA turnstile usage. Also check out the Open NY Dataset Submission Guide!
|
Submitted Dec 05, 2016 to Scientific Data NYC Open Data makes the wealth of public data generated by various New York City agencies and other City organizations available for public use. As part of an initiative to improve the accessibility, transparency, and accountability of City government, this catalog offers access to a repository of government-produced, machine-readable data sets.
|
Submitted Dec 05, 2016 to Scientific Data This page contains many classification, regression, multi-label and string data sets stored in LIBSVM format. Many are from UCI, Statlog, StatLib and other collections. We thank their efforts. For most sets, we linearly scale each attribute to [-1,1] or [0,1]. The testing data (if provided) is adjusted accordingly. Some training data are further separated to "training" (tr) and "validation" (val) sets. Details can be found in the description of each data set. To read data via MATLAB, you can use "libsvmread" in LIBSVM package.
|
Submitted Dec 04, 2016 to Scientific Data Macrostrat is a platform for the aggregation and distribution of geological data relevant to the spatial and temporal distribution of sedimentary, igneous, and metamorphic rocks as well as data extracted from them. It is linked to the GeoDeepDive digital library and machine reading system, and it aims to become a community resource for the addition, editing, and distribution of new stratigraphic, lithological, environmental, and economic data. Interactive applications built upon Macrostrat are designed for educational and research purposes.
|
Submitted Dec 04, 2016 to Scientific Data Search by a wide variety of parameters (latlon, magnitude, time, etc.) and download earthquake data in a variety of file formats including CSV, KML, QuakeML, and GeoJSON.
|
Submitted Dec 04, 2016 to Scientific Data Know about earthquakes just after they happen.
Real-time notifications are available from the Earthquake Notification Service (ENS) and Tweet Earthquake Dispatch (TED) services each offer something different, depending on your interests. Real-time feeds are available in ATOM, KML, csv, QuakeML, and GeoJSON formats. |
Submitted Dec 04, 2016 to Scientific Data A comprehensive on-line resource for quality checked and aligned ribosomal RNA sequence data.
SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya). SILVA are the official databases of the software package ARB. |
Submitted Dec 03, 2016 to Scientific Data Harnessing the power of supercomputing and state of the art electronic structure methods, the Materials Project provides open web-based access to computed information on known and predicted materials as well as powerful analysis tools to inspire and design novel materials.
|
Submitted Nov 30, 2016 to Scientific Data NASA Goddard Earth Sciences (GES) Data and Information Services Center (DISC) is one of twelve NASA Science Mission Directorate (SMD) Data Centers that provide Earth science data, information, and services to research scientists, applications scientists, applications users, and students. The GES DISC is the home (archive) of NASA Precipitation and Hydrology, as well as Atmospheric Composition and Dynamics remote sensing data and information. The DISC also houses the Modern Era Retrospective-Analysis for Research and Applications (MERRA) data assimilation datasets (generated by GSFC’s Global Modeling and Assimilation Office), and the North American Land Data Assimilation System (NLDAS) and Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). The GES DISC is located at Goddard Space Flight Center, in Greenbelt, Maryland.
|
Submitted Nov 30, 2016 to Scientific Data APFO is home to one of the country's largest aerial film libraries. We currently house more than 70,000 rolls of film (10 million plus images). Our film dates from 1955 to the present. We have coverage of most of the United States and its territories. Historic aerial images play a more vital role today than ever before with environmental assessments, change detection, and property boundary disputes.
|
Submitted Nov 30, 2016 to Scientific Data NASA's MODIS instrument is operating on both the Terra and Aqua spacecraft. It has a viewing swath width of 2,330 km and views the entire surface of the Earth every one to two days. Its detectors measure 36 spectral bands between 0.405 and 14.385 µm, and it acquires data at three spatial resolutions -- 250m, 500m, and 1,000m.
|
Submitted Nov 30, 2016 to Scientific Data High spatial resolution, contemporary data on human population distributions are a prerequisite for the accurate measurement of the impacts of population growth, for monitoring changes and for planning interventions. The WorldPop project aims to meet these needs through the provision of detailed and open access population distribution datasets built using transparent approaches.
|
Submitted Nov 30, 2016 (Edited Nov 30, 2016) to Scientific Data Earth Engine’s public data catalog includes a variety of standard Earth science raster datasets. You can import these datasets into your script environment with a single click. You can also upload your own raster data or vector data for private use or sharing in your scripts.
|
Submitted Aug 23, 2010 to Scientific Data Find global bathymetry (ocean depth) and topography (land elevation) data from NOAA's National Geophysical Data Center. The database contains the global ETOPO1 1-minute relief database, gridded at approximately 100 square meters, US coastal relief and tsunami inundation models, multibeam bathymetry data, Lidar data, and more.
|