Multimedia and Datasets: Providing Access to New Forms of Nuclear Information
-
Multimedia and Datasets: Providing Access to New Forms of Nuclear Information
Brian A. Hitson
United States Department of Energy
Office of Scientific and Technical Information
-
The "Big Data" Era
A definition: "A collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools." (Wikipedia)
How big is "big data"?
22,700,000 hits on Google.
-
Everybody Is On Board
- Policymakers
- U.S. "Big Data" Initiative - $200M (March 2012)
- European Commission: "Big Data – The Digital Agenda for Europe and Challenges for 2012"
- Scientists/Authors
- The Fourth Paradigm – Data-Intensive Scientific Discovery (2009)
- "Sailing on an Ocean of 0s and 1s," Science, Vol. 237 (2010)
- "A Deluge of Data Shapes a New Era in Computing," New York Times (14 December 2009)
- International/National bodies
- International Council of Science – ICSU
- World Data System
- CODATA
- U.S. Board on Research Data and Information (BRDI)
- International Council of Science – ICSU
- Policymakers
-
Nuclear Data
- Nuclear Data*
- Types:
- Experimental (e.g., Experimental Nuclear Reaction Data (EXFOR))
- Evaluated (e.g., Evaluated Nuclear Data File (ENDF-6) and Evaluated Nuclear Structure Data File – ENSDF)
- Reaction: incident neutrons and incident charged particles and photons
- Structure and decay data: half-lives, decay schemes, etc. (Nuclear Data Sheets)
- Other data-intensive nuclear fields:
- Nuclear medicine
- Radiation safety
- Waste management and environmental research
- Materials analysis
- Safeguards
- Nuclear astrophysics
- Types:
* Source: Nuclear Data Section, IAEA, 2000
- Nuclear Data*
-
The Challenges of Numeric Data:
- Data sets are hard to find.
-
The Challenges of Numeric Data:
- Data sets are hard to navigate.
Screenshot of Experimental Nuclear Reaction Data (EXFOR) Database Version of September 21, 2012
-
The Challenges of Numeric Data:
Data sets are hard to cite.
-
Why Cite Data?
Data should be cited in just the same way that other sources of information, such as articles and books, are cited.
Data citation can help by:
- enabling easy reuse and verification of data
- allowing the impact of data to be tracked
- creating a scholarly structure that recognizes and rewards data producers
-
One Solution: DataCite
What is DataCite?
» A global consortium composed of local institutions focused on improving the scholarly infrastructure around datasets and other non-textual information.
» A service for assigning Digital Object Identification (DOIs) and metadata to data sets.
-
How Data Citation Works
Data Citation metadata submitted to DOE-OSTI
Web Service API
241.6 AN
DOI Assigned By DOE-OSTI
DOE-OSTI submits nightly feed of new DOIs to DataCite
DataCite Registers DOI
- Dataset Type
- Dataset Title
- Dataset Creator/Author or Principal Investigator
- Dataset Product Number
- DOE Contract/Award Number
- Originating Research Organization
- Publication/ Issue Date
- Sponsoring Organization
- URL where the Dataset is posted for access
- Contact information
Creator/Author, Primary Investigator, or Submitter notified of Data Citation availability
Data Citation submitted to search engines for indexing
DOE-OSTI updates metadata record with DOI creating a full Data Citation
DataCite validates DOI registration with DOE-OSTI
-
Data Citation Demo
Data Citation Demo Play Demonstration of Data Citation (opens new window)
-
Multimedia …
… an increasing form of scientific communications
- Videotaped lectures
-
Multimedia …
… an increasing form of scientific communications
- Visualizations
-
Multimedia …
… an increasing form of scientific communications
- Experiments/Simulations
YouTube search on "nuclear" has 3,090,000 results
-
The Challenges with Multimedia Science Information
- Lack of written transcripts, i.e. no "full text" to search
- Metadata, if available, is often minimal
- Scientific, technical, and medical terminology/vocabulary
- Videos can be long, often up to an hour or more
-
Access to Multimedia-based Science & Technology
A Case Study for Enhanced Multimedia Search & Retrieval
ScienceCinema
http://www.osti.gov/sciencecinema/
- Partnership between OSTI and Microsoft Research.
- Launched in February 2011; searches ~2,600 multimedia files from DOE and CERN.
- Utilizes Microsoft Research Audio Video Indexing System (MAVIS).
- Enables searching of digitized spoken content.
- Users can search for precise term within video and be directed to the exact point in the video where the term was spoken.
-
Multimedia Search Demo
Multimedia Search Demo Play Demonstration of Multimedia Search (opens new window)
-
Summary
- Big Data is here.
- Data citation makes data:
- easier to find
- easier to navigate
- Scientific multimedia is here.
- Speech indexing makes multimedia:
- easier to search
- more productive for the scientist and student
-
Thank You!
Brian A. Hitson
hitsonb@osti.gov
www.osti.gov
865-576-1199