HathiTrust Research Center Seeks Proposals for Advanced Collaborative Support Projects

lterrat's bookmarks 2017-05-16

Summary:

"We are pleased to be offering non-consumptive access to the entire HathiTrust collection. HTRC seeks proposals in each of the below three categories:

   I. Extracted features dataset: pre-extracted features, such as part-of-speech tagged token counts, have been extracted at the page level. The most recent release has page-level features from 13.6 million volumes.[1]

  • Example ACS help: HTRC staff help identify and package the right subset of volumes for your use. HTRC staff help select tools to analyze the content. 

 II. HTRC parallel analysis tools: tools for large scale analysis, i.e., parallel analysis and pattern matching using large-scale parallel compute resources

  • Example ACS help: HTRC staff assist in building large-scale corpus workset; run analysis on large-scale parallel computer on behalf of scholar.

 III. Data Capsule service: the Data Capsule service[2] gives researchers/educators a Capsule (computer) that runs in the HTRC environment to carry out their research. A researcher is free to flexibly configure their environment as they need with their own tools, import their workset, and then in a secure mode execute their analysis tools on HathiTrust content. Data Capsule supports an increasingly wide range of Capsule types with various built in community accepted analysis tools that make use of the Data Capsule system easier to use.

  • Example ACS help: HTRC staff assist in building a large-scale workset from a corpus; help scholar install their tools in a Capsule and link them to the data services. 

Possible Topics:

  • Analysis using an external dataset and HathiTrust content within a Capsule
  • Analyze workset-defined corpus within a pipeline of tools in Capsule where data provenance is gathered
  • Capsule results as published worksets
  • Build a novel dataset of derived features for a specific community using the HTRC parallel analysis tools
  • Derive new knowledge from the extracted feature dataset and other sources

These services are developed from the entire text corpus of the HathiTrust Digital Library, a collection of 15 million volumes digitized from 45 research libraries in North America, Asia, Europe, and Australia. Approximately 60% of the collection is in-copyright and is available for data mining only through the HathiTrust Research Center. More descriptive statistics about the collection can be found online: https://www.hathitrust.org/statistics_visualizations. HTRC anticipates awarding up to 4-6 ACS projects this round, depending on scope of projects on HTRC staff, with at least three awardees spots reserved for applicants from HathiTrust member institutions. Key content of the proposal should include the research context and problems, identification of one of the three operational interaction modes (I-III above) in which the scholar proposes to work, a detailed characterization of the data to be engaged, projected outcomes, and if can be provided, a description of the type of assistance sought from HTRC."

Link:

http://us14.campaign-archive2.com/?id=07c9c269b1&u=c9252a443583eb73bf936f046

From feeds:

Open Access Tracking Project (OATP) » lterrat's bookmarks

Tags:

Date tagged:

05/16/2017, 22:44

Date published:

05/16/2017, 18:44