Scithon: An evaluation framework for assessing research productivity tools

peter.suber's bookmarks 2020-11-27

Summary:

Abstract:  There is a current scarcity of tested methods to evaluate the performance of artificial intelligence-based science discovery tools. Iris.ai, an international start-up developing text understanding technology and products, has developed a novel framework for performing such evaluation tasks. The framework, organized around live events, involves a systematic and cross-disciplinary comparison that focuses on productivity gains and takes into account user engagement. Under this format, referred to as Scithon™, event participants are asked to address, in a compressed time frame, the early stages of a research challenge put forth by a third party. Submitted results are then evaluated externally by domain experts. The logged data, including user engagement with the system, is compared against the outcome of the Scithon™. In this paper, we present in detail the full mechanics of the Scithon™ and the results obtained from a series of Scithon™ competitions run since 2016, where the presented framework is used to evaluate the productivity gains of Iris.ai’s own intelligent research assistant. Initial findings show that, compared to conventional evaluation frameworks for search engines, Scithon™ is a suitable platform for benchmarking intelligent research assistants and is able to identify advantages and disadvantages of such systems in deeper detail and complexity. Iris.ai provides the usage of the platform under an Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License, which means we welcome the community to freely adopt its name and format with an appropriate acknowledgement to this paper and its authors.

Link:

http://lrec-conf.org/workshops/lrec2018/W24/pdf/7_W24.pdf

Updated:

11/27/2020, 08:47

From feeds:

Open Access Tracking Project (OATP) » peter.suber's bookmarks

Tags:

oa.scithon oa.iris.ai oa.ai oa.discoverability oa.search

Date tagged:

11/27/2020, 13:47

Date published:

05/12/2018, 09:47