Beta

Title : Exploring Academic Patent–Paper Pairs in Japan: Benchmarking Existing Detection Models

Author(s) : Van-Thien Nguyen, Rene Carraz

Abstract : This study expands on the patent-paper pair (PPP) detection model developed by Nguyen and Carraz (2025, Scientometrics) by systematically comparing it with two prominent large-scale approaches: Marx and Scharfmann (2024) and Wang et al. (2025). Although these models all aim to identify instances where the same research result is disclosed through both a patent and a scientific paper, they differ substantially in scope, design, and methodological assumptions. The Nguyen and Carraz model is designed for the Japanese academic context and integrates inventor–author matching, citation overlap, and semantic and lexical similarity within a supervised learning framework. In contrast, Marx and Scharfmann rely on detecting long identical word sequences (“self-plagiarism”) via a random forest classifier, and Wang et al. implement an inventor-centric clustering method with logistic regression applied to title and abstract similarity. We directly compare the Nguyen and Carraz dataset with those of Marx and Scharfmann and Wang et al., focusing on PPPs involving Japanese academic assignees. Despite the shared national context, there is minimal overlap: only 168 PPPs overlap with the Marx and Scharfmann model and 425 overlap with the Wang et al. model. When evaluated on a shared validation set, the Nguyen and Carraz model outperforms both alternatives in the Japanese academic context, especially with logistic regression features. Feature extensions such as self-plagiarism and geographic distance offer only modest improvements under non-linear models. These findings highlight the importance of designing context-specific models and exercising caution when applying global PPP datasets to localized settings.

Key-words : Patent Paper Pair; Methodology; Matching algorithm; Academic patent; Japan

JEL Classification : 031 , 034 , 05

Working Paper BETA #2025-27