Kaphta: Text mining web tool to extract information on the anticancer activity of polyphenols

摘要/Abstract

摘要：

Abstract In this paper, we describe the application of Kaphta architecture, a resource for text mining of the anticancer activity of polyphenols. The anticancer activity of these compounds against different types of cancer has been widely reported in the literature and they are one of the most promising molecules for the development of anticancer drugs. The architecture, which comprises four sequential and well-defined steps, uses a hybrid approach composed of a dictionary, rules and machine learning to identify abstracts containing sentences with associations between polyphenol, cancer and gene entities. The application of the architecture on 23 826 PubMed abstracts generated a knowledge base of indexed abstracts with 172 169 sentences containing, polyphenol-cancer and polyphenol-gene associations. A Web tool was implemented that allowed the user to search for information on 2 006 polyphenols, 240 cancers and 3 121 genes entities, and 11 750 polyphenol-cancer and 9 160 polyphenol-gene associations indexed in the knowledge base. A ranking algorithm calculates scores for each indexed abstract considering the number and type of sentences with entities and rules recognized. A test with users demonstrated that the visualization resources on the web tool contributes to the understanding of the association between polyphenols, genes and cancers, in comparison with the PubMed Tool. The Kaphta architecture and web tool permits to extract knowledge on the anticancer activity of polyphenols and can thus contribute to the exploration of these molecules in the development of anticancer therapies.

关键词:

polyphenols, cancer, text-mining

Abstract: Abstract In this paper, we describe the application of Kaphta architecture, a resource for text mining of the anticancer activity of polyphenols. The anticancer activity of these compounds against different types of cancer has been widely reported in the literature and they are one of the most promising molecules for the development of anticancer drugs. The architecture, which comprises four sequential and well-defined steps, uses a hybrid approach composed of a dictionary, rules and machine learning to identify abstracts containing sentences with associations between polyphenol, cancer and gene entities. The application of the architecture on 23 826 PubMed abstracts generated a knowledge base of indexed abstracts with 172 169 sentences containing, polyphenol-cancer and polyphenol-gene associations. A Web tool was implemented that allowed the user to search for information on 2 006 polyphenols, 240 cancers and 3 121 genes entities, and 11 750 polyphenol-cancer and 9 160 polyphenol-gene associations indexed in the knowledge base. A ranking algorithm calculates scores for each indexed abstract considering the number and type of sentences with entities and rules recognized. A test with users demonstrated that the visualization resources on the web tool contributes to the understanding of the association between polyphenols, genes and cancers, in comparison with the PubMed Tool. The Kaphta architecture and web tool permits to extract knowledge on the anticancer activity of polyphenols and can thus contribute to the exploration of these molecules in the development of anticancer therapies.

Key words:

polyphenols, cancer, text-mining

Ramon Gustavo Teodoro Marques da Silva, Samuel Lucas Santos Gomes, Paulo Muniz de Ávila, Gustavo José da Silva, Ana Lucia Fachin, Edilson Carlos Caritá, Mozart Marins.

Kaphta: Text mining web tool to extract information on the anticancer activity of polyphenols

[J]. 多酚, 2022, 4(2): 87-100.

Ramon Gustavo Teodoro Marques da Silva, Samuel Lucas Santos Gomes, Paulo Muniz de Ávila, Gustavo José da Silva, Ana Lucia Fachin, Edilson Carlos Caritá, Mozart Marins. Kaphta: Text mining web tool to extract information on the anticancer activity of polyphenols[J]. Journal of Polyphenols, 2022, 4(2): 87-100.

[1]	Fengjiao Zhang, Tingting Zhang, Jingyu Yang, Chunfu Wu. A review: anticancer activity of grape seed proanthocyanidins[J]. 多酚, 2020, 2(1): 1-10.
[2]	Changhua Li, Lin Wang, Dianwen Wei, Gaosheng Hu, Xiaoning Zhang, Zhihong Lou, Qiong Zhang, Jingming Jia, Yongsheng Hou. Cranberry cultivated in China: UPLC-Q-TOF-MS analysis of its acidic ethanol extract and assessment of its anti-bacterial and anti-tumor activities in vitro[J]. 多酚, 2019, 1(1): 50-61.

多酚杂志