多酚 ›› 2022, Vol. 4 ›› Issue (2): 87-100.

• • 上一篇    下一篇

Kaphta: Text mining web tool to extract information on the anticancer activity of polyphenols

  

  • 出版日期:2022-12-26 发布日期:2022-12-26

Kaphta: Text mining web tool to extract information on the anticancer activity of polyphenols

  • Online:2022-12-26 Published:2022-12-26

摘要:

Abstract In this paper, we describe the application of Kaphta architecture, a resource for text mining of the anticancer activity of polyphenols. The anticancer activity of these compounds against different types of cancer has been widely reported in the literature and they are one of the most promising molecules for the development of anticancer drugs. The architecture, which comprises four sequential and well-defined steps, uses a hybrid approach composed of a dictionary, rules and machine learning to identify abstracts containing sentences with associations between polyphenol, cancer and gene entities. The application of the architecture on 23 826 PubMed abstracts generated a knowledge base of indexed abstracts with 172 169 sentences containing, polyphenol-cancer and polyphenol-gene associations. A Web tool was implemented that allowed the user to search for information on 2 006 polyphenols, 240 cancers and 3 121 genes entities, and 11 750 polyphenol-cancer and 9 160 polyphenol-gene associations indexed in the knowledge base. A ranking algorithm calculates scores for each indexed abstract considering the number and type of sentences with entities and rules recognized. A test with users demonstrated that the visualization resources on the web tool contributes to the understanding of the association between polyphenols, genes and cancers, in comparison with the PubMed Tool. The Kaphta architecture and web tool permits to extract knowledge on the anticancer activity of polyphenols and can thus contribute to the exploration of these molecules in the development of anticancer therapies.

关键词:

Abstract: Abstract In this paper, we describe the application of Kaphta architecture, a resource for text mining of the anticancer activity of polyphenols. The anticancer activity of these compounds against different types of cancer has been widely reported in the literature and they are one of the most promising molecules for the development of anticancer drugs. The architecture, which comprises four sequential and well-defined steps, uses a hybrid approach composed of a dictionary, rules and machine learning to identify abstracts containing sentences with associations between polyphenol, cancer and gene entities. The application of the architecture on 23 826 PubMed abstracts generated a knowledge base of indexed abstracts with 172 169 sentences containing, polyphenol-cancer and polyphenol-gene associations. A Web tool was implemented that allowed the user to search for information on 2 006 polyphenols, 240 cancers and 3 121 genes entities, and 11 750 polyphenol-cancer and 9 160 polyphenol-gene associations indexed in the knowledge base. A ranking algorithm calculates scores for each indexed abstract considering the number and type of sentences with entities and rules recognized. A test with users demonstrated that the visualization resources on the web tool contributes to the understanding of the association between polyphenols, genes and cancers, in comparison with the PubMed Tool. The Kaphta architecture and web tool permits to extract knowledge on the anticancer activity of polyphenols and can thus contribute to the exploration of these molecules in the development of anticancer therapies.

Key words: