12/29/2023 0 Comments Fminer buttonWe also describe the implementation of CT-PRO which utilize the CFP-Tree for FIM. The number of nodes in a CFP-Tree can be up to half less than in the corresponding FP-Tree. In this paper, we propose another pattern growth algorithm which uses a more compact data structure named Compressed FP-Tree (CFP-Tree). It has been an active research area and a large number of algorithms have been developed. Its application for other data mining tasks has also been recognized. Experimental results show that DPT remarkably outperforms previous algorithms with respect to running time and memory usage, and that a prefix tree representing all frequent itemsets DPT outputs can be used more efficient than a list representing them previous algorithms output.įrequent itemset mining (FIM) is an essential part of association rules mining. Using only one dynamic prefix tree, DPT avoids the high cost of constructing many prefix trees and thus gains significant performance improvement. An interesting advantage of DPT is that the algorithm can directly output a prefix tree representing all frequent itemsets after slight modifications. Subsequently, we illuminate how DPT adjusts the prefix tree to mine frequent itemsets and give three optimization techniques. We first introduce the concept of the post-conditional database of an itemset, and analyze the distribution of an itemset's post-conditional database in a prefix tree representing a database. In this paper, we propose a novel frequent itemset mining algorithm called DPT (Dynamic Prefix Tree) which uses only one prefix tree. To mine frequent itemsets, previous algorithms based on a prefix tree structure have to construct many prefix trees, which is very time-consuming. Then, these rules are used to build the protein class prediction classifier.įrequent itemset mining is a fundamental problem in data mining area because frequent itemsets have been extensively used in reasoning, classifying, clustering, and so on. The generated CARs relate protein structure class with the other protein features. In data mining phase the data is mined using our algorithm to generate Class Association Rules (CARs). In preprocessing phase, a series of data preprocessing techniques are applied to “clean” the data in order to overcome the false positives problem. The adopted process contains two main phases: preprocessing and data mining. In this paper we propose a new technique based on data classification and association rules to improve memory dependency while predicting protein structures. However, applying existing data mining techniques to protein data faces many problems such as high memory dependency, time consumption and low accuracy. The huge amount of available protein data motivated the automated knowledge extraction from this data. The protein structure can be predicted from its sequences, functions, and organism sources. Knowledge about similarities and differences in protein structure is vital in many pharmaceutical and disease related research.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |