
Hamori and Ruskin first applied it to biomolecular sequences data. Among these alignment-free algorithms, the graphic representation of protein is one of the most effective and commonly used ways. Compared with the former algorithms, alignment-free has lower computing complexity and better visualization. Existing MSA algorithms can be divided into two categories: alignment-based and alignment-free algorithms. ĭetecting similar fragments between sequences is the core idea of multi-sequence alignment (MSA), whose reliability directly affects protein phylogenetic analysis in revealing the distance relationship among different species. Due to the explosive growth of genome sequence data, it is necessary to find a reliable algorithm for sequence analysis. Sequence analysis based on biomolecular data can reduce the time and cost of traditional laboratory experiments for protein family identification, function prediction and gene annotation. Generally, proteins’ three-dimensional structure depends on primary amino acid sequence and determines their biological function. Proteins perform vital roles in countless biological processes, they help to build the structure of living organisms. And it is hoped to play a role in APPA’s related research.

It also can measure the protein sequence similarity effectively. ConclusionįFP has higher accuracy in APPA and multi-sequence alignment. When FFP is tested for phylogenetic analysis on four groups of protein sequences, the results are obviously better than other comparisons, with the highest accuracy up to more than 97%. Finally, the phylogenetic tree is constructed. The smaller the distance between them, the more similar they are.

Secondly, FFT and HFD are used to generate the feature vectors of encoded sequences, whereafter, the distance matrix is calculated from the cosine function, which describes the degree of similarity between species.

Firstly, FFP is used to encode protein sequences on the basis of the important physicochemical properties of amino acids, the dissociation constant, which determines acidity and basicity of protein molecules. ResultsĬonsequently, we propose a new method named FFP, it joints FFT and HFD. However, with the exponential growth of protein sequence data, it is very important to develop a reliable APPA method for protein sequence analysis. Fast Fourier transform (FFT) and Higuchi’s fractal dimension (HFD) have excellent performance in describing sequences’ structural and complexity information for APPA. Amino acid property-aware phylogenetic analysis (APPA) refers to the phylogenetic analysis method based on amino acid property encoding, which is used for understanding and inferring evolutionary relationships between species from the molecular perspective.
