ENSIKLOPEDIA

Kembali ke Ensiklopedia Arsip Wikipedia Indonesia

MUSCLE (alignment software)

MUltiple Sequence Comparison by Log-Expectation
Original author	Robert C. Edgar
Developer	drive5
Initial release	2004; 22 years ago (2004)

Stable release	5.3 / 10 November 2024; 18 months ago (2024-11-10)

Operating system	Linux, macOS, Windows
Platform	IA-32, x86-64
Available in	English
Type	Multiple sequence alignment
License	Public domain
Website	drive5.com/muscle/
Repository	github.com/rcedgar/muscle/releases/tag/v5.1 at GitHub

MUltiple Sequence Comparison by Log-Expectation (MUSCLE) is a computer software for multiple sequence alignment of protein and nucleotide sequences. It is licensed as public domain. The method was published by Robert C. Edgar in two papers in 2004. The first paper, published in Nucleic Acids Research, introduced the sequence alignment algorithm.^[1] The second paper, published in BMC Bioinformatics, presented more technical details. MUSCLE up to version 3 uses a progressive-refinement method.^[2] Since version 5 it uses a hidden Markov model similar to ProbCons.^[3]

History

Robert C. Edgar

Edgar graduated in 1982 from University College London, BSc in Physics, PhD in Particle physics.^[4] He pursued software development post-graduation and founded his own company, Parity Software, in 1988.^[4] In 2001, he began working with coding algorithms after attending a seminar at the University of California, Berkeley.^[5] From 2001-present day Edgar has contributed to or been the sole creator of multiple software programs, including MUSCLE and USEARCH.^[4] He has written a total of 96 papers in the field of computational biology from 2002-present, with his most recent paper being Discovery and Validation of Alternatives to VSV-G for Pseudotyping of Lentiviral Vectors for In Vivo Delivery of Anti-Tumor Transgenes. As of April, 2025, his work has been cited over 143,126 times.^[6] The two originally published MUSCLE papers have been cited more than 58,979 times combined. The paper “MUSCLE: multiple sequence alignment with high accuracy and high throughput”^[1] has received more than 49,052 citations,^[6] while “MUSCLE: a multiple sequence alignment method with reduced time and space complexity”^[2] has been cited over 9,936 times.^[6]

Muscle Versions History


MUSCLE Vversion	Date Published	Summary	Reference
MUSCLE v1	March 1, 2004	The method was initially published on March 1, 2004, however this version was already on v3.2.^[1] It is to be assumed MUSCLE version v1-v3.1 were unreleased to the public prior to publication as no record of previous versions exist online.	MUSCLE: multiple sequence alignment with high accuracy and high throughput
MUSCLE v3.3 & MUSCLE-fast	August 19, 2004	In the paper MUSCLE: a multiple sequence alignment method with reduced time and space complexity, published on August 19, 2004, a newer of MUSCLE (MUSCLE v3.3), was released alongside MUSCLE-fast. MUSCLE-fast, as the name suggests, was designed specifically for "high-throughput applications".^[7]	MUSCLE: a multiple sequence alignment method with reduced time and space complexity
MUSCLE v3.6	September 2005	In September 2005, another release of MUSCLE was published, specifically in the article MUSCLE User Guide (PDF available in the link), which describes how to use the, at the time, latest version of the software, which was v3.6.^[8]	MUSCLE User Guide
MUSCLE v3.8.31	September 15, 2010	MUSCLE v3.8.31 was released on September 15, 2010 and was the latest version prior to the release of MUSCLE v5.	MUSCLE 3.8.31 download on Drive5
MUSCLE v5.0-5.3	June 21, 2021	MUSCLE v5 was published on June 21, 2021 in the paper MUSCLE v5 enables improved estimates of phylogenetic tree confidence by ensemble bootstrapping. Since then, v5.1, 5.2, and 5.3 have been released, with all versions accessible through Edgar's website https://drive5.com/software.html.	MUSCLE v5 enables improved estimates of phylogenetic tree confidence by ensemble bootstrapping

Muscle5

Overview

In late 2021, Edgar released Muscle5 (also referred to as Muscle v5), an updated version of the MUSCLE software. It introduces several innovations aimed at improving alignment accuracy and reducing bias found in other MSA algorithms. Traditional tools such as Clustal Omega, MAFFT, and earlier versions of MUSCLE rely on progressive alignment strategies that produce a single alignment. Muscle5, in contrast, generates an ensemble of high-accuracy alignments by perturbing a hidden Markov model and permuting its guide tree. At its core, the algorithm is a parallelized reimplementation of ProbCons, and is designed to scale efficiently to large datasets. Muscle5 has demonstrated improved benchmark performance compared to leading MSA methods across several datasets, including BAliBASE, BRAliBASE, and PREFAB.^[3]

Ensembles

A key innovation in Muscle5 is the use of alignment ensembles, which provide unbiased metrics of confidence in alignments. Each individual MSA (replicate) in the ensemble uses fixed but independently chosen parameters for the hidden Markov model and guide tree, allowing results to be averaged over a diverse set of replicates. This enables biologists to assess how sensitive their downstream analyses are to alignment uncertainty by comparing results across the ensemble.^[3]

Old algorithm

The MUSCLE algorithm (before v5) proceeds in three stages: the draft progressive, improved progressive, and refinement stage.

Stage 1: Draft Progressive

In this first stage, the algorithm produces a multiple alignment, emphasizing speed over accuracy. This step begins by computing the k-mer distance for every pair of input sequences to create a distance matrix. UPGMA clusters the distance matrix to produce a binary tree. From this tree a progressive alignment is constructed, beginning with the creation of profiles for each leaf of the tree. For every node in the tree, a pairwise alignment is constructed of the two child profiles, creating a new profile to be assigned to that node. This continues until there is a multiple sequence alignment of all input sequences at the root of the tree.

Given $N$ input sequences and $L$ as the average sequence length, the time complexity of the draft progressive stage is

$O(N^{2}\cdot L+N\cdot L^{2})$ .

Here, the pairwise $k$ -mer distance calculation is computed as $O(N^{2}\cdot L)$ , and the progressive alignment steps take $O(N\cdot L^{2})$ , where $O$ denotes the asymptotic upper bound. The space complexity is $O(N\cdot L)$ as the algorithm maintains profiles and alignments for each sequence across the tree.^[1]

Stage 2: Improved Progressive

This stage focuses on obtaining a more optimal tree by calculating the Kimura distance for each pair of input sequences using the multiple sequence alignment obtained in Stage one, and creates a second distance matrix. UPGMA clusters this distance matrix to obtain a second binary tree. A progressive alignment is performed to obtain a multiple sequence alignment like in Stage one, but it is optimized by only computing alignments in subtrees whose branching orders have changed from the first binary tree, resulting in a more accurate alignment.

Refined alignments are made in the second stage by recalculating a more accurate tree via the Kimura distance. Thus, the algorithm analysis involves the initial subprocedures of pairwise distance calculations and progressive alignment; however, optimizations in computation are made by limiting re-alignment to only those subtrees with altered branching orders. The optimization is thus given as

$O(N^{2}\cdot L+m\cdot L^{2})$ ,

where the variable $m$ denotes the number of subtree realignments. Similarly, the space complexity is

$O(N\cdot L)$ ,

as profiles and alignments for the input sequences are stored for the progressive alignment process.^[1]

Stage 3: Refinement

In this final stage, an edge is chosen from the second tree, with edges being visited in decreasing distance from the root. The chosen edge is deleted, dividing the tree into two subtrees. The profile of the multiple alignment is then computed for each subtree. A new multiple sequence alignment is produced by re-aligning the subtree profiles. If the SP score is improved, the new alignment is kept, otherwise, it is discarded. The process of deleting an edge and aligning is repeated until convergence, or until a user-defined limit is reached.

The time complexity of the refinement stage is given as $O(k\cdot L^{2})$ . Here, $k$ denotes the number of edge deletions and $L$ denotes the average sequence length, where re-alignment of the subtree profiles is still the dominant cost per iteration. The space complexity remains the same as given in Stage one and two: $O(N\cdot L)$ . Since the same iterative refinement process occurs, the asymptotic complexity remains polynomial as the dominant term grows linearly with respect to the number of refinement steps.

In comparison, the CLUSTALW algorithm includes an optimized iterative refinement step such that selective re-alignment of the tree occurs in order to maximize alignment accuracy without repeating the entire process. The time and space complexity, however, do not change for this optimized iterative refinement step. The time complexity is $O(k\cdot L^{2})$ , where $k$ is the number of refinement steps and $L$ is the average alignment length. The space complexity is given as $O(N\cdot L)$ , again, for alignment profiles and sequence data for all $N$ input sequences.^[1]^[2]

Comparison of Space Complexity and Alignment Accuracy
Feature	CLUSTALW	MUSCLE v3 Version
Time Complexity	$O(k\cdot L^{2})$	$O(k\cdot L^{2})$
Alignment Accuracy	Moderate	Higher, due to iterative refinement improving SP score

Algorithm Flowchart

Complexity and Comparison

In the first two stages of the algorithm, the time complexity is $O(N 2 L + NL 2)$ , the space complexity is $O(N 2 + NL + L 2)$ . The refinement stage adds to the time complexity another term, $O(N 3 L)$ .^[1] MUSCLE is often used as a replacement for Clustal, since it usually (but not always) gives better sequence alignments. Depending on the chosen options, MUSCLE is significantly faster than Clustal, more so for larger alignments.^[1]^[2]

Most modern multiple sequence alignment programs are generally accepted when presenting aligned sequences but there are few differences amongst them. The main difference between programs is the method used to align the sequences. For instance, T-COFFEE and Clustal use the progressive method while MUSCLE and MAFFT perform using the iterative method of alignment.^[9] These two methods differ in their ability to handle low similarity sequences with the iterative method providing more accurate results. The other way methods differ is with their computational needs.

Originally MUSCLE had middling CPU demands in comparison to other programs but was definitely higher than the progressive methods.^[1] Comparisons with modern versions of MSA programs reveal that many are quite similar in capabilities. The alignments were assessed based on their sum of pairs (SP), correctly matching two nucleotides/amino acids across two sequences, and their total columns (TC), matching columns divided by the total columns. In these cases, MUSCLE was average in its ability to maximize matching pairs and columns, being slightly worse than the ProbCons, T-Coffee, Probalign and MAFFT.^[10] Outside the alignment scores, MUSCLE was less computationally demanding in both time to execute the alignment and the memory demand.

MSA Computational Properties ^[10]
Program	Average Alignment Time (sec)	Average Memory Usage (Mb)	Short Sequence Average SP	Short Sequence Average TC	Long Sequence Average SP	Long Sequence Average TC
MUSCLE	301.4	60.8	0.718	0.341	0.789	0.437
Probalign	1410.2	162.7	0.774	0.219	0.800	0.455
ProbCons	1781.7	192.5	0.763	0.220	0.831	0.524
MAFFT	309.0	231.6	0.767	0.421	0.803	0.470
T-Coffee	1964.9	372.2	0.760	0.407	0.830	0.520

Integration

MUSCLE is widely supported across multiple bioinformatics platforms. It is fully integrated into software programs such as CodonCode Aligner, DNASTAR's Lasergene, Geneious, and MacVector, and is also available as a plug-in for Sequencher, MEGA, UGENE, and AliView. Users can also access MUSCLE as a web service via the European Molecular Biology Laboratory (EMBL)-European Bioinformatics Institute (EBI)^[11] or T-Coffee. MUSCLE can also be downloaded by users on their personal devices via the Official website.

Platform	Integration Type	Access Method	Source
CodonCode Aligner	Built-in	Available under Contig → Align with Muscle	^[12]
Geneious	Built-in	Available under Align/Assemble → Multiple Align → MUSCLE	^[13]
DNASTAR Lasergene	Built-in	Available in MegAlign Pro module under Align → Align using Muscle	^[14]
MacVector	Built-in	Found under Align	^[15]
Sequencher (Gene Codes Corp.)	Plug-in	Requires installation of MUSCLE plugin; accessed under Assemble → Align Using → Muscle	^[16]
MEGA	Plug-in	Found under Alignment Explorer → Align → MUSCLE	^[17]
UGENE	Plug-in	Accessible by right-clicking → Align → Align with Muscle	^[18]
AliView	Plug-in	Requires installation of MUSCLE plugin; Accessible by Preferences → External Tools → Set path to MUSCLE	^[19]
EMBL-EBI	Web tool	Accessible via the MUSCLE Web Interface	^[11]
T-Coffee	Web tool	Accessible via the more options menu and selecting muscle under the methods section	^[20]

References

1 2 3 4 5 6 7 8 9 Edgar RC (2004). "MUSCLE: multiple sequence alignment with high accuracy and high throughput". Nucleic Acids Research. 32 (5): 1792–97. doi:10.1093/nar/gkh340. PMC 390337. PMID 15034147.
1 2 3 4 Edgar RC (2004). "MUSCLE: a multiple sequence alignment method with reduced time and space complexity". BMC Bioinformatics. 5 (1) 113. doi:10.1186/1471-2105-5-113. PMC 517706. PMID 15318951.
1 2 3 Edgar, Robert C. (2022). "Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny". Nature Communications. 13 (6968) 6968: 1–9. doi:10.1038/s41467-022-34630-w. PMC 9664440.
1 2 3 "Curriculum Vitae". drive5.com. Retrieved 2025-04-24.
↑ Edgar, Robert (September 3, 2014). "An Unemployed Gentleman Scholar". Retrieved April 24, 2025.
1 2 3 "Robert C. Edgar". scholar.google.com. Retrieved 2025-04-24.
↑ Edgar, Robert C. (2004-08-19). "MUSCLE: a multiple sequence alignment method with reduced time and space complexity". BMC Bioinformatics. 5 (1) 113. doi:10.1186/1471-2105-5-113. ISSN 1471-2105. PMC 517706. PMID 15318951.
↑ "Muscle user guide". scholar.google.com. Retrieved 2025-04-22.
↑ Zhang, Chenyue (November 29, 2024). "The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction". Biomolecules. 14 (12): 1531. doi:10.3390/biom14121531. PMC 11673352. PMID 39766238.
1 2 Pais, Fabiano (6 March 2014). "Assessing the efficiency of multiple sequence alignment programs". Algorithms for Molecular Biology. 9 (4) 4. doi:10.1186/1748-7188-9-4. PMC 4015676. PMID 24602402.
1 2 "MUSCLE < Multiple Sequence Alignment < EMBL-EBI". Archived from the original on 18 January 2015. Retrieved 1 September 2014.
↑ CodonCode Aligner: Sequence Alignment Software for DNA Data
↑ Geneious: Multiple Alignment using MUSCLE
↑ DNASTAR: MUSCLE alignment options
↑ Alignments in MacVector
↑ Sequencher: MUSCLE MSA
↑ MEGA Help: MUSCLE Alignment
↑ UGENE: MSA with Muscle
↑ AliView: AliView: About
↑ T-Coffee: Tutorial

External links

v t e Bioinformatics
Databases	Sequence databases: GenBank, European Nucleotide Archive, DNA Data Bank of Japan and China National GeneBank Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: BioNumbers, Protein Data Bank, Ensembl, InterPro, KEGG, and Gene Ontology Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, Rat Genome Database, PHI-base, Arabidopsis Information Resource, GISAID and Zebrafish Information Network
Software	BLAST Bowtie Clustal EMBOSS HMMER MUSCLE PANGOLIN SAMtools SOAP suite TopHat
Other	Server: ExPASy Rosalind (education platform)
Institutions	Broad Institute Computational Biology Department (CBD) Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI) Database Center for Life Science (DBCLS) DNA Data Bank of Japan (DDBJ) European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory (EMBL) Flatiron Institute J. Craig Venter Institute (JCVI) Joint Genome Institute (JGI) Max Planck Institute for Molecular Genetics (MPIMG) Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) US National Center for Biotechnology Information (NCBI) Japanese Institute of Genetics Netherlands Bioinformatics Centre (NBIC) Philippine Genome Center (PGC) Scripps Research Swiss Institute of Bioinformatics (SIB) Wellcome Sanger Institute Whitehead Institute
Organizations	African Society for Bioinformatics and Computational Biology (ASBCB) Australia Bioinformatics Resource (EMBL-AR) European Molecular Biology network (EMBnet) International Nucleotide Sequence Database Collaboration (INSDC) International Society for Biocuration (ISB) International Society for Computational Biology (ISCB) Student Council (ISCB-SC) Institute of Genomics and Integrative Biology (CSIR-IGIB) Japanese Society for Bioinformatics (JSBi)
Meetings	Basel Computational Biology Conference‎ ([BC²]) European Conference on Computational Biology (ECCB) Intelligent Systems for Molecular Biology (ISMB) International Conference on Bioinformatics (InCoB) International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) ISCB Africa ASBCB Conference on Bioinformatics Pacific Symposium on Biocomputing (PSB) Research in Computational Molecular Biology (RECOMB)
File formats	CRAM format FASTA format FASTQ format NeXML format Nexus format Pileup format SAM format Stockholm format VCF format GFF format GTF format
Related topics	Computational biology List of biobanks List of biological databases Molecular phylogenetics Sequencing Sequence database Sequence alignment
Category Commons