Comparative sequence analyses of the neurotoxin complex genes in Clostridium botulinum serotypes

Neurotoxin complex (NTC) genes are arranged in two known hemagglutinin (HA) and open reading frame X (ORFX) clusters. NTC genes have been analyzed in four serotypes A, B, E and F of Clostridium botulinum causing human botulism. Analysis of amino acid sequences of NT genes demonstrated significant differences among subtypes and four serotypes. Phylogram tree of NT genes reveals that serotypes A1 and B1 are much closer compared to serotype E1 and F1. However, non-toxic non-hemagglutinin (NTNH) gene is highly conserved among four serotypes. Analysis of phylogram tree of NTNH gene reveals that serotypes A and F are more closely related compared to serotype B and E. Additionally, sequences of HAs and ORFX genes are very divergent but these genes are specific in subtypes and serotypes of Clostridium botulinum. Information derived from sequence analyses of NTC has direct implication in development of detection tools and therapeutic countermeasures for botulism.


Introduction
Botulism is a neuroparalytic disease caused by seven antigenically distinct serotypes of neurotoxins (A-G) produced primarily by Clostridium botulinum.Human botulism is caused by toxins produced from serotypes A, B, E, and F with vast majority of cases due to serotypes A and B. Botulinum neurotoxins (BoNTs) are highly poisonous and have great potential as a biothreat agent. 1,2Thus, BoNTs are listed as a category A select agents by the Centers for Disease Control and Prevention (http://emergency.cdc.gov/bioterrorism). 3oNTs are produced as a progenitor toxin complex consisting of neurotoxin (NT), and neurotoxin associated proteins (NAPs) including hemagglutinins (HAs), nontoxic non-hemagglutinin (NTNH), and other uncharacterized components such as P21, P47, bot-R, and ORFX1-3.Neurotoxin complexes (NTCs) vary in sizes [viz.900 kDa Large Large-Toxin Complex (LL-TC), 500 kDa Large-Toxin Complex (L-TC), and 300 kDa Medium-Toxin Complex (M-TC)] and that reflects to number, ratio, presence and absence of individual components.The production of botulinum NTC is known to vary with different serotypes, strains, medium composition, and culture conditions.LL-TC has only been produced by serotype A. L-TC and M-TC is produced by serotypes A-D, and G strains, while M-TC is produced by serotype E and F strains.The biological and structural roles of the NAPs are not fully characterized but it is believed that they serve the role of protecting BoNT from digestive enzymes and adverse conditions present in the gastrointestinal tract. 4][7][8][9][10][11][12][13] HA cluster and open reading frame X (ORFX) cluster are two known primary gene clusters vary in serotypes and subtypes, and located upstream of the toxin gene.The HA cluster encodes for the HA17, HA-33, HA-70, bot-R, and a NTNH.The ORFX cluster encodes for ORFX-3, ORFX-2, ORFX-1, P-47, P-21, and a NTNH.The function of ORFX genes and their role in the neurotoxin cluster are still unknown, but it is assumed that they perform a similar role like their counterparts HAs because the location and orientation of the ORFX and HA cluster genes relative to the NT gene in the neurotoxin cluster are analogous. 8ver the past decade, a significant amount of BoNT gene clusters sequence data was analyzed.Comparative sequence analysis of the NT gene clusters revealed interesting phylogenetic relationships both within and among serotypes. 7,8,11To our knowledge, no reports present a complete overview of sequence comparison of neurotoxin gene clusters among serotypes A, B, E, and F. In this communication, we have explored the sequence diversity of neurotoxin gene clusters of C. botulinum by comparing amino acid sequence of NT and NAPs genes in serotypes A, B, E, and F. Results indicated the existence of marked sequence diversity in neurotoxin gene clusters within subtypes and serotypes of C. botulinum.

Materials and Methods
The gene bank accession numbers of the National Center for Biotechnology Information were used for amino acid sequences from fourteen C. botulinum strains including subtypes and serotypes A, B, E, and F (Table 1).Amino acid sequences of BoNT, and NAPs

Results and Discussion
The homology analysis of BoNT, NTNH, and three HA genes (HA-17, HA-34, and HA-70) within and between five subtype A is shown in Table 2.Among the four strain of subtype A1, three strains show 100% sequence identity for NT and NTNH genes except strain 62A where 1% difference was observed in NT and NTNH gene because of a single mutation in both gene (Table 2).Analysis of three HA genes revealed 100% sequence identity in all four strains.Next, the comparison of NT and NTNH gene was performed in five subtype of serotype A (A1-A5).Kyoto-F (subtype A2) and 657 Ba4 (subtype A4) strains showed 89%, and <78% sequence identity in NT and NTNH gene respectively compared to the Hall-H strain of subtype A1.NT and NTNH gene from Loch Maree strain (subtype A3) showed 84% and 79% sequence identity respectively to the Hall-H strain.The sequence identity of NT, NTNH, and HA genes for IBCA94-0216 strain (subtype A5) was 90%, 97%, and 91-97% respectively to the Hall-H strain of subtype A1 (Table 2).
The amino acid sequence of NT gene in serotype A was 37-39%, identical to serotypes B, E, and F respectively.However, sequence of NT gene in serotype E was 64% identical to serotype F (Table 3).Analysis of phylogram tree of NT genes revealed that serotype A1 is much closer to serotype B1 than serotype E1 and F1 (data not shown).A report described several noteworthy observations between the four subtypes (A1-A4) for type A including highly conserved catalytic domain residues of light chain (LC), divergent regions in protease nicking site, amino terminal of a translocation domain (H N ), and amino terminal of a receptor binding domain (H CN ). 14Therefore, analysis of amino acid sequences of toxin gene was performed in detail including a catalytic domain of LC, protease nicking sites, H N and H CN domains for four serotypes.The catalytic domain of LC (residues His 223, Glu 224, His 227, Glu 261, Glu 262, Glu 351, Arg 362, and Tyr 366) was completely conserved among four serotypes.The disulfides that are in between LC and HC are also highly conserved for four subtypes.Protease nicking site (residues 436-444) in serotype A1 was compared to serotypes B, E, and F. We noticed that residues 436-439 are divergent.Residues 440-444 are only present in serotype A but absent in serotypes B, E, and F. The variation in sequences of protease nicking site may be accounted for the differences in formation of double chain.We have also    -A4). 14Sequences of four variable regions of serotype A showed significant sequence diversity compared to serotypes B, E, and F. Recently, a paper described comparison of four specific peptides (residues 925-957, 967-1013, 1051-1069, and 1275-1296) of BoNT/A1, /A2, and /A5 that are known to be a targets for antibody neutralization. 15Analysis of these four specific polypeptides was performed in serotypes B, E, and F, and compared to serotype A. Results of sequence comparisons of these four polypeptides showed 66-80% sequence identity between serotypes E1 and F1.40-65% sequence identity was observed for serotypes B1, E1, and F1 compared to serotype A1 with the exception of polypeptide sequence 1051-1069 that shows only 22-27% sequence identity for serotype B compared to serotype E, and F (Table 3).Variability in sequences of LC, H N , and H CN regions were observed and this may be responsible for differences in structural configuration of NT because these residues are located in the belt region, at the H CN /H N interface, and the N-terminus of the H CN .The diversity in sequence of NT gene within serotypes and subtypes can account for significant differences in protection with vaccines and therapeutic agents. 12,16Due to universal presence of NTNH gene of C. botulinum, analysis of this gene was performed in four serotypes. 10omparison of amino acid sequences of NTNH gene for serotype A showed 68-83% identity to serotypes B, E, and F. The sequence identity for serotype B was 62-65% compared to serotype E and F, and even higher up to 75% in between serotype E and F (Table 3).We also noticed that the ORF of NTNH gene for serotypes A and B consist of 1193 and 1197 amino acid residues, respectively.However, the ORF of NTNH gene for serotypes E and F consist of 1163 residues and are lacking 30-34 amino acid residues compared to serotypes A and B. Analysis of phylogram tree of NTNH genes from four serotypes reveal that serotype A and F are more closely related compared to serotype B and E (data not shown).Based on sequence homology data, it appears that NTNH gene has higher sequence homology than NT gene among four serotypes (Table 3).NTNH gene has been explored as useful tool to rule out the presence of C. botulinum because of its universal nature in NT gene clusters. 17,18urthermore, ORFX genes were analyzed in serotypes A, E, and F. Comparison of ORFX genes revealed 78-95% sequence identity in subtypes A2-A4.Among three ORFX genes, the highest sequence identity was observed for ORFX-3 (92-95%) between subtypes A2-A4 (data not shown).We also observed that serotypes A (subtypes A2-A4) show 78-97% sequence identity for ORFX genes to serotype F, and 48-77% to serotype E. Analysis of ORFX genes in serotype E and F showed 50-77% sequence identity.Based on ORFX sequence data analysis, serotype A and F has more resemblance than serotype E. We are also interested in comparing sequences of HA-17, HA-33, and HA-70 genes to ORFX-1, ORFX-2, and ORFX-3 genes because they are counterpart in NT gene clusters.Analysis of sequence comparison of HA genes show <15% identity to ORFX genes within serotypes (data not shown).This clearly indicates diverse nature of sequences of HA and ORFX genes even though they are counter part in NT gene clusters.

Conclusions
In conclusion, we have analyzed amino acid sequences of NT gene clusters in serotypes A, B, E, and F of C. botulinum.Sequence of NT gene is highly divergent in subtypes and serotypes.However, the sequence of NTNH gene is highly conserved compared to NT among serotypes.Analysis of HA and ORFX genes revealed sequence diversity among subtypes and serotypes.Information derived from sequence analyses of neurotoxin gene clusters has direct implication in development of therapeutic countermeasures and detection tools for botulism.
genes were aligned individually by a Clustal W2.1 program of European Bioinformatics Institute.Four strains (Hall-H, Hall-A, ATCC-3502, and 62A) were analyzed for NT, NTNH, and HAs genes within subtype A1.Comparative sequence analysis of NT and NAP genes was performed within serotypes A, B, E, and F. Phylogram trees of BoNT and NTNH were also generated within four serotypes by Clustal W2.1 program.
N o n -c o m m e r c i a l u s e o n l y