MultiLocus Sequence Typing (MLST)
of Pathogenic Escherichia coli

Revised September 2002

Original version

Multilocus sequence typing (MLST) makes use of rapid sequencing technology to uncover allelic variants in conserved genes for the purpose of characterizing, subtyping, and classifying members of bacterial populations. It has been particularly useful in studying the population genetics of recombining bacterial pathogens, such as Neisseria meningitidis (Maiden et al., 1998) and Streptococcus pneumoniae (Enright & Spratt, 1998). MLST analysis has indicated that, in many species, recombinational replacements contribute more to clonal diversification than do point mutations and, in some species, recombination has been sufficiently frequent to eliminate any phylogenetic signal from gene trees (Feil & Spratt, 2001). One of the advantages of MLST over other molecular typing methods is that sequence data are portable between laboratories and have led to the creation of global databases that allow for exchange of molecular typing data via the Internet (Feil & Spratt, 2001).

We have developed an MLST system for Shiga-toxin producing E. coli, other diarrheagenic E. coli strains, and Shigella species and serovars using a stepwise approach. First, we conducted nucleotide sequencing of 7 housekeeping genes using 20 representative serotypes of major ET groups of pathogenic E. coli including Stx-producing strains of the EHEC groups and STEC groups that do not have the intimin (eae) locus. Phylogenetic analysis showed that the genetic relationships among the epideic clones of enteropathogenic (EPEC), enterohemorrhagic (EHEC), and STEC strains is tree-like, with more than 70% of the polymorphic sites agreeing with a single phylogeny (Reid et al., 2000). The old protocols and original data are available (here). Second, we used these sequence data to design new MLST primers. These new primers were located in conserved regions that encompass polymorphic informative sites and spaced so that at least 500 bp of sequence could be obtained in single pass for both strands for each locus. With the redesigned primers, the amount of sequence per reaction was doubled (because of the reduced overlap) so that we could now examine more genes for the same number of reactions. Third, we expanded the number of loci to 15 by choosing additional conserved housekeeping genes that are widely spaced around the chromosome. This information was used to assess the degree of variability among loci and to examine patterns of variability with position on the chromosome. The 15 MLST loci are on average 331 kb apart (range 8 - 692 kb) on the K-12 map. Comparison of the distance between adjacent loci in E.coli K-12 and Salmonella enterica Typhimurium LT-2 shows that the distances are highly correlated with an average deviation of 43.6 kb (~13 %) from identity. This observation indicates that the gene order and position is conserved since the time of divergence from the common ancestor of E. coli and S. enterica.

MLST based on 7 genes. By examining the full data on 15 loci, we selected 7 informative gene segments on which to base an MLST scheme. We sequenced these ~500 bp segments in 130 pathogenic strains including major pathovars and Shigella serotypes. In the total 3,573 bp of sequence there are 360 variable sites of which 263 are phylogenetically informative (Table 1). Most of the variable sites represent silent mutations: the ratio of synonymous to non-synonymous differences is ca. 40:1 indicating these housekeeping genes are highly conserved in amino acid sequence. The number of alleles resolved per locus ranges from 25-32 indicting that there is sufficient variability, at least in principle to resolve, ca.30 Exp(7) 7-locus combinations of alleles or sequence types (STs).

Among the 130 isolates examined, there were 75 STs that were resolved. Nearly 2/3 of these belong to one of 15 groups or clone complexes. These clone complexes were recognized both by BURST analysis of sequence types and by bootstrap analysis of concatenated sequence genotypes. These main clone complexes represent the epidemic strains that exist in the E. coli population and circulate to cause both sporadic cases and outbreaks of disease.

MLST primer redesign (arcA, aroE, icd, mdh, mtlD, pgi, rpoS)

Using sequence data previously obtained for 20 strains, the primers were redesigned using the computer programs Primer Designer and DNASTAR. Sequences were aligned and primers were designed in the conserved regions of each gene. In all cases, the K-12 sequence was used in Primer Designer and DNASTAR. Forward and reverse primers were designed separately using Primer Designer. All primers for a gene were then loaded into DNASTAR and the best primer pair that gave a 300-700 bp amplicon was found.

MLST primer design (aspC, clpX, cstA, cyaA, dnaG, fadD, grpE, lysP, mutS, uidA)

Using the published genomic sequences for K-12, EDL-933, and Sakai, the primers were designed using the computer programs Primer Designer and DNASTAR. Sequences were aligned and primers were designed in the conserved regions of each gene. In all cases, the K-12 sequence was used in Primer Designer and DNASTAR. Forward and reverse primers were designed separately using Primer Designer. All primers for a gene were then loaded into DNASTAR and the best primer pair that gave a 400-700 bp amplicon was found.

Table of MLST primers for 7 genes

Primers for additional loci for extended MLSA

MLST PCR protocols

MLST CEQ Quick Start protocol

MLST protocol summary

  1. Set up and run PCR according to the provided protocol.
  2. Run PCR products on a 1.5% agarose gel to confirm presence of the desired product.
  3. Purify PCR products using the QIAquick PCR purification kit according to the manufacturer’s instructions (purified product eluted into 30 ul of buffer EB).
  4. Quantify purified PCR products on a 1.5% agarose gel using a low DNA mass ladder (Gibco BRL).
  5. Set up and run sequencing PCR according to the provided protocol.
  6. Purify sequencing PCR products using the Millipore MultiScreen 96-well filtration plates according to the manufacturer’s instructions.
  7. Resuspend dried sequencing product in 40 ul of deionized formamide for use with a Beckman CEQ2000XL automated sequencer.

REFERENCES

Enright, M. C. & Spratt, B. G. (1998). A multilocus sequence typing scheme for Streptococcus pneumoniae : identification of clones associated with serious invasive disease. Microbiology 144: 3049-3060.

Feil, E. J. & Spratt, B. G. (2001). Recombination and the population structures of bacterial pathogens. Annu. Rev. Microbiol. 55: 561-590.

Maiden, M. C., Bygraves, J. A., Feil, E., Morelli, G., Russell, J. E., Urwin, R., Zhang, Q., Zhou, J., Zurth, K., Caugant, D. A., Feavers, I. M., Achtman, M. & Spratt, B. G. (1998). Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA 95: 3140-3145.

Reid, S. D., Herbelin, C. J., Bumbaugh, A. C., Selander, R. K. & Whittam, T. S. (2000). Parallel evolution of virulence in pathogenic Escherichia coli. Nature 406: 64-67.