Main

L. pneumophila is the causative agent of legionellosis, an atypical pneumonia that can be fatal if not treated promptly1. This Gram-negative facultative intracellular pathogen can adapt to both the aquatic environment and the intracellular milieu of phagocytic cells of a human host2. When inhaled in contaminated aerosols, L. pneumophila can reach the alveoli of the lung, where it is engulfed by macrophages. In contrast to most bacteria, which are destroyed there, L. pneumophila can multiply inside its phagosome and eventually kill the macrophage, resulting in legionellosis3.

L. pneumophila and other Legionella species are inhabitants of natural aquatic biotopes and of man-made water systems, such as air-conditioning cooling towers3. Legionella has been detected by culture in 40%, and by PCR in up to 80%, of freshwater environments, where it is known to survive and to replicate intracellularly in free-living protozoa, often within aquatic biofilms4. Its ability to exploit the basic cellular mechanisms of numerous protozoal eukaryotic hosts also enables Legionella to infect human cells5. Indeed, the capability of L. pneumophila to multiply intracellularly in amoeba has been shown to contribute to disease, although little is known about the mechanisms that govern host-microbe interactions. By contrast, the biphasic life cycle of L. pneumophila, its change from parasitic replicative cells to extracellular transmissive forms, and the complex regulatory network that governs this change are understood in part6.

The genus Legionella comprises 48 species, but over 90% of clinical cases of legionellosis are caused by L. pneumophila and, even more strikingly, up to 84% are caused by L. pneumophila serogroup 1 (ref. 7). To provide insight into the genetic characteristics of L. pneumophila, and to identify properties that have been selected in niches that are specific for the pathogenicity and the life cycle of L. pneumophila, we have determined the complete genome sequences of two clinical isolates of serogroup 1, strains Paris and Lens. Strain Paris is the only endemic strain known so far, accounting for 12.7% of cases of legionellosis in France and 33% of those occurring in the Paris area8. It is associated with hospital- and community-acquired forms of the disease that occur as outbreaks or sporadic cases. From November 2003 to January 2004, strain Lens caused an outbreak of 86 cases resulting in 17 deaths in northern France, suggesting that it is particularly successful in causing disease in humans. Comparative genomic assessment of an endemic and an epidemic isolate provides the basis for understanding strain specificities and might provide clues to the particular adaptability and stability of L. pneumophila strain Paris.

Results

General features

L. pneumophila strain Paris and strain Lens each contain one circular chromosome of about 3,503,610 bp and 3,345,687 bp, respectively, with an average G+C content of 38% (Table 1 and Fig. 1). L. pneumophila Paris contains one plasmid of 131,885 bp and strain Lens contains a plasmid of 59,832 bp.

Table 1 General features of the two Legionella pneumophila genomes
Figure 1: Circular genome map of L. pneumophila strain Paris and specific genes of L. pneumophila strain Lens.
figure 1

From the outside, the first circle indicates strain Paris genes on the + and − strands, respectively. Red bar indicates an inversion in strain Lens. Green indicates strain Paris genes, black indicates rRNA operons, and red indicates known virulence genes as follows: 1, lvh-lvr type IV secretion system (lvrABC, lvhB2B3B4B5, lvrD, lvhB6B8B9B10B11D4, lvrE); 2, dot-icm type IV secretion system (icmTSRQOMLKEGCDJBF); 3, mip; 4, lspA; 5, lspDE; 6, htrA; 7, lspFGHIJK; 8, enhABC; 9, dot-icm type IV secretion system (icmVWX and dotABCD); 10, momp. The second circle shows genes specific for strain Lens with respect to strain Paris. The third circle shows the G/C bias (G+C/G−C) of strain Paris. The fourth circle shows the G+C content of strain Paris: light yellow, <32.5% G+C; yellow, between 32.5 and 44.1% G+C; dark yellow, >44.1% G+C. The scale (in Mb) is indicated on the outside, with the origin of replication at position 0.

We identified 3,077 genes in the chromosome of L. pneumophila Paris and 2,932 in that of L. pneumophila Lens (Supplementary Table 1 online). No function could be predicted for 42.1% (1,354) of the L. pneumophila Paris and 44.1% (1,320) of the L. pneumophila Lens genes, a proportion similar to that found in most other sequenced bacterial genomes. A high proportion of the predicted genes (21% in strain Paris and 20.4% in strain Lens) are unique to the genus Legionella and thus might encode Legionella-specific functions.

Exploitation and modulation of host cell functions

An intriguing issue is how Legionella subverts host functions to enter, survive, replicate and evade amoebae or alveolar macrophages. L. pneumophila encodes an abundance of eukaryotic-like proteins in its genome. Indeed, 30 genes encode proteins with high similarity to eukaryotic proteins (Table 2), and 32 genes encode proteins with eukaryotic domains that are implicated in protein-protein interactions (Table 3). We highlight here proteins that are predicted either to divert eukaryotic regulatory pathways or to be secreted into eukaryotic cells, making them strong candidates for directing Legionella invasion, its trafficking in the host cell, and its modulation or evasion of host cell functions.

Table 2 L. pneumophila proteins with the highest similarity score to eukaryotic proteins
Table 3 L. pneumophila proteins encoding domains preferentially found in eukaryotic proteins

Tetratrico peptide repeats (TPRs) are degenerate repeated motifs of 34 amino acids that are present in tandem arrays of 3–16 motifs and form scaffolds to mediate protein-protein interactions. TPR proteins contribute to cell cycle control, transcription repression, the stress response, protein kinase inhibition, mitochondrial and peroxisomal protein transport, and neurogenesis9. Sel-1 repeats represent a subfamily of TPR sequences. In L. pneumophila, five proteins containing Sel-1 domains were identified. Two of those (EnhC, LidL) have been previously implicated in interactions with host cells or in the early signaling events that regulate L. pneumophila trafficking decisions in macrophages10,11. Therefore, the three newly identified proteins are also probably involved in host-pathogen interactions (Table 3).

After internalization, L. pneumophila manipulates the host endosomal-lysosomal degradation pathway to survive and to replicate in a vacuole derived from the endoplasmic reticulum (ER). One L. pneumophila protein, RalF, which is thought to contribute to ER recruitment, contains a eukaryotic Sec 7 domain. RalF, a substrate of the type IV secretion system, is required for association of the regulatory protein ARF with the L. pneumophila phagosomes12. Both L. pneumophila strains encode three eukaryotic-like serine/threonine protein kinases (STPKs; Table 3). Multiple sequence comparisons of kinase domains from L. pneumophila Paris and other prokaryotic and eukaryotic STPKs revealed that Lpp2626 and Lpp1439 of L. pneumophila cluster in the group of eukaryotic STPKs, close to STPKs from Entamoeba histolytica (Fig. 2). Mycobacterium tuberculosis, which like L. pneumophila blocks phagosome lysosome fusion, produces 11 such eukaryotic-like STPKs13. In particular, PknG STPK of M. tuberculosis inhibits phagosome-lysosome fusion and promotes intracellular survival14. The STPK domain of the Lpp0267 and Lpl0262 (of L. pneumophila strains Paris and Lens, respectively) is closely related to PknG and to the Y. pseudotuberculosis STPK YpkA (Fig. 2), an enzyme that is translocated into eukaryotic cells, where it counteracts host defense by interfering with eukaryotic signal transduction pathways15. This finding suggests that the L. pneumophila STPKs may also modulate eukaryotic signal transduction mechanisms and can modify host cell trafficking pathways.

Figure 2: Phylogenetic tree of kinase domains from Legionella pneumophila strain Paris and other prokaryotic and eukaryotic kinases constructed by the program MEGA.
figure 2

The calculation was done by using Poisson correction as the distance method and neighbor joining as the tree construction method. The value of 0.2 on the scale bar indicates two amino acid substitutions per ten sites. The Nrprot accession numbers, gene names and organism names are shown. Numbers indicate bootstrap values.

Twenty proteins contain ankyrin domains (Table 3), tandem repeats of around 33 amino acids that represent one of the most common modular protein-protein interaction motifs of eukaryotes. So far, the only prokaryotic genomes known to encode large families of ankyrin domain proteins are those of Coxiella burnetii and Wolbachia pipientis, which encode 13 and 23 members, respectively16,17. Similar to L. pneumophila, C. burnetii is an intracellular pathogen that is highly adapted for life in the eukaryotic phagolysosome, whereas W. pipientis is a parasitic 'endosymbiont' that lives inside the reproductive cells of various arthropods. Thus, ankyrin domains may be involved in a common microbial mechanism to manipulate the host cell physiology.

A possible function of the ankyrin repeat–containing proteins (hereafter called ankyrin proteins) of L. pneumophila is to modify interactions with the host cytoskeleton, because many eukaryotic ankyrin proteins are thought to function as linkers between membrane proteins and the cytoskeleton18 and are important for targeting proteins to the plasma membrane or to the ER. Ankyrin domains are also components of transcriptional regulators, which suggests that they can influence host cell gene expression, as has been proposed for Ehrlichia phagocytophila19. Indeed, one of the L. pneumophila ankyrin proteins (Lpp3991/Lpl0559) also contains a eukaryotic SET domain, which is known to bind host chromatin and to influence host cell gene expression20. In contrast to the ankyrin proteins of W. pipientis, none of the L. pneumophila ankyrin proteins contains a signal peptide17. Instead, some of them might be secreted via the type IV secretion system, a pathway that is independent of typical targeting signals.

The last stages in the intracellular life cycle of L. pneumophila involve the killing of and escape from the host cell, a mechanism that is not understood. A class of proteins that may affect the control of host cell division (Lpp2082/Lpl2072, Lpp2486, Lpp0233/Lpl10234) contains eukaryotic F-box domains, which constitute sites of protein-protein interactions. Eukaryotic F-box domains are typically associated with other interaction domains21. Consistent with this, two of the F-box proteins identified in L. pneumophila are associated with an ankyrin repeat or a coiled-coil motif (Table 3). F-box proteins assembled into SCF ubiquitin-ligase complexes determine which substrates will be targeted for ubiquitination and subsequent proteolysis by the proteasome. Because the targeted substrates include promoters and inhibitors of the cell cycle, as well as signal transduction components22, F-box proteins can regulate cell division and differentiation. To our knowledge, the only other prokaryotic F-box protein that has been described is Agrobacterium tumefaciens VirF, a protein that is thought to interact with host proteins through its F-box domain to target them for proteolysis23. Another motif implicated in eukaryotic ubiquitination is the U-box motif. Protein Lpp2887, present in strain Paris but not in strain Lens, contains such a motif. To our knowledge, it is the first one that has been identified in a prokaryotic organism.

Additional eukaryotic-like proteins identified in the genomes of both L. pneumophila strains are sphingosine 1-phosphate lyase (Lpp2128/Lpl2102) and two secreted apyrases (Lpp1033, Lpp1880/Lpl1000, Lpl1869), suggesting that L. pneumophila modulates the host cell cycle to its advantage. The widely expressed enzyme sphingosine 1-phosphate lyase catalyzes the essentially irreversible cleavage of the signaling molecule sphingosine 1-phosphate, a product of sphingomyelin degradation that regulates cell proliferation and cell death in eukaryotes. Indeed, overexpression of sphingosine 1-phosphate lyase can induce apoptosis in eukaryotes, identifying this enzyme as a dual modulator of sphingosine 1-phosphate and ceramide metabolism, as well as a regulator of cell fate decisions24.

In addition, sphingosine 1-phosphate has a central role in the development of the amoeba Dictyostelium discoideum, because disruption of its gene results in aberrant actin distribution, an abnormal morphogenetic phenotype and an increase in viability during the stationary phase25. The two L. pneumophila apyrases are the only apyrases so far identified in prokaryotes. The apyrase protein family comprises enzymes capable of cleaving nucleotide tri- (NTPs) and diphosphates (NDPs) in a Ca2+- or Mg2+-dependent manner. Apyrase has been isolated in the autophagy vacuole26, suggesting that the two L. pneumophila proteins influence the fate of the L. pneumophila phagosome by decreasing the concentration of NTPs and NDPs during cell parasitism.

We also identified 246 proteins (7.6%) in strain Paris and 231 proteins (7.7%) in strain Lens with predicted coiled-coil domains (Supplementary Table 2 online), many of which also show weak similarities to eukaryotic proteins. Coiled-coil domains mediate protein-protein interactions either for protein multimerization or for macromolecular recognition. Thus, coiled-coil domains may target proteins to their appropriate localization in the eukaryotic host. Notably, all nine substrates (SidA-SidH, SdeC) of the type IV secretion system27 and LidA, LepA and LepB contain coiled-coil domains.

To affect the eukaryotic cell, L. pneumophila must translocate these eukaryotic-like proteins to the host cytoplasm. These proteins are therefore candidate substrates for the type IV secretion system, as are VirF in A. tumefaciens28 and RalF in L. pneumophila12.

Secretion systems

Central to the pathogenesis of L. pneumophila are the dot and icm loci, which together direct assembly of a type IV secretion apparatus. Although both strains contain the complete dot-icm loci, their sequences show variations. Indeed, a previous comparison of the sequence of the dot-icm genes of 18 different L. pneumophila strains identified considerable sequence variation and placed the strains into seven different phylogenetic groups; however, no correlation with virulence was apparent29.

A previously unknown putative virulence factor of L. pneumophila, restricted to strain Paris, is a predicted autotransporter protein. Lpp0779 contains several hallmarks of type V secretion systems, including an N-terminal leader peptide for secretion across the inner membrane and a dedicated C-terminal domain that forms a pore in the outer membrane through which the passenger domain passes to the cell surface30. The passenger domain is composed of hemagglutinin repeats—which are known to be involved in cell-cell aggregation—that are highly similar to those of the Escherichia coli autotransporters AIDA-I and Ag43, two proteins that are implicated in virulence. The bacterial surface protein AIDA-I mediates adherence to mammalian cells31, whereas Ag43 not only confers a low level of adhesion to certain mammalian cells, but also promotes the autoaggregation that is important for biofilm formation32.

Similarly, the L. pneumophila autotransporter may be involved in adhesion to the host cell and in biofilm formation. In contrast to AIDA-I and Ag43, the L. pneumophila autotransporter does not possess an RGD motif, which is implicated in binding to human integrins, and thus may have another interaction domain. The autotransporter was probably acquired by horizontal gene transfer, as is suggested by its many upstream remnants of insertion sequences and its GC content of 41%, which exceeds the genome average of 38% (Supplementary Fig. 1 online). Studies on the distribution of this gene in clinical and environmental L. pneumophila strains, together with research into its function, should provide insight to its importance.

In addition, both L. pneumophila strains contain a twin arginine translocation (Tat) secretion pathway (TatAB, TatC) and complete type I and II secretion systems. The type II system encoded by the lspA and lspD to lspJ genes is required for the secretion of several enzymes such as lipase A and B (Lpp0533, Lpp1159/Lpl0509, Lpl1164), acid phosphatase Map and SurE (Lpp1120, Lpp1245/Lpl1124, lpl1245), lysophospholipase A (Lpp2291/Lpl2264) and the phospholipase PlaB (Lpp1568/Lpl422), proteins that are all present in both strains.

Metabolism

The metabolic pathways used by L. pneumophila to multiply in eukaryotic cells are not known. The bacterium seems to prefer proteinaceous substrates, because a large set of oligopeptide and amino acid uptake and degradation systems are encoded in the genome. In particular, in addition to the Pseudomonas aeruginosa elastase homolog ProA (Lpp0532/lpl0508), three secreted paralogous metalloproteases and 46 additional peptidases are present (Supplementary Table 3 online).

By contrast, systems for sugar uptake are rare, although the complete Embden-Meyerhof and Entner-Doudoroff pathways are present. In neither L. pneumophila strain was a phosphotransferase-like uptake system identified. Some of the 55 ABC-type transport systems might be involved in sugar uptake, however, because the bacterium does possess a few systems for degrading complex sugars such as trehalase, polysaccharide deacetylase, eukaryotic-like glucoamylase (Lpp0489/Lpl0465), β-hexosaminidase and chitinases (Lpp1117/Lpl1121). Both strains encode proteins that are highly homologous to glycerol phosphate ABC transporters (Lpp1696, Lpp1695, Lpp1694/Lpl1695, Lpl1694, Lpl1693) and a hexose phosphate transporter (Lpp2623/Lpl2474), which may be important during intracellular growth. We also identified several enzymes that are probably involved in myo-inositol use and that may interfere with the host cell signaling mediated by this intracellular messenger.

L. pneumophila is predicted to encode an extensive aerobic respiratory chain consisting of NADH dehydrogenase, cytochrome-dependent succinate dehydrogenase, ubiquinol-cytochrome c reductase and four terminal oxidases (aa3 cytochrome, two bd type quinol cytochromes, o type quinol cytochrome oxidase), which guarantee its capacity to cope with changing oxygen tension. The o type quinol cytochrome oxidase is absent in strain Lens. Systems involved in anaerobic respiration are apparently absent in all strains. Both strains encode a FoF1-type ATP synthase typical of γ-proteobacteria; in addition, strain Paris encodes a second ATP synthase similar to uncharacterized systems of archaea and marine bacteria.

L. pneumophila also encodes at least four Na+/H+ antiporters (Lpp1464, Lpp2448, Lpp0868, Lpp0667/Lpl1519, Lpl2304, Lpl0839, Lpl0651), which probably modulate the H+ and Na+ gradients across the cytoplasmic membrane. Thus, a Na+ motive force may be used for cellular activities. In this respect, the presence of a component of Na+-type polar flagellar motors, MotY, as well as two significantly different motA-motB gene clusters leads to the prediction that motility might be powered by Na+ as well as H+ motive forces. A particular feature of L. pneumophila is its differentiation to a mature intracellular form that accumulates inclusions of poly-hydroxybutyrate as a carbon and energy reserve. In agreement with this, strain Paris encodes four and strain Lens encodes three parologous polyhydroxybutyrate synthases (Lpp2323, Lpp2038, Lpp2214, Lpp0650/Lpl1055a and Lpl1055b, Lpl2186, Lpl0634).

Physiological adaptation and gene regulation

Consistent with its intracellular lifestyle, the regulatory repertoire of L. pneumophila is rather small. In strain Paris, genome analysis identified 92 (79 in strain Lens) transcriptional regulators, representing only 3.0% of the predicted genes (Supplementary Table 4 online). L. pneumophila encodes six putative sigma factors, the homologs of RpoD, RpoH, RpoS, RpoN, FliA and the ECF-type sigma factor RpoE. The number of two-component systems (13 histidine kinases, 14 response regulators) is also low.

The most abundant class of regulators belong to the GGDEF-EAL family, with 23 members (Supplementary Table 4 online). Present in many bacteria, including Vibrio cholerae (41 members), P. aeruginosa (33), Wolinella succinogenes (26) and E. coli (19), these regulators contain two subdomains, GGDEF and EAL. Of the 23 regulators identified, 10 contain only a GGDEF domain, 3 in strain Paris and 2 in strain Lens contain an EAL domain, and 10 in strain Paris and 11 in strain Lens contain a combination of both. The role of these regulators in L. pneumophila is unknown, but in other bacteria they are involved in aggregation, biofilm formation or twitching motility.

In L. pneumophila, cyclic AMP may also transduce cellular signals, because the genome encodes five class III adenylate cyclases (Lpp1446, Lpp1131, Lpp1704, Lpp1277, Lpp0730/Lpl1538, Lpl1135, Lpl1703, Lpl1276, Lpl0710). In P. aeruginosa, CyaB, a class III adenylate cyclase, is involved in regulating virulence genes33. L. pneumophila, however, does not contain an ortholog of Vfr, the cAMP-dependent regulator of P. aeruginosa, but it does encode five proteins with cAMP-binding motifs (Lpp3069, Lpp1482, Lpp2063, Lpp0611, Lpp2777/Lpl2926, Lpl1501, Lpl2053, Lpl0592a, Lpl0592b, Lpl0592c, Lpl2648). As in P. aeruginosa, these class III adenylate cyclases may sense environmental signals ranging from the nutritional content of the surrounding medium to the presence of host cells and control virulence gene expression accordingly.

High plasticity of the L. pneumophila genomes

The two genomes show markedly high plasticity and diversity. Comparison of the chromosomes identified a conserved backbone of 2,664 genes, but 280 and 428 (10 and 14%, respectively) strain-specific genes (Fig. 3 and Supplementary Tables 5 and 6 online). Given that the two strains analyzed belong to the same species and the same serogroup, this diversity is unexpected. For example, comparison of two strains of Salmonella enterica serovar Typhi identified only 2% of strain-specific genes in each genome34. The specific genetic equipment of strain Paris contains many regulators (three CsrA homologs, 13 transcriptional regulators), additional ankyrin and eukaryotic-like proteins (Tables 2 and 3) and several restriction modification genes (DNA modification methylases and endonucleases), which may explain the low strain's competence (data not shown) and high genomic stability8 of strain Paris. Strain Lens contains only four specific regulators and four specific proteins with eukaryotic domains (Table 2), two of which are ankyrin proteins, suggesting that the Paris strain is particularly well equipped.

Figure 3: Core genomes and unique gene complements of L. pneumophila strains Paris and Lens.
figure 3

Orthologous genes were defined by reciprocal best-match FASTA comparisons. The threshold was set to a minimum of 80% sequence identity and a length ratio of 0.75–1.33.

The L. pneumophila genomes have undergone numerous genome rearrangements. Genome-wide synteny between strain Paris and strain Lens is disrupted by a 260-kb inversion, by a 130-kb insertion in strain Paris (or deletion in strain Lens) and by multiple smaller deletions and insertions (Supplementary Fig. 2 online). The 130-kb insertion is flanked by a tRNA gene and encodes a putative integrase, suggesting a structure similar to the pathogenicity islands of enterobacteria. It contains genes encoding an ATP synthase and chemiosmotic efflux systems (cebABC, cecABC), the genes cadA1, ctpA, copA1 and copA2, which encode ATP-dependent efflux pumps and have been shown to be induced in macrophages35, and the prpA-lvrABC gene cluster, which is present in a 65-kb pathogenicity island in strain Philadelphia36.

Except for the above-mentioned genes, this 65-kb pathogenicity island is absent from strains Paris and Lens; however, the corresponding chromosomal location in strain Paris is the insertion site of an integrative plasmid discussed below. Thus, these two regions might be hot spots for genomic rearrangements. Genomic variation is also evident from the array of mobile elements, which includes 10 integrases, 58 insertion sequences (34 complete and 24 truncated ones; Supplementary Table 7 online) and phage-related proteins. In addition, the genomes contain a large set of repeated sequences organized as inverted repeats, which are reminiscent of enterobacterial repeated intergenic consensus (ERIC) sequences. The Legionella or 'LeRIC' sequences fall into seven classes that are present in many copies (80, 18, 18, 25, 9, 9 and 6 in strain Paris) (Supplementary Fig. 3 online).

L. pneumophila strains Paris and Lens contain lvh, a region that encodes a second type IV secretion system previously characterized in strain Philadelphia37. Notably, the lvh region of L. pneumophila Paris is encoded in a 36-kb region that is either integrated in the chromosome or excised as a multicopy plasmid (data not shown). This pattern is similar to that described for the 30-kb unstable element of strain Olda, which is possibly derived from phage and is involved in phase variation38. The lvh region has a G+C content (43%) that differs from that of the rest of the chromosome (38%), and it contains some phage-related genes, suggesting a possible phage origin. The exact mode of excision and integration is not, however, understood. An appealing hypothesis is that the integration and excision of particular regions of the chromosome constitute a specific mechanism of L. pneumophila to increase versatility.

The second plasmid of strain Paris (132 kb) includes known virulence factors, mobile genetic elements and antibiotic resistance genes. The two-component regulator system lrpR-lskS present on this plasmid has been found on a Legionella longbeachae plasmid implicated in virulence of this species39. The high conservation (93–98% protein identity) of the six genes sequenced on the 135-kb L. longbeachae plasmid might indicate recent horizontal transfer between L. pneumophila and L. longbeachae. L. pneumophila Lens contains a 60-kb plasmid that encodes several proteins homologous to the transfer region of the F plasmid of E. coli. All three plasmids of strains Paris and Lens encode a paralog of CsrA—a repressor of transmission traits and activator of replication40.

Although the role of plasmids in L. pneumophila virulence remains to be determined, correlation between strains containing a plasmid and virulence in a mouse model has been described41. In addition, L. pneumophila strains with plasmids seem to persist longer in the environment than do strains lacking plasmids42. The identification in both clinical isolates of plasmids encoding putative virulence factors is another indication of the importance of plasmids for pathogenicity of Legionella.

The L. pneumophila genomes also show plasticity at the gene level. The enh loci, implicated in the entry into host cells43, are present in strains Paris and Lens. One of the proteins encoded by these loci is RtxA, which contributes to entry, adherence, cytotoxicity, pore formation43 and intracellular trafficking in amoebae44. Unlike in strain AA100 (ref. 10), in strain Paris, rtxA is fused to arpB and a second gene with roughly 30 highly conserved tandem repeats of 549 bp. A similar structure is encoded by strain Lens; however, the latter strain contains two motifs in the repeated region, both of which differ from that of strain Paris although the number of repetitions seems to be the same (Fig. 4). It is possible that variations in the number and sequence of the repeats contribute to L. pneumophila versatility and to virulence.

Figure 4: Comparison of the Rtx protein–coding genes of L. pneumophila strains AA100, Paris and Lens.
figure 4

(a) The sequence of the rtxA locus of strain AA100 was obtained from the National Center for Biotechnology Information database. Broken lines indicate that the correct number of repeats is uncertain. (b) Consensus sequences of the highly conserved repeated motifs of strains Paris and Lens. The 11 amino acids of the conserved N-terminal sequence of strains Paris and Lens are shown in black; amino acids in the repeated motifs of each strain are colored as in a. Underlined amino acids indicate positions that may differ among repeats.

Consistent with the plasticity apparent in their genomes, the L. pneumophila strains encode type IV pili—organelles that are required for natural competence for DNA transformation45. The organization of the genes coding type IV pili is similar to that in P. aeruginosa, where they are crucial for bacterial adherence to and colonization of mucosal surfaces and for twitching motility. Another mechanism in L. pneumophila that contributes to genome plasticity is conjugative transfer mediated by the type IV secretion of plasmids46 and chromosomal DNA47.

Discussion

Analysis and comparison of the genome sequences of the Paris and Lens strains of clinical L. pneumophila identify this bacterium as a highly versatile organism that shows extensive genome plasticity and diversity. The excision and integration of plasmids or genes might represent one mechanism that L. pneumophila exploits to adapt to different environments. Its large cohort of eukaryotic-like proteins is predicted to manipulate the host cell to the advantage of the pathogen (Fig. 5). Proteins of putative eukaryotic origin have been also identified in other intracellular pathogens, including Coxiella, Wolinella, Agrobacterium, Mycobacterium and Ehrlichia, but currently L. pneumophila ranks as the prokaryote with the widest variety of eukaryotic-like proteins. Presumably, pathogenic L. pneumophila acquired DNA by horizontal transfer from its host or by convergent evolution during its coevolution with free-living amoebae. These proteins may also contribute to the infection of human macrophages.

Figure 5: Steps in the intracellular growth of L. pneumophila in macrophages.
figure 5

Four different steps are shown: 1, L. pneumophila adhesion to and invasion of the host cell; 2, recruitment of organelles and their conversion into a rough endoplasmic reticulum-like compartment (note that the phagosome does not fuse with lysosomes); 3, intracellular replication of non-flagellated L. pneumophila inside a phagosome; and 4, release of flagellated L. pneumophila. Red indicates steps that are important in the infectious cycle of L. pneumophila. Blue indicates newly identified proteins that could interfere at these steps in the cycle.

On the basis of these genome sequences, future comparative and functional studies will enable us to define survival tactics of intracellular parasites and to identify the special attributes of endemic and epidemic L. pneumophila. To combat the increasing threat of L. pneumophila strains that are resistant to the chemicals used to decontaminate public water systems, including those of hospitals, the genome sequences could be used to identify targets for new biocides that are active against L. pneumophila.

Methods

DNA preparation and sequencing.

L. pneumophila strains Paris and Lens were grown on BCYE agar at 37 °C for 3 d and chromosomal DNA was isolated by standard protocols. Cloning, sequencing and assembly were done as described48. For both genomes, two libraries (inserts of 1–2 and 2–3 kb) were generated by random mechanical shearing of genomic DNA, followed by cloning the fragments into pcDNA-2.1 (Invitrogen). A scaffold was obtained by end-sequencing clones from a BAC library constructed as described49 using pIndigoBac (Epicentre) as a vector. For L. pneumophila strain Paris, we constructed a medium-sized insert library (8–10 kb) in the low-copy number vector pSYX34. Plasmid DNA purification was done with either a Montage Plasmid Miniprep96 kit (Millipore) or a TempliPhi DNA sequencing template amplification kit (Amersham Biosciences). Sequencing reactions were done with an ABI PRISM BigDye Terminator cycle sequencing ready reactions kit and a 3700 or a 3730 Xl Genetic Analyzer (Applied Biosystems). We obtained, assembled and finished 47,200 sequences for L. pneumophila strain Paris and 47,231 sequences for strain Lens from four libraries each as described48.

Annotation and analysis.

Definition of coding sequences and annotation were done as described48 by using CAAT-box software50. All predicted coding sequences were examined visually. Function predictions were based on BLASTp similarity searches and on the analysis of motifs using the PFAM, Prosite and SMART databases. We identified orthologous genes by reciprocal best-match BLAST and FASTA comparisons. To identify coiled-coil domains, we used the publicly available software PairCoil and Coilscan. Pseudogenes had one or more mutations that would prevent complete translation. Repetitive DNA sequences were identified by BLASTN comparisons of the intergenic regions and the complete genome. To predict the folding of single-strand DNA molecules, we used MFOLD software.

URLs.

The sequence and the annotation of both L. pneumophila genomes are accessible at the LegioList Web Server (http://genolist.pasteur.fr/LegioList). For annotation and analysis we used PairCoil (http://paircoil.lcs.mit.edu/cgi-bin/paircoil) and Coilscan (http://www.biology.wustl.edu/gcg/coilscan.html). MFOLD server, http://www.bioinfo.rpi.edu/applications/mfold/old/dna/.

Accession numbers.

GenBank: L. pneumophila strain Paris, CR628336; L. pneumophila strain Lens, CR628337; L. pneumophila Paris plasmid, CR628338; L. pneumophila Lens plasmid, CR628339. GenBank protein: strain AA100 rtxA locus, AAD41583.

Note: Supplementary information is available on the Nature Genetics website.