Candida glabrata MLST nomenclature changes

It has been necessary to change some of the ST numbers in the C. glabrata MLST scheme. This has come about due to an unforeseen issue whereby STs had been assigned but not entered in to the database, prior to its hosting on PubMLST. Some of these were subsequently included in the published literature.

Since this came to light, we have tried to minimise the impact and retrospectively added the profiles that have been formally published. Where these had been assigned STs that have since been re-issued we have had to change the ST on the later-assigned profiles.

The following timeline indicates the changes that have been made:

2016
Transfer of alleles, 68 STs, and isolates from mlst.net.
2016
STs 5 and 9 were not part of the data received, although they are mentioned in the literature. ST-5 and ST-9 have been blocked from future use.
2012-2016
ST numbers issued by mlst.net within this timeframe were not documented and were not passed to pubmlst.org during the data migration process.
08/2018
Novel alleles and sequence types from Lott et al. 2010 (PMID 20190071(link is external)) and Lott et al. 2012 (PMID 21838617(link is external)) were added to the sequence database.
08/2018
ST-83 in Lott et al. 2010 (PMID 20190071(link is external)) is identical to ST-75 in Lott et al. 2012 (PMID 21838617(link is external)). ST-83 was given preference due to the earlier publication date; consequently there is no ST-75 in the database, and the number is blocked from future use.
09/2018
Novel sequence types from Amanloo et al. 2018 (PMID 28482076(link is external)) were added; due to partial overlap with numbers used in this study with those used in Lott et al. 2010 and Lott et al. 2012, ST-71 to ST-79 (designation in Amanloo et al. 2018) are included as ST-101 to ST-109 in the database. This is noted in the respective ST records.
10/2018
A total of 368 non-redundant isolates from Lott et al. 2010 (PMID 20190071(link is external), n=229 ) and Lott et al 2012 (PMID 21838617(link is external), n=265) were added to the isolates database. Isolates included in these two studies partially overlap.
11/2018
A total of 50 isolates from Amanloo et al. 2018 (PMID 28482076(link is external)) were added to the isolates database, with ST numbering as explained above.
11/2018
Added novel alleles, STs, and isolates from Bordallo-Cardona et al. 2019 (PMID 30397068(link is external)) upon publication.
11/2018
Added novel alleles, STs, and isolates from Achmad et al. 2019 (PMID 30455247(link is external)) in parallel to publication process.
12/2018
Added novel alleles, STs, and isolate information from Biswas et al. (PMID 30559734(link is external)), Mushi et al. (PMID 30597052(link is external)), and Bordallo-Cardona et al. (PMID 30397068(link is external)) in parallel to publication process.
01/2019
Retrospectively added information from Sasso et al. (PMID 29580647(link is external)). Novel FKS allele "X" added to database as FKS29, ST "X" is now ST166.
06/2019
Retrospectively added alleles, STs, and isolates from Healey et al. 2016 (PMID 27020939(link is external)).
07/2019
Added NCBI_BioProject field to isolates table.
07/2019
Added novel alleles, STs, and isolates from five whole-genome-sequencing studies: Xu et al. 2016 (PMID 27713500(link is external); BioP PRJNA218162(link is external)), Håvelsrud and Gaustad 2017 (PMID 28280017(link is external), BioP PRJNA297263(link is external)) Vale-Silva et a.l, 2017 (PMID 28663342(link is external), BioP PRJNA374542(link is external)), Carrete et al., 2018 (PMID 29249661(link is external), BioP PRJNA361477(link is external)), and Barber et al. 2019 (PMID 30478162(link is external), BioP PRJNA483064(link is external)). Novel STs for isolates “Norway 5 and 6” from Håvelsrud and Gaustad 2017 (PMID 28280017(link is external)) are now ST137, novel STs for isolates “P35_2” and “P35_3” from Carrete et al 2018 (PMID 29249661(link is external)) are now ST136.
07/2019
Finalized isolate data from Biswas et al. 2018 (PMID 30559734(link is external)) in parallel to publication process. Labels there misplaced in the original figure 1 have subsequently been corrected by the authors (PMID 31608038(link is external)).
07/2019
Retrospectively added novel alleles, STs, and isolates from Biswas et al. 2017 (PMID 28344162(link is external), BioP PRJNA310957(link is external)). Isolates CMRL-06, -07, and -08 were omitted due to ambiguous sites in our mapping obtained from data deposited at SRA.
07/2019
Added isolate data from Rivero-Menendez et al. 2019 (PMID 31285229(link is external)) upon publication. Consecutive isolates are indicated by patient numbers.
08/2019
Reconstructed ST5 (5-7-8-1-3-6) and ST9 (1-2-2-7-2-1) from original publication (PMID 14662965(link is external)) and mended records for isolates CE-02 (ST8→ST9) and CE-03 (ST3→ST5).
09/2019
Added novel alleles, STs, and isolates deduced from raw data deposited in SRA by Guo et al. 2019 (PMID 31059831(link is external), BioP PRJEB20459(link is external)). Isolate Y1644 “from ATCC archive, isolated from Iowa, USA” was found to be ST10, and therefore presumed to be ATCC90030 (==database isolate 1).
09/2019
Added 3 isolates from Carrete et al. 2019 (PMID 30809200(link is external); BioP PRJNA506893(link is external)).
09/2019
Added 3 isolates deposited in SRA from Porto (Portugal) under BioP PRJNA525402(link is external) (2019).
09/2019
Added novel alleles, STs, and isolates of the CDC (USA) deduced from raw data deposited in SRA under BioP PRJNA329124(link is external) (2016) and PRJNA524686(link is external) (2019). Isolates were partially redundant, also with those already present in the database (Lott et al. 2010, 2012; PMID 20190071(link is external); PMID 21838617(link is external)) and Healey et al (2016; PMID 28018323(link is external)). The following modifications were made to join the datasets:
  1. Three records were omitted: SRR8697269 (CAS11-3129), which displayed a frameshift in FKS2 due to an 8 bp insertion in our assembly, SRR8697391 (CAS08-0631), which did not have sufficient sequencing depth to determine the ST, and SRR8697473 (CAS08-0629), which had no matches to Cg MLST loci (isolate might be C. parapsilosis).
  2. One novel ST derived from BioP PRJNA329124(link is external) (ST169) and 11 derived from BioP PRJNA524686(link is external) (STs 179-189) were added.
  3. In BioP PRJNA329124(link is external) isolate names are given with underscores, these were replaced by dashes to allow matching with other datasets.
  4. For three isolates duplicate SRA entries were found: CAS08-0209, CAS08-0439, and CAS11-2978. Since deduced STs were identical, these were merged into single records each.
  5. Thirty-eight isolates were already present in the database by isolate name and could be traced back to the same original. Since the deduced STs were identical to those previously recorded, the SRA information was added to the pre-existing records.
  6. Six isolates (CAS08-0069, CAS08-0094, CAS08-0525, CAS08-0569, CAS08-0725, andCAS09-0869) were already present in the database by isolate name as above, but the genome sequencing-derived STs did not match those previously recorded. These datasets were introduced with the postfix "_GS" to the isolate name to flag those versions with genome sequencing-derived STs.
  7. In total, 26 novel isolates from BioP PRJNA329124(link is external) and 219 from BioP PRJNA524686(link is external) were added.
12/2022
Amended records for Biswas et al 2018 with PMID and NCBI_Bioproject numbers, and added missing isolates.
12/2022
Analyzed unknown STs from Arasthefar et al. (PMID: 34909054(link is external))
  • ST”X” in isolate DPL209 corresponds to ST215 (PMID: 28018323(link is external))
  • ST”Y” corresponds to ST16, and is only erroneously labelled in Table 1 Isolates are already contained in database from older studies.
12/2022
Retrospectively added isolates from published studies, including those where novel alleles and STs had previously been added during the respective publication processes:
12/2022
Added data from Jensen et al. (PMID: 26711776(link is external))
  • The sequence for the novel TRP1 allele in isolate RHJ_122 was not available anymore from the authors, this isolate is not represented in the database.
  • Added 8 novel STs.
  • Added 49 isolates.
12/2022
Added 4 studies using genome sequencing:
  • Added 46 non-redundant isolates from Helmstetter et al (PMID: 35199143(link is external)), using always only the first of sequentially obtained isolates.
    • Isolate names were padded to 3 digits to allow easier alphanumeric sorting
    • STs for CG86 and CG185 (SRR12825233, SRR12825253) could not be determined as the raw data did not yield sequences for FKS or URA markers. These isolates are deposited in SRA, but also not presented in the manuscript (i.e. lacking in Supp. Table 2).
    • The control assembly of CG151 (SRR12825241) showed a novel LEU allele (LEU37)
    • The control assemblies of CG181 (SRR12825234) and CG151 (SRR12825241) yielded novel STs (ST224, ST225), deviating from the published STs (123 and 15).
  • Added 3 isolates from Pais et al (PMID: 36448018(link is external)).
    • Control assembly of isolate 73281 (SRR14844978) shows a novel URA3 allele, leading to the novel ST 226, not ST6 as given in the publication.
  • Added 30 isolates and two novel STs (227 and 228) from Stefanini et al (PMID: 36354359(link is external)). STs were derived from own control assemblies.
  • Added 8 isolates from Szervas et al (PMID: 34829249(link is external)). STs were derived from own control assemblies. The MLST profiles undisclosed in the manuscript are ST128 (ERR4669795), ST148 (ERR4669757), and ST238 stemming from a novel TRP1 allele (ERR4669779).

Control genome assembly methodology:

For control purposes, genome sequences are re-assembled from raw data downloaded from SRA. Raw data are only superficially checked using FASTQC, and trimmed from adapters if needed using trimmomatic. Reads are de novo assembled to scaffold level using spades (standard options), and the scaffolds used to check the ST designation given in the respective publication. Where this leads to data deviating from the published STs, reads are extracted by mapping (BWA-mem) to the reference allele (always allele 1), and the SNP curated by manually inspecting the mapped reads. Where this holds, this is mentioned in the comment fields of the isolate record, and the new allele attributed giving the SRA number.