Poster Presentation — ASN Events

Search
Speakers

Lessons learnt from using a machine learning (ML)-based genotype-phenotype association study (GPAS) to predict the metadata of group A Streptococcus (GAS) genomes (#115)

Sean J Buckley ¹ , Robert J Harvey ¹ ²

School of Health and Behavioural Sciences, University of the Sunshine Coast, Maroochydore, Queensland, Australia
Sunshine Coast Health Institute, University of the Sunshine Coast, Birtinya, Queensland, Australia

Background:

GAS is a globally significant pathogen. In the era of serology, the typing of GAS based on the immuno-protective antigenicity of the surface-exposed Emm (that is, ‘M-typing’) was mandatory. In the era of classical nucleotide sequencing, calibration of the nucleotide-based emm-type against the M-type was inevitable. In the whole genome sequencing (WGS) era, however, we contend that assessment of the quintessence of emm-typing for characterising GAS phylogenetic delineation and epidemiology is warranted. Therein, rationalising the feasibility testing of alternative WGS-amenable typing schemes, of which our transcription regulator (TR)-based scheme is one. Quantification of the strain (or emm-type)-dependency of variation in the DNA that encodes GAS TRs indirectly provides a measure of the backwards compatibility of our TR-based scheme with the vast body of emm-based epidemiological research.

Methods:

We catalogued the distribution and diversity of GAS TRs using phylogenetic and concordance metrics, and applied GPAS in the prediction of GAS genome metadata (including emm-type, invasiveness, and clinical outcome). We developed an ML-based workflow incorporating a novel WGS-amenable TR-based phenotype prediction scheme and a protocol for collecting GAS phenotype metadata.

Results:

We predicted emm-type (97%), country of origin (88.6%), and invasiveness (84.7%) with high accuracy. Interpretation of the inaccurate emm-type predictions resulted in the development of biological models for characterising: emm-switching, mga2-switching, two types of emm-enn chimerisation, and the putative rapid evolution and time-dependent excision of genes in the mga regulon of clinically-significant GAS strains.

Conclusions:

Our workflows have advanced the understanding of GAS phylogenetic delineation and epidemiology, and stand as templates for the testing of hitherto untested GAS phenotypic traits.

Poster Presentation 21st Lancefield International Symposium for Streptococci and Streptococcal Diseases 2022

Lessons learnt from using a machine learning (ML)-based genotype-phenotype association study (GPAS) to predict the metadata of group A Streptococcus (GAS) genomes (#115)