Multilabel 12-Lead Electrocardiogram Classification Using Gradient Boosting Tree Ensemble

Alexander W Wong, Weijie Sun, Sunil V Kalmady, Padma Kaul, Abram Hindle

2020/10/01

Multilabel 12-Lead Electrocardiogram Classification Using Gradient Boosting Tree Ensemble

Authors

Alexander W Wong, Weijie Sun, Sunil V Kalmady, Padma Kaul, Abram Hindle

Venue

Abstract

The 12-lead electrocardiogram (ECG) is a commonly used tool for detecting cardiac abnormalities such as atrial fibrillation, blocks, and irregular complexes. For the Phy- sioNet/CinC 2020 Challenge, we built an algorithm using gradient boosted tree ensembles fitted on morphology and signal processing features to classify ECG diagnosis. For each lead, we derive features from heart rate vari- ability, PQRST template shape, and the full signal wave- form. We join the features of all 12 leads to fit an ensem- ble of gradient boosting decision trees to predict probabil- ities of ECG instances belonging to each class. We train a phase one set of feature importance determining models to isolate the top 1,000 most important features to use in our phase two diagnosis prediction models. We use re- peated random sub-sampling by splitting our dataset of 43,101 records into 100 independent runs of 85:15 train- ing/validation splits for our internal evaluation results. Our methodology generates us an official phase valida- tion set score of 0.476 and test set score of -0.080 under the team name, CVC, placing us 36 out of 41 in the rankings.

Bibtex

@inproceedings{wong2020CINC-multilabel-ECG,
 abstract = {The 12-lead electrocardiogram (ECG) is a commonly used tool for detecting cardiac abnormalities such as atrial fibrillation, blocks, and irregular complexes. For the Phy- sioNet/CinC 2020 Challenge, we built an algorithm using gradient boosted tree ensembles fitted on morphology and signal processing features to classify ECG diagnosis.  For each lead, we derive features from heart rate vari- ability, PQRST template shape, and the full signal wave- form. We join the features of all 12 leads to fit an ensem- ble of gradient boosting decision trees to predict probabil- ities of ECG instances belonging to each class. We train a phase one set of feature importance determining models to isolate the top 1,000 most important features to use in our phase two diagnosis prediction models. We use re- peated random sub-sampling by splitting our dataset of 43,101 records into 100 independent runs of 85:15 train- ing/validation splits for our internal evaluation results.  Our methodology generates us an official phase valida- tion set score of 0.476 and test set score of -0.080 under the team name, CVC, placing us 36 out of 41 in the rankings.},
 accepted = {2020-10-01},
 author = {Alexander W Wong and Weijie Sun and Sunil V Kalmady and Padma Kaul and Abram Hindle},
 authors = {Alexander W Wong, Weijie Sun, Sunil V Kalmady, Padma Kaul, Abram Hindle},
 booktitle = {2020 Computing in Cardiology (CinC) PhysioNet Challenge},
 code = {wong2020CINC-multilabel-ECG},
 date = {2020-09-21},
 funding = {NSERC Discovery},
 location = {Rimini, Italy},
 pagerange = {1--4},
 pages = {1--4},
 rate = {41/300 or 13%},
 role = {Co-Author},
 title = {Multilabel 12-Lead Electrocardiogram Classification Using Gradient Boosting Tree Ensemble},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/wong2020CINC-multilabel-ECG.pdf},
 venue = {2020 Computing in Cardiology (CinC) PhysioNet Challenge},
 year = {2020}
}