Predictive patient’s discharge, but do not try to

Predictive analytics in healthcare promise to significantly influence different processes of the stakeholders. In general, hospitals could benefit from more accurate predictive analysis by, among others, a more pronounced monitoring of quality indicators, or a more precise planning of accommodation capacities or an increase in optimization level of supplies etc. Insurance companies could increase their drive for sustainable growth and higher performance. The medical community could provide more individualized patient-centered care guided by clinical decision support, while patients could receive a higher quality of care and better price transparency cite{van2016randomized}. Health care governments should therefore organize health plans in such a way that particular attention is provided for these patient population characterized by augmented care at home preventing additional and costly hospital admissions. Inherent to the plan to cover care from the cradle to the grave, data gathering and exchange deserves as much attentions as the organization of the care itself.

Hospital readmission (admission to a hospital within 30 days of discharge) is disruptive to both patients and healthcare providers. Although it is sometimes inevitable, it is frequent and often associated with a higher cost. Modern care standards require effective discharge planning including the transfer of information to discharge, patient and parent education, and coordination of care after discharge. The analysis of hospital readmission continues to be challenging based on the multitude of influencing factors (e.g. seasonal variations) and is considered a critical metric of quality and cost of healthcare cite{stiglic2014readmission}. Based on cite{srivastava2013pediatric} report, readmission rate within 30 days is 19.6\%, 34.0\% within 90 days and 56.1\% within one year following discharge. According to the Institute for Healthcare Improvement, of the 5 million U.S. hospital readmissions, approximately 76\% can be prevented, generating the annual cost of about US$25 billion cite{srivastava2013pediatric}.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Potential benefits of accurate models for readmission risk prediction led to many types of research based on patient data embedded in electronic health records (EHRs) cite{saunders2015impact, stiglic2015comprehensible}. However, all these approaches attempt to quantify the risk of readmission on patient’s discharge, but do not try to answer the very important question: which diagnoses are likely to be involved in readmission? Highly accurate models that could answer this question would provide not only indicator of readmission risk but also assessment of the risk of specific complications (diagnoses or symptoms) on next admission. These models could provide valuable decision support for doctors in time of discharge (they could decide if additional monitoring or testing is required for a specific patient) and push analytic models from predictive towards a prescriptive role in healthcare decision support.

In order to predict the set of diagnoses/symptoms with which a patient is likely to be re-admitted, we utilize Predictive Clustering Trees cite{blockeel1998top, vens2008decision, kocev2013tree} framework (PCT). The PCTs generalize decision tree models. They seek for homogeneous clusters of observations for which a predictive model can be associated. The main difference between the algorithm for learning PCTs and a standard decision tree learner is that the former considers the variance function and the prototype function, that computes a label for each leaf, as parameters that can be instantiated for multi-label prediction cite{kocev2007ensembles, struyf2005constraint} and hierarchical-multi label classification cite{vens2008decision}. Since PCTs performs Decision Tree-like clustering of diagnoses with which patient is likely to be re-admitted, it is easy to interpret these models. Having this in mind, we applied this approach on data obtained from hospital discharge data from the California, State Inpatient Databases (SID), Healthcare Cost and Utilization Project cite{hcupnet2003utilization}, Agency for Healthcare Research and Quality. Obtained models are interpreted, analyzed and evaluated for compliance with current medical findings.

Second, we exploit the usage of the information provided by the domain hierarchy, namely Clinical Classification Software (CCS) cite{healthcare2010clinical} on the output (label) space. That means that instead of directly predicting the set of readmitted diagnosis, we try to predict their taxonomies from the CCS hierarchy. This classification task is called hierarchical multi-label classification. Finally, besides utilizing the expert-knowledge provided by the CCS hierarchy, we try to derive a hierarchy from the data that appear in the output space of the classification problem and use this hierarchy in the learning and prediction phases in order to improve the predictive performance. Construction of the hierarchies from the dataset is done using a hierarchical clustering approaches based on balanced k-means and agglomerative clustering. Here, we strive to investigate how the data-driven hierarchies of medical concepts which are not formally written (but occurs in practice) can influence on the predictive performance of the classification models.

As a baseline we will perform PCT with no hierarchy in output space assumption and measure its performance using standard multi-label classification evaluation measure which are grouped in example-based, label-based and ranking-based evaluation measures. With this, we will obtain performance measures such as accuracy, precision and recall of predictive model which will be further evaluated by medical doctor. Models obtained using expert- and data-driven hierarchies in output space are going to be evaluated and compared to baseline method. Additionally, these models are going to be evaluated by medical doctors.

The following of the paper is structured as follows. In next section, we will provide the background needed for an understanding of the paper, the namely problem of hospital readmission, the research objectives and the definition of multi-label and hierarchical multi-label classification, domain hierarchy used for solving the problem and data-derived hierarchies. Next, we will briefly explain the Predictive Clustering Trees for the two classification tasks (multi-label and hierarchical multi-label classification) and present the experimental evaluation. Further, we provide results and discussion. Finally, we conclude paper and provide further directions of our study.