There has been increasing popularity in medical text mining due to its vast applications in the field of disease prediction and clinical Recommendation systems. Radiology reports possess rich information depicting radiologists investigations on the health conditions of the patients in associated radiology images. However, radiology reports exist in a free-text unstructured format consisting of valuable information for disease prediction. This information cannot be easily retrieved and utilized for prediction without suitable text mining techniques. The medical dataset available in the current procedure is small, domain-specific and restricted to the institution. However, data is one of the critical factors to power Machine Learning (ML) and Deep Learning (DL) models. To overcome the above challenge of predicting disease in the low data condition, we present a practical Deep Learning framework that combines a Knowledge Base (KB) with the Deep Learning for accurate text mining and predicting the lung diseases from the unstructured radiology free-text reports. We adopt Glove word embeddings with the KB trained on large corpus for effective text modelling. Further, we incorporate Convolutional Neural Network-based Discriminative Dimensionality Reduction (CNN-DDR) to obtain the most discriminative feature vector. Finally, a fully connected Deep Neural Network (DNN) is leveraged as the prediction model to detect the diseases. We applied the proposed framework to predict the lung diseases on radiology reports from both publicly available Indiana University (IU) dataset \cite{DemnerFushman2016} and data collected from the private hospital. We benchmark the performance of the proposed framework, which outperforms against the standard ML Techniques. traditional instance selection. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.