TY - GEN
T1 - Cascaded data mining methods for text understanding, with medical case study
AU - Romano, Roni
AU - Rokach, Lior
AU - Maimon, Oded
PY - 2006/1/1
Y1 - 2006/1/1
N2 - Substantial electronically stored textual data such as clinical narratives reports often need to be retrieved to find relevant information for clinical and research purposes. The context of negation, a negative finding, is of special importance, since many of the most frequently described findings are such. Hence, when searching free-text narratives for patients with a certain medical condition, if negation is not taken into account, many of the documents retrieved will be irrelevant. We present a new cascaded pattern learning method for automatic identification of negative context in clinical narratives re-ports. Studying the training corpuses, the classification errors and patterns selected by the classifier, we noticed that it is possible to create a more powerful ensemble structure than the structure obtained from general-purpose ensemble method (such as Adaboost). We compare the new algorithm to previous methods proposed for the same task of similar medical narratives, and show its advantages: accuracy improvement compared to other machine learning methods, and much faster than manual knowledge engineering techniques with matching accuracy.
AB - Substantial electronically stored textual data such as clinical narratives reports often need to be retrieved to find relevant information for clinical and research purposes. The context of negation, a negative finding, is of special importance, since many of the most frequently described findings are such. Hence, when searching free-text narratives for patients with a certain medical condition, if negation is not taken into account, many of the documents retrieved will be irrelevant. We present a new cascaded pattern learning method for automatic identification of negative context in clinical narratives re-ports. Studying the training corpuses, the classification errors and patterns selected by the classifier, we noticed that it is possible to create a more powerful ensemble structure than the structure obtained from general-purpose ensemble method (such as Adaboost). We compare the new algorithm to previous methods proposed for the same task of similar medical narratives, and show its advantages: accuracy improvement compared to other machine learning methods, and much faster than manual knowledge engineering techniques with matching accuracy.
UR - http://www.scopus.com/inward/record.url?scp=78449267641&partnerID=8YFLogxK
U2 - 10.1109/icdmw.2006.38
DO - 10.1109/icdmw.2006.38
M3 - Conference contribution
AN - SCOPUS:78449267641
SN - 0769527027
SN - 9780769527024
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 458
EP - 462
BT - Proceedings - ICDM Workshops 2006 - 6th IEEE International Conference on Data Mining - Workshops
PB - Institute of Electrical and Electronics Engineers
ER -