In the past couple of weeks, I've attempted to combine all of the Python classes I've written in order to extract data from XML files, store them in Patient objects, and assess discrepancies in the clinical narratives.
My part of discrepancy-seeking is attempting to identify places in the records where mentions of diabetes are inconsistent or lacking; in spite of the fact that all of the records in the corpus were selected based on patients' diabetes status, many of the clinical narratives do not mention whether or not the patient has diabetes.
The first step will be to gather statistics regarding whether or not a given patient has any explicit reference to diabetes, including via medication mentions, physical state, etc.