Monthly Archives: December 2015

This week, we put together a presentation of our work for our school's metadata research group. They gave a lot of really interesting feedback, which was much appreciated! Some of their ideas were outside of what we'll be able to do, but these ideas were still interesting to think about. For example, they thought it would be worthwhile to track each individual doctor's writing habits. However, because the way the records are deidentified, a doctor's pseudonym will only be consistent within any one patient's narrative. It was nice to get some new feedback on the project and the directions in which we can take it!

This week I got to use R to analyze the CSV file. I ran some descriptive analysia on them, and also wrote some code to produce some graphs. I ran into an issue with how the file was configured however. I am collaborating with a professor from the statistics department to write some code in order to analyze the data by patient, and not by patient visits. In the meanwhile, I was able to do some comparison of our corpus data to national averages of age and gender.  

The code I was working on is going well, but not complete yet; testing my own code has shown a few inconsistencies, which I've addressed as necessary.

This past week, on Dec. 8, we presented an overview of our project to a small research group here at Simmons, and a lot of the questions that were asked were helpful, particularly with respect to areas of inquiry we haven't considered explicitly. (Some were out of the scope of the project, but still good to keep in mind.)

I've been working on finding discrepancies in diabetes mentions in the clinical narratives, and additionally Rebecca has been working to address discrepancies in patients' smoking history. Before the end of the semester, our goal is to report some preliminary findings with regard to discrepancies in clinical narratives.

In the past couple of weeks, I've attempted to combine all of the Python classes I've written in order to extract data from XML files, store them in Patient objects, and assess discrepancies in the clinical narratives.

My part of discrepancy-seeking is attempting to identify places in the records where mentions of diabetes are inconsistent or lacking; in spite of the fact that all of the records in the corpus were selected based on patients' diabetes status, many of the clinical narratives do not mention whether or not the patient has diabetes.

The first step will be to gather statistics regarding whether or not a given patient has any explicit reference to diabetes, including via medication mentions, physical state, etc.

After having Thanksgiving break, we decided to lay out our game plan for the rest of the semester. I will continue looking into discrepancies between medical records. Specifically, I'm looking into patients' smoking statuses. It's common for one record to say a patient has never smoked and for the next to say that they only quit within the last year, for example. I think this will be a very interesting area to analyze, because smoking status is such an important aspect of one's health history.