This week, we put together a presentation of our work for our school's metadata research group. They gave a lot of really interesting feedback, which was much appreciated! Some of their ideas were outside of what we'll be able to do, but these ideas were still interesting to think about. For example, they thought it would be worthwhile to track each individual doctor's writing habits. However, because the way the records are deidentified, a doctor's pseudonym will only be consistent within any one patient's narrative. It was nice to get some new feedback on the project and the directions in which we can take it!
This past week our group prepared a presentation, and discussed our project and an overview of our findings to a research group at Simmons. It was definitely a good experience, and I’m glad I was able to get to practice presenting.
This week I got to use R to analyze the CSV file. I ran some descriptive analysia on them, and also wrote some code to produce some graphs. I ran into an issue with how the file was configured however. I am collaborating with a professor from the statistics department to write some code in order to analyze the data by patient, and not by patient visits. In the meanwhile, I was able to do some comparison of our corpus data to national averages of age and gender.
So we now have access to a server, and since I’ve never used one before I spent time teaching myself how to navigate it using some online tutorials. We also ran some initial analysis on the csv file that contained info about all the patient's age and gender. We noticed some discrepancies which we will be running further analysis on.
We’re still having a few issues getting our server up and running, but it should be happening soon. Meanwhile, we spent some time reading papers related to our research, and I brushed up on my R skills.
The code I was working on is going well, but not complete yet; testing my own code has shown a few inconsistencies, which I've addressed as necessary.
This past week, on Dec. 8, we presented an overview of our project to a small research group here at Simmons, and a lot of the questions that were asked were helpful, particularly with respect to areas of inquiry we haven't considered explicitly. (Some were out of the scope of the project, but still good to keep in mind.)
I've been working on finding discrepancies in diabetes mentions in the clinical narratives, and additionally Rebecca has been working to address discrepancies in patients' smoking history. Before the end of the semester, our goal is to report some preliminary findings with regard to discrepancies in clinical narratives.
In the past couple of weeks, I've attempted to combine all of the Python classes I've written in order to extract data from XML files, store them in Patient objects, and assess discrepancies in the clinical narratives.
My part of discrepancy-seeking is attempting to identify places in the records where mentions of diabetes are inconsistent or lacking; in spite of the fact that all of the records in the corpus were selected based on patients' diabetes status, many of the clinical narratives do not mention whether or not the patient has diabetes.
The first step will be to gather statistics regarding whether or not a given patient has any explicit reference to diabetes, including via medication mentions, physical state, etc.
After having Thanksgiving break, we decided to lay out our game plan for the rest of the semester. I will continue looking into discrepancies between medical records. Specifically, I'm looking into patients' smoking statuses. It's common for one record to say a patient has never smoked and for the next to say that they only quit within the last year, for example. I think this will be a very interesting area to analyze, because smoking status is such an important aspect of one's health history.