This week, we put together a presentation of our work for our school's metadata research group. They gave a lot of really interesting feedback, which was much appreciated! Some of their ideas were outside of what we'll be able to do, but these ideas were still interesting to think about. For example, they thought it would be worthwhile to track each individual doctor's writing habits. However, because the way the records are deidentified, a doctor's pseudonym will only be consistent within any one patient's narrative. It was nice to get some new feedback on the project and the directions in which we can take it!
After having Thanksgiving break, we decided to lay out our game plan for the rest of the semester. I will continue looking into discrepancies between medical records. Specifically, I'm looking into patients' smoking statuses. It's common for one record to say a patient has never smoked and for the next to say that they only quit within the last year, for example. I think this will be a very interesting area to analyze, because smoking status is such an important aspect of one's health history.
Our server is finally working! I've been learning how to use the command line and how to SSH into the server. Having remote access will be especially useful for me, since I live off campus. Yay!
We also did some preliminary analysis on our data. Mainly, we wanted to check for discrepancies between a patient's actual gender and the gender predicted within each visit. We checked for similar discrepancies between the patient's actual age and predicted age. We did find some discrepancies, and many records never mention age or gender.