We have a server! We spent our most recent meeting figuring out who has access to which parts of it and other things like that. The next step is to learn how to navigate the server; I lucked out and worked on a remote server all summer for a research project two summers ago, so I'm already very familiar with navigating a server via terminal/bash commands.
We also ran some analysis on the official-vs-predicted genders and ages of the patients from an earlier script that Professor Stubbs ran. The script used pronouns, age markers, and other mentions of gender/age/etc to determine the age and gender of each patient on a record-by-record basis.
Along with Stephanie and Rebecca, I worked on a script that looked for some initial discrepancies among the "official" and "predicted" ages and genders in this file, and found that there are some patients whose gender is never specified in their clinical narrative records, and some for which the age is ambiguous. We'll refine these analyses over the course of the next few weeks, but for now, it's an interesting start.
Our server is finally working! I've been learning how to use the command line and how to SSH into the server. Having remote access will be especially useful for me, since I live off campus. Yay!
We also did some preliminary analysis on our data. Mainly, we wanted to check for discrepancies between a patient's actual gender and the gender predicted within each visit. We checked for similar discrepancies between the patient's actual age and predicted age. We did find some discrepancies, and many records never mention age or gender.
It's been tricky to get a server instantiated to keep all of our data in one place, but we're working on it! In the meantime, I read some papers related to our project, including one that Professor Stubbs worked on, and touched up a few scripts to convert .xml files to Patient objects in Python.
Our next steps will be to move the scripts to the server (once we have one), so that our data and our analysis tools can all be in the same place at long last.
This week, we've worked on finalizing our person and patient objects. After a few weeks of writing the separate parts of this program, we've finally brought them all together into a cohesive whole. The code will now read in the XML files and apply the information to each individual patient. Now that we have this functionality implemented, we should be able to begin analyzing data soon.
This week we split up the work among the three of us. I focused on writing a code that read in a CSV file of patient information. Later I will be using this file and file reader and applying statistical analysis on the expected versus reported ages of the patients.
This week Rebecca and I worked together on creating person and patient objects in python. It had been a while since I had done this in python, so I took some time getting familiarized with how to do this. Our code is pretty basic, it takes in user input (name, smoking status, diabetes status, etc) and assigned it to a person/patient object. Next week we will be working on reworking the code so that is takes in this information from the tags in our patient files.