This week, we worked on finishing up our person and patient classes. I also created separate objects for each tag category - for example, family history and medication. At first we were reading in these tags as strings. Now we can save them as objects that contain more specific information as attributes, which will be helpful for future analysis.
I was fortunate enough to attend the Grace Hopper Celebration recently, so I was unable to make our most recent meeting. However, I did program quite a bit on the plane (I had a lot of layovers). Regular expressions are a good time.
Stephanie, Rebecca, and I have started discussing how we're going to integrate all of our code, since we have all begun working on parsers, Patient classes, and so on. I did some more background reading as well.
Lately we've been in a little bit of a limbo; we have a lot of the tools we'll need to begin the project in earnest, but we don't have a place to store the data, so we're at a temporary impasse with regard to that.
I also spent some time this week ensuring that the patient class is appropriately robust for the data I already have access to, and additionally tweaked my XML parser for improved flexibility.
This week, I read more background on the subject of not only de-identifying medical records but also on developing ways to parse the files, find medical information, and so on. It was very interesting, especially since I hadn't a lot on the subject since earlier this summer.
I also began to develop a program which will read in a patient's information from an XML file and attribute the details of their medication history, smoking status(es), etc. to a Patient object.
This week I started working on a Patient class, which will encompass information such as medication history, smoking status, family information, co-morbidity of disease, etc.
I have a small subset of data available to use and have been testing my parser on that. It's been fun to get back into regular expressions -- finding patterns in language, and then implementing them as robustly as possible, is something I actually really enjoy.
Since we're still working on finding a way to host our data in a way that enables us all to have access to it but not provide any means of accidental data leaks, I spent my week developing an XML parser. As it currently stands, the program I have written will read in an XML file with a de-identified medical record in it, find the tags within the file, and separate them.
I'm working to automate the process and make it more robust to fit the needs of the whole dataset as well.
I also re-took parts of the CITI certification course to gain more knowledge on research ethics specifically related to our project.
Today, Stephanie and I worked on creating Person and Patient objects in Python (Katie is away at a conference, go Katie!). I've never used Python for object-oriented programming before, so I had to learn the basics of class definitions, how constructors work, etc. The objects we created are still pretty basic and static. The Person class has attributes like name and age, and the Patient class inherits these attributes along with others such as current medication. The next step will be to bring our code together with what Katie has put together, and to pull information from our XML files and link that data with each person/patient object.
This week we agreed to all read a corpus entitled "Creation of a new longitudinal corpus of clinical narratives". We also began looking a CSV file regarding predicted and actual values of patient information regarding age and gender, and will soon be doing statistical analysis of this data (how often the predicted differed from the actual, etc.). Lastly, are working on creating a "Person" object in Python, this code will help us extract specific data that we are interested in analyzing.
We met again this week to discuss our next steps with the data. We’ve spent time this week familiarizing ourselves with the data, and discussing our plan of attack from here on out with this project. We’ve all been working on completing our CITI certification, and began discussing which programming language would be best to use on this project. We need to write code in order to import the narratives and begin our analysis.
After spending some time writing a program to read in our patient files, we've begun working with regular expressions in Python. First, we worked through many exercises to reacquaint ourselves with regular expressions in general. Then, we worked on implementing these regular expressions in Python. As a simple test, I wrote a program to pull the main text out of an xml file. The next step would be to retrieve more specific data and store it in a useful format.