This week, I read more background on the subject of not only de-identifying medical records but also on developing ways to parse the files, find medical information, and so on. It was very interesting, especially since I hadn't a lot on the subject since earlier this summer.

I also began to develop a program which will read in a patient's information from an XML file and attribute the details of their medication history, smoking status(es), etc. to a Patient object.

This week I started working on a Patient class, which will encompass information such as medication history, smoking status, family information, co-morbidity of disease, etc.

I have a small subset of data available to use and have been testing my parser on that. It's been fun to get back into regular expressions -- finding patterns in language, and then implementing them as robustly as possible, is something I actually really enjoy.

Since we're still working on finding a way to host our data in a way that enables us all to have access to it but not provide any means of accidental data leaks, I spent my week developing an XML parser. As it currently stands, the program I have written will read in an XML file with a de-identified medical record in it, find the tags within the file, and separate them.

I'm working to automate the process and make it more robust to fit the needs of the whole dataset as well.

I also re-took parts of the CITI certification course to gain more knowledge on research ethics specifically related to our project.

Today, Stephanie and I worked on creating Person and Patient objects in Python (Katie is away at a conference, go Katie!). I've never used Python for object-oriented programming before, so I had to learn the basics of class definitions, how constructors work, etc. The objects we created are still pretty basic and static. The Person class has attributes like name and age, and the Patient class inherits these attributes along with others such as current medication. The next step will be to bring our code together with what Katie has put together, and to pull information from our XML files and link that data with each person/patient object.

This week we agreed to all read a corpus entitled "Creation of a new longitudinal corpus of clinical narratives". We also began looking a CSV file regarding predicted and actual values of patient information regarding age and gender, and will soon be doing statistical analysis of this data (how often the predicted differed from the actual, etc.).  Lastly, are working on creating a "Person" object in Python, this code will help us extract specific data that we are interested in analyzing.

We met again this week to discuss our next steps with the data. We’ve spent time this week familiarizing ourselves with the data, and discussing our plan of attack from here on out with this project. We’ve all been working on completing our CITI certification, and began discussing which programming language would be best to use on this project. We need to write code in order to import the narratives and begin our analysis.

After spending some time writing a program to read in our patient files, we've begun working with regular expressions in Python. First, we worked through many exercises to reacquaint ourselves with regular expressions in general. Then, we worked on implementing these regular expressions in Python. As a simple test, I wrote a program to pull the main text out of an xml file. The next step would be to retrieve more specific data and store it in a useful format.

Greetings!

We've spent some time making sure that out CITI certifications are all in order so that we are able to handle the patient files. We've also spent time brushing up on our python skills, since we've decided this is the programming language we are going to use for this project. Lastly, we wrote some code in order to import the patient files into python.

Stay tuned for more updates!

Now that we've all gotten our CITI certifications sorted, we'll be programming soon! We've decided to use Python for this project. I'm excited for this, because I haven't used Python since my first semester here at Simmons. We've also looked at several records and discussed how we will be working with and analyzing the data.  For now, we are working with a small set of clinical narratives. Our first step will be to write a simple program to read in our files. Hopefully we will have a private server set up soon where we will be able to work with the full range of records.

The CREU Clinical Narratives project is officially underway! We're going to be spending our year studying patterns in clinical narratives using a natural language processing (NLP) approach.

This week, we met for the first time since last semester and began discussing data dissemination among the members of our student research team. Stephanie, Rebecca, and Katie signed the data use agreement contract, and we began the requisite CITI training for dealing with medical record data. We also discussed what our first steps will be once we've completed the CITI training, so we'll get started with that very soon.

Next week, we'll meet again and determine the next few steps of our project. We're all very excited -- stay tuned for more updates from the CREU crew!