Over the last couple of weeks, I extended a little bit of what I was working on (ordering temporal annotations with regard to other treatments). Although initially I used tuples (time, description) to track changes in medication/smoking, I recently switched to using a custom object approach (i.e., a medication object, smoking history object, etc) and then sorting them according to time (which can be extracted simply via function).

This upcoming week is Simmons' spring break, so I'm hoping to spend some extra time working on not only finding temporal relationships between medications/smoking, etc, but also on visualizations of these in a more concrete timeline fashion, so we don't have to make timelines manually.

Although the code I wrote over the last few weeks was primarily in script format, I put some time in over the weekend to create Python classes in order to analyze medication history more effectively. Building off of my previous Medication class, I introduced a time attribute, so that I can keep track of when the medication was being taken (before/during/after DCT).

Last week, Simmons closed the school due to snow on Monday, so we didn't meet; instead, I worked on extracting medication data from tags. I've been using linked lists to keep track of which medications are mentioned when.

This week, I ran into a couple of issues with my code, but I've been working to resolve them, and have also been reading more about methods for addressing temporal issues in code. More updates to follow!

This week I will begin to work on the problem of finding discrepancies of smoking statuses. In our weekly meeting we discussed some of the challenges that will come with this - primarily that smoking status can change in a much more complicated way than diabetes status. In the case of diabetes the patient must have had diabetes, so if they did not we knew there was a discrepancy. Smoking status, on the other hand, can switch back and forth many times However, if a patient is listed as a smoker in the first visit, and then as never having smoked in any later visits then there is a discrepancy. This is definitely going to be an interesting project.

This week we collaborated to finish a poster presentation for the 2016 Tapia conference. We also read journal articles from the journal of biomedical informatics. The articles were pertaining to the 2014 i2b2/UTHealth NLP shared task, which is where we are getting the information for our research.

Over winter break, I spent a few days reading through the articles published last year detailing the outcomes of the various tracks of the 2014 i2b2 challenge, which is where the data we're currently investigating came from in the first place. The articles I read primarily concerned risk factor identification and heart disease prediction systems, since those are closely relevant to our team's work.

In particular, a few of these papers explicitly mention the impact of missing (i.e., unstated, undocumented) risk factors on developed systems which rely on tagged risk factor metadata; developing methods not only to identify missing risk factors but also to create workarounds seems to be an area of clear research opportunity/necessity.

Our first week back, we wrote the rest of our proposal for the 2016 Tapia Celebration student poster session, and submitted it Friday (thanks very much to Professor Stubbs' help!), and will move on to move computational pursuits in the meantime.

This week, we put together a presentation of our work for our school's metadata research group. They gave a lot of really interesting feedback, which was much appreciated! Some of their ideas were outside of what we'll be able to do, but these ideas were still interesting to think about. For example, they thought it would be worthwhile to track each individual doctor's writing habits. However, because the way the records are deidentified, a doctor's pseudonym will only be consistent within any one patient's narrative. It was nice to get some new feedback on the project and the directions in which we can take it!