This past Wednesday, Stephanie, Rebecca, and I were able to present our work during the Simmons Undergraduate Research poster session! It was a lot of fun. Simmons has a strong focus on health and life sciences, so our work was well-received, and additionally, many professors and students who stopped by seemed to learn a lot about what computer science can entail, and how it can be beneficial across disciplines and professions.
There were two other CS-based research posters that I saw, and additionally I was able to see the other Simmons CREU team present their work during one of the panels! It was great to see the culmination of the other research that's been taking place this year, and the project seemed really interesting.
Our poster was accepted for the student poster session at the upcoming Tapia Celebration of Diversity in Computing, so we're thrilled to be able to go (even if it might only be Rebecca!).
Overall, this has been an enjoyable, albeit sometimes challenging, year of research. I personally am grateful to have the opportunity to investigate these clinical narratives more closely, and I am actually looking into being a participant in the i2b2 challenge itself next time it rolls around! I would like to thank Professor Amber Stubbs as well as CREU and SLIS for supporting this project and being generally fantastic.
The Simmons Undergraduate Research Conference is approaching, so we've been finalizing some of the results from the discrepancy searches; I've been working with Stephanie and Rebecca to make a poster so that we can present the work this coming Wednesday. The Undergraduate Research Conference gives students at Simmons the chance to present their work during a poster session, and several dozen students present every year.
However, there are generally very few posters about computer science projects, so we hope to contribute somewhat to the visibility of the CS program here at Simmons. In the past, professors and students alike have been interested in CS research, and presenting in these contexts grants the Simmons community more understanding of what computer science can contribute to both within the field as well as in an interdisciplinary sense.
So, for next week, we'll finish up the visual materials (poster, data visualization, etc.), and make sure we're set to explain our work by Wednesday!
This past week, I've been dealing with a strange Python issue that arose during a different project -- I still haven't resolved the problem (which is related to my path! consternation), but I've been working on some markups for the timeline layout I'm trying to develop. Stay tuned!
Along the way, I'm working on code that will automatically detect discrepancies in the narratives -- that part is a little slower going, but that's why it's a long project. Onward!
Over the last couple of weeks, I extended a little bit of what I was working on (ordering temporal annotations with regard to other treatments). Although initially I used tuples (time, description) to track changes in medication/smoking, I recently switched to using a custom object approach (i.e., a medication object, smoking history object, etc) and then sorting them according to time (which can be extracted simply via function).
This upcoming week is Simmons' spring break, so I'm hoping to spend some extra time working on not only finding temporal relationships between medications/smoking, etc, but also on visualizations of these in a more concrete timeline fashion, so we don't have to make timelines manually.
Last week, Simmons closed the school due to snow on Monday, so we didn't meet; instead, I worked on extracting medication data from tags. I've been using linked lists to keep track of which medications are mentioned when.
This week, I ran into a couple of issues with my code, but I've been working to resolve them, and have also been reading more about methods for addressing temporal issues in code. More updates to follow!
Over winter break, I spent a few days reading through the articles published last year detailing the outcomes of the various tracks of the 2014 i2b2 challenge, which is where the data we're currently investigating came from in the first place. The articles I read primarily concerned risk factor identification and heart disease prediction systems, since those are closely relevant to our team's work.
In particular, a few of these papers explicitly mention the impact of missing (i.e., unstated, undocumented) risk factors on developed systems which rely on tagged risk factor metadata; developing methods not only to identify missing risk factors but also to create workarounds seems to be an area of clear research opportunity/necessity.
Our first week back, we wrote the rest of our proposal for the 2016 Tapia Celebration student poster session, and submitted it Friday (thanks very much to Professor Stubbs' help!), and will move on to move computational pursuits in the meantime.
The code I was working on is going well, but not complete yet; testing my own code has shown a few inconsistencies, which I've addressed as necessary.
This past week, on Dec. 8, we presented an overview of our project to a small research group here at Simmons, and a lot of the questions that were asked were helpful, particularly with respect to areas of inquiry we haven't considered explicitly. (Some were out of the scope of the project, but still good to keep in mind.)
I've been working on finding discrepancies in diabetes mentions in the clinical narratives, and additionally Rebecca has been working to address discrepancies in patients' smoking history. Before the end of the semester, our goal is to report some preliminary findings with regard to discrepancies in clinical narratives.
In the past couple of weeks, I've attempted to combine all of the Python classes I've written in order to extract data from XML files, store them in Patient objects, and assess discrepancies in the clinical narratives.
My part of discrepancy-seeking is attempting to identify places in the records where mentions of diabetes are inconsistent or lacking; in spite of the fact that all of the records in the corpus were selected based on patients' diabetes status, many of the clinical narratives do not mention whether or not the patient has diabetes.
The first step will be to gather statistics regarding whether or not a given patient has any explicit reference to diabetes, including via medication mentions, physical state, etc.
We have a server! We spent our most recent meeting figuring out who has access to which parts of it and other things like that. The next step is to learn how to navigate the server; I lucked out and worked on a remote server all summer for a research project two summers ago, so I'm already very familiar with navigating a server via terminal/bash commands.
We also ran some analysis on the official-vs-predicted genders and ages of the patients from an earlier script that Professor Stubbs ran. The script used pronouns, age markers, and other mentions of gender/age/etc to determine the age and gender of each patient on a record-by-record basis.
Along with Stephanie and Rebecca, I worked on a script that looked for some initial discrepancies among the "official" and "predicted" ages and genders in this file, and found that there are some patients whose gender is never specified in their clinical narrative records, and some for which the age is ambiguous. We'll refine these analyses over the course of the next few weeks, but for now, it's an interesting start.