Along the way, I'm working on code that will automatically detect discrepancies in the narratives -- that part is a little slower going, but that's why it's a long project. Onward!
This past week, I created more CSVs for mentions of CAD, hypertension, hyperlipidemia, and obesity. These CSVs are based on whether or not each condition is mentioned at all, so there are only two possible options (mentioned or not mentioned). But in many cases, we have more information than that about the conditions or related events. Over this next week, which is spring break, I'll be looking into how to organize and analyze this more complex information.
Over the last couple of weeks, I extended a little bit of what I was working on (ordering temporal annotations with regard to other treatments). Although initially I used tuples (time, description) to track changes in medication/smoking, I recently switched to using a custom object approach (i.e., a medication object, smoking history object, etc) and then sorting them according to time (which can be extracted simply via function).
This upcoming week is Simmons' spring break, so I'm hoping to spend some extra time working on not only finding temporal relationships between medications/smoking, etc, but also on visualizations of these in a more concrete timeline fashion, so we don't have to make timelines manually.
This week I will finish the functions in R that I began to code this week to check for patients with smoking discrepancies, I will also begin to write a function to check for total numbers of discrepancies. Lastly I will write a report for the discrepancies for diabetes status.
Last week I began to brainstorm on ways to attack the problem of smoking status. This week I will be meeting with an outside professor to get some advice on how to go about doing this. I will also be rerunning the analysis on the diabetes discrepancies and creating a report on the findings.
Although the code I wrote over the last few weeks was primarily in script format, I put some time in over the weekend to create Python classes in order to analyze medication history more effectively. Building off of my previous Medication class, I introduced a time attribute, so that I can keep track of when the medication was being taken (before/during/after DCT).
During week 15, I worked on extracting diabetes mentions from each medical record and writing them to a CSV file. I formatted the file based on what will work best for Stephanie when she goes on to analyze the CSV in R. It took me a while to work out all the bugs, but I had an accurate CSV by the end of the week. Just skimming the file I could see that a surprising number of records never mention the fact that the patient has diabetes.
Now that I've developed the script to go through each record, extract diabetes mentions, and write them to a CSV, doing the same with other tags will be much easier. I also created a CSV file detailing the patient's smoking status. Though this file was relatively simple for me to create, it will be harder to analyze since someone's smoking status can change over time.
Currently, I'm working on going through this same process to create CSV files for family history and other tags.
Last week, Simmons closed the school due to snow on Monday, so we didn't meet; instead, I worked on extracting medication data from tags. I've been using linked lists to keep track of which medications are mentioned when.
This week, I ran into a couple of issues with my code, but I've been working to resolve them, and have also been reading more about methods for addressing temporal issues in code. More updates to follow!
This week I will begin to work on the problem of finding discrepancies of smoking statuses. In our weekly meeting we discussed some of the challenges that will come with this - primarily that smoking status can change in a much more complicated way than diabetes status. In the case of diabetes the patient must have had diabetes, so if they did not we knew there was a discrepancy. Smoking status, on the other hand, can switch back and forth many times However, if a patient is listed as a smoker in the first visit, and then as never having smoked in any later visits then there is a discrepancy. This is definitely going to be an interesting project.
These weeks were spent writing code in R in order to look for discrepancies pertaining to the diabetes status of the patients in our corpus. I met a few bumps in the road but was able to collaborate with an R expert and found a solution. I also learned of the perils of using loops and R!