Final weeks! I finished up the results regarding the number of discrepancies in our corpus and created some visualizations of the data using R. These graphs and charts were used on the poster that we created for the Simmons Undergraduate Research Symposium. The poster was well received and I am grateful that I had the opportunity to present our project. It was really exciting to show all that we had learned this year.
This week I met again with BJ, a professor at Simmons, to discuss my code for the remaining Tags. I had encountered a problem where the code used to find the total number of discrepancies (vs the total number of patients with discrepancies) was producing a value that did not make sense. With his help I was able to debug the program and it is now working!
This week I will continue to work on finding discrepancies in the other tags in our corpus : hyperglycemia, hyperlipidemia, CAD, family history, and obesity. We also began to work on a final paper outlining the research we completed this year.
I've been working on primarily the same project throughout this time. Trying to find the total number of discrepancies in smoking status was pretty tricky, but I was able to find this number after doing some research and much perseverence (yay!). I also took some time to learn how to use knitr, a package in R that allows me to create html documents so I can easily share my code and results and reduce risk of error. As a group we also submitted a proposal to the undergraduate research conference at our university.
This week I will finish the functions in R that I began to code this week to check for patients with smoking discrepancies, I will also begin to write a function to check for total numbers of discrepancies. Lastly I will write a report for the discrepancies for diabetes status.
Last week I began to brainstorm on ways to attack the problem of smoking status. This week I will be meeting with an outside professor to get some advice on how to go about doing this. I will also be rerunning the analysis on the diabetes discrepancies and creating a report on the findings.
This week I will begin to work on the problem of finding discrepancies of smoking statuses. In our weekly meeting we discussed some of the challenges that will come with this - primarily that smoking status can change in a much more complicated way than diabetes status. In the case of diabetes the patient must have had diabetes, so if they did not we knew there was a discrepancy. Smoking status, on the other hand, can switch back and forth many times However, if a patient is listed as a smoker in the first visit, and then as never having smoked in any later visits then there is a discrepancy. This is definitely going to be an interesting project.
These weeks were spent writing code in R in order to look for discrepancies pertaining to the diabetes status of the patients in our corpus. I met a few bumps in the road but was able to collaborate with an R expert and found a solution. I also learned of the perils of using loops and R!
This week we collaborated to finish a poster presentation for the 2016 Tapia conference. We also read journal articles from the journal of biomedical informatics. The articles were pertaining to the 2014 i2b2/UTHealth NLP shared task, which is where we are getting the information for our research.
This past week our group prepared a presentation, and discussed our project and an overview of our findings to a research group at Simmons. It was definitely a good experience, and I’m glad I was able to get to practice presenting.