Week 9

I continued with my experimental study and started injecting the next issue into the datset and noting any observations down. I noticed some patterns across the results and shared my observations with professor Fariha during our meeting this week.

Week 8

I continued my experimental study and created scripts to inject each data quality error to automate the process and make it quicker. I first created visualizations with the clean data to use for comparison against the visualizations with the injected data issue. I met with Professor Fariha to discuss findings and get feedback.

Week 7

I created a Google Colab link with for the first dataset: the Most Streamed Spotify Songs 2024. I observed the dataset to take note of any pre-existing errors before injecting data issues into the dataset. I noted some encoding errors where some characters were not utf-8. This caused incorrect information within the visualization to be portrayed. These errors were fixed by adding an encoding to the script when reading in the file.

Week 6

It was finally time for the experimental study. My next task was to gather 10 different datasets and inject 1 data quality issue into each dataset. Then, I chose 5 visual tasks for each dataset and observed the visualization errors given the data quality issues. I gathered clean 10 datasets from Kaggle. The first dataset I was working with was the Most Streamed Spotify Songs 2024. The 5 visualizations I chose were: pie chart, word cloud, histogram, heat map, and scatter plot.

Week 5

I started wrapping up my previous task this week and documenting what common data quality issues I encountered and what type of data visualization issues they produced. I made a connection between each data quality issue and their resulting visualization issues based on the results from the MET dataset. I then created a table to organize the visualizations I previously created and categorized them with the different mirage issues to help me gain a better understanding of each issue. I discussed these findings with professor Fariha during our meeting.

Week 4

This week I continued with creating visualizations from the MET dataset from last week. I also created a slide deck to present my research findings so far to two faculty members from the University of Utah – Dr. Andrew McNutt and Dr. Kate Isaacs. Professor Fariha set up a meeting with us to discuss the research topic and gain insight from them and guidance towards an experimental study. I gained many good points from them on what to consider when designing an experimental study. Some of these points included: defining my audience, what type of issues and tasks to focus on, what tasks matter the most, what would the error debugging process look like, how can people process dealing with these issues, etc. Overall, it was an insightful meeting to gain a different point of view on the study.

Week 3

I continued my task from last week, I realized that this took longer than I anticipated due to the dataset I chose. For this experiment, I used The Metropolitan Museum of Art Open Access dataset which included a plethora of issues including missing values, inconsistent information, missing documentation, possible duplication, mixed text, and numeric data. Some visualizations took longer than others to fix and many visualization errors were encountered in the process. I shared my findings with Professor Fariha and we discussed the different types of data issues and how they can affect data visualizations. I read another research paper about data mirages to help me understand the topic more.

Week 10

This was my final week of the DREU program. I started wrapping up my experimental study although I was not able to get through all 10 datasets with the given time, I was still able to document the patterns and observations I encountered. I met with professor Fariha for a final time to discuss my research findings and what conclusions I drew. She also gave me insight on research in academia vs research in the industry, which I found was helpful.

Week 2

During this week, I presented my findings from week 1 to Professor Fariha, had a discussion about my observations, and recieved feedback regarding the experiment. Then, I was my given my next task which was to find an unclean dataset and perform the same experiment as last week and document steps, errors encountered, time taken to fix each error, the level of complexity, and what conclusion I drew from each visualization. I pre-experiment reading consisted of DataPrism: Exposing Disconnect between Data and Systems by Sainyam Galhotra, Anna Fariha, et al.

Week 1

For my first week, I had initial meeting with Professor Fariha and went over details of the project and established a schedule to meet twice a week. I read five research papers relating to my project to gain further insight and background information about the topic. My first task was to fix imperfect data visualizations given by ChatGPT using a clean dataset. I used the 911 Emergencies dataset on Kaggle to create 10 different visualizations using Python on Google Colab. Then, I documented my steps and any errors expereinced with the code along the way and found solutions to the errors in the AI-generated code.