image

Novice Software Engineers' Focus Across Stack Overflow

2019
My role: Mixed-Methods Researcher, Study Facilitator
Team: Dr. Preetha Chatterjee (lead researcher; Drexel University) and Dr. Lori Pollock (advisor; University of Delaware)

Challenge

Many software engineers visit Stack Overflow, a renowned Q&A forum website, to help and receive assistance for coding problems. Such forums can be beneficial for a software engineer when seeking help and additional perspectives beyond one’s local team. However, it holds an abundant amount of information for one to spend time to read and consider, and likely much more overwhelming for novice software engineers who may not have sufficient expertise and prior knowledge in the field. 

Our team was interested in understanding how novice software engineers direct their focus and what kinds of information they concentrate on the most in each Stack Overflow forum post. Our objective was for the results to inform the interface of a Q&A forum site itself, as well as guide future tool designs that may aim to improve the readability of such forums for novice software engineers. 

Figure illustrating the the overview research process of our exploratory study.

Process

Our mixed-methods empirical study was two-fold: (1) recruited novice software engineers completed a survey on their perceptions of how they use Stack Overflow, and (2) the participants also took on the role of an "information seeker," annotating parts of sample Stack Overflow posts they found informational. I conducted the study and analyzed the gathered data with my team member and mentor, Preetha. The above figure illustrates the major steps that were involved in the study.

Recruitment

We conducted the study during lab sessions of an undergraduate software engineering course, thanks to which we were able to recruit 50 undergraduate students who are novice software engineers with similar prior experience and skill level. 62% of the students reported that they use Stack Overflow frequently, 32% use it periodically, and 6% use if seldom, with at least 94% using the forum site at least once a month.

Annotation activity

We asked each novice software engineer to manually annotate printouts of 8 Stack Overflow posts by highlighting the "minimum amount of code and phrases/sentences in the question and answer, that you believe would be most helpful in making you more efficient in understanding your problem and the suggested solution." The 400 total Stack Overflow posts were pre-selected and printed by us based on various set criteria, dating from August 2008 to March 2019. 

Survey

The recruited novice software engineers were asked to complete a survey of questions that were designed to: (1) reveal potential challenges they face in identifying useful solutions in a Stack Overflow post, if any, and (2) what parts of a Stack Overflow post they would recommend that one should direct their focus on the most to reduce time reading information. We intentionally asked the participants to complete the survey questions after they completed the annotation task to gain reflective feedback from them after looking at concrete examples of Stack Overflow posts. 

Some of the survey questions we asked to better understand novice software engineers' perceptions of Stack Overflow posts.

Analysis

Preetha and I inductively analyzed the novice software engineers' annotations to examine what types of information each annotation highlighted in a Stack Overflow post. We manually labeled each annotation, keeping notes of our individual observations and reflections, and synchronously identified common themes in the annotations to eventually develop a code book. We were able to reach a strong inter-rater reliability after iteratively discussing and resolving any conflicts. 

To explore how much of each type of information we qualitatively identified was available in a Stack Overflow post, we kept count of each information type's occurrence across all 400 posts. For each information type, we also computed the frequency and percentage of posts where participants highlighted relevant code/text, with the aim of investigating how important the novice software engineers viewed the type of information.

A shortened example of how a participant annotated a Stack Overflow post, along with how Preetha and I labeled each annotation.

Outcomes

Our findings suggest that although novice software engineers had no problem reading an entire Stack Overflow post and accompanying answers presented in the study, they only highlighted 27% of code and 16-21% of natural language text as information that they find informational. Future designs of tools that enhance novice software engineers' Stack Overflow experiences may explore opportunities such as: 

I was also grateful to home my skills in mixed-methods research under the mentorship of two experienced researchers via this project. Huge thanks to Preetha and Lori for inviting me to this project! Preetha further investigated the potentials of a browser extension design that addresses some of the aforementioned design opportunities that you can read more about here

Additional details on the project and results are available in our journal publication