Question-to-Learn Science in Higher Education: A Quantitative Study

This work reports the results of a case study where traditional activities in an engineering/science classroom, such as demonstrations and self-paced activities, were compared with ‘writing across the curriculum” (WAC) activities. One group did writing-to-learn assignments and one group did our seminal construct ‘question-to-learn’ where they designed exam problems for their peers. A model is presented which describes the parameters that influence the exam score outcome. Some of these variables were carefully controlled during the project (labs, textbook, lecturer) and some other parameters were measured (lecture attendance, time-on-task and previous knowledge) in order to minimize data corruption due to confounding variables. The main parameter of interest, the ‘predictor’ of the exam score was the extra-curricular activity. A pre-test and a post-test were also conducted in order to establish the students relative gain. We also tested the hypothesis of using the quality of the students’ WAC outputs as a predictor of academic achievements. Data is analyzed both with parametric and non-parametric methods and results show that there was no significant difference between the groups on exam scores and that the relationship between WAC quality and exam scores is not significant. The main reason for the non-significant results is concluded to be due to low participation rates and too low “dosage”.


Introduction
In the ancient Greece, Platon argued that writing was a threat to intellectual training since it would diminish the need for memorization and foster only a pretense of wisdom [1]. Today, writing is generally recognized as an important element of any curriculum in higher education and one of the most efficient vehicles for learning. It benefits content learning and higher order thinking skills at the same time [2][3][4][5], it promotes critical thinking [6][7][8][9][10][11][12][13][14], metacognition [15][16][17], helps students develop their general ability to express themselves [18] and converts their thoughts to words [19]. The importance of a verbal language to support the learning process was emphasized already by Vygotsky [20] since the development of higher cognitive functions (such as analysis and synthesis) benefits immensely from a proficiency in verbal expression, written verbal expressions in particular. Exactly why writing has such a beneficial influence on learning is still debated, but it has been suggested that it slows down the brain [21,22] and thereby gives the brain more time to process the information and also helps students realize that they don't understand the material as well as they might think. Writing facilitates meaningful learning because it forces students to analyze (break down) and synthesize (reconnect) fundamental scientific concepts [23,24]. Others emphasize the metacognitive processes that are stimulated by writing; the students are forced to think about a) how they perceive the information, what they understand (and what they don't understand) [25], and b) how they connect new information to what they already know [19,26,27].
There are several different ways to implement writing in the curriculum (and reasons for doing it). The general term is "writing across the curriculum" (WAC) and refers to any writing activity within the curriculum. The most common WAC activity in STEM classes (science, technology, engineering, mathematics) is "writing in the discipline" (WID) (sometimes referred to as 'writing to communicate' or 'transactional writing'). WID includes lab reports and bachelor theses. It is not unusual that the bachelor thesis is the first (and only) serious WAC activity in an undergraduate STEM program and this deficiency has been emphasized in the literature [3,28,29]. WID writing is characterized by very formal writing complying to strict templates [30] and there is little room for informal spontaneity.
Writing to learn (WTL), on the other hand, is the opposite of formal writing [18]. Students are encouraged to "just write". Never mind the formalities, just try to word your thoughts. The idea is that the simple act of writing, in itself, benefits learning [31]; "writers generate knowledge at the point of utterance" [32, 33, p. 211]. A typical WTL assignment is less than one page of writing and characterized by being "low-stake", i.e. not graded [14,21,33,34]. It has been debated whether these assignments should be carried out using a pen or a keyboard and in 2014, Mueller and Oppenheimer [35] showed that the use of pen or keyboard triggers different cognitive processes and when it comes to learning, the use of pen is preferable.
It is also generally recognized that WID and WTL activities promote different cognitive levels. WID is considered more benficial for the higher order thinking levels (analysis and synthesis) [22] while WTL is mainly a tool for the lower order thinking levels (remember and understand). In order to close the gap between WID and WTL, "writing to engage" (WTE) has been suggested [36] which is an amalgamation of WID and WTL, but no explicit research on WTE in higher education has been found in the literature. Figure 1 illustrates the relationship between the different WAC activities and Bloom's taxonomy of higher-order cognitive thinking [37]. It could be argued that the benign influence on learning from writing could be explained by the extra time spent on the course material. In 2007, Drabick, Weisenberg, Paul and Bubier [18] compared thinking versus writing; the students who wrote about the subject for some time each week outperformed those who thought about the subject during the same time. It is the act of writing that promotes learning. This conclusion about writing versus thinking was confirmed in an other study in 2011 [38].
Even if the concensus of the community is in favour of WAC activities and extolls it as an important vehicle for learning, some works have reported discordant results. Armstrong, Wallace and Chang [39] reported that writing assignments had only a minimal impact on the overall content learning and Bangert-Drowns, Hurley and Wilkinson [16] even suggested that writing activities could have an adverse effect on learning since it "takes time away from other important educational activities and decreases content coverage" (ibid, p. 33). Libarkin and Ording [40] concluded that even if students' ablitity to discuss scientific concepts and draw reasonable conclusions based on data improved by using writing assignments, their ability to state a hypothesis or draw connections between human actions and environmental impacts did not improve. Sabrio, Sabrio and Tintera [41] could not detect any difference in performance between a writing and a non-writing class in matematics and Bargate [21] purported that writing only improved students' grade on essay-type questions.
It has been recognized by the community that learning by writing does not occur automatically but only under certain conditions [16] and some evidence of the benfits may even be anecdotical [33]. For example, tasks that promote deep processing are more likely to enhance learning [42] and the extent to which instructors explain the learning benefits of writing also affects the learning outcome [34]. Klein [33] stressed the social context's influence on learning by writing.
Hence, writing-to-learn is a complex process influenced by a diversity of external (context) and internal (cognitive) factors and successful implementations require careful planning.

Definition of Concepts
WAC activities are generally considered to improve critical thinking and scientific literacy. To obviate any misunderstandings, we define these concepts here: By 'critical thinking', we refer to the ability to conceptualize, apply, analyze, synthesize and evaluate data acquired from observations, experiments, reasoning or communication with others, and to use this as a guide in decision makings and actions [43]. It is "an intentional, selfregulated process that provides a mechanism for solving problems and making decisions based on reasoning and logic" [10, p. 141].
'Scientific literacy' implicates a level of knowledge and understanding of scientific concepts, conjunctions and processes to such an extent that it benefits decision making in socio-scientific issues [44]. This implies an ability to describe, explain and predict phenomena from a natural science point of view and to identify scientific issues that are likely to have significant influence on society and/or indivduals. A scientific literate is able to assess evidence, pose arguments and draw conclusions based on these evidences and previous knowledge [45].

Brief Review of WAC-WTL
WAC activities have been used for a long time in higher education but did not really draw scholars' attention until the 1970 th [46,47]. Until then, WID had been the dominating WAC activity, but first Britton [46,47] and then Emig [22] acknowledged writing as "a unique mode of learning" [22, p. 122] since it is self-paced and scaffolds both analysis and  [48] purported that it is the process of writing, rather than the product of writing, that promotes learning. This position has been advocated repeatedly by others [14,42,49]. Moskovitz and Kellog [50] stated that WTL activities "treat writing as a means, rather than an end" (ibid, p. 919). Applebee [1] advocated writing because it fosters reasoning skills; "good writing and careful thinking go hand in hand" (ibid, p. 577).
WTL can be, and has been, implemented in several different ways. The most common formats seem to be informal personal journals ("micro themes" or "five-minute essays") [33], writing informally about an experiment [31], blogs and chat forums [19], inquiry-based writing as response to external material [44,50], written arguments for/against research articles [39] and case studies [43]. In order to emphasize the informal character of WTL, Sabrio, Sabrio and Tintera [41] prompted students to "write a letter to a relative or a friend describing what you do to prepare for the upcoming exam" and "write an introduction to chapter x for an algebra student who has not yet studied it" (ibid, p. 422). Almer, Jones and Moeckel [51] proved that as little as "one-minute papers" improved students' performance on essay-like questions in an accounting exam. As a matter of fact, in 2019, Twitter (140 character text messages), was ranked as the 4 th (out of 200) best technological tool for learning by the Center for Learning and Performance Technologies [52].

Introducing QTL
The seminal approach to WAC in this work is the introduction of the QTL concept (Question-to-Learn). Rather than writing essay-like papers, we hypothesized that students' learning would benefit more from being prompted to design exam problems for their peers. Students solve each other's problems in pairs and afterwards they discuss problems and solutions (providing an inherent peer-review process to the QTL concept). It has been reported repeatedly that the most important prerequisite for successful WAC is that the writing assignments must focus on the students' cognitive processes [15,16,27,42] and they must challenge students to define and explain concepts and expand on ideas [28]; we believed that the QTL writing task would meet both these prerequisites even better than WTL and WID. Prompting students to design exam problems is also consonant with other purported aspects of successful WAC; students must be allured to restructure knowledge [3] and meta-reflect on their own understanding [16]. Forced to design a challenging exam problem in combination with peer-review, we belive that both cognitive and meta-cognitive processes are triggered and we hypothesise that this will benefit learning in general and improve exam performance on question items requeiring higher-order thinking skills in particular. We hypothesised that the QTL approch will elicit even deeper thinking processes than plain writing and that this will have a significant impact on both content learning and higher order thinking question items. This is illustrated in Figure 1. The main objective of this study was to benchmark our QTL approach to WAC in relation to traditional WTL assignments.

Method
The target group of this study was the "Introduction to Natural Science" program cohort at University of Gothenburg. This program is for students with a high-school diploma but not eligible for third cycle studies in the STEM disciplines (or medicine). It is a one-year program consisting of introductory courses in mathematics, physics, chemistry and biology and 179 students were accepted for the academic year 2019/2020 and approximately 130 students took the Electricity course during the second semester of fall 2019. The cohort was randomly divided into three groups; a QTL group, a WTL group and a reference group.
The project was first presented to the students in a plenum session where the presumptions were explicated: 1) participation is voluntary but attendance implicates a tacit consent to use students' data for research purposes, 2) these extra-curricular activities do not have any (direct) influence on their grade on the course (but will hopefully have an indirect, positive influence), 3) we ask the students not to migrate between groups and/or participate in multiple activities since that would corrupt the data, and, 4) all data will be treated with professional secrecy (teacher-student privilege).
In order to establish a "baseline" of specific knowledge (in Electricity) and higher order thinking skills, all students took a pre-test consisting of a selection of test items from "Teaching Physics" by Edward Redish [53]. Neither the results nor the solutions were communicated to the students at this time. On the last lecture preceding the exam, the students took the same test again (in order to establish any progress) and this time the solutions were presented orally in plenum. On the pre-/posttest, for each question item students were also asked to indicate their degree of confidence in their answer ('How sure are you that your answer is correct?') on a 5-level Likert scale, ranging from 'Absolutely sure' to 'Just guessing'.
The pre-test and post-test scores were used to determine each student's relative improvement and the exam score was used to determine the students' absolute degree of learning. Finally, one month after the exam, a retention test was scheduled in order to measure any differences in the retention of knowledge between the groups.
The three groups met for one hour weekly for a total of five occasions. This may seem like a short time to register any improvements in higher order thinking skills, but Quitadamo and Kurtz [10], showed that higher order thinking skills can be measurably changed in weeks. The "one-minute paper" study by Almer, Jones and Moeckel [51] also supports this approach.

QTL Instructions
The QTL group was tasked with the following assignment: a) Based on what was covered in the last week's lectures, design a "good" exam problem, including an elaborate solution (on a separate paper). b) Exchange your problem sheet with one of your peers and solve the problem your peer designed. c) Exchange sheets again and correct your peer's solution. d) Finally, discuss your problems and solutions until you agree on the solutions. If necessary, correct the solution to your problem. e) The students were also encouraged to keep this exercise in mind for the upcoming lectures and try to figure out a good problem already during the lectures (and during the in-between-lecture-times). The instructor collected the problem and solution sheets and they were subsequently "graded". The problems were assessed by considering legibility, originality, relevance, level of complexity, elucidating figures and "degree of effort" (do you only need to apply a formula or is higher order thinking required to solve it?). All problems were graded using a Likert scale from 1 to 5, where "5" represents a very creative problem illustrating central concepts and demonstrates a profound understanding on the designer's part. The problem should also be considered to significantly contribute to the peer's learning. At the other end, grade "1" represents a substandard problem characterized by minimum efforts, deficient (or faulty) assumptions and negligibly contributing to the peer's learning. (Problems copied directly from the textbook were also graded with "1".)

WTL Instructions
The WTL assignments were designed according to ideas proposed by Kovac and Sherwood [7,8] and Angelo and Cross [54]; short, informal and not intended for communication. The WTL group was instructed as follows: "Summarize what was presented on the lectures during the last week. Focus on central concepts, conjunctions and context and write no more than one page. In particular, try to explicate the parts you did not understand." This exercise was carried out using pen and paper in order to optimize the learning [35].
The sheets were collected for grading (1-5), where "5" represents a student who demonstrate an excellent ability to express his/her understanding verbally (and/or lack of understanding), the student has a high level of self-awareness (of his/her deficiencies), an excellent language and has correctly identified the gist of the course material. A "1" represents substandard works with severe deficiencies in language, verbal expression and an inability to catch the central concepts and conjunctions.
NB! In both the QTL and WTL groups, the grading was surreptitious to the students. According to MacKinnin-Slaney [55], students are more likely to focus on content and clarity of expression, rather than formal aspects of writing (spelling and grammar) if papers are not graded and Bargate [21] reported that an ungraded paper group improved more than a graded group. All QTL/WTL works were always graded by the same teacher.

The Reference (REF) Group
The reference group had 45 minutes of "traditional" activities in a STEM classroom; the teacher demonstrated problems on the whiteboard half the time and the rest of the time was dedicated to questions or self-paced problem-solving. (This is the "placebo" group; activities are only intended to even out differences in teacher time between groups.)

The NULL Group
There will inevitably be a group of students that ignore any extra-curricular activities. We will refer to this group as the "NULL" group and their contribution to this study is also important. This group's performance was registered and compared with the other three groups. The reason this group is interesting is because it has been suggested that WAC activities may not be the best way to spend students' time [16]. Even if this group is not randomly selected, their performance may be an indication of the credence of that suggestion.

Variables
The main objective of the study was to determine if the extra-curricular activities (X 0 ) have any impact on the students' learning (Y i ). The most fundamental tenet in scientific methodology states that a correlation between X and Y does not imply a causal relationship (causality); significant differences in exam scores between the groups could be explained by confounding factors. Figure 2 illustrates the confounding factors that were considered in this work. All students used the same textbook, did the same laboratory exercises and were lectured by the same teacher; these factors were constant during the experiment. Lecture attendance, Time-on-Task (how much did you study out-ofclass?) and previous knowledge had to be measured in order to establish any differences between the groups. Lecture attendance and Time-on-Task were measured by a questionnaire at the end of the course and previous knowledge was measured by the pre-test. The extra-curricular group activity is the primary independent variable ('predictor') in this study; this is a categorical (nominal) variable (NULL/REF/WTL/QTL). This is the only independent variable that was manipulated in this experiment. Multiple dependent variables ('outcomes') were considered; the students relative gain (post-test minus pretest), the students absolute gain (exam score) and the students retention score (one month after exam). These three cases were also subdivided into two groups; the total score and the score on the HOCS questions only.

Hypotheses
Referring to Table 1, six null hypotheses were formulated: H 00 : There is no difference in the total relative learning between the groups.
H 01 : There is no difference in the HOCS relative learning between the groups.
H 02 : There is no difference in the total absolute learning between the groups.
H 03 : There is no difference in the HOCS absolute learning between the groups.
H 04 : There is no difference in the total retention of knowledge between the groups.
H 05 : There is no difference in the HOCS retention of knowledge between the groups.

Proposed Analysis
From Table 1, it is obvious that the dependent variables depend on multiple variables and hence 'multiple-way' ANOVA analysis will have to be applied in order to get an indication of any significant differences. To this end, all variables will be defined in SPSS 1 and a 'univariate, general linear model' will be applied. If any of the null hypotheses are rejected, we will perform a multiple comparison analysis to establish which groups are significantly different; this will be performed using Fisher's (improved) LSD test (least significant difference), including either Holm [56] and/or Dunn [57] compensated P-values or a simple Tukey test [58, p. 555]. If data does not follow a normal distribution, or if too few samples are collected, non-parametric methods will be applied.
We will also analyze individuals' improvements by analyzing the correlation between the quality of QTL/WTL works and post-test/exam/retention scores.

Screening
The students' participation in these extra-curricular activities was voluntary and attendance rates were less than expected. In order to represent a 'valid sample' for the QTL, WTL or REF group, the following criteria were established: 1. The student must have attended at least two extracurricular classes. 2. The student must have attended both the pre-and posttest. In order to represent a valid sample for the NULL group the student must have attended both the pre-and post-test sessions but not attended any of the extra-curricular activities. Also, students must have provided information about their lecture attendance and their Time-on-Task (outside the classroom).
Hence, most students represented invalid samples because they either only attended one extra-curricular session or they did not take both the pre-and post-test. One student attended multiple extra-curricular sessions and was therefore excluded. Two students did not provide information about their lecture attendance and time-on-task and were also excluded.
After the screening stage, a total of 43 students remained. Table 2 summarizes their scores on the dependent variables Y 0 -Y 3 . The retention test results (Y 4 -Y 5 ) had to be excluded from the experiment since only five of the 43 students took the retention test. Figures 3-6 illustrate the box plots of the four outcome variables Y 0 -Y 3 ; the boxes represent the middle 50% and the whiskers represent the top and bottom 25%.

Testing for Normality
First, data was investigated for normality. Figure 7 illustrates the histograms of the relative gain results for the four groups. (The histograms of the other output variables were very similar.) Due to the low number of samples in some of the groups, normality is hard to infer. Figure 8 illustrates a P-P plot of the pre-test gain for the merged sample of all groups. The meandering of data around the straight line indicates a minor skewness problem but a 1-sample Kolmogorov-Smirnov test of the merged data produced a p=.083 which would suggest that data is normally distributed but since it is just about not significant (α=.05), both parametric and non-parametric methods will be applied. Also, the low sample size indicates that non-parametric methods should be applied [58, pp. 283-284]. However, both parametric and nonparametric methods will be applied for comparison.

Parametric Methods
First, a 1-way ANOVA analysis was performed on output variables Y 0 -Y 3 with respect to independent variables X 0 -X 3 . The result is illustrated in Table 3; no scores are significantly different. Table 4 reports the 2-way ANOVA between the extra-curricular activity (X 0 ) and the other independent variables (X 1 -X 4 ) for the different Y i s. Since no significances were found, no post-hoc multi-comparison analyses were applied.

Non-parametric Methods
A Kruskal-Wallis 1-Way ANOVA test was applied to all the dependent variables (Y 0 -Y 3 ) with the extra-curricular activity as the independent variable (X 0 ). The result is illustrated in Table 5. No significant differences were reported across categories of extra-curricular activities.

Performance Predictors
We also investigated the possibility of using the quality of the QTL and WTL students' writing assignments as a predictor for Y 0 -Y 3 ; the average score on their QTL/WTL assignments was plotted against their performance scores and a linear graph was fitted to the data. Results show that there was no significant correlation between QTL/WTL scores and their academic performance.

Conclusions
From the graphs in Figures 3-8, it can be concluded that the extra-curricular activities implemented in this experiment did not have any significant impact on the students' learning; not in terms of relative or absolute knowledge, regardless of whether the total learning is considered or if only HOCS questions are considered. None of the six initial null hypotheses can be rejected. Also, this work cannot conclude that the quality of their extracurricular writing assignments is a valid predictor of their exam performance.

Discussion
The most interesting results of this study is the NULL groups' performance. Even if this group is not a random selection, it is generally accepted that the students that don't participate in any extra-curricular activities are the low achievers. The results of this work do not support that hypothesis; no significant differences between participants and non-participants were observed.
The fact that none of the hypotheses could be rejected could suggest a support for Bangert-Drown, Hurley & Wilkinsons' [16] contention that WAC activities may be a waste of students' time. There could, however, be other explanations for the absence of significant results. First of all, the 'dosage' was too small; each group was only scheduled for one extra-curricular event each week, a total of five occasions, and few students attended more than one occasion. Another reason for the meagre outcome could be that the low dosage effects reported by Almer, Jones & Moeckel [51] do not apply to STEM teaching (their work was in accounting). Quitadamo & Kurtz [10] proved that effects could be detected in only a few weeks (in a biology class) but they were specifically targeting critical thinking.
However, the main source of uncertainty in this work is of course the low number of samples in the four groups which makes any analysis and conclusions precarious.

Suggestions
Since the main issue in this work was the low participation rate, a new study should focus on stimulating the attendance rates. Extra-curricular classes should be mandatory or some other incentives for participation must be implemented. Also, the logistics and scheduling should be considered. Some of our extra-curricular classes were scheduled early in the morning or late in the afternoon and in classrooms geographically far away from their normal domains.
In order to improve the quality of the QTL assignments, we also suggest that students should not design the exam problems in the classroom, but rather design them at home and bring them ready-made to the QTL classroom. That would save classroom time and probably increase the quality of the problems (and their learning).

Author Note
This work was conducted at the department of Physics, University of Gothenburg, during the fall of 2019 with permission of the department head. The author would like to acknowledge Jonas Enger, Carlo Roberto and Andreas Johansson for their help collecting the data.
The author has no other conflicts of interest to report.