Local Preferential Attachment of Posting Behaviour on MOOC Discussion Forum

Online education has become a popular education paradigm and foundation for future educations. Compare to traditional offline classes, online education could support massive participants, but also confronting engagement and retention problems. Many studies have analysed participant’s behaviour to enhance the engagement concerning social, cognitive and technical factors, such as 1) knowledge, cognitive process and social dimensions of micro-collaborations; 2) effects of rating visualization on reading behaviour. However, fewer studies have concerned the organization of discussion forum on posting behaviour analysis. Therefore, this study explores patterns of posting behaviours with consideration of forum organization. In detail, this study applied local preferential attachments (LPA) on the posting behaviour of MOOC discussion forum, and discovered their patterns concerning social, emotional and technical factors. The LPA model will measure message attributes on thread page as a factual local environment of posting behaviour. With statistical test on collected data, we validated the significance and linearity of the LPA on measurements of responses number and message length. Results indicated that: 1) most LPA of posting behaviour are significant; 2) comment behaviour expressed sub-linearity on responses number, and others expressed super-linearity; 3) posting behaviour was affected by compound factors of self-identity, knowledge exchange and community citizenship. Their significant preferences suggested that: 1) forum organization is significant to model posting behaviour; 2) re-organize threads and mark message status could improve engagement of participants.


Introduction
MOOC discussion forum is often an open, optional and loosely structured platform [24]. In the flexible platform, participants are motivated by psychological and cognitive factors, such as self-identity, reciprocity and preferential attachment. Those factors were studied under perspectives of knowledge sharing and social networking. For example, open reciprocity was analysed from factors of self-identity, social norm, trust and altruism [6,8], and preferential attachment was characterized through phenomenon of super-poster and extended thread [3]. Recent studies of discussion forum were concentrated on global networking and content analysis. Networking behaviours were modelled with ERGM and SIENA methods [16,24,11]. Usually, following and posting actions will form a forum network. Content analysis was applied to analyse activities of participants, such as engagement, retention and responding to a message [5,10]. From those analyses, a posting action could be labelled as social, cognitive or affective behaviours. The general purposes of recent studies are two sides. One purpose is to decrease the burden of large volume content, hence, it's necessary to get response patterns for the classification of messages. Another purpose is to find out reasons which enhanced engagement and completion of courses, such as increase feedback to retain participants [19,10].
Except for the discoveries of online course engagement, heterogeneity and ever-growing volume still challenging the efficiency of knowledge exchange [4,19]. That is, new characteristics appeared as a consequence of platform evolution, such as novel organization and compound factors. For example, the EDX platform has accented forum guidance and attached questions under course videos. Of compound factors, course platform integrated functions of open course initiative, social platform and information diffusion. Deng et al. also concluded that integrated functions of online education have been stabilized with dependencies on social learning, intrinsic motivation and network externality [6]. Therefore, it's necessary to discover patterns under the novel characteristics.
To discover novel patterns under the characteristics, we proposed hypotheses with local mechanism of networking analysis. Conform to platform organization, we measured the LPA with variables of message length and responses number. The reason to apply LPA is that participants are confronting local pages when posting a message. As most course syllabus designed, lectures and assignments are usually conditioned on weeks and themes [18]. On another side, platforms haven't aggregate messages in a global vision. Hence, with high probability, participants will only consider local information when posting a message, such as an answer or comment. To accommodate the changes of MOOC platform, we propose LPA model to measure the participants' propensity and growth pattern under the organization and compound factors.
In summary, discussion forum is an important component to attract participants through cognitive and social factors. This phenomenon guided previous studies to discover patterns of knowledge exchange and psychological activity as a primary theme of online education studies. In those patterns, preference attachment has been modelled in global vision, other than thread conditioned. However, posting behaviours are usually undertaken on a local page. Therefore, we propose the LPA model to discover the propensity of participants when posting messages.
Our study contributes to online course engagement with two points. Firstly, we introduced LPA modelling to posting behaviour and validated its utility. Secondly, we discovered patterns of preference under compound motivations of self-identity, knowledge exchange and community citizenship. In nutshell, we modelled the posting behaviour under the novel characteristics of MOOC platform.

Theory and Hypothesis
This paper constructed measurements to test significance of LPA of posting behaviour. To explain results of the tests, this section introduces theories and procedures of local preferential attachments.
Measure LPA requires an assumption that the local page affects posting behaviour independently. The rationale of this assumption is as follows. Firstly, Posting behaviour involves large cognitive investment [14,5]. As a consequence, large volume of message will impose participants to concentrate on partial information when posting responses. Secondly, platform also divided messages into many web pages. Hence, the partial information roughly equals a single web page. For example, after a lecture, a participant may answer a question of this lecture, without considering questions of other lectures. Except for the lecture, questions also could be organized along assignments or weeks. Those flexible organizations affected activities of participants [19,3,2]. For example, an online discussion thread grows either one of three patterns: short thread pattern, extended thread pattern with an elongated structure or a split thread pattern with a broad structure [3]. And, rating visualizations could affect navigation and learning of participants [2].
Given the organization and assumed context of single web page, we measure LPA with two theoretical foundations. One is local preferential attachment when modelling growth of network. Another is mechanism of knowledge exchanging which composed the local mechanisms of posting behaviour.

Local Preferential Attachment
Preferential attachment is a reasonable model to generate scale-free network, such as online social network and citation network [22,17]. In those networks, new connections will prefer nodes with high in-degree or other attributes. In another word, when network is growing, node will get new connection with a probability that proportional to its attribute. For example, journals from the same discipline are more likely to form citation links than journals from different disciplines [16]. With integration of social and cognitive factors, activities on discussion forum may prefer special message attributes as well [23].
Compare to the preferential attachment on global network, local mechanisms were improved network modelling with likelihood estimation [20,22]. With assumption that network should be preferentially attached to already well-connected nodes, extra local rules may stabilize the simulated degree distribution [22]. Generally, local mechanism of preferential attachment is settled with local structure, limited information and copying strategy [20,22,16]. Those strategies are reflected on course platform, i.e., corresponding to cognitive energy investment of posting behaviour. When generating a new connection, posters will not count the global distribution like the Price's model, but rely on local information to conduct a posting action.

Mechanisms and Variables
Except for the primary purpose to exchange knowledge, posting actions are also affected by forum organization and honour code which are designed for efficient knowledge exchange. Course platforms organized messages to facilitate search and response action of participants, such as organization along with weeks or assignments. That is, information in a single web page is sufficient to support a posting action, such as answer or comment. Given a single web page, such as a thread page, all messages on the page will form an ambient context to affect posting action through psychological and cognitive activities of participants. In another word, the posting behaviour of the participant could be fully affected by a single page.
Participants may post a message for various purposes, such as cardinal knowledge exchange, self-identity, community citizenship and acquire feedback. In detail, self-identity and self-efficacy are fundamental conditions to enact posting actions and the reason to modify their posting behaviour, i.e., prefer to respond to one message than another [10]. Except for the self-identity, participants may be identified externally by community citizenship when perceiving external prestige and distinctiveness of online support communities [4]. Through opening process and self-organization, participants could influence online education platform with active knowledge exchanges, rather than take it as an auxiliary or impulsive learning platform. Feedback from discussion forum can promote the engagement of online course learners [7,10]. In reverse, participants may respond to core questions for pursuing more feedback. In nutshell, when posting messages, participant will compare local ambient information to fulfil their needs of psychology and knowledge exchange.
Given the above mechanisms, we measure LPA with two variables: message length and responses number. The selection of the variables has two reasons. Firstly, both variables have sufficient visual significance to affect posting action of participant. Secondly, they could be extracted from archive data.
Message length will affect posting behaviour with the following rationale. Message length positively correlates to perplexity of the message, i.e., imply more information. Hence, with the expectation of reciprocity, a longer message will be considered more endeavoured when exchanging knowledge and attract feedback. That is, participants prefer to respond high-quality messages to improve their community [2]. In advance of self-identity, the preference conducted community citizenship through development of social identity [4]. If all participants prefer longer messages, then, it will get super-linearity with network externality. Therefore, as a major pattern, the variable of message length will achieve positive attachment level and super-linearity.
As a minor pattern, sub-linearity could happen for the following reasons. Firstly, preference of message length has sub-linearity, since the long messages are more compressible than short messages. Quantitatively, the coefficient of the linearity should in [0, 1] if only consider the coding length of a message. That is, message could be compressed much shorter without losing information. Secondly, participant may respond to messages of a specific theme, since they have a specific cognitive level, habit and subjective perceptions [6]. In addition, themes of humour and technique issues could generate sub-linearity.
In those contrary effects, super-linearity should be in dominant status, because knowledge exchange is the essence of course platform, other than social activities. Hence, given the purposes and primary perceptions, we propose hypotheses of message length preference as follows.
Hypothesis 1. Participants prefer to respond to longer messages on a web page of discussion forum.
Hypothesis 2. For preference of message length, super-linear model has high likelihood than linear model.
More precise than the message length, responses number could be functioned identically compare to in-degree of global networking model. That is, participant will prefer popular messages if discussion forum conform to traditional network growth pattern [9]. However, given the same effects about community citizenship, social presence and feedback acquisition, participants may prefer non-responded messages as well. Of linearity, insensitivity of large numbers will generate the sub-linearity effect. But on another side, complicated or top questions may attract more responses and present super-linearity as a consequence. Hypothesis 3. Participants prefer to respond to messages which have lower responses number.
Hypothesis 4. For responses number, sub-linear model has a higher likelihood than linear model.
In nutshell, to achieve efficient knowledge sharing, posting actions will be affected by two opposite factors. One is herding behaviour which prefers popular messages. Another is self-identity and reciprocity which could equalize the responses. Our statistical test will explore the dominant factors of posting behaviour. To depict the factors, we also analysed messages qualitatively with typical preferences. Our measurements could indicate types of messages and participants, such as instructor to prefer equilibrium, and emotional contagion to transferring positive and negative emotions to others [21].

Methodology
To test the LPA, we reconstructed local pages of posting actions. For example, we could extract messages that a participant was confronting when he posts a comment. In detail, our study was performed as follows. We collect data at first and extract the hypothesized variables from the ambient context of each message, e.g., length of comments { , , , } ∈ . Next, we test the significance of LPA of posting behaviour with relative attributes and student t-test. In addition, we analyse the linearity of preference through a comparison of likelihoods. At last, we select prominent examples to explain their conditions on message types.

Data Collection
Forum data were collected from two online course platforms edx.org, piazza.com. EDX supports institutions to create online courses with open and plenty of components. On this platform, participants could take free online courses from universities and institutions in the world. Piazza is a free online gathering place where students can ask, answer, and explore, under the guidance of their instructors. Both platforms provided typical components and guidance for knowledge exchange, such as forum, survey and honour codes. Hence, they could sufficiently support studies of posting behaviour, message structures and social activities of attachment and endorsement [1,18]. Given the sufficiency, we select two courses on the platforms to test our hypotheses. Table 1 presented the courses we were analysed, and the following will introduce their characteristics. Similar to school education, online courses also introduced components of syllabus, quiz, examination and discussion. Of discussion forum, the two courses generated fundamental thread types, such as greeting, quiz, assignment and their technical issues. Also, in process of knowledge exchange, networking and openness are truly diversified the participants and activities on those courses. That means, not only the syllabus but technical issues or social activities could dominate the forum of a specific lecture. For example, specific technique issue usually dominates a class on the discussion forum. Hence, we discover local preferential attachment to characterize the posting actions and explain their diversified pattern of knowledge exchange.
In discussion forums, typical activities are endorsements and postings of question, answer and comment. For each platform, a posted message was attached with meta-information of publish date and participant's id. We recognized the message and that meta-information with HTML tags and visual structures. The recognized data columns are denoted as follows.
1. Question ( ∈ ) will generate a thread page, with answers ( ∈ ) listed under the question. For an answer ( ), its comments ( ∈ ) will be appended to the answer. As a convention of discussion forum, answer should be solution of question, and comment will discuss the answer with supplement information, such as links and requirements. 2. Each message belongs to one type of , , , and will be attached with meta information of publish date and poster's id. In addition, question may have upvote score, and poster may have reputation and identifiable social status. 3. Statistical values. Essential attributes are message length and responses number. Optionally, the message may have been distributed on topics or tags.

Local Ambient Context
In discussion forum, most pages belong to question list or question detail. On question page, ambient context includes past questions. On question detail page, ambient context includes question content and past answers and comments of the question. Ambient information of our measurement excluded personal information and node attributes, such as gender, badge and location. Participants may answer a question as a result of searching behaviour or subscribed email, here, the ambient context will be a search result. Our measurement excluded these situations, and just consider question list in a time range or a specific class. In nutshell, we postulated that posting behaviour was determined by information on a single web page. And, ambient information consists of messages on a single web page. The following is a schema of ambient information which formalized the variables of LPA. Figure 1 presented structures of two thread pages, where, each sub-figure denotes a question detail page and each rectangle denotes a message. On a web-page, is the title of current question, and is the question's content. is the answer of the . , is the comment of the . When a new message is being generated, e.g. , in figure  1, a local page will be expanded with a rectangle inserted into the HTML page. Therefore, a posting action was conditioned on two elements , , where, denotes messages in ambient context, and is the responded message. For each message , we have measured the variables to test the preferential level of its posting action.

Variables to Measure LPA
Variables included message length and responses number which was measured through relative attributes of message out of ambient messages . Let , denote a pair (ambient messages, a responded message), and ⋅ , " # ⋅ , "$% ⋅ denote functions of message length, responses number and publish date of message respectively. For example, & will return the text length of a message, and will return a list of message lengths. Given a message , we define its ambient messages (1) and relative attribute (2) as follows.
Where * & denotes local page of message & . The * & ' * means that messages &, are on the same local page, and "$% & , "$% means that message & is posted before message , hence, it belongs to 's ambient context. Compare to comment behaviour, answering behaviour has relatively various local environment. On platforms, EDX has appended questions to each lecture, hence, answering will confront a lecture page. In the contrary, for the piazza, we construct local question page with time windows that include 6 questions.
The relative attribute D , indicated preference of participant when posting message , and distribution of the attributes indicated LPA level of measurement. This is similar to the view of citation preference [15]. If a relative attribute is greater than zero, we could say that posting behaviour preferred the attribute positively, i.e., preferred longer or popular messages. However, this conclusion hasn't considered various means of local pages which causes the scaling issue of the relative attribute. Hence, we also compared the distributions of measurements of responded message with random selections.

Preference and Linearity
The distribution of relative attributes may be biased by local aggregation when test LPA levels. For example, it could not discern preference 50,28,2 from 50,1,28,2 , since they both have -' 1 . Therefore, we also applied two-sample t-test between random selection and real distribution.
For linearity analysis, we applied Barabási-Albert model to posting behaviour with a parameter of linearity. With modified B-A models, we could approval linearity of each variable of preferential attachment. Coherent to B-A, Price's, and non-linear growth modes [12], we calculated likelihoods and added parameter : to adjust probability as follows.
where, ; I means probability with the non-linear transformation. < means relative attributes of the messages , and functioned as the weight of the sum-up.
The rationality to construct equation 4 is as follows. Firstly, traditional analysis of degree distribution [13] may not precise, because insufficient messages of the local page will cause lower statistical power, and distributions of local pages do not follow an identical event. i.e., their partition functions are different. Hence, we directly compare likelihoods of different parameters of linearity. Secondly, the smooth 1 K ; of non-linearity will impart the : an ability to indicate non-linearity. A study also assigned non-zero attachment value even to nodes of zero degree [9]. Of divide variance L$ < , we scale the calculation without alter the original likelihood with : ' 1 . In another word, the division of attributes will not affect the value of the likelihood.

Results
This section explains results in three parts: 1) distribution of relative attributes as non-parameter analysis; 2) statistical tests to validate the significance of LPA; 3) qualitative analysis to explore message types along with different preferences. From the results, we would conclude that most hypotheses are significant in the three parts. Figure 2 presented the distribution of relative attributes of -.,/ corresponding to equations 2. Each sub-figure measured a specific attribute on a discussion forum. In detail, each row denotes a dataset, and from left to right, the columns denote number of comments, length of comment, number of answers, length of answer respectively.  Most attributes are positively preferred as table 2 indicated, i.e., the proportion of M N 0 is greater than M , 0. The one exception, i.e., responses number of EDX answers, shows that community citizenship has primary effect over other factors.

Distributions of the Variables
However, the exception could be an intrinsic bias, because early published messages are encountered more which is similar to sampling bias of the Poisson process. Table 3 is t-test results of the relative attributes. For a specific variable, the t-test was performed on two distributions of random and real responding behaviours. Similar to table 2, the t-test showed many significant LPA levels and excluded the effect of Poisson distribution on the attributes.  preferences in table 2 are  conformed to the result of table 3, such as the negative preference of response number of answer on the EDX platform.

T-test and Likelihood
The negative t-values of degree confirmed hypothesis H3 on partial response numbers, i.e., community citizenship improved knowledge to exchange with feedback. However, response number of answer of piazza got a significant positive t-value. This positivity could be ascribed to search behaviour or email prescription in which the participant answered a question.
All significant positive t-values are validated hypothesis H1, i.e., participant will appreciate knowledge endeavour from other participants. Hence, message lengths were significantly affected posting action to respond to a message. Figure 3 plotted curves of likelihoods concerning equation 4. In the figure, x-axis denotes parameter :, and y-axis denotes the likelihood of various models. In the plotted range of :, the only local optimal showed is the degree of comment. On the contrary, message length has a large super-linearity, which means that longer messages will be appreciated by participants. However, we should notice that a message could be compressed with various levels with quite discrepancies. For answer degree, the EDX expressed sub-linearity, which confirmed the non-significance of its t-test results. The piazza expressed super-linear. With its positive t-value, we conclude that several messages are attracted large responses with specific reasons.

Qualitative Analysis
To explore the specific reasons, this section analysed messages which were responded with a pretty high preferential level, i.e., located at terminals in figure 2. Even the specific messages are relatively less, their pattern and purpose still could reflect the social-cognitive theories which are used to construct our hypotheses. To acquire comparable message types, we sample EDX answers with a high preferential level of response number and plenty of messages in an ambient context. The plenty means that sum of " # is pretty large. Given the criteria, table 4 listed the selected messages with four columns: degree of selected message (dcv), & $ " # (dpm), question content and answer content.
In table 4, all messages are relatively urgent in different ways. In detail, messages about technical issues expressed a low preferential level, and emotion or content related messages expressed a higher preferential level. For example, approve a message implied a high level of psychological factors, such as +1, +2, and Strangely enough, Thanks, marked as incorrect implied emotional expression. Messages with their properties will achieve a special preferential level. Please take a look. Have an answer in slightly different notation, but it is marked as incorrect. used q(alpha/2) Please take a look. Have an answer in slightly different notation, but it is marked as incorrect.
The delta method does not apply here, when 'p = 1/2', we have a 0 in a denominator. Can help me...

2.11
Please take a look. Have an answer in slightly different notation, but it is marked as incorrect. I used q(0.025) too! Thanks Please take a look. Have an answer in slightly different notation, but it is marked as incorrect. same Theoretically, the samples conformed to studies of behaviour patterns of self-identity, citizenship, emotional contagion in section 2. For a message, its first response may prefer self-identity or cognitive achievement, and the following may express emotional traits. That is, the response at the beginning could be reinforced with instructions [8], and the latter may prefer approval or emotional expression, such as confirmation or summarizing process [3,1]. In the middle range, knowledge exchange related interactions are essential reasons to sustain engagement [11]. In another word, interactive participants will continuously engage the course, and then post more messages with self-nourished interactivity.

Discussion
In this paper, we proposed a local attachment model to discover the novel pattern of posting behaviour in online course forum. Through t-test and likelihood ratio, we validated hypotheses about the pattern on two collected datasets. Experiment results confirmed most of the hypotheses and specificity of messages. Given the hypotheses and results, we discuss concepts of the local and preferential as follows to clarify our model.

Local Page and Comparative Analysis
The local page is an ambient context that independently affects posting behaviour of a participant. With data volume increasing, it is necessary to concentrate on local page when posting a message. On another side, the unified platform organization facilitated the networking model with the local mechanism. That is, we could find out which ambient context has affected posting behaviour of participants. Given the necessity and sufficiency, we could model posting behaviour with local preferential attachment.
When posting a message, participants will compare the attributes of messages on the local page. The final posting action will be affected by the comparison and traits of the poster. In another word, related attributes of the final posting will reflect the social-cognitive pattern of the posting behaviour. In addition, the comparison may not happen in some cases, since participants may post a message instantly or through search behaviour. This posting strategy will generate specific messages and preferential levels in our qualitative analysis.

Mechanism of the LPA
In networking behaviours, new connections are primarily affected by preference or purpose, e.g., the purpose of self-present may enact a connection to core nodes. When posting a message, the purpose of participant also determined their posting behaviours. Since purpose of participant could not be observed directly, we modelled the LPA to indicate the purpose. In reverse, our results validated that posting behaviours were affected by compound factors, such as self-identity and community citizenship.
The factors assimilated participants toward conformity and criteria of course platform. To achieve efficient knowledge exchange, participants will follow guidance, honour code and citizenship in platforms. Those factors are shaping homogeneity of participants as foundation of our hypotheses. In platforms, category and visualization of messages also affected activities of participants [1,2]. This is a reverse way that assimilate the participants. With a pretty large investment of cognitive energy, participants will care about those endeavours. Hence, in process of responding, they will compare and enact reasonably. In detail, learner may passively mimic their peers (irrational herding) or engages in active observational learning concerning their needs (rational herding) [23].

Research Comparison
This study shared several common points with previous studies. Firstly, there is a common purpose to measure interactivity behaviour and fulfil the requirement of online education analysis [11]. Secondly, methods of statistical test, networking model and content analysis are applied to online education analysis, such as ANOVA [2], likelihood ratio [14], SIENA [24]. Thirdly, we introduced similar objects and factors to model online education, such as Piazza [1], AOD forum [3] and Facebook Group [6]. Most of those analyses are performed on units of message, thread and participant. We have resorted to classical and specified factors which are significant on education platforms, such as community citizenship [4], cognitive engagement [5] and psychological needs [7].
This study also extended traditional methods to discover the pattern of posting behaviour concerning forum organization. Firstly, previous studies focused on single aspects of message types, response times and growth patterns on posting behaviour [1,3]. Our study discovered compound factors that affect posting behaviour. We analysed those preferential attachments which are conditioned on message attributes. Secondly, we applied LPA [20] concerning forum organization and factual environment of posting behaviour which is different to the reading behaviour or community size with network modelling [2,24]. In addition, we combined qualitative and quantitative analyses to validate the patterns of posting behaviours.

Conclusion
Through statistical tests on two datasets, we conclude our study as follows. Firstly, local mechanisms could be applied to networking analysis on the MOOC discussion forum. That is, we discovered significant local attachments through the LPA models. Secondly, posting behaviours are affected by compound factors which include self-identity, knowledge exchange and community citizenship. Thirdly, participants preferred messages with specific types, such as technical issues or emotional expressions.
This study validated that forum organization is usually to model participant behaviour because the organization decides the factual environment of posting behaviour. Therefore, we could improve forum organization and online learning environment with the following strategies. Firstly, we should add guidance on the format of questions and answers to make them articulated. Secondly, lecturer should harness the emotional and technical issues with quick responses. If not, those threads could increase quickly and attract attention of participants. Thirdly, we can mark the status and importance of posts to evoke community citizenship and posting behaviour of participants.