A Multimodal Discourse Analysis of the National Publicity Film of China

Since entering the era of information, people are realizing that meanings no longer exist only in language or words, but are also conveyed by non-verbal modalities like images, sound, color, and gesture. These non-verbal means of communication are increasingly significant and get into a broader utilization in public media communication, especially in publicity film or short video. This research, based mainly on Kress. G & Van Leeuwen’s Grammar of Visual Design and Zhang Delu’s Comprehensive Theoretical Framework of Multimodal Discourse Analysis, conducts a multimodal discourse analysis on the National Publicity Film of China—China Steps Into a New Era. After a synthesis analysis from both macroscopic and microcosmic angle of the transcription of the film, this thesis comes up to its conclusion that the publicity film mainly involves two kinds of modalities: visual sense and auditory sense. Visual sense includes light, image, color, body movement and facial expression; auditory sense falls into language, music and sound. These modalities are not only simple spreading out or superposition, but interact, mingle and supplement with one another, so as to reach emotional resonance. Consequently, the communicative purpose of the multimodal discourse is realized in this way.


Introduction
In the era of information, meanings are not only conveyed in language, but in all forms of semiotic resources such as image, color and sound. These non-verbal means of communication are increasingly significant and get into a broader utilization in public media communication, especially in publicity film or short video. National image is believed to be a core part in presenting the country with its comprehensive national strength and influence as well as its "soft power"; therefore, national publicity film is deemed as a crucial method to build a favorable national image, which is as well a good means of cultural exportation. 40-years of economic reform enables China an economic boom and our international influence up to a new level, thus China's national image gains much attention. There are already several studies of multimodal discourse analysis (MDA) on different versions of Publicity Film of China. Yang Ying [1] probed into synergy analysis of China's National Image Publicity Film, which was shot at 2010; Zhang Dongmei [2] studied the Interactive Meaning in the National Publicity Film of China, which was shot at 2011. Yang and Zhang chose different versions of the Publicity Film of China and different methods and angles of MDA. This study aims to carry out a multimodal discourse analysis on the latest National Publicity Film of China-China Steps Into a New Era, attempting to figure out synergy of different kinds of modalities or semiotic resources in it and provide a reference for dynamic multimodal discourse makers.

Theoretical Basis
Multimodal Discourse Analysis deals with various forms of meaning units (language, image, sound, body language, and spacial arrangement), and focuses on how they work together to realize communicative function. Scholar around the world devoted much time and energy to set up their own theories of multimodal discourse analysis. So far, the greatest contributors are Kress G.&Van Leeuwen [3]. They, based on Halliday's three Metafunctions [4], came up with three meanings of images. Their contribution lies in applying Halliday's Systemic Functional Grammar into visual layer, proposing the Grammar of Visual Design, providing a theoretical foundation for a better analysis and understanding of the meaning of images. Other researchers like O'Toole [5] also brought Halliday's theory bear on the study of paintings, sculptures, and architecture, setting up a referential analysis theoretical framework for a further study on MDA in the 1980s and 1990s. In 21st century, with new science technology emerging rapidly and the development of information technology, especially the development of the information communication in multi-media, scholars began to engage in the study of MDA in advertising, film, cartoon, teaching, new media, etc. Baldry & Thibault [6] proceeded an attempting exploration on dynamic video discourse. In convenience for the discourse analysis, he probed into the the approach of transcription and annotation of film and television texts, dividing video discourse into two levels: Display and Depiction. Tan [7] presented a comprehensive Systemic Functional Framework for the analysis of dynamic video discourse, attempting at making it clear on how a dynamic video discourse express its meaning potentiality in the process of In-line and Inter-semiotic interaction. Via the analysis of a teacher-recruitment advertisement, the focus of Lim & O'Halloran [8] went into the validity on visual semantic layer in studying relation of the inter-frame of dynamic video discourse.
What cannot be denied is that recently multimodal discourse analysis on video discourse turns out to be a challenging task due to its involving in technique process as collection, compound and playback of voice and video, its dynamic peculiarity, and the complexity in analyzing. Since there is yet no unanimous applicable theoretical frame of MDA on video-discourse, this study will combine Kress & Van Leeuwen's Grammar of Visual Design and Zhang Delu's Synthetic Theoretical Framework of MDA as the theoretical basis, while as for the research method, it will refer to Baldry & Thibault's approach of multimodal transcription and text analysis, which firstly transcribe the film into image and text discourse, and then proceed a synergy analysis to figure out how these semiotic resources work together to accomplish the publicity of the film.

Baldry & Thibault's Approach of Multimodal Transcription and Text Analysis
Baldry & Thibault [6] set up a research pattern for MDA, which is usually conducted in this way: transcribe video discourses into image and text, recreate it by the sequence of the video, and then proceed the analysis with Systemic Functional Linguistics as register theory and cohesion theory, together with Grammar of Visual Design by Kress & Van Leeuwen. Under this analysis mode, the essence of video discourse is regarded as closely-jointed continuum. Video is created by a series of pictures or images playing in an appointed frequency accompanied with certain music and sound. According to Baldry & Thibault's theory, transcription itself is an act of analysis as well as the record of the analysis. They stratifies video discourse into two levels: display and depiction. Visual image, by nature, is a delimited optic array that some kind of electronic equipment project on the screen. Therefore, it can be decided that display level is made up of such visual resources as line, spot, light, shadow and color along with the variant of these resources; depiction consists of a succession of visual scene such as movement, event, person and object.
Baldry & Thibault's approach of MDA transcription and analysis can be demonstrated as micro-analysis and macro-analysis. The transcription for macro-analysis aims at accessing the fundamental structure of the video discourse, and trying to express the meaning-making process according to the connection between the sub-units that build up the dynamic multimodal discourse video. These sub-units include Cluster, Phase and Transitivity Frame [6]. "Macro-analysis and transcription of the video discourse reveal the relation between sub-units and the three metafunctions" [9]; the transcription for micro-analysis resort in the elaboration of all the semiotic resources in meaning-making process.

Grammar of Visual Design
Kress & Van Leeuwen's theory serves as the fundamental theory in the macro-analysis process. By comparison to the three metafunctions in Systemic Functional Linguistics, Kress & Van Leeuwen [3] put forward three meaning of image: representational meaning, interactive meaning and compositional meaning.
Representational meaning can be regarded as the counterpart of ideational function in Systemic Functional Linguistics. It can be divided into narration and concept, the former of which consists of three processes: move & action, utterance and mentality. While, the latter can be classified as category, analysis and symbol. The element, in a given image, which shaped as an oblique line is what we call vector. Vectors are the sign of a narrative image. In the process of narrative representation, the actor can be explained as a participant that gives out the vectors or the actor itself can also be the vector. Therefore, the analysis on vectors can be regarded as a feasible approach to explore the representational meaning in the film, in which the narrative techniques are frequently used.
Interactive meaning is made up of three sub-systems: contact, social distance and attitude. Contact system refers to the effect that a certain form of figure has on the audience, which means the figure or image irritates the audience to respond to by its visual message on video. Social distance indicates the social distance between the audience and performers, which reveals their relationship, whether their relationship is close or alienated. Attitude system comprises two levels, crosswise and lengthways. The crosswise level measures the degree of participation of the audience, while the lengthways level shows the power relation between the audience and performers.
In correspondence to the textual metafunction in Systemic Functional Linguistics, Kress & Van Leeuwen came up with compositional meaning, and probed into the analysis of three semiotic resources of compositional meaning. That is the value of message, salience and scale division. The value of message points to the different value of each semiotic resources because of their different distribution in the composition. Salience can be explained as the level of audience's attention drawn by different resources, which is determined by such factors as brightness, color contract, size, foreground or background. The scale division refers to whether there is any segmentation in between the resources, and whether the segmentation is obvious or vague.

Zhang Delu's Synthetic Theoretical Framework of Multimodal Discourse Analysis
Based on Halliday's Systemic Functional Linguistics, Zhang Delu [10] proposed a Synthetic Theoretical Framework for MDA, which is made up of five layers: cultural layer, contextual layer, meaning layer, pattern layer and media layer.
The cultural layer includes ideology that acts as a major existence in one culture and potential style or generic structure of discourse mode. In other words, any process of meaning-making was created under one certain culture, including one's mind-sets and cultured-fixed preference. Contextual layer refers to the contextual feature that consists of the factors such as the scope of discourse, the fundamental mood of discourse, and the form of discourse. Meaning layer covers function of the discourse which closely related to three metafunctions, that is ideational function, interpersonal function, and textual function. In pattern layer, the focus is shifted to different systems of meaning-making. These systems are the combination of grammar of human language and vocabulary, and the ideographic form or grammar for visual and auditory and tactile sense.
The media layer, the significant element in meaning-making process, is a sort of material form in which discourse exists in this natural world. Media layer can be split into language and non-language systems. In language system, voice and words are the core elements together with some com-language elements such as sound, tone, typeface and its distribution. These systems, as Zhang put, can be the supplement and strengthening issues in the language meaning conveying, and they are as much important as language system in meaning-making and even can change the whole meaning in the process of meaning making. Just as what China Steps Into a New Era shows us, the meaning is conveyed in figures' sound, tone and the meaning is also added by editing and rearrangement of the video tape, so it is indispensable for this thesis to make a thorough inquiry in language system; non-language system is another considerable carrier in meaning making. This system includes body language and non-body language. Body language includes body movement and facial expression, etc. Non-body language will not be involved in the discussion of this thesis because it is concerned with the environment and meaning-expressing tools that a person uses, which have little significance in dynamic multimodal video.

Research Design
On 21st October, 2017, the National Publicity Film of China-China Steps Into a New Era was released in People's Daily app. It was spreading all over the major media, brushing burst circle of friends shortly after it launched. The micro-video marks a milestone that China no longer remains the one that seeks food and clothing, but grows rich and becomes strong, and it now embraces the brilliant prospects of rejuvenation. This new era mainly reflects in that socialism with Chinese characteristics has crossed the threshold into a new era, that China is confronting with a new principal contradiction that needs to be solved for people's need in a better life, and that China designs new centenary goals for building a moderately prosperous society in all respects and of fully building a modern socialist country. In this new era, China is standing unprecedentedly close to the centre of the world arena, in which the Belt and Road Initiative associates China with most of the countries in the world and enables China to occupy a decisive position in this very new era. Under this setting, the video China Steps Into a New Era can serve as a perfect research resource for MDA and meanwhile to present a new image for the world what China really is now and the responsibilities of modern China.

Research Questions
The objective is to study the synergy of different modalities such as speech, sound and image in the film, and attempts to figure out how these modalities work together to construct the meaning of communication. Accordingly, the research questions of this thesis are designed as follows: (1) How many kinds of modalities are involved in the multimodal discourse film? (2) How these modalities cooperate and coordinate with one another to create an integrated meaning and embody the three metafunctions of image?

Transcription and Annotation of the Film
As a publicity film, the National Publicity Film of China introduces "China steps into a new era" as its theme. In order to incarnate this theme, the film chooses seven characters of different ages and identities. By narrating their own Chinese dream, those seven characters show in detail how China steps into a new era in all walks of life. Finally, this publicity film ends in a quotation of president Xi Jinping's speech in reports of the 19th National Congress of the Communist Party of China. The whole film lasts for three minutes. On account that there are 24 frames in one second, so there are all together 4320 frames in the film. It is unnecessary and unfeasible to analyze all of the frames. Baldry & Thibault [6] also pointed that transcribing must be selective, and transcription should be conducted to those visual (or other) features that have pertinence to the analysis so as to avoid endless detail description and aimless transcription. For the sake of limited space and in convenience for the analysis, this study only demonstrates the integration of all forms of modalities and the key frames. In the transcription, it is well to be reminded that the seven characters serve as a significant part in the meaning construction for these characters contain social semiotic function that make up the theme of this film. Table 1 shows the basic information of the seven characters. For the transcription at the macro level, this thesis divides crucially important parts of the film into display and depiction. According to Baldry & Thibault [6], display is made up of some visual resources such as line, spot, light, shadow and color, and depiction is a succession of visual scene such as movement, event, person and object. The sub-units include cluster, phase, and transitivity frame. This thesis divides the dynamic discourse into 8 clusters, namely the seven characters work as a cluster plus a piece of Xi Jinping's speech in the end as the last cluster. A cluster involves several phases that are edited and arranged as film technique. The transitivity frame refers to every last frame that belongs to the cluster. In the transcription, these semiotic resources will be decoded into combination of pictures and texts. Table 2 shows the transcription of cluster 1 and the transcription of the rest clusters are done accordingly, among which several typical ones are listed for further analysis.

The Metafunctions of Image
According to Baldry & Thibault, transcription should be selective, so does the analysis. In other words, the research object should be typical and selective so that it can attain the research objective. Therefore, in order to stick to the typicality, and also to simplify the analysis process, this part will only involves several clusters and phases of those transcription to elucidate the metafunctions of images instead of covering all the clusters.

Representational Meaning
According to Grammar of Visual Design, representational meaning of video discourse can be divides into narration and concept, the distinction of which lies in the vector. A representational image presents the audience as a movement or an event in process. In the transcription above, phase 1 in Table 2 can be defined as narrative discourse. Because the girl's view forms a vector. The direction of the view is the direction of the vector. What the girl is gazing at is apparently the receiver of the video discourse-the audience. The film begins with the girl's close-up face, strengthening the representational meaning of the vector that is pointing at, which enables the audience pay more attention to the vector and the following dynamic discourse of this video. What the vector is pointing at not only indicating the representation of image but also the representation of the character's mental behavior. Take phase 4 in Table 1 for example. The display is made up of sun, lake, Chinese character " 我 的 中 国 梦 " colored white and and English title "my Chinese dream" colored red beneath it. The depiction is the girl's facing to where the sun is rising together with her utterance "I want to go to Beijing when I grow up, to visit the Tian'anmen Square", which is exactly what the girl dreams of. The vector (the girl's gazing when she utters her dream) points at the sun, which is on behave of hope and dream. This is how representational meaning is embodied with the combination of the semiotic resources available and to construct the communication meaning.
Generally speaking, it doesn't need much cognition to understand the representational meaning of image, readers can extract meaning relying on their subjective perception. This kind of representational meaning is a preliminary acquaintance of the readers, which characterizes as non-systematic and subjective. In the meantime, this characteristic narrows down the interpersonal distance between the audience and video discourse, which lays an emotional foundation of interaction between audience and the film.

Interactive Meaning
Interactive meaning is reflected in three dimensions: contact, social distance and attitude. In the film, contact refers to the reaction or visual information that the audience received through the eye contact between the participants of image and interactor of image. This kind of contact is also divided into "offer" and "demand", the difference of which depends on whether there is a direct contact between the participants and audience. The key frame of phase 1, cluster 2 in Table 3 serves as a "demand" contact well. In this phase, the character-Xie Yuanli, a welding machine manipulator, is working on some welding jobs. The spark ejects in all directions, and Xie's face can be barely seen. What we can decode from this phase is that Xie is devoting all his attention into the melding jobs, which conveys to the audience such visual message that workers like Xie are working cautiously and conscientiously, dreaming of devoting their every effort into producing "high-speed train featuring faster, more stable and safer so as to connect the world and benefit mankind". In turn, this visual message provides an interactive meaning to the audience in the meaning-making process.
Social distance lies with the way the image is framed. Generally speaking, a close-up stands for close social distance, while a long shot signifies long social distance. Leeuwen [3] discussed four different partitions on framing and distances (see Table 5). According to the table, the key frame of phase 5, cluster 1 gives the audience an interaction of geniality and vivacious experience for it featuring the girl's head and shoulder, which presents the girl as a lively figure. We can also see a close social distance in phase 2, cluster 4 (see Table 3). In this phase, Ji Kuisheng is standing closely behind the staffs, smiling. This kind of visual information gives such interactive meaning to the audience that Mr. Ji, the boss of the company, gets along well with his employees. By doing so, the film managed to shorten the distance between the audience and the performer, thus succeeded in conveying the interactive meaning in the meaning-making process. This technique is applied in every character in each cluster, especially in the introduction part of each character, which means interactive meaning runs through the whole film. Attitude is decided by what angle the image is shot. In Grammar of Visual Design, the angle of view reflects the subjective and objective attitude of the participants of image. The subjective attitude is concerned with a person while objective attitude comes to an object. In horizontal dimension, the front side can be decoded as one is deeply affected as if one had experienced it oneself. In vertical dimension, looking from the low angle, horizontal or overlook angle conveys different meaning. In the film under analysis, most of the shot adopts horizontal angle, which signals the audience that even those great contributors are as average people as we are. However, when it comes to the image of a guard of honor, the picture was shot from a low angle, so as to create a towing figure in front of the audience. The guard of honor, generally speaking, is on behave of the image of a country, which is a well platform to construct national image.

Compositional Meaning
Compositional meaning can be reflected by value of message, salience and scale division. Each element coming out in the film can be understood as a semiotic resource, every resource concerning closely to the meaning construction is perfectly distributed to the film, including the line, subtitle, and so on. As for the salience, the film features the seven clusters, rendering them more prominent between the foreground and background elements. The scale division means whether these semiotic resources distributed to the picture is in distinct or vague segmentation. In compositional meaning, there are two kinds of distributions. The first one focuses on character portrayal. In this case, the figure itself is the most important element, while the other elements serve as a subordination or foil to the figure. In order to highlight character, the figure is usually arranged at the middle of the picture, with background elements blurring out. Thus, the audience will grasp the major information in no time. For example, phase 3 in Table 2, phase 1 and phase 3 in Table 3 is very persuasive in character portrayal. Another distribution lays stress on the surroundings of the figure. In this case, the background is no longer indistinct, but majestic, unified and lightful. For example, phase 2 in Table 3, where Xie Yuanli is walking between two lines of train, shows us a rigorous attitude in train manufacturing. By comparison to the representational meaning and interactive meaning, although compositional meaning didn't convey directly the communicative meaning of multimodal discourse, it is absolutely necessarily in decoding the representational meaning and interactive meaning to access the whole meaning of a multimodal discourse.

Media System
The analysis in the previous part is mostly concerning color and image; however, there are other important modalities involved, such as voice, words, sound, tone, body movement, and facial expression, etc. These modalities, according to Zhang Delu [10], are in the range of language system and non-language system.

Language System
According to Zhang and the presentation of the film, the voice of the characters and other sounds from the surroundings can be classified as pure language sub-system. The voice comes as off-screen voice or live voice record. The speed and intonation of the voice comes out slowly just like a storytelling that absorbs the audience's attention. Another sub-system is accompanying language, which includes background music and other sound such as the melding sound (phase 1, cluster 2 in Table 3) and the sound of a train running through (phase 5, cluster 2 in Table 3). It is worthy to be mentioned that background music is arranged throughout the film, and the relationship between background music and voice and sound is a mutual supplement and strengthening. In other words, background music can be heard clearly, when there comes the voice, the background music will recede so that it highlights the meaning of the voice. The background music gradually enhances as the film comes to the climax. Take the cluster of Xi Jinping's speech for example. Xi's final speech ends up as a climax of the publicity film. "...实现人民 对美好生活的向往,继续奋斗！". When the words come to "继续奋斗", Xi spoke louder and the background music almost ends at the same time. What can be concluded from the above analysis is that in language system, voice (volume and tone) and background music are juxtapositional or supplementary relations.

Non-language System
As to non-language system, the study only focuses on body-language especially body movement and facial expression. The body movement presents in the film as the following form: a girl stretching hands and rotating (phase 2, Table 2), a man's walking (phase 2, Table 3) or escort's parade step, and so on. Most of these movements are edited and rearranged in a slower pace, which decorates the whole discourse with ornamental value. Facial expression is a crucial element in the meaning conveying. The theme of the film features "China steps into a new era" realized by the achievement of the seven ordinary people's "Chinese dream". Therefore, when it comes to the narration of their "Chinese dream", they crack a smile on their face.

Conclusion
The study has probed into the National Publicity Film of China-China Steps Into a New era with a creative method. The findings reveal that the short film concentrates on "My Chinese dream", and makes use of abundant modalities in meaning-making process which are mainly divided into two genres: visual sense and auditory sense. As for visual sense, there are light, images, color, body movement, and facial expression. These modalities applied in the film create different meanings. For example, the sunlight stands for hope or dream, wielding spark for rigorous working attitude, smiling for the long for a better life. As for auditory sense, there are voice, sounds and background music. These auditory modalities and images are mutually supplementary so that high value message can be highlighted.