Visual and Verbal Means to Attract our Clicks: Multimodality in YouTube Thumbnails and means to our in

Visual verbal ABSTRACT Pictures, color, tones, and motions have all been identified as modalities that help to create the meaning-making process. A multimodal message is made up as two or more modes work together to give meaning for the overall discourse. This article is to describe how the visual and verbal signs work together in constructing meaning in video thumbnails. This study is used as a descriptive research method. The data are thumbnails of the most-viewed videos in Close the Door podcast. They were analyzed by employing Kress& van Leeuwen's Visual Grammar and Halliday’s Functional Grammar of language, especially ideational meaning of clause, transitivity. These are to explain the relationships between the images and the texts and elucidates the functions of images in meaning interpretation. Based on the analysis, all the thumbnails evoke a clear message video; they have no ambiguity and are in line with content of the video. Amongst 4 data, there are four types used; relational, mental, material, verbal with descriptive sentences by three data and one imperative. In terms of MDA Visual, the videos have Lead, Display and Emblems as the relationships between the images and the functions of images in meaning interpretation.

Usually, multimodal text aims to achieve functions of communication via texts with verbal and semiotic modes. According to Leuwen [10], multimodality deals with communication and representation of thoughts in other modes. It needs to be able to make the audience understand about the idea or thoughts without getting some thorough explanation. Kress and Leeuwen [9] investigate the detail of the concept of multimodal perspective to get the underlying meaning. It deals with movement, image, sound, speech and music. These are all the bases of interpersonal meaning, which are the concept substituted and delivered by the meaning resources. The most investigated aspects are visual and verbal texts [9]. Multimodality is the need to deal with ideational functions that express the speaker's thoughts using language interpersonal functions that use not only text but also visuals. both have a role relationship for in the unitary text [2]. Based on the theory, multimodality can be explained through all its various objects and their relation to represent the intended meaning as an intact unit.
There have been several studies dealing with this topic. Using the same idea, the study from Ahour and Zaferani [1] focuses on images used on ELT Books. They used the visual grammar and interpreted from the gender representation. Both female and male are represented in the book. Pratiwy and Wulan [17] analyse ads by applying multimodal approach, combining some theories to result in a quite complete analysis. Similar one is a journal Rahayu, Lukmana, and Riesky [18] as this study deals with how advertisement can be a way of doing a communication through both spoken or written language and also visual grammar and transitivity. A study from a journal by Yusuff [22]. This study deals with advertisements that has persuasive techniques using visual images and the functional implication of language for persuasion. The data are selected printed advertisements on consumables in Nigeria. Mahmudah [15] deals with helath campaign of Covid-19 infection in a strip comic on Instagram. It deals with narration and textual with its visual grammar.
The gap between related studies and my analysis is that although the theories are with multimodality visual by Kress & van Leuwen's and Halliday's SFG text, I am analyzing thumbnail of YouTube Videos as the data. Not only does it not be analyzed much, but also it is interesting because YouTube is one of the most used social media with billions of audiences. People can create video and get money from it. The present study makes use of video thumbnails as the object of study to see its function to market internet content in social media. As a preview of the video, thumbnails are designed to persuade the netizen to click and view into a certain length, which in turn to Volume 04, Number 01, May 2022 p. 54-62 draw the benefits of monetizing the public view into some remarkable royalties. Therefore, making a good and catchy advertisement can make them want to watch the video.
The importance of analyzing this study is that YouTube has become such an impact to our generation. Everybody watches YouTube and through YouTube we can all also learn new things. YouTube has become an important aspect in life in terms of entertaining and/or to convey thoughts to the audience. This is the reason why they create the video.
This study deals with the representation and investigation of how visual pictures and textual aspect can function in discourse meaning production to engage the audience. It also focuses on how visual images converge with verbal texts in the multimodal discourse to reflect social reality and culture. Therefore, using Halliday's Systemic-Functional Grammar [6] and Kress and van Leeuwen's Visual Grammar [9] as the analytical tools, this article is going to explore the use of visual signs and verbal signs in video thumbnails.
However, the term multimodal arises when analyzing a presence of a multimodal text. Multimodal provides the tools and technique to analyze the texts that can be seen from more than one mode of discourse. Kress and van Leeuwen [9] reveal that multimodality is used to refer to the way people communicate using different modes simultaneously. There is a medium of communication which accommodates how person's thoughts to convey the aspirations in the general public. Kress and van Leeuwen [9] followed that the modality markers to guide the truth or factuality of the message/meaning. Thus, it is interpreted that multimodality is how a person when communicating allows using more than one mode, not only visual but also verbal [9].
Kress and Van Leeuwen have been using Halliday's systemic-functional grammar to examine visual pictures in recent years. Halliday's 3 metafunctions are regarded as an important tool for assessing any human communication network. Though three metafunctions are initially utilized to examine language, they are not confined to language signs. Kress and van Leeuwen, for example, broaden their discourse research to incorporate visual images. "The visual, like all semiotic modes, has to serve various communicational (and representational) requirements in order to function as a comprehensive system of communication," Kress & van Leeuwen [9]. They create Visual Grammar as a tool for further research. Kress and Van Leeuwen [9]'s Reading Images: The Grammar of Visual Design (2006) gives Visual Grammatical, a clearly multimodal approach to visual communication that delivers a precise and systematic exposition of visual design grammar. They identified representative meaning, interactive meaning, and compositional meaning in agreement with Halliday's conceptual, interpersonal, and textual metafunctions.
According to Halliday's Systemic-Functional Theory, language is considered as a social semiotic.
He created systemic-functional grammar in the 1960s as part of a broader social semiotic approach to language that sees language as a socially based semiotic system. He believes that language has three metafunctions. In summary, the three metafunctions are principles that semiotic resources provide as a technique for both constructing ideational meaning and fulfilling social interactions. Hence, according to Halliday[4] that these ideas are then used to construct textual functions related to representation, and the process of experience achieved through the transitivity system is related to the characteristics of the text field. People exchange their sentiments, attitudes, and judgments through mood and modality when speaking with others, according to interpersonal metafunction. The tone and interactivity of a text are important considerations. Textual metafunction, that is concerned with mode, focuses on how to construct a cohesive text through theme organization and information structure and "building continuity in time and space" [4]. To put it another way, the three metafunctions established the groundwork for confirming the functionality of semiotic resources. The main linguistic elements which reveal and give the essence of the meaning of the metafunction. Therefore, linguistic is important for a multimodal text and visual metafunction analysis.
The structure in general linguistic terms relates to the internal process of compiling or forming language units. In the transitivity systems of Halliday [5], there are taxis, modes and themes which are realized in the relation of ideational, textual and interpersonal function systems. These ideational functions consist of experiential and logical functions which are realized by the transitivity clause system. The logical functions realized in a complex clause system which means the taxis system. The textual function is realized with the rhyme system and the interpersonal function is realized with the mode system. When analyzing, each of the function and carries meaning which are ideational function or meaning, interpersonal function or meaning and textual function or meaning. The ideational function is divided into two functions which are experiential and logical. The experiential function occurs at the clause level as a representation of human experiences in external reality and internal reality of humans. One function of the clause is as a representation of experience from two realities such as the reality from the outside and from within a person.
The transitivity clause as a grammatical unit has three components: process, participant, and circumstance. Meanwhile, the process is broadly classified into material, mental, relational, behavioral, verbal, and existential [6].

II. METHODOLOGY
This research adopted a descriptive qualitative method. According to Ospina [16], descriptive method could be simply categorized as empirical research, meaning it uses a natural data, with no modification from the researcher. This is in line with Dornyei [2], who views it from the nature of its data collection and analysis methods; it collects the data through open-ended, non-numerical, and nonstatistical method, and conducted using a holistic, rich, and nuanced careful analysis. Meanwhile descriptive methods, according to Sugiyono [19], means "describing what happened at that time." It deals with how to know and understand the whole situation when answering the answer to the research topic.
This research belongs to a literature study. Kartono (1998, as cited in Syahputra, 2017 [20] states that literature study is done by reading materials of the subject, then sorted them, and analyzed critically. It is also explained as a series of activities related to the methods of collecting library data, reading and taking notes and processing research materials [19]. In this case, the materials are in the forms of written and visual data which will be viewed from two major theories. This study only dealt with "texts" as the raw materials.
The data in the study are 4 thumbnails of the videos taken from #ClosetheDoor Podcast in its YouTube channel. They were chosen based on the numbers of the view; they were top four videos with highest view, from 13-23-million views. These thumbnails as a single text contain two major elements: image and text. It means they are multimodal text, with two kinds of semiotic systems. Hence, two different theories were applied; transitivity concepts of SFG [5] to discern the verbal constituents, and visual grammar [9] to uncover the visual ones. This way is to describe how these two signs complement each other to construct a single meaning/message as means to attract the people.

III. RESULT AND DISCUSSION
The analysis focuses on two aspects that would be the indicators of analyzing videos of Deddy Corbuzier's #ClosetheDoor Podcast. First, it deals with the visual elements of overall thumbnails of the data as well as the chosen font and its style. The second is the linguistics aspect which deals with the title's sentence structure as well as what is being stated in a video in a whole and why it makes the video interesting to watch. The title of the video is "Goodbye Laura… Netizen, Dengar Saya Kali Ini," ("Goodbye Laura…. Netizen, listen to me this time.") This video is a solo podcast of Deddy Corbuzier after the news of a famous celebgram, Laura Anna's death after her 2-years of injury, spinal cord injury. This video was made just after Laura's came to Corbuzier's #ClosetheDoor Podcast to share her story fighting with this injury after the tragedy or car accident. However, as this video was not supposed to be released, it was released because soon after, Laura Anna passed away on December 15 th , 2021. The thumbnail shows the host's expression mourning the picture of Laura Anna. 13M of people will concern about clicking this video knowing that this video was posted just after the death of Laura Anna.
This video has so much information that can make the audience aware of it. As we know, the host rarely creates content based on 'kekinian' or clickbaited video just for the sake of views by inviting and/or creating a video based on what's in at the moment. Yet, he creates educational video for his viewers to get a life-lesson through it. Hence, not only does this video get a lot of viewers, but it also matches with the educational matter and the point of the chat in the podcast which are going to be stated through the next paragraph.
The use of imperative as in the sentence "Netizen, dengar saya kali ini," creates some order or command to follow. According to Halliday's [6] SFG, the sentence can be divided into; The sentence creates some abruption or 'urgent' as to make the audience feel pressured when not doing what it is that's being told by the imperative sentence. When someone is reading an imperative sentence, it will sound as if the speaker is bossing them around; this creates the audience to have no room for doubting or questioning although this sentence sounds like using a polite tone [3].
This video contains a rhetoric remark expressed in a sentence which stands for the specificity of the genre through introducing the aspect of information which is message of Laura Anna's tragedy as well as the summary of what the video is all about. The sentence is to the point as to make the audience not to have the second-guessing or ambiguity before clicking the title. Therefore, without having ambiguity and made with a straight-to-the-point message, the headline of the video anchors the viewers to directly get the idea of the background and the main topic. This is in accordance to Kress and van Leuween [9] that a coherence text or sentence implies the cohesion that links between internal and external elements to spread the real intentions of the writer or speaker. Thus, it is important to raise audience knowledge and imagination to raise curiosity before they click onto the video.
The video has another 'thumbnail' worthy title which is 'Selamat Jalan Laura, Tuk Laura, Tuk Kita,' as the cover of the video. Again, as stated by Kress and van Leuween [9], this sentence on the cover is put in a coherence sentence which links between the message from Deddy as to spread the real intentions to the audience.
Not only does the title is put together to create an effective imperative sentence, the video itself is a straight-forward one. He opens the video with a video and a voice-note from Laura Anna before her passing. Throughout the video, he speaks in an efficient mood to make the audience understand what's happening to her and learn the lesson. In conclusion, the thumbnail color-pattern, font and font-style mixed with the sentence structure as the title of the video makes this video a worthy of many viewers.
This video is a podcast of Deddy Corbuzier with a famous celebgram, Laura Anna. She has gone through her 2-years of injury, spinal cord injury. This video is made to share her story fighting with this injury after the tragedy or car accident and how she feels towards the one who caused this tragedy, her ex-boyfriend. Through the picture, the host holds Laura Anna as a way of helping her to sit. 32 of people will concern about clicking this video knowing that this video was posted just after Laura Anna's lawsuit to Gaga Muhammad.
Through the video, Laura deals with how she finally got into this injury and what actually happened back then. Therefore, she brings so much information that can make the audience aware of it. Again, as mentioned earlier, he rarely creates content based on 'kekinian' or clickbaited video just for the sake of views. Inviting Laura Anna is based on what's in at the moment and it is Laura's will to come. Hence, this video is created to become an educational video for his viewers to get a life-lesson through it and it matches with the educational matter and the point of the chat in the podcast about Laura's condition.

The title of the video is "Saya Di Hancurkan Dia
Fisik dan Mental!!.' Laura Edelenyi," ("I was broken by him physically and mentally", Laura Edelenyi). The use of declarative sentence in this title is to declare the main idea of what is happening to Laura. According to Halliday's [6] SFG, 'was broken by,' is a passive material process, and the sentence can be divided into; Using the process of passive voice to emphasize the effect on the victim after getting broken by 'dia', which here means 'Gaga Muhammad', Laura Anna's ex. Using a declarative sentence in this title will create a sentence that makes the audience focus on the emphasize of the effect that Laura Anna has got.
As mentioned earlier in the first data, this video also contains a rhetoric remark which is expressed in the sentence when introducing the aspect of information as well as the summary of what the video is all about. The title is made with a point as to make the audience not to have the second-guessing when clicking the title. There are no ambiguity and made with a straight-to-the-point message which will anchor the viewers to directly get the idea of what happened to Laura Anna as the background and the main topic of this video. According to Kress and van Leuween [9], there are three aspects in this video, namely: -Lead: the main focus of this thumbnail is the text -Emblem: the identity of this video is the room of #ClosetheDoor Podcast and the fonts. -Display: Deddy holding Laura Deddy Corbuzier also has a 'thumbnail' worthy title which is 'Saya di Cacatkan,' which means 'I am disabled by,' as the cover of the video. Van Dijk [21] theory believes that when the actor or the subject who has done the action is put later after the object, it means that the writer wants to emphasize to the object; in this case Laura as victim, as to minimize the bad effect on Laura and gives a negative effect on Gaga Muhammad's face.
He and Laura Anna talks about the condition in an efficient mood to make the audience understand what's happening. Therefore, with the thumbnail color-pattern, font and font-style mixed with the sentence structure makes this video a worthy of many viewers. This video is a podcast of him with a governor Prabowo Subianto. Through the video, Deddy and Prabowo Subianto brings so much information that can make the audience understand about the situation happening at that moment. Through the picture, Prabowo Subianto is showing an expression of 'explaining' as he puts his fingers up there, and Deddy Corbuzier seems to enjoy the talk as he smiles. 17M of people will concern about clicking this video as this video was posted because this video was the first complete and the exclusive one of Corbuzier with Prabowo. It will increase the likelihood of the audience clicking the video because it will make them think that Prabowo Subianto only comes to talks about the matter with the host, Corbuzier.
The title of the video is "Habis semua!! Prabowo Perdana Bicara!! Exclusive," which can be translated in English as: "All finished!! Prabowo firstly talk!!" According to Halliday's [6] SFG, the sentence "All finished!! An exclusive talk of Prabowo" can be divided into; As mentioned above, this video is also using declarative sentence in this title as to declare the main idea of the video. The chosen word, 'exclusive,' which according to Macmillan Dictionary [14] is "published or reportedly by only one newspaper, magazine, television station etc," means that this video is the first complete or the exclusion of him with Prabowo. Mixing the declarative sentence in this title and the word 'exclusive' will increase the likelihood of the audience clicking the video because it will make them think that Prabowo Subianto only comes to talks about the matter with the host, Corbuzier.

Akhirnya saya bicara
According to Kress and van Leuween [9], there are three aspects in this video, namely: -Lead: the main focus of this thumbnail is the picture of Prabowo and Deddy Corbuzier -Emblem: the identity of this video is the room of #ClosetheDoor Podcast and the fonts -Display: it shows the characteristics of the lead which is the text in the thumbnail.
Deddy Corbuzier also has a 'thumbnail' worthy title which is 'Akhirnya Saya Bicara,' which means 'Finally I speak up,' as the cover of the video. According to Van Dijk [21] theory believes that putting the actor as the subject means that the writer wants to emphasize to what the subject is doing; in this case Prabowo Subianto. Therefore, according to the visual of color-pattern, font and font-style mixed with the sentence structure, this video is worthy of many viewers.

Picture 4. Video Deddy & Mongol's Thumbnail
This video is a podcast of him with Mongol brings so much information about the Satanism that creates a new information for the audience to understand about the situation happening at that moment. 21M of people will most likely to get goosebumps when reading the thumbnail. This is making the audience wants to click on the video after reading the word to know what the story is all about.
According to Kress and van Leuween [9], there are three aspects in this video, namely: -Lead: the main focus of this thumbnail is the text as it is covering the picture. -Emblem: the identity of this video is the room of #ClosetheDoor Podcast and the fonts -Display: it shows the characteristics of the lead which is Mongol and Deddy Corbuzier The title of the video is "Merinding Gue Denger Ini, Gokil‼ Serem Abis‼," which can be translated in English as: "I get goosebumps hearing this, Cool!!" This video is also declarative sentence in this title as to declare the main idea of the video to emphasize the main idea or topic.

Ritual seks gereja setan, ini sih serem
-Ritual seks gereja setan. Choosing the word Goosebumps which means "a state of the skin caused by cold, fear, or excitement," creates the sense of what this video is all about making the audience wants to click on the video after reading the word. As mentioned above, this video is also using declarative sentence which brings the idea to declare the main idea of the video. Using the declarative sentence in this title will be easier for the audience to understand about what it means and what the story is all about. Therefore, it also will increase the likelihood of the audience clicking the video. Again, the subject 'I' or 'gue' is according to Van Dijk [21] that believes that putting the actor as the subject means that it helps to emphasize what he is doing; in this case the host. Therefore, according to this title, the visual of color-pattern, font and fontstyle that has been mixed with the sentence structure which creates the sense of goosebumps of the video that people like most and it is then a worthy of many viewers.

IV. CONCLUSION
Therefore, this analysis believes that when making YouTube thumbnails, it is very important in determining aspects that can attract people's attention to watch the video. The purpose of thumbnails, of course, is to make the video cover of a video attractive to viewers. Not only that, the thumbnail also functions as a title and with thumbnails, it's definitely a slight change, especially from the number of viewers and subscribers. From the data taken in this study, namely 4 videos from Deddy Corbuzier's Podcast, #ClosetheDoor on Youtube, Thumbnails managed to make these videos get a lot of engagement with the audience's interest before clicking on them. of course, in line with the news that was the topic of public discussion at that time.
Based on the analysis, amongst 4 data, according to Halliday's SFG theory, there are four types used, such as; relational, mental, material, verbal. The titles are using mostly descriptive by three data and one imperative. All of the visual of color-pattern, font and font-style are mixed in with sentence structure which creates the sense of the video that people like and it is worthy of many viewers. Based on the Kress and van Leuwen's theory, there are all aspects of Lead, Display and Emblems as the relationships between the images and the functions of images in meaning interpretation.