Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Analyzing Narratives in Social Networks: Taking Turing to the Arts
Analyzing Narratives in Social Networks: Taking Turing to the Arts
Analyzing Narratives in Social Networks: Taking Turing to the Arts
Ebook884 pages6 hours

Analyzing Narratives in Social Networks: Taking Turing to the Arts

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book uses literature as a wrench to pry open social networks and to ask different questions than have been asked about social networks previously. The book emphasizes the story-telling aspect of social networks, as well as the connection between narrative and social networks by incorporating narrative, dynamic networks, and time. Thus, it constructs a bridge between literature, digital humanities, and social networks. This book is a pioneering work that attempts to express social and philosophic constructs in mathematical terms.

The material used to test the algorithms is texts intended for performance, such as plays, film scripts, and radio plays; mathematical representations of the texts, or “literature networks”, are then used to analyze the social networks found in the respective texts. By using literature networks and their accompanying narratives, along with their supporting analyses, this book allows for a novel approach to social network analysis.

LanguageEnglish
PublisherSpringer
Release dateAug 28, 2021
ISBN9783030682996
Analyzing Narratives in Social Networks: Taking Turing to the Arts

Related to Analyzing Narratives in Social Networks

Related ebooks

Science & Mathematics For You

View More

Related articles

Reviews for Analyzing Narratives in Social Networks

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Analyzing Narratives in Social Networks - Zvi Lotker

    © Springer Nature Switzerland AG 2021

    Z. LotkerAnalyzing Narratives in Social Networkshttps://doi.org/10.1007/978-3-030-68299-6_1

    1. Overview of the Book

    Zvi Lotker¹  

    (1)

    Faculty of Engineering, Bar-Ilan University, Ramat-Gan, Israel

    Zvi Lotker

    Email: zvi.lotker@gmail.com

    1.1 Introduction

    Figures such as the Golem, Frankenstein, and HAL 9000 have fascinated the public for centuries. As robots and intelligent machines have advanced, they have invaded our private spaces. Thus, the need for intelligent machines to understand our personal and cultural narratives is more imperative than ever. Utilizing the structure of the text instead of the textual content, we present simple algorithms, most of which are accessible to a wide range of readers in order to reveal the rhythm and structure of dialogues in films, theater, television, and radio scripts. Most of these algorithms can be implemented using spreadsheets, such as Microsoft Excel. We open an epistemological window to shed light on the dialogue between man and machine using tools from digital humanities and social networks.

    The fields of digital humanities, social networks, artificial intelligence (AI), and complex systems are central to modern culture and technology. These fields permeate our lives in ways ranging from issues of privacy to our quality of life. By integrating ideas from classical fields, such as physics and mathematics, these newer fields have developed rapidly. The literature and literary analysis, twinned classical fields with vast and highly influential knowledge bases, have not been previously integrated into these younger, emerging fields. Our intention is to pave a new path in the incorporation of the literature into complex systems and social networks.

    To test the soundness of the algorithms presented, we use dramas. The term drama is used as a catch-all phrase for any performance-based text, including plays, film scripts, radio plays, and other performative media. This classification comes from the Ancient Greeks, who defined dramas as any text in which the author is hidden from the audience, and the characters speak for themselves [66].

    In great dramas, characters are independent entities who represent their own psychological states of mind, and provide paradigms for society to discuss and teach emotions such as love, hate, and envy. Canonical texts, including dramas, serve as educational tools and mental maps for society to navigate these complex feelings.

    As machines become more perceptive, they will edge their way into the dimension of emotions. We can begin inviting machines to participate in the discussion of complex emotions and concepts by transforming texts into mathematical objects. To accomplish this, we use mathematical representations (or literature networks) to represent dialogues in dramas in a way machines can understand. Once we have a representation of the text, it is possible to analyze dramas using mathematical tools. This is equivalent to the invention of the pixel, which allowed machines to see and understand images. Typically, in literature networks, characters in dramas are represented by nodes, and edges represent social connections between these individuals or characters.

    ../images/490817_1_En_1_Chapter/490817_1_En_1_Fig1_HTML.png

    Fig. 1.1

    Can machines read?

    1.2 Brief Notes on Notation and Definitions

    We assume an elementary knowledge of several disciplines such as graph theory, linear algebra, and probability (for a rough overview of these topics, see Appendices B and C).

    In this book, in general, we use graphs and networks , which we denote by G(VE). Note that the set that contains all social networks is equal to $$\mathbb {M}_{n,n}$$

    $$\begin{aligned} G\in \mathbb {M}_{n,n}. \end{aligned}$$

    (1.1)

    We assume that the nodes

    $$V=(v_1,v_2,...v_n)$$

    are sorted alphabetically, and that this order corresponds with the adjacency matrix  or weighted adjacency matrix  A. This means that the first row of A corresponds to first node $$v_{1}$$ , and the last row of A corresponds to $$v_{n}$$ . Therefore, the number of nodes in G is n.

    Although graph notation is standard when discussing social networks, since we move frequently between matrices to social networks, here we use matrix notation when speaking about social networks in general. This means there is no gatekeeper between matrices and social networks, and it is possible to move freely between these two different objects. Additionally, note that when we discuss social networks, we are considering positive matrices.

    Matrices, networks, and graphs share many common features. All three describe relationships between pairs of objects. Each of these complex objects emphasizes specific properties of the relationships between pairs. However, it is at times useful to discuss these complex objects in general. Therefore, in this book, we use the notion of complex objects/complex object to refer to any of these three in general. For further definitions of these three complex objects, see Appendix B.

    1.3 The Role of Interpretation in Literature and Math

    Many argue that dramas behave like mathematical formulas, some books going as far as providing formulas for writing good scripts (see [152, 201] for examples). In these formulas, the events and the characters are variables, and the formulas are the connections between characters and events, which guide the reader. This book takes this claim one step further, by writing these formulas in the rich and native language of formulas themselves: mathematics and algorithms. Once the variables are identified, and the interpretation process begins, the main question, thus, is whether or not the different interpretations are meaningful and valid.

    This raises the following issue. When translating the literature for use in mathematics, a common trap is the dictatorship of truth present in mathematics. While this is one of the primary attractions of mathematics, objective truth is undesirable in the literature. In fact, if a text has only one interpretation, it is more akin to a shopping list or a program than to Romeo and Juliet; it will never be considered great literature, rich with thick veins of conflict. Thus, there must always be space for multiple interpretations of a text, even when transforming the text into a mathematical object. This book provides methods to avoid this trap.

    Note that in mathematics as well, a polyphony of interpretations makes a theory stand out, since it can then be applied to many different situations. For example, consider linear algebra. This is a branch of mathematics which explores arranging numbers in a table. It turns out that this order is very useful and suits essentially all scientific disciplines.

    While the above paragraph may seem contradictory to the classification of math as an exact science, the trick is that the proofs of a theorem have a single interpretation according to the definition of the object in the theorem. However, the definitions typically have multiple interpretations. For example, a matrix is a table of numbers. However, the numbers themselves enjoy the freedom of interpretation, i.e., they can be apples, oranges, people, etc.

    Conflict is equivalent to multiple interpretations. When we have more than one interpretation, there is inherently a conflict between the two (or more). Conversely, if we have a conflict, this means that we have at least two interpretations. Conflict serves as a motor in a drama or a narrative, and propels the plot along through time. The way in which humans perceive and organize time is through narrative; think of the phrase story of your life. Therefore, to understand time in social networks, narrative is an essential tool (as defined as a Mini-Turing Test in [176]). Time serves as an ambassador between social networks and the literature, bringing trade and cooperation between the fields of digital humanities, computer science, artificial intelligence, literature, and literary theory. This opening of borders allows both machines and humans to participate in dialogues about the literature.

    1.4 Social Clocks

    Time and narrative are interwoven concepts. Since time is conceptually slippery, we concentrate on clocks instead of time. Clocks are concrete objects and are more straightforward. We explore the meaning and role of time by using clocks in social networks. In moving from time to clocks, we are able to set aside more philosophical questions of time, and instead primarily examine psychological and social aspects of time from an engineering perspective. One drawback of modeling social networks is that the models are fixed and do not change over time. We develop and present natural mathematical algorithms that use the notion of clocks, which are incorporated into dynamic social networks.

    A primary goal of AI is the ability of machines to imitate humans. One of the key aspects of humanity is the ability to have differing perceptions of time differently. These disagreeing perceptions of time are highly contingent upon the level of interest, feeling, age, etc. These differences in perception generate varying causality relationships in personal narratives. In order for machines to experience similar varying, asynchronous perceptions of time, psychology must be integrated into AI.

    1.5 Psychology-Driven Algorithms

    When reading, watching, or listening to a drama unfold, we often focus on understanding the psychological states of mind of the characters in the piece. The information divulging the psychological states of the individual characters is part of the texts themselves. If we want machines to have the ability to understand dramas, machines must be able to, on some level, understand these psychological variables. Moreover, it turns out that algorithms which imitate human psychology can be simple and extremely effective in analyzing dramas, as will be demonstrated.

    1.6 Looking Ahead

    In recent years, there has been a dramatic improvement in the ability of machines to understand images through deep learning. Perhaps, the next goal is for machines to understand texts (see Fig. 1.1). There has been much written on this subject (see [109, 184] for some recent examples), concentrated on machine comprehension of language [176, 177].

    This book takes a different approach and defines the pixels of the text. Instead of concentrating on language, the primary concern is instead on the hidden structure of narratives in scripts. We claim that typically zero cognitive understanding of the language of a text is needed in order to reveal who the main characters of a drama are, when the important plot points occur in a drama, who the criminals are in Sherlock Holmes, etc.

    In summary, we follow humanity’s desire to explore, as embodied by Star Trek Enterprise’s credo To boldly go where no man has gone before. The question of what is narrative? is a central question in human thought, and has been grappled by many thinkers, such as Aristotle, Labov, Hemingway, Derrida, Reinhart, and numerous others. This book contributes to this tradition, and reveals a mechanical and mathematical method to enlighten our understanding of the source and structure of narrative itself.

    Part IStatic Literature Networks

    © Springer Nature Switzerland AG 2021

    Z. LotkerAnalyzing Narratives in Social Networkshttps://doi.org/10.1007/978-3-030-68299-6_2

    2. Graphs in Dramas

    Zvi Lotker¹  

    (1)

    Faculty of Engineering, Bar-Ilan University, Ramat-Gan, Israel

    Zvi Lotker

    Email: zvi.lotker@gmail.com

    2.1 Introduction

    We now lay the foundation to model narrative and conflict in social and literature networks. This chapter demonstrates how to mechanically construct graphs from scripts. In doing so, we invite machines to participate in the reading process.

    In order to make it possible for a computer to understand narrative, it is essential that a computer can recognize how conflict and plot manifest themselves in literature networks. Over the course of the next several chapters, we demonstrate how to partition a network into two or more parts along conflict lines derived from a narrative in order to model conflict and narrative in social and literature networks.

    To illustrate the impact narrative has on social networks, we ask the reader to refer to the left side of Fig. 2.2. Viewing the graph, mathematicians will call this a multigraph, and define a multigraph as

    Multigraph

    Definition 2.1

    A multigraph 

    $$G=(V,E,\mu _{G})$$

    is a graph or a directed graph where V is the set of nodes,

    $$E\subseteq V\times V$$

    is the set of edges, and

    $$\mu _{G}:E \rightarrow \mathbb {N}$$

    is a weighted function which permits multiple edges (also called parallel edges).

    We use

    $$e=(v,u)=v\rightarrow u$$

    to denote the directed edge that connected the node v to u. The head of the edge e is u and the tail of the edge e is v. Formally

    $$\begin{aligned} v \rightarrow u= (v,u). \end{aligned}$$

    (2.1)

    In the same way, we denote the undirected edge that connects the node v to the node u by

    $$e=v \circ \!\!\!-\!\!\!\circ u$$

    formally as

    $$\begin{aligned} v \circ \!\!\!-\!\!\!\circ u = \{(v,u),(u,v) \}. \end{aligned}$$

    (2.2)

    The degree $$d_{v}$$ of a node $$v\in V$$ is one of the most straightforward and powerful notions in graph theory formally.

    Degree

    Definition 2.2

    A degree of a node $$v\in V$$  is¹

    $$\begin{aligned} d_v=\sum _{(v,u)\in E} \mu _{G}((v,u)). \end{aligned}$$

    (2.3)

    The mathematician will describe the multigraph from Fig. 2.2 as having five nodes and 67 edges (Fig. 2.1).

    ../images/490817_1_En_2_Chapter/490817_1_En_2_Fig1_HTML.png

    Fig. 2.1

    Do machines dream of reading?

    ../images/490817_1_En_2_Chapter/490817_1_En_2_Fig2_HTML.png

    Fig. 2.2

    This figure shows a graph without narrative

    Another mathematician may respond to the figure by saying that this is a weighted graph, and then define a weighted graph to be

    Weighted Graph

    Definition 2.3

    A graph (or a directed graph)

    $$G=(V,E,W)$$

    is called a weighted graph (or weighted directed graph) when V is the set of nodes,

    $$E\subseteq V\times V$$

    is the set of edges, and

    $$W:E \rightarrow \mathbb {R}$$

    is a weighted function.

    This mathematician, always concerned with notation, will explain to us that we denote

    $$V(G):=V$$

    to be the set of nodes of G,

    $$E(G):=E$$

    to be the edges of graph G, and

    $$W(G):=W$$

    to be the weighted adjacency matrix of the graph G. Note that when we give a vector u which has the dimension

    $$dim(u)=n^2$$

    , we use the notation

    $$\begin{aligned}{}[u]=[u_{i,j}] \end{aligned}$$

    (2.4)

    to transform the vector

    $$u=(u_1,....,v_{n^2})$$

    into a square matrix by

    $$\begin{aligned} u_{i,j}= u_{i(n-1) + j}. \end{aligned}$$

    (2.5)

    The two graph descriptions made by the two mathematicians above show a small difference in perception. The first mathematician describes the graph more discretely, while the second mathematician simplifies and moves from natural numbers to real numbers. Regardless, for the majority of us, the graph on the left side of Fig. 2.2 is not terribly exciting; there is no narrative, only mathematical definitions.

    We now move to the right side of Fig. 2.3. Suddenly, we see that the central node of the graph is Juliet, one of the main heroes of Western literature, fighting with her family, and desperately in love. Although the two graphs are, in fact, the same graph, the meaning has been enriched from the first to the second. The first is without narrative, while the second exists within our collective imagination. Revealing the source of the graph to be Act III, Scene 5 of William Shakespeare’s play Romeo and Juliet, and labeling the nodes as the respective characters in the play, we suddenly gain a complete understanding of the story behind the graph. We see the arc of the story emerging out of the graph. We see Romeo and Juliet declaring their love for one another, we see Juliet arguing with her parents, the tragedy that closes the play, and so on. The graph on the left is forgettable. The graph on the right engages us and will remain in our memory.

    2.1.1 Chapter Overview

    Section 2.2 begins by contextualizing the chapter, and points to material which is outside the scope of this book but is valuable and relevant. We then move to some necessary definitions in Sect. 2.3, such as those of scripts, frequency graphs, and metric space. Sections 2.4 and 2.5 demonstrate how to construct graphs which represent narratives from dramas. There are three different graphs presented in these sections: the WW graph, the AB graph, and the ABA graph. Next, Sect. 2.6 discusses how to transform the different representations of literature graphs. Finally, Sect. 2.7 describes applications of frequency graphs, leading into some concluding remarks in Sect. 2.8.

    2.2 Related Work

    2.2.1 Natural Language Processing (NLP)

    Both neuroscientists and literary theorists have worked to discover how humans understand and construct narratives [119, 198, 221]. In the context of AI and computer science, our understanding is still in its infancy. However, it is clear that computers require the use of standard data structures such as numbers, link lists, strings, graphs, matrices, images, functions, and so on (for more on data structures, see [61]).

    Thus, if we wish for computers to understand dramas, our task is to transform the information from the drama into a simple data structure which the computer can understand and manipulate. Moreover, we would like to accomplish this mechanically. While this process can be completed manually by humans, it is too time-consuming and expensive to be practical. The question of how machines can accomplish this task holds its own interest for the future of AI as well. This chapter presents a surprisingly simple and automatic method to create structures for information generated from dramas that computers can then work with. This automatic method can help to reveal hidden structures within dramas, such as time dilation (see Chap. 14) and network partitions (see Chaps. 3,4, and 9).

    Natural Language Processing (NLP) algorithms which identify the subject of a sentence already exist. An example of this is the Mathematica command TextStructure [217]. Additionally, there is a vast sea of the linguistic literature on syntax (see [52] for a pivotal work on this subject). However, when analyzing dialogues in dramas, the NLP algorithms should not look at individual sentences, but at the response or paragraph dimension, perhaps even complete scenes. Therefore, while NLP algorithms may produce the correct grammatical subject on the level of a sentence (see [15, 16]), when scaling to a response or paragraph dimension, the algorithms fall short, and cannot identify the semantic subject of the response or paragraph dimension.

    For example, while poison may be the grammatical subject of a sentence, the death of Romeo may be the semantic and philosophical subject of the response or paragraph. The poison is of secondary importance, while the tragic death of Romeo is of primary importance [222]. This can be seen in Fig. 2.4. The parts of speech in this sentence were computed by Mathematica. The drugs or poison are the grammatical subject in this sentence. Note that thy is a pronoun, and not a proper noun as shown in the figure. While drugs are the grammatical subject, these are Romeo’s last words, and the semantic subject is clearly his death. Without hesitating, we mentally substitute the drugs with his death. When describing this scene, we would note that these are the words with which Romeo takes his life, and not how Romeo talks about poison.

    While grammar and NLP algorithms may point to the same grammatical subject of a sentence, this may not be the most pertinent information in a dialogue when that dialogue is analyzed by humans. A thorough analysis of a response or paragraph may show that the grammatical subject of a sentence in Lord of the Rings is the One Ring. In fact, the One Ring is a metaphor for our desire. What we really care about is what the One Ring is doing to Gollum, Frodo, Bilbo, and what our desires do to ourselves. The semantic subject is not the One Ring, but, in fact, it is us. In the context of dramas, the audience must be drawn in and create connections to the text through personification. Any object with a will or desire becomes a character [222]. Following the tradition of the Enlightenment and Kant and Hegel, the semantic subject in the response or paragraph dimension is always a character, or a personified non-human [103, 186].

    Take, for example, Chekhov’s famous statement, One must never place a loaded rifle on the stage if it isn’t going to go off. It’s wrong to make promises you don’t mean to keep [98]. As noted, the rifle is not the semantic subject. In fact, Chekhov is presenting a formula which contains two variables: the character who uses the gun, and the character that the bullet is meant for. This could be the same character or several different characters. What is important in this scenario is not the presence of the gun, but the relationship between the variables/characters.

    ../images/490817_1_En_2_Chapter/490817_1_En_2_Fig3_HTML.png

    Fig. 2.3

    This figure represents Act III, Scene 5 of Romeo and Juliet, a graph with a narrative. Compare this to Fig. 2.2 that shows the same graph without narrative

    ../images/490817_1_En_2_Chapter/490817_1_En_2_Fig4_HTML.png

    Fig. 2.4

    This figure shows Romeo’s last words labeled with their respective parts of speech. This figure was generated by Mathematica

    2.2.2 Literature and Graphs

    The Who spoke to whom graph presented in this chapter is a popular model, though it is difficult to compute. Various papers, particularly in the field of digital humanities, have used this graph; for example, see [189, 204].

    When analyzing human communication, including dramas, there are many different factors at play. In this book, we touch on primarily who is speaking, and who is spoken to. From a social network perspective, it is not important who is speaking and who is spoken to, but what is important is that there is a relationship. Additionally, how do we model these relationships? The question of to whom a character is speaking can create further complexity. Consider Tartuffe, where Elmire speaks simultaneously to either Oragon or Tartuffe. Moreover, the meaning of her utterances holds completely different meanings to both Oragon and Tartuffe. In this chapter, we provide a simple data structure which shows the social relationships between characters in dramas. We provide several simple heuristics that approximate these graphs.

    2.2.3 Markov Chains and Metric Space

    In Sect. 2.6, we use some advanced mathematics such as Markov chains and metric space. Both of these topics could fill several books. In this chapter, we provide formal definitions for both Markov chains and metric space. For readers who need or want more background on those mathematical objects, we encourage them to refer to [170] for Markov chains, and [172] for metric space (for a general overview, Wikipedia articles on each are of value).

    We now move to some definitions, most importantly that of scripts.

    ../images/490817_1_En_2_Chapter/490817_1_En_2_Fig5_HTML.png

    Fig. 2.5

    This table shows Act II, Scene 1 of Romeo and Juliet. The columns indicate the act number, the scene number, the response number, who is being spoken about, the speaker, who is spoken to, and the text of the response itself

    2.3 Definitions

    In order to harness the information found in drama scripts, it is necessary to have a way to organize the information so that it is accessible to machines. This chapter presents a simple way to mathematically and graphically organize and present information from dialogues and scripts.

    2.3.1 Scripts

    According to the Oxford English Dictionary, a script is the written text of a play, film, television or radio programme, etc., typically including acting instructions, scene directions, and the like. [3]. An example scene from a script is presented in Table 2.5. The response number, i, the character who speaks, $$c_{i}$$ , and the response, $$r_{i}$$ , are listed in columns 3, 5, and 7, respectively.

    In this book, we take the minimum information needed to understand the narrative, i.e., who is speaking and what is being said. This means that all acting instructions and scene descriptions are ignored. We represent a script formally as

    Script

    Definition 2.4

    A script Sc consists of a sequence of pairs

    $$\begin{aligned} Sc = (sc_{1},..., sc_{\tau }). \end{aligned}$$

    (2.6)

    Hence,

    $$sc_{i} = (c_{i}, r_{i})$$

    , where i is the response index, $$c_{i}$$ is the character who speaks in the ith response, and $$r_{i}$$ is the text of the response itself. We denote the j-letter of the i-responses by $$r_{i,j}$$ . Let the number of characters in $$r_j$$ be $$|r_{j}|$$ .² The total number of responses in the script is denoted by $$\tau $$ .

    Graphs which are derived from the literature and represent the relationships between characters are called literature graphs or drama graphs. Literature graphs can have either the image of frequency graphs or metric graphs.

    We conclude this section with a simple exercise: Exercise 1.

    2.4 Graphs for Dramas

    We now define some elementaric mathematical objects which are useful in analyzing scripts as mathematical objects themselves.

    We assume that the reader is familiar with basic set theory. Some formal definitions can be found in Appendix B. For more on set theory, [133] is an excellent resource for readers with a mathematical background. Reference [105] is a more introductory text which provides a solid grounding in set theory.

    Totally Ordered Set

    Definition 2.5

    A totally ordered set or well-ordered set $$(S,\succeq )$$ is a set with a relation $$\succeq \subseteq S\times S $$ on the set which satisfies the conditions such that for any $$ x,y,z\in S$$ , the following hold:

    1.

    Reflexivity: $$x\succeq x$$ .

    2.

    Antisymmetry: $$x\succeq y$$ and $$y\succeq x$$ is $$x=y$$ .

    3.

    Transitivity: $$x\succeq z$$ and $$y\succeq z$$ , then $$x\succeq y$$ .

    4.

    Comparability: $$x\succeq y$$ or $$y\succeq x$$ .

    There are two natural examples for totally ordered sets. The first is $$x\le y$$ which states that x is less than or equal to y. The second natural example is the reverse $$x\ge y$$ which states that x is greater than or equal to y. Note that we sometimes refer to this at total order. For more on totally ordered sets, see [4, 133] or [160].

    A strictly ordered set, $$(S,<)$$ , is a binary relationship which is irreflexive, transitive, and antisymmetric set.

    Directed pairs of numbers

    $$i,j \in \{1,...,n\}$$

    can be transformed into adjacency matrices. We denote this by

    $$[(i,j)]\in \mathbb {M}_{n,n}$$

    for the directed case, and

    $$[(\{i,j\})]\in \mathbb {M}_{n,n}$$

    for the undirected case. Note that in both cases, we know the dimension of the matrix from the set $$\mathbb {M}_{n,n}$$ , and that we have a total order $$\succeq $$ on the set $$\{1,...,n\}.$$ We usually use the standard linear order of greater than or equal to, or smaller than or equal to.³

    $$\begin{aligned}{}[(i,j)]_{a,b}={\left\{ \begin{array}{ll} 1 &amp;{} \text { if } a=i,b=j \\ 0 &amp;{} \text { else } \end{array}\right. }. \end{aligned}$$

    (2.7)

    We use the above notation [(ij)], [(ji)] to define the undirected case, as shown below.

    $$\begin{aligned}{}[\{i,j\}]=[(i,j)]+[(j,i)]. \end{aligned}$$

    (2.8)

    In general, we can always transform a function

    $$\begin{aligned} W:V\times V \rightarrow \mathbb {R} \end{aligned}$$

    (2.9)

    into a matrix

    $$\begin{aligned} W=[w_{i,j}] \end{aligned}$$

    (2.10)

    where $$w_{i,j}$$ is defined by

    $$\begin{aligned} w_{i,j}=W(v_{i},v_{j}). \end{aligned}$$

    (2.11)

    2.4.1 Frequency Graphs

    Scripts provide us with a natural way to mechanically represent text, since we can easily compute the frequency of a pair of characters to speak one after the other. Therefore, we call these graphs frequency graphs since the weights measure the frequency of communication (i.e., a high number means two characters speak to one another more frequently). Formally:

    Frequency Graph

    Definition 2.6

    A weighted graph 

    $$G=(V,E,W_{f})$$

    with n nodes is called a frequency graph if the weighted adjacency matrix is

    $$W_{f}=[f_{i,j}]$$

    for all

    $$i,j\in \{1,2,...,n\}$$

    , $$v_{i},v_{j}\in V$$ :

    $$\begin{aligned} f_{i,j}= {\left\{ \begin{array}{ll} \ge 1 &amp;{} \text { if } (v_{i},v_{j}) \in E \\ = 0 &amp;{} \text { if } (v_{i},v_{j}) \notin E \end{array}\right. }. \end{aligned}$$

    (2.12)

    We now present three different frequency graphs which will be used throughout the book. Our goal is to understand the social structure of dramas. This social structure is best captured by the Who spoke to whom graph (or the WW graph ), due to the fact that characters tend to speak more frequently to their friends. This graph has been previously used in elementary Bible studies.

    However, the WW graph is difficult for machines to grasp. Therefore, we instead use two simpler approximations of this graph: the AB graph (or the Who spoke after whom graph) and the ABA graph (or the Long conversation graph), which is a subgraph (see Sect. 2.5.4) of the AB graph (see Fig. 2.6). Both the AB and ABA graphs come with their own benefits and limitations, as will be further explained later in this chapter.

    We now present a brief description of the ideas behind all three graphs. The details of these graphs will be explained throughout this section.

    The AB graph or the Who spoke after whom graph  uses the chronological order of the script instead of the semantic order [140]. This means that we will connect two characters in the AB graph if they speak one after the other. This is in opposition to the WW graph, or the Who spoke to whom graph, which connects characters who speak to one another. The ABA graph or the Long conversation graph again uses the chronological order, and adds an edge between consecutive speakers in a drama only when they are engaged in a long conversation. Note that short conversations are excluded from ABA graphs. We consider a short conversation when a character speaks locally once, i.e., if character A speaks, then character B speaks, then character C speaks. Since C responds to B instead of A responding, we consider this to be a short conversation. If A responds instead of C, we get the ABA pattern and a long conversation.

    ../images/490817_1_En_2_Chapter/490817_1_En_2_Fig6_HTML.png

    Fig. 2.6

    This figure shows three different graphs of Act II, Scene 1 of Romeo and Juliet. The leftmost graph is the Who spoke to whom graph (computed manually), the middle graph is the AB graph, and the rightmost graph is the ABA graph

    To clarify the differences between the AB and ABA graphs, compare the graphs AB to ABA in Fig. 2.6. In the Who spoke to whom or WW graph (on the left of the figure), there are 10 edges (including 1 self-loop). In AB, there are only 9 edges, of which 2 are incorrect (the first and second edges; see Table 2.7). The ABA graph has 8 edges, of which one is incorrect (the first edge; see Table 2.7).

    When comparing the WW graph to the AB graph, we see that the AB graph is roughly 78% approximate to the WW, and the ABA graph is roughly 88% approximate to the WW. Thus, the ABA graph is a better approximation for WW per edge. However, it contains fewer edges than the AB graph does. The fewer edges in the ABA means that the social network is not captured as well as it is in the AB graph. In fact, therefore, the AB graph may at times capture the social structure better than the WW graph does. If we remove the self-loops and direction from the graphs, then the AB graph captures the structure of the WW graph very accurately, while the ABA graph is not as accurate (see Exercise 12).

    Due to the structure of scripts, it is easy to harvest frequency graphs. We can then analyze those graphs through standard procedures. The next section explains in detail how to generate these graphs.

    2.5 Construction of the WW, AB, ABA Graphs

    Our goal is to approximate the Who speaks to whom or WW graph, which best represents the social structure of the drama. In order to accomplish this goal, we approximate this graph through simple heuristics. Primarily, we provide two different heuristics of the same flavor, using the structure and the rhythm of the drama. In doing so, we are able to imitate the process of reading, watching, or listening to a drama, akin to how an audience experiences a drama.

    To ease the calculations, we will use the example of Act II, Scene 1 from Romeo and Juliet. This scene is well-suited to this purpose since it is short and contains only three characters: Romeo, Benvolio, and Mercutio. In this scene, Benvolio and Mercutio are teasing Romeo. The text of this scene along with other pertinent information about this scene can be found in Table 2.5.

    Note that Appendix A discusses the technical details of the structure of scripts, and where on the Web to find scripts.

    ../images/490817_1_En_2_Chapter/490817_1_En_2_Fig7_HTML.png

    Fig. 2.7

    This figure shows the edges corresponding to the Table 2.5

    2.5.1 WW Graph

    We now explain how to compute the WW graph  manually. After reading the script Sc carefully, we construct the WW graph.

    For each response $$sc_{i},$$

    $$i=1,...,\tau $$

    , determine the target character of the response $$sc_{i}$$ , i.e., to which character the response $$sc_{i}$$ is directed. Denote this target character by $$Tc_{i}$$ . After carefully reading the script Sc and determining the $$Tc_{i}$$ for each of the responses $$sc_{i}$$

    $$i=1,...,\tau $$

    , we can construct the graph WW.

    First, the set of vertices of the graph WW(Sc) is

    $$\begin{aligned} V(WW(Sc))=\cup _{i=1}^{\tau } \{c_{i},Tc_{i}\}. \end{aligned}$$

    (2.13)

    Now after defining the set V(WW(Sc)), we can define the edges of the graph WW to be

    $$\begin{aligned} E(WW(Sc))=\cup _{i=1}^{\tau } \{(c_{i},Tc_{i})\}. \end{aligned}$$

    (2.14)

    Finally, define the weighted adjacency matrix 

    $$W_{f}(WW(Sc))=[f_{i,j}]$$

    of the graph WW(Sc), for all

    $$v_{i},v_{j}\in V(WW(Sc))$$

    by the formula

    $$\begin{aligned} f_{i,j}= |\{k\in \mathbb {N}:c_{k}=v_{i},Tc_{k}=v_{j},1\le k\le \tau \}|. \end{aligned}$$

    To avoid confusion, we assume that the nodes in V(WW(Sc)) are sorted alphabetically. This means that

    $$\begin{aligned} \text {if } V=\{v_1,v_2,...,v_n\} \text { then for all }1\le i&lt;j \le n, \text {it follows that } v_{i}&lt; v_{j}. \end{aligned}$$

    (2.15)

    For example, if we consider Fig. 2.6, then this means that

    $$v_1=\text {Benvolio, }v_2=\text {Mercutio, }v_3=\text {Romeo}$$

    .

    We next explain the AB graph and the ABA graph in further detail.

    2.5.2 The AB Graph

    There are four versions of the AB graph  which correspond to two independent dimensions: either the graph is considered to be directed or undirected, and the graph is considered to be weighted or unweighted. We give a definition only for the directed and weighted graphs. The definition for the other cases can be derived from the definition below.

    AB Directed Graph

    Definition 2.7

    Let

    $$Sc=\left( sc_{t}\right) _{t=1}^{\tau }$$

    be a script with length $$\tau .$$ The AB directed graph

    $$AB(Sc)=(V,E,W_{f}=[f_{i,j}])$$

    of the script Sc is a frequency graph

    1.

    whose vertex set is

    $$\begin{aligned} V(AB(Sc))=\cup _{i=1}^{\tau } \{c_{i}\} \end{aligned}$$

    2.

    with the set of edges of AB(Sc) as

    $$\begin{aligned} E(AB(Sc))=\cup _{i=1}^{\tau -1} \{(c_{i},c_{i+1}\}) \end{aligned}$$

    3.

    and the weighted adjacency matrix

    $$W_{f}(AB(Sc))=[f_{i,j}]$$

    of the graph AB(Sc) for all

    $$v_{i},v_{j}\in V(AB(Sc))$$

    is given by the formula:

    $$\begin{aligned} f_{i,j}= |\{k\in \mathbb {N}:c_{k}=v_{i},c_{k+1}=v_{j},1\le k\le \tau -1\}|. \end{aligned}$$

    When the script in question is clear from the text (as is typically the case), we denote this as AB instead of AB(Sc). As noted, there are four versions of the AB graph. We denote the weighted undirected version as ABU, the simple unweighted undirected version as ABS, and the unweighted directed version as ABUD. We assume V(AB(Sc)) are sorted alphabetically.

    When we add an edge linking the last character $$c_{\tau }$$ to speak in a drama to the first character to speak in a drama $$c_{1}$$ , this closes the Hamilton cycle in the graph AB. This means that we can visit all nodes according to the order of the script, and then return to the beginning of the script. Graphs with Hamilton cycles are more connected than the general AB graph. We denote these Hamilton cycle graphs to be $$AB^{\circ }$$ . Formally, we denote the set of edges to be

    $$E^{\circ }=E\cup \{ (c_{\tau },c_{1})\}$$

    . There are several advantages in working with $$AB^{\circ }$$ , as can be seen in Exercise 19.

    Therefore, all that is needed in order to generate the graph is the script of the drama. An example of a complete list of edges in a scene appears in the second column of Table 2.5.

    The next subsection explains how to improve the accuracy of the heuristic of the AB graph, when considering the question of whom a single response was meant for. While the accuracy of predicting the target of the response in the WW graph is improved by approximately 15–20%, this accuracy comes with a dilution of the edges of the graph.

    ../images/490817_1_En_2_Chapter/490817_1_En_2_Fig8_HTML.png

    Fig. 2.8

    This figure shows a dialogue between three cups, based on Chekhov’s Three Sisters. In this diagram, Cup A speaks to Cup B, and Cup B speaks to Cup C. If Cup C had responded to Cup A, this would be a directed ABA graph or a Long Conversation. The numbers in the voice bubbles represent the order in which the cups speak

    2.5.3 The ABA Graph

    While the surprisingly simple AB graph approximates the Who spoke after whom graph with decent accuracy, the accuracy can be improved by looking at longer conversations between two characters. We model this using the ABA graph. Prolonged conversation provides evidence of an exchange of ideas. Therefore, we call the ABA graph the Long conversation graph (see the coffee chat in Fig. 2.8). An ABA graph contains an edge going from A to B if A speaks to B, then B replies to A, and then A speaks again after B. The AB graph in Fig. 2.8 contains two edges: (AB) and (BC). The ABA graph in the figure is empty. If Cup B responds to Cup C, then there would be an edge between the pair and the ABA graph would not be empty. In the ABA graph, first, we consider three consecutive responses.

    The ABA graph utilizes the chronological order of scripts more so than the AB graph does. Therefore, the ABA graph is a better predictor of the edges of the WW graph than the AB graph is. However, in roughly 10% of the cases, the ABA graph does not capture the target of the responses in a script. That said, the ABA graph captures roughly 90% of the targets of the responses in the script.

    Note that we can define the ABA graph in two other ways (see Exercise 2). These other definitions may have a slightly different outcome than the definition presented above.

    ABA Graph

    Definition 2.8

    Let Sc be a script with length $$\tau .$$ The ABA graph ABA(Sc) of the script Sc is a frequency digraph. In particular, ABA(SC) is a graph where

    1.

    The vertex set is

    $$\begin{aligned} V(ABA(Sc))=\cup _{i=1}^{\tau } \{c_{i}\}. \end{aligned}$$

    2.

    The set of edges is

    $$\begin{aligned} E(ABA(Sc))= \{(c_{i},c_{i+1})\in E(AB(Sc)): c_{i}=c_{i+2}, 1\le i\le \tau -2\}). \end{aligned}$$

    3.

    The weighted adjacency matrix

    $$W_{f}(ABA(Sc))=[f_{i,j}]$$

    is given by the formula

    $$\begin{aligned} f_{i,j}= |\{k\in \mathbb {N}:c_{k}=v_{i},c_{k+1}=v_{j},c_{k+2}=v_{i},1\le k\le \tau -2\}. \end{aligned}$$

    We use the same notation here as above in the AB graph. When the script in question is clear from the text, as is typically the case, we denote this as ABA instead of ABA(Sc). As noted, there are four versions of the AB graph. We denote the weighted undirected version as ABAU, the simple unweighted undirected version as ABAS, and the unweighted directed version as ABAUD. We assume V(ABA(Sc)) are sorted alphabetically.

    The nodes of ABA are exactly the same as AB. Sometimes, it is useful to delete nodes with degree 0. In this book, if these nodes are deleted, it will be specifically mentioned.

    Note that the last edge of AB will never exist in

    Enjoying the preview?
    Page 1 of 1