Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Recommender Systems
Recommender Systems
Recommender Systems
Ebook366 pages4 hours

Recommender Systems

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Acclaimed by various content platforms (books, music, movies) and auction sites online, recommendation systems are key elements of digital strategies. If development was originally intended for the performance of information systems, the issues are now massively moved on logical optimization of the customer relationship, with the main objective to maximize potential sales.

On the transdisciplinary approach, engines and recommender systems brings together contributions linking information science and communications, marketing, sociology, mathematics and computing. It deals with the understanding of the underlying models for recommender systems and describes their historical perspective. It also analyzes their development in the content offerings and assesses their impact on user behavior.

LanguageEnglish
PublisherWiley
Release dateDec 4, 2014
ISBN9781119054238
Recommender Systems

Related to Recommender Systems

Related ebooks

Information Technology For You

View More

Related articles

Reviews for Recommender Systems

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Recommender Systems - Gérald Kembellec

    1

    General Introduction to Recommender Systems

    1.1. Putting it into perspective

    Before the emergence of modern information systems, individuals developed the habit of recommending products or services through word of mouth, sharing certain social or cultural affinities [OBR 77, SHA 95]. This approach, which can be qualified as social, pursued the principle of sharing an individual experience with others, in areas, at first, as wide as culture or handicraft and then industry. Beyond the reputation tied to the intrinsic quality of a product, there were assessments that emerged through the prism of sociocultural mediums which also improved products and services.

    Today, offers – whether information or products – are increasing day-by-day, proposed on the Internet. Beyond a certain threshold, too much information can lead to a deterioration of the quality of the message, which we refer to as information overload [LEV 98, CHE 09]. For the end user in search of information, it is of interest for the system to carry out preprocessing in order to filter the least important elements, in line with their expectations. The development of automated recommender systems (RecSys) is therefore a foreseeable phenomenon for contributing toward resolving the problem of information overload, valuing content and focusing attention on the user in such a context of overabundance.

    The first recommender systems, using collaborative filtering, had the aim of using the volume of community evaluations in order to propose personalized cultural advice, based on evaluation statistics and the correlation of user profiles [RES 94].

    As early as 2000, Burke remarked that many commercial websites such as Amazon or even eBay had understood the purpose of contextualizing peripheral hyperlink offers consulted by the user [BUR 00]. Commercial search engines have even created related products such as Google AdSense in order to optimize advertising profits by taking advantage of recommendations based on the contents of queries, or even e-mails¹. The principle is simply to propose private advertisers to provide hyperlinks directed toward their website in the margin of content selected by the user. This second method is called the content-based method.

    With the arrival of social networks, be they in the public or professional spheres, sharing and the evaluation of content have become a mass worldwide phenomenon. As a result of this unprecedented generation of data, mercantile diversions are common and have led AFNOR² to propose standards for controlling the phenomenon [AFN 13].

    1.2. An interdisciplinary subject

    The first notable papers confirming recommender systems as a dedicated area of study and research involved computer science specialists as well as economists invested in the emerging development of e-commerce. The issue of information systems unified them; it has become a decisive factor in the decision-making of organizations. Thus, the precursory paper by Paul Resnick (AT&T) and Hal R. Varian (Berkeley School of Information Management) in 1997 focused on the functional analysis of five precursory recommender systems by mostly concentrating on the business model and risks of corruption of such systems [RES 97]. In 2000, Robin Burke, a researcher in computer science, prioritized mentioning the emergence of large catalogs and the required assistance for the consumer in making their choices; his articles focused on the design of algorithms and their performance [BUR 00]. E-commerce and recommendation algorithms were originally linked.

    The data in Table 1.1, collected by consulting the digital library of publications of the Association for Computing Machinery (ACM) about the thematic area of Recommender System in the titles of articles, show the increase in interest in this subject over the last 5 years. This count remains partial compared to the set of articles published by other publishers on this subject over the same period. The growth of information as well as the major development of online commercial platforms explains for the most part the stakes associated with the issue; its development goes hand in hand with the optimization of information systems and the needs of e-marketing.

    Table 1.1. Increase in the number of articles dedicated to recommender systems in the library of the ACM (http://dl.acm.org/)

    The international conference on recommender systems (RecSys) was held in 2007 by the ACM and gathered many RecSys specialists. The 8th meeting of the conference will be held in Silicon Valley at the end of 2014³.

    The literature shows that the computer approach is focused on the performance of algorithms, their robustness, the design and comparison of systems based on semantic, social as well as hybrid data. The proposed evaluation is often centered around the interaction with the technical system, but does not take into account the more qualitative approaches centered around the user. The computer approach also takes into consideration questions related to the transparency, clarification, trust and measurement of recommendation diversity. The ongoing renewal mainly includes combinations with other technologies: notably the Web of data, Big Data and automated sentiment analysis.

    E-commerce approaches are mostly focused on new techniques which can direct potential clients to targeted products and services. The combinations of different types of recommendation have been tested in fields such as tourism and cultural industries (selling of books, music, on-demand video). Recommender systems are considered to be marketing tools and technologies specific to business intelligence, a set of methods and technologies which transform data into useful information for decision-making in industry.

    From the point of view of information science, identified works are more recent; they highlight the use of such systems for developing discovery functions in digital libraries and library catalogs [WAK 12]. Qualitative evaluations of recommendations, the perspective of users and psychological factors are all perspectives of analysis which are specific to recommender systems and which open up new areas of research in this field with the help of abundant literature on techniques and algorithms. Several conferences are focused, however, on the user experience with these recommender systems by assessing their acceptance or rejection placed in this context. It is notably the aspects of visualization, clarification, transparency, trust, and help in decision-making which are the objects of investigations by researchers from various subject areas⁴.

    1.3. The fundamentals of algorithms

    Here, we introduce the foundations of recommendation systems, models and methods to provide a better context for the later chapters. This conceptual appropriation is intended to be neutral and factual; it will pave the way for the presentation of more involved points of view in the rest of this book.

    1.3.1. Collaborative filtering

    Historically, the first system proposed was based on collaborative filtering. This method assumes an authentication of users on the content management platform and, of course, personal input. Once a document has been proposed to the user by the system on the basis of criteria researched during the creation of the profile and/or the use of an additional internal search engine by the user, the latter will propose the possibility of attributing a rating to it. This rating can be an intrinsic assessment of the document, or an assessment of the relevance to the context of the search and its main intentions.

    This rating will be preserved within the system to be reused. According to the memory-based or heuristic collaborative filtering, ratings can help predict the assessment of a user α of an item based on that of another user β, having regularly rated in a similar way. In order to determine which user β is most similar to user α, the Pearson correlation is often used [RES 94]. This method is also referred to as Word of Mouth [SHA 95] or People-to-People Correlation [SCH 99].

    Let r be the Pearson correlation coefficient which in our case compares ratings, from 0 to 10, of 2 users for a collection of items. We note that this function is integrated into modern spreadsheets⁵. The correlation will be weak if the coefficient is less than 0.5 and strong if it tends toward 1.

    Pearson correlation:

    [1.1]

    Example of the computation of the similarity between users having rated a set of items. Table 1.2 displays a collection of user assessments for certain items.

    Table 1.2. Example of a sample of ratings

    Table

    Table 1.3 displays the correlation coefficients computed two by two for the collection. The values in bold show strongly correlated users.

    Table 1.3. Similarity of users based on their Pearson correlation

    Table

    In the example, for the values presented in Table 1.2, the results displayed in Table 1.3 show that each user can benefit from the assessments of at least one other user with a similar profile to theirs (correlation close to 1).

    Once the number of user ratings has reached the maximum value, it can be used for offering a more precise prediction method referred to as model-based prediction which uses user profiles [BRE 98]. In this second method, the profile types are established by grouping those which have given similar ratings. These are the profile types or models which will be used to give out recommendations.

    1.3.1.1. Advantages and drawbacks of collaborative filtering

    The first advantage of recommendations based on collaborative filtering is that familiarity with the area of knowledge is not required for searching for information [BUR 02]. This system also facilitates the recommendation to be extended to genres which are correlated to the area of knowledge by using the other interests of similar profiles. This elicited serendipity is referred to by Burke as cross-genre niches [BUR 02]. According to Poirier et al., because of its independence from the representation of data, this technique can be applied to contexts where analysis of the content is difficult to automate [POI 10]. We also add that for image, audio and video documents, metadata is rarely available. In this context, outside of collaborative filtering (or a preliminary significant descriptive crowdsourcing effort), there would not be an alternative recommendation method. The last positive aspect is that the quality of the recommendation proposed through collaborative filtering increases with the use of the system.

    Claypool et al. have highlighted a certain number of problems in initial recommendation methods [CLA 99]. For example, in the initial state, a recommender system based on collaborative filtering is unusable due to a coldstart. This coldstart problem manifests itself in the following way: without ratings no recommendation is possible. This difficulty is reproduced every time an item or user is added. With an overly low number of evaluations for a vast corpus, the data will be too sparse to establish enough correlations. This phenomenon is referred to as sparsity [CLA 99].

    It is also shown that the principle of popularity will be favored by collaborative filtering. The more an item is favorably rated, the more it will be recommended and therefore rated again. This principle of self-generated notoriety therefore seems to be a result of age rather than the actual quality as perceived by users. This problem can be made up for, or on the contrary intensified by, a downfall of social recommendation systems, namely rating fraud through multiple identities. It can be tempting to modify recommendations from a marketing perspective by leaving ratings under multiple identities. This technique is referred to as shilling and is the object of many studies [LAM 04, BUR 06].

    1.3.2. Content filtering

    The other classic filtering method is based on the description and analysis of the content proposed by the system. This process is mainly based on text analysis techniques, but can be extended to various forms of content containing metadata. Digital text documents which are already well equipped with a wealth of metadata and linked to catalog records illustrate this point.

    The content-based recommendation technique is based on the relationship between the user and metadata associated with the items stored in the knowledge base [BOU 04, LEE 06].

    The user can voluntarily enter their preferences during their signup to the service: they are provided. The other possibility is to compute preferences through the observation of their behavior [ADO 05]. In this case, they are calculated and put into vectors.

    User preferences are represented in the form of a vector containing the most representative preferences of the user. These key terms can have a statistically determined value depending on their frequency in documents visited and/or rated by the user within the corpus [BAL 97]. For example, it is possible to use the tf algorithm to weight key terms from texts [SAL 88].

    Frequency of a term in a document:

    [1.2]

    EXAMPLE 1.1.– Let us consider a document d containing 100 words in which the term m appears n times with n = 3. The frequency of the term (tf) for m in document d is therefore the quotient between the number of occurrences n of the word m in document d and the total number of words in d. In this example, this gives 3/100.

    The inverse of the frequency of documents [JON 72] is therefore computed with the logarithm of the quotient between the cardinal number of the whole of the corpus C and the cardinal number of the sub-corpus C′ of documents of C containing term m. The number 1 is added to the denominator in order to generalize the function in the case of the absence of terms in the corpus.

    Inverse of the frequency of a word in the corpus:

    [1.3]

    EXAMPLE 1.2.– Suppose that we have 10 million documents in the corpus C and that the term m appears in one thousand of these. If we apply this to our example, the idf is log (10 000 000/1 000), thus 4. The value of tf.idf in our example is therefore 0.03 × 4 = 0.12. Thus, the term m will statistically be weighted with a coefficient of 0.12 in document d of corpus C.

    This basic algorithm is rarely used on its own, and has been replaced by more recent and sophisticated combinations, such as Terrier [OUN 05], notable with okapi BM25, but remains the basis for the weighting of the representative terms of documents in text corpuses.

    Methods based on the vectorization of queries show promising results. Berry et al. have suggested the recovery of the query in matrix form through the popular latent semantic indexation (LSI) algorithm. The algorithm creates a vector space of reduced dimensions which offers a representation in n dimensions of a set of documents [DUM 88]. When a request is submitted, its numerical representation is compared with the cosine of other documents in the database, and the algorithm returns the documents with the smallest distance. This method can be adapted to recommending documents according to the needs of users.

    1.3.2.1. Advantages and drawbacks of content filtering

    The advantages of content filtering are similar to those observed in collaborative filtering [BUR 00]. Thus, knowledge of the area is not required by the user, since recommendations are based on corpus data. The accuracy of the system recommendations will also evolve with the size of the corpus. However, a system based solely on corpus data will not be able to propose serendipity in the absence of user correlations. Furthermore, as pointed out by Poirier, each user is absolutely independent of others. Thus, a user who would have appropriately filled their profile with their interests will receive recommendations even if they are the only one to be registered [POI 10].

    The main drawback of a content-based recommender system is first, as for collaborative types, the case posed by new users who do not have established profiles and therefore no observed reference data. Moreover, it is also very difficult to index non-text-based data. The users will be typecast into a particular search context, the one which has already been set as their area of interest. This problem is referred to as overspecialization, which eliminates any possibility of serendipity through the proposal of related subjects.

    1.3.3. Hybrid methods

    Trivially, the hybridization of recommender systems is the result of the combination of collaborative filtering and content-based methods. This vision for hybridization was refined by Burke and then by Adomavicius and Tuzhilin [BUR 02, ADO 05].

    Burke made a list of the following seven hybridization techniques [BUR 02]:

    – weighted: the recommendation value of an item is based on the sum of available methods. For example, P-Tango [CLA 99] gives an equal value to both collaborative filtering and content-based filtering. This value is then weighted by a confirmation of the users;

    – switching: the system chooses to apply either a data-based method or social filtering depending on the search context of the user;

    – mixed: this technology facilitates the proposal of recommendations from traditional methods with the aim of limiting the drawbacks of each classic method;

    – features combination: this method offers the possibility of enriching data which has been integrated a priori into the system with the ratings of users, which enriches the database a posteriori. The computation of the recommendation is carried out over all of the data;

    – cascade: this process consists of a double analysis of user profiles. The first is used to highlight potential candidates, the second to refine the selection of users;

    – features augmentation: this is a technique which is similar to the previous one for the first pass-through. If the number of candidates is too high on the first pass-through, then a second will carry out a secondary discrimination by integrating the data of recommended items;

    – meta level: as for the first two methods, it involves filtering users twice in order to determine similarities. The difference is that the first pass-through makes possible the generation of a model or profile type of the user.

    Adomavicius and Tuzhilin have proposed a classification of hybrid recommendation methods based on three points of focus [ADO 05]:

    – combining separate recommenders: the collaborative method and the content-based method are applied separately, then their predictions are combined;

    – adding content-based characteristics to collaborative models: this system uses the classic collaborative People-to-People Correlation approach, to which it adds recommendations based on the classification of the content and the interests indicated by users;

    – adding collaborative characteristics to content-based models: the principle of this model is not to reverse the previous one, but to incorporate characteristics of the model-based group profile collaborative method into the content-based approach;

    – single unifying recommendation model: construction of a general model which incorporates the characteristics of two models within a same algorithm.

    1.3.4. Conclusion on historical recommendation models

    The timelines of the first two types of recommendation model overlapped in the 1990s.

    Collaborative filtering recommender systems are based on the statistical processing of opinions expressed by users. It was found that data-based methods are adapted to automatic language processing rules, namely automatic indexing and the weighting of representative terms. In order to mitigate the weaknesses inherent to these initial models, hybrid methods have emerged since the end of the 1990s. We will examine the ways in which these different algorithms have been implemented in online applications.

    1.4. Content offers and recommender systems

    1.4.1. Culture and recommender systems

    1.4.1.1. Recommendation and cinema

    Historically, researchers (GroupLens) have mostly been interested in the application of recommender systems to the cultural domain with cinema and film ratings [ALS 97]. Film database interfaces are available to users in return for a rating. This method, used in MovieLens, is exactly that presented in section 1.3.1 [SCH 07]. Based on the ratings of each user, it is possible to provide recommendations.

    The French cinema listings website Allociné contextually proposes an offer with similar ratings for each presented film. The improvement of this recommender system is based on the introduction of stars to the Internet user, which represent an evaluation, as well as the popular Facebook Like mechanism or even Would you like to watch this film yes/no (see Figure 1.1, top left). This website also offers the possibility of rating films in batches, on a scale of 1 to 10 if one has seen the film, or indicating whether the user is interested or not (see Figure 1.1, bottom right). The principle is to consecutively assess a large number of cinematographic works and therefore facilitate the system to create the most accurate profile of our preferences in this department. Additional propositions will be more accurate as the number of rated films increases.

    Figure 1.1. Allociné’s rating context

    Figure

    1.4.1.2. Recommendation and literature

    For the recommendation of literary works, we mention the social network for readers Goodreads and the French network Babelio⁶. Goodreads initially modeled its recommendation system on metadata sourced from Amazon. Filtering was therefore based on this data.

    Enjoying the preview?
    Page 1 of 1