Example: Merging Recommenders
Using matrix-factorization-based techniques for recommendation has some interesting side benefits. It means that models from different domains can be merged, if they share one type of entity, to create a new and potentially interesting model. That is, given a recommender based on associations between As and Bs, and one based on associations between Bs and Cs, it's possible to make a recommender that connects As and Cs.
This brief example again uses the GroupLens movie rating data set. It contains associations from users to movies (ratings), and also movies to genres, because each movie is labeled with relevant genres. These can be used to create a user-movie, or movie-genre recommender, independently. But, they can be put together to create a user-genre recommender.
This is a toy example. It may not be of great interest to "recommend" genres, or label users with their most favored genres. But there are much more interesting examples of this form. For example, Ted Dunning writes about using user-query data, and user-video-play data on a video site to connect query terms and videos -- creating an amazingly effective search engine.
Experimental
Note that this is an experimental feature of Myrrix, and not generally available yet. From version 0.6, interested developers can try this by manually merging net.myrrix.online.generation.MergeModels in online-local. In version 0.6 of the Computation Layer, it will be possible to merge the model of a different instance by passing -Dmodel.merge.instanceID=[other instance ID] on the command line, which will cause mergedY/ to be created in the output directory, which is the contents of Y/ post-multiplied by the other model.
Preparation
From the GroupLens 10M data set, the ratings.dat can be converted into a simple CSV format of "userID,movieID,rating". Call the file user-movie.csv. The movies.dat, with more parsing, can be turned into a CSV file of "movieID,genreName", called movie-genre.csv.
To create a movie-genre recommender sending movie-genre.csv to the ingest method of a Myrrix Serving Layer instance. Note that genre names are not numeric. These can be used as-is if client side translation is used. And this turn requires creating a file containing all 19 genre names. Call this genres.txt. This can be accomplished using the Java client command line:
java -jar myrrix-client-x.y.jar [options] --translate genres.txt ingest movie-genre.csv
java -jar myrrix-client-x.y.jar [options] refresh
Both numeric movie IDs and non-numeric genre names will be translated (hashed). As a result, when creating the user-movie recommender, translation will also be necessary. And this in turn requires creating a file of all known movie IDs. Call it movieIDs.txt. This recommender can be created with:
java -jar myrrix-client-x.y.jar [options] --translate movieIDs.txt ingest user-movie.csv
java -jar myrrix-client-x.y.jar [options] refresh
When done, each recommender has produced a model.bin file. To merge the two models, their matrices are simply multiplied together to make two new factored matrices:

Sample Results
When the new model.bin is used in a new Serving Layer, it can be used to recommend genres to users. The results can be evaluated anecdotally. For example, user 10010 in the data set rated the following movies:
- 400 Blows
- Breathless
- Oldboy
- Ratatouille
- WALL-E
He or she disliked movies like:
- Bio-Dome
- Black Sheep
- Romy and Michele's High School Reunion
- Sister Act
The user prefers mostly foreign dramas, with two children's films, though has rated some mainstream comedies poorly. The "recommendations" (here, really an assessment of favored genres), are:
- Comedy
- Drama
- Crime
There is no "Foreign" genre in the data set, so this feature is not identified. It does make the reasonable conclusion that the user has interest in Comedy and Drama, mostly.
Or consider user 10016, who liked:
- 12 Monkeys
- Se7en
- The Usual Suspects
- The Professional
These are dark thrillers and action movies, and indeed, the top genres come out as:
- Drama
- Crime
- Thriller
While quite anecdotal, these results indicate suggest that this simple process does work, and that existing association can be mind to infer associations for which there are not even direct examples observed. (Here, nobody has tagged users with genre preferences, but we have been able to infer them anyway.)
