One of the main challenges for those who are working with User Generated Content on the social web is its incredible malleability, variety and idiosyncratic nature. In other words, UGC is a muddled up mess that rivals the Tower of Babel. You all know what I am talking about. Just look at a random MySpace profile and try to figure out what exactly is being expressed there.
If it is so difficult to make some sense out of what is said on social media for a culturally aware human, imagine how difficult it is for machines to understand and properly process this kind of data. Rule-based engines quickly run into difficulties when dealing with the shifting morass of language on the social web. The grammatical structure is difficult to discern as opposed to regular prose writing (i.e. Where does a sentence begin or end? What are the main phrases that constitute this text? What is the subject and what is the object?) but the semantic features – the actual meaning of the text, in other words- is even more difficult to figure out, largely because of two main features of language that come up far more often in social networks than in more traditional forms of writing: Polysemy and Synonymy. Polysemy is the state of one word having many different meanings, and synonymy is when many words mean the same thing.
For today, let's look at polysemy, and then in the next posting we’ll examine synonymy. Words often have many meanings – in fact a great example is the word "mean" which can mean "not nice", "occupying a middle point" or "to denote" but on social media, there tends to be a much wider range of meanings attributable to most words because of the amount of proper names that are used to indicate bands, movies, brands, books and things of that nature.
For example, many names of bands are in fact common English words so how do you know that the band is being referred to rather than the regular word? This can get very problematic – think of the (somewhat) popular band "Yes" (one of my favorites, by the way – you should definitely check out their classic album Fragile from 1971 if you don't know it already) or "Tool" or "Live." These are bound to create a lot of problems when it comes to determining whether this word which appears in a user's profile is related to the band or to the standard denotation of the word. This is especially problematic because there tend on social networks not to be long, well-formed sentences that could provide some context and tell the semantic processor if the subject is music or something else (unless of course the word has been entered into a text box that has been assigned a category by the site owner such as "movies," "music," "hobbies" etc…).
One of the advantages of the approach being taken by Peerset is that by analyzing the relationships amongst these terms, we have the added benefit of being able to disambiguate based upon our social psychographic profiling of the users. If somebody likes "pink" this could mean many things, foremost amongst them the colour pink or the musical artist Pink. By seeing what else this person likes, Peerset can figure out based upon the principles of psychographic segmentation whether in this instance the colour or the artist is intended. If other interests are "glitter, ballerinas, Victoria's Secret, lipgloss" then the word is likely to refer to the colour pink, but if the other interests are such things as Christina Aguilera, Nelly Furtato and Avril Lavigne then the word probably refers to the artist Pink. This process is very powerful for targeting advertisements based upon a proper understanding and disambiguation of people’s interests.
Comments
Pink
This sounds like some great stuff!! One comment, though: You write: "If other interests are "glitter, ballerinas, Victoria's Secret, lipgloss" then the word is likely to refer to the colour pink..." In this case, actually, I would imagine that it refers to Victoria Secret's distinct brand of loungewear, Pink (vspink.com). I guess this simply underlines the need for the type of disambiguation you describe. I look forward to seeing how this works.
VS Pink?
Thank you for that insightful comment. You are certainly right that the Pink line of loungewear might very well be the answer here. In fact, our research has shown that specific brands and product names are discussed surprisingly often on Social Media sites. I will get to the bottom of this by doing some thorough research on Victoria's Secret products.