There have been lots of discussions lately about the need and opportunity for structuring data on Twitter to make it more useful. Presently, the main tool we have to use is hashtags. I decided to do a little research on exactly how users are using hashtags and if there are any patterns to be seen. First, the fact that I, personally, did this research is a testament to how easy to use Twitter’s API is combined with Excel and some Visual Basic code found on the web. I’m sure the analysis isn’t perfect; it’s just designed to start new conversations. This is definitely the geekiest (and most difficult to read) post I’ve ever written. I started by downloading 3,000 public tweets on Friday October 23. I’m sure there is a bit of bias in this small sample; I didn’t attempt to control for it. The first things I looked at were the number of tweets containing hashtags or mentions: Of the 3,000 tweets, 13% contained at least one hash tag, 47% at least one mention. Of these tweets containing hash tags, I found 258 unique hashtags. Each tag was used in between 1 and 1509 differerent status updates (in total, outside my original 3,000). In actuality, the ones reported as being used 1509 times were used at least 1509 times. The Twitter API only shows only the first 1500 search results. The first thing I wanted to look at was the frequency with which hashtags are used. I wanted to see if a hashtag was likely to be used in 1 posts or 1500 different posts. The distribution was quite heavy in the head and in the tail and is shown below. The frequency distribution below is especially hard to read because I couldn’t convince Excel to give me labels on the X and Y axis like I wanted. The X axis shows the number of posts and the Y axis shows the number of tags. For example, there were 49 different hashtags used in 10 or less posts. There were 68 hashtags used in about 1500 posts. An example of the former is #hackedagain used in two posts by two different authors. An example of the latter is #Windows7 used in over 1500 posts by over 1225 different authors. Click here to view the distribution of hashtags by usage. My blog won't allow pics inline. :( If we consider those near the right of the graph as having critical mass in the community, then 26% of the hashtags gained critical mass. The interesting area in the frequency distribution, to me, is between 500 and 1500. These tags make up the majority of tags in use. They have a low posts-per-author ratio and seem to be useful. For instance, #CHOCOLATE was used in 465 posts by 309 different authors. The hashtag has been mildly diffused. A more indepth network science study of these tags would be interesting. The next thing I wanted to look at was of those gaining critical mass, how distributed were they in authorship. I calculated posts-per-author as a way of detecting spam. A posts-per- author near one implies diffusion whereas a number nearing 1500 implies spam. Of these 68 hashtags reaching widespread use, 50 resulted in a posts-per-author of 5 or less. In summary: 1) Only 13% of the tweets used a hashtag. 2) Of the hashtags used, about 26% gained widespread use (the hashtag was used in at least 1500 posts). This suggests there is less than a 4% probability that any given tweet can be structured with critical, diffused means (.13 * .26) 3) About 75% of the hashtags used are for one’s personal reasons or in very small circles. 4) Of the ones that gained widespread adoption, they weren’t spam. Opportunities: 1) There is a bit of hashtag redundancy amongst the critical mass hashtags. For example, #FF, #FOLLOW, and #FollowFriday are each widely used. 2) The critical mass tags may not be very valuable for structured analysis as it stands now. They are very broad, by definition, and include things like #tech, #video, and #fail. 3) Those in the middle of the distribution, ~500 uses, may be of the most use for brands. Included are things like #Colts, #Edtech, and #Wvu. If anyone would like a copy of the Excel I used, please email or @hellodelight me. It's interesting to look at the bank's compensation schemes and resultant fallout from a behavioral theory point of view. If ever there was an industry that believes in pay-for-performance, it's banking. This is true in the US and the UK; both have been blighted similarly. Lots of psychologists believe that extrinsic motivators (eg rewards) don't create long-term change. At best, they create short-term compliance. Worse, they create perverse disincentives. Blatant bribes such as attaching a large part of one's compensation to performance leads to destroyed relationships and no sense of community within a firm, covers up underlying reasons and logic and building on false assumptions, creates an unnatural environment of risk aversion or risk seeking behavior, and undermines longevity within a firm. Most psychologist believe that extrinsic motivators are trumped by internal ones. Give someone a sense of meaning, autonomy, an understanding of the bigger picture, and the like and you'll get much better employee performance. It seems so reasonable yet we tend to ignore it. Banking is based largely on rewards compensation tied to extraordinarily difficult problem sets. There is no evidence to support extrinsic motivation helps solve complex problem. In fact, the opposite is true. Rewards stifle creativity as bodies of research show. Alas, equity as we've practiced it is also not the answer. Options create many of their own perversions - there is little doubt about that. The answer isn't quite clearly. Certainly it's a mix of an appropriate rewards baseline and intrinsic motivation factors. But in what combination do we sprinkle those? And, is it appropriate to have multiple compensation schemes within a single firm? Lots of questions...not so many answers. In the age of rampant media and mass publishing that we live in today, it's critical for a technology firm to time its innovation strategies well. Too many firms proceed to market with an assumption that their technology is too difficult, expensive, or time-consuming too copy. Timing innovation is a function of (at least) two things: The probability of spillover of properitary information into the market and the amount of headstart time you have. The headstart is essential if the firm is trying to establish a new standard, build a network of complementary partners, and create brand. The spillover effect is crucial if the firm is trying to maintain its first mover advantage by holding onto proprietary information. The question I've been pondering is how does this work in the age of the blogosphere? Clearly, the spillover effect is difficult to control. With so many opportunities to share, it's quite likely that information will be shared. You can pursue an intellectual property defense in some cases but that's often costly and time consuming. The headstart is equally difficult to manage as new technologies are often built on existing platforms. Much innovation is incremental rather than disruptive; a healthy headstart nearly impossible to achieve. There are corner cases where a firm has acquired a majority of share due to its headstart, certainly, but these are the things that are most difficult to achieve. Assuming these are the states of nature, how should a technology firm time its innovation? For an upstart, they must weigh the risks of moving first and moving big with the potential to grab lots of share. They must truly evaluate their technology along these dimensions. If they don't believe they can grab appropriate share due to managing spillover and maintaining a headstart, they must consider waiting until uncertainty (market, technical, etc) dies down a bit. Preference functions are a fantastic way to model things. In lots of situations, it's useful to understand what factors constitute value to a party and how those things are weighted. If you know someone's preference function, you can well predict their actions. For employees, preference functions seem to be changing. There appears to be a generational shift between X+Y and the Baby Boomers that as large as any shift we've seen before. The weight of income would seem to be slipping while factors such as impact and automony are increasing. The movement in this preference function seems to be taking hold rather quickly, compounded by the economic climate. Yet on the corporate side, preference functions seem to be fairly static. Most of them are still most heavily weighted on short-run profit. That's fine but their long-run prospects dwindle as its workforce and other factors of production refocus. Most of what we're perceiving to be preference function changes on the corporate side are really not. They're non-strategic and therefore non-sustainable. Corporation have yet to reestablish value chains in the ways that will be necessary to be sustainable. This is beyond environmental impact, although that's part of it. It's definitely beyond cause marketing. This is about full value chain rejiggering to create strategic advantage by aligning yourself with your value network. It's the new school of capital. It's not optional. I hope, and believe to be true, that the bottom forces underlying business are changing. I hope to be part of nudging businesses along as well. Wharton has recently announced its The Future of Advertising project. The project "seeks not only to collect case studies and data, and ask experts to debate the best new practices in the fast-changing world of marketing, but also to make its own use of New Media to build a broader audience for its work." From the email blast today in Knowledge at Wharton: "Yoram (Jerry) Wind, director of the SEI Center, says the Future of Advertising Project is working toward the ambitious goal of creating what he calls "a portfolio model" of advertising that would allow companies to craft a unique strategy using a traditional mix of traditional and new media to make sure it's reaching the right customers. "When families create an investment portfolio, the idea is to figure out what is the objective of the investment and then make your decisions based on that," Wind says. "We want to bring that same approach to advertising" -- in other words, weighting the use of various techniques like TV spots compared to Internet advertising or communicating directly with consumers through social networking sites like Twitter or Facebook." Not that this is anything new, of course. Maybe the answer is that Harry Markowitz didn't solve advertising too. Addressing consumers as they are securities leaves an advertiser with very blunt tools to use. You get things like frequency and amplitude. You don't solve for impact in an closed-loop equation. Assuming Mr. Wind perfectly constructs his portfolio, you end up with an ideal media mix. That doesn't solve too much, save spending less on the same old tired advertising plans. The problem with portfolio theory (at least one of the problems and the one I know of that Warren Buffet subscribes to) is that it implies diversifying risk vis a vis a lack of impact by any single thing. It suggests you need to know very little about the underlying securities because they all function roughly the same. In advertising, that's not the case. An ad can matter and an ad can make a difference. Ads aren't simply variables in an equation unless we beleive all ads are commodotized. And I don't believe that...yet. |