Twitter retweet analysis

With Professor Lars Kai Hansen I am presently looking into retweeting on Twitter. A 2010 scientific article Want to be retweeted? Large scale Analytics on factors impacting retweet in Twitter network by Suh, Hong, Pirolli and Ed H. Chi, examined what the variables hashtag, “@”, number of followers, number of followees, age of account, number of tweets, number of favorited tweets and number of tweets have of effect on whether a tweet is retweets.

The article also points to Dan Zarrella’s previous writings. He has a blog as well as the slides The Science of ReTweets. Zarrella reports (on page 11 in the slides) statistics on the fraction of retweets with URLs and it is well over 50%, Suh & Co. writes it to be 56.69% to be exacts.

This fraction does not fit with what Suh & Co. find. They say only 28.4% of retweets have URLs.

To investigate this discrepancy I looked into the tweets I had downloaded. The tweets were downloaded with the streaming method provided by Twitter that I heard of through Bjarne Ørum Wahlgreen. I am furthermore using the MongoDB noSQL database for storing at the moment (I used SQLite before). It means that you can write the downloading and storing in one Unix line which is (with a tip from Eliot):


curl http://stream.twitter.com/1/statuses/sample.json -u:USER:PASSWORD | mongoimport -d twitter -c tweets

I have only a bit above 330’000 tweets in my database at the moment, but my results align better with Suh & Co than with Zarrella. The result depends on the matching of a retweet. For my most broadest I get 25.2%.

Furthermore, I find that the fraction of tweets with URLs is 19.1% which is in alignment with both Zarella and Suh & Co that both report around 20%. I find the fraction of retweets to the total to be in the range 9-16%.

The detailed results are here:

Total                          330000    100.0%
With URLs                       62901     19.1%
Retweet                         52633     15.9% of total
Retweet with URLs               13239      4.0% of total
                                          25.2% of retweets
                                          21.0% of tweets with URLs

Suh & Co. found that hashtags were associated with increased retweeting. On a blog one of the authors writes "Want to be Retweeted? Add Hashtags to Your Tweets!". I doubt that the causal relationship is that simple. I think it is more likely that a common effect (e.g., that the tweet is informative and well-written) causes the tweet to get hashtag(s) and be retweeted.

Did bloggers bring down the German president?

The role of the main stream media in Germany as the 4th power of state, was recently aided by the 5th(?), namely the bloggers, as the German president resigned after some questionable statements in an interview.

In the interview, aired on May 22, Koehler appeared to imply that military operations like in Afghanistan were, in part, commercially motivated and necessary to protect Germany’s economic interests.

“[A] country of our size with its … export dependency should also know that, if in doubt, in an emergency, a military engagement is also necessary to defend our interests, for example free trade routes,” he told Deutschlandfunk.

Yahoo news

The story was largely ignored by the mainstream media, but a number of bloggers started debating the story, including Jonas Schaible of unpolitik.de, which only hit the mainstream media after Schaible wrote a piece for Der Spiegel online. This was over a week after the interview first aired.

“World Without Torture” campaign started on Facebook and Twitter

“World Without Torture” campaign of “International Rehabilitation Council for Torture Victims” (IRTC) seems to have started in May 2010. It is running on Facebook from http://www.facebook.com/WorldWithoutTorture as well as Twitter from http://twitter.com/withouttorture

There are at the moment only 73 followers on Twitter, but 1832 persons in the group on Facebook, and the wall on Facebook is much more active.

Previously it have been quite difficult to get data from facebook, but now has become easier with the search in the “graph” part of Facebook, e.g., http://graph.facebook.com/search?q=torture
The results is deliver in a straightforward JSON format. We have not yet begun to download data from Facebook systematically.

Robocare hoax?

Danish newspapers boast large adds from Robocare seemingly marketing new high tech robots that can replace human care in homes for the elderly.

Speculations abound that the campaign is a hoax, see for example the Amino blog. The contact address is fake and the company website is owned by a media company Wasabi.

Today I checked the Robocare press release pdf-file. The owner of the document is Mette Stuhr from Wasabi, so there maybe something to this hoax theory…Wasabi is used by the Danish Union FOA organizing personnel in the public care sector … and maybe robot hijacker?

June 6, 2010:  Social workers’ union FOA has launched a second wave of ads confirming the ownership of  Robocare.

BP’s Global PR spill

According to a recent post at The Brandbuilder Blog BP has been unable to control  the Global PR leak in Twitter and seems to have accepted the impersonator. The BP brand – including logo – is hijacked and turned into a bitter joke featuring tweeds like: Proud to announce that BP will be sponsoring the New Orleans Blues Festival this summer w/ special tribute to Muddy Waters.

Sentiment analysis

One part of large scale reputation management is to be able to automatically assess the public opinion. We believe that we need automated tools to analyse the myriads of statements found in social media. Luckily the natural language processing research community has been focused on this problem in recent years.

The area is usually referred to as sentiment analysis or opinion mining, and covers tasks such as detecting sentences that utter opinions, assessing whether an opinion is positive or negative, and summarization of opinions. Bo Pang has written a nice survey of the field: Opinion mining and sentiment analysis, which is freely downloadable.

The field has recently moved to new and important tasks such as  sarcasm detection, which was presented in a paper last week at the International AAAI Conference on Weblogs and Social Media.

WeMedia

we_media WeMedia is a report published in 2003 commissioned by The
American Association Press Institute. It shows
the relationship between traditional media information and audience
in the Internet Era.

Does reputation Matter- Value Destruction following the Tiger Woods Scandal

In recent years there have been a lot of publications claiming to establish a link between reputation and monetary values. A number of publications have suggested that changes in corporate reputation have a noticeable effect on financial values. However such studies are prone with difficulties. For example, it is often difficult to establish to what extent price movements on financial markets depend on changes in reputation and to what extent they depend on other factors. The ideal case is an event where something significant happens and where it is easy to established the ’before’ and ’after’. Recently two American economists have conducted such an analysis on the aftermath of the Tiger Woods scandal. Comparing the performance of the shares of the companies that sponsored Tiger Woods with average returns for the market as a whole and those of competitors, they estimate that Tiger Woods behaviour created a negative reputational asset that managed to destroy $ 5 to 12 billion of shareholder wealth in the period between Tiger Wood’s car crash and his announcement of an ‘indefinite leave’ form golf. The paper is available here, enjoy !

Morgan Stanley Internet Trends: The Role of Mobile Apps

Human Randomness

Free will is considered a hallmark of humanity. The ultimate exercise in free will is to create true randomness: You decide when and nobody can predict you. Can humans be random if they want to?

Mathematically randomness is defined as a process with some degree of unpredictability. The simplest random process being so-called white noise processes in which there is no information in the present state about the future:  Prob( future | past )= Prob( future ). The statement reads: The probability of the future given the past is equal to the probability of the future.

In physics true randomness only takes place at the quantum level, where e.g. the decay of an unstable isotope is  modeled as a random process. So-called deterministic chaos  ‘looks unpredictable’; except if you happen to know the underlying dynamics and can compute with very high precision. High-dimensional chaotic processes, e.g., systems at a temperature, are best approximated as random, although the unpredictability here may really be a result of limited memory or computer power. When we physically flip a coin the assumption is that the dynamics is so complicated and high-dimensional (requires massive computation) that we can rely on the outcome being unpredictable.

Randomness is often used to simulate complex processes in computers. So-called Monte Carlo processes are getting more and more popular for evaluation of important high-dimensional integrals in Bayesian statistics. In the computer pseudo-random numbers are typically created by deterministic chaos-like algorithms.

Why would a human aim for randomness?  Well, unpredictability can be useful in certain types of games like in poker – when to bluff?  Or in football – where to place the penalty kick? Generally speaking, whenever we want keep our intensions secret to opponents, we would like to be random. Although useful when there is a conflict of interest, there is nothing inherently un-ethical about randomness, c.f., the most ethical deed of all, the  ‘random act of kindness’. But the problem facing when we want benefit a random good cause is that mostly there is no possibility to flip a coin. So the question is: Can humans flip a mental coin and exercise ultimate free will? The short answer is no!

Humans are not really good at randomness. In a 1972 meta-analysis by carried out by Wagenaar, 14 out of 15 studies report significant deviations from randomness in human generated random sequences. A typical problem facing the free willing human is the so-called negative recency effect. Our random sequences are too regular. Imagine that you are asked to place ten points at random positions in an interval. Your solution would show negative recency if you choose to put them at roughly equal distance. It is like you want them to be too random. But in a real random Poisson point process, some points should be quite close and other more distant. Another problem for human randomness, is lack of memory and boredom which both lead to an over abundance of repeated patterns.

It seems it takes a superhuman effort to produce true randomness. In Daniel Dennett’s book ‘Freedom Evolves’ free will is based on un-consciuous randomness and unequivocally denounced as an illusion. While human freedom is indeed rapidly evolving, Dennett can only allow the notion of a free will to be considered a useful metaphor or model of human behavior. Our in-ability to actively produce  randomness can be seen as another manifestation of Dennett’s insight.