eprintid: 3555 rev_number: 8 eprint_status: archive userid: 69 dir: disk0/00/00/35/55 datestamp: 2016-10-04 12:24:38 lastmod: 2016-10-04 12:24:38 status_changed: 2016-10-04 12:24:38 type: article metadata_visibility: show creators_name: Zhou, Wei-Xing creators_name: Ranco, Gabriele creators_name: Bordino, Ilaria creators_name: Bormetti, Giacomo creators_name: Caldarelli, Guido creators_name: Lillo, Fabrizio creators_name: Treccani, Michele creators_id: creators_id: gabriele.ranco@imtlucca.it creators_id: creators_id: creators_id: guido.caldarelli@imtlucca.it creators_id: creators_id: title: Coupling News Sentiment with Web Browsing Data Improves Prediction of Intra-Day Price Dynamics ispublished: pub subjects: QC divisions: EIC full_text_status: public keywords: Finance, Forecasting, Twitter, Financial firms, Financial markets, Internet, Noise reduction note: WOS ID: WOS:000369527800026 abstract: The new digital revolution of big data is deeply changing our capability of understanding society and forecasting the outcome of many social and economic systems. Unfortunately, information can be very heterogeneous in the importance, relevance, and surprise it conveys, affecting severely the predictive power of semantic and statistical methods. Here we show that the aggregation of web users’ behavior can be elicited to overcome this problem in a hard to predict complex system, namely the financial market. Specifically, our in-sample analysis shows that the combined use of sentiment analysis of news and browsing activity of users of Yahoo! Finance greatly helps forecasting intra-day and daily price changes of a set of 100 highly capitalized US stocks traded in the period 2012–2013. Sentiment analysis or browsing activity when taken alone have very small or no predictive power. Conversely, when considering a news signal where in a given time interval we compute the average sentiment of the clicked news, weighted by the number of clicks, we show that for nearly 50% of the companies such signal Granger-causes hourly price returns. Our result indicates a “wisdom-of-the-crowd” effect that allows to exploit users’ activity to identify and weigh properly the relevant and surprising news, enhancing considerably the forecasting power of the news sentiment. date: 2016 publication: PLOS ONE volume: 11 number: 1 pagerange: e0146576 id_number: doi:10.1371/journal.pone.0146576 refereed: TRUE issn: 1932-6203 official_url: http://doi.org/10.1371/journal.pone.0146576 referencetext: King G. Ensuring the data-rich future of the social sciences. Science 331, 719–721 (2011). doi: 10.1126/science.1197872. pmid:21311013 Vespignani A. Predicting the behavior of techno-social systems. Science 325, 425–428 (2009). doi: 10.1126/science.1171990. pmid:19628859 Bonanno G. et al. Networks of equities in financial markets. Eur. Phys. J. 38, 363–371 (2004). doi: 10.1140/epjb/e2004-00129-6. Caldarelli G. Scale-Free Networks: complex webs in nature and technology (Oxford University Press, Oxford, 2007). Tumminello M., Aste T., Di Matteo T. & Mantegna R. N. A tool for filtering information in complex systems. P.N.A.S. 102, 10421–10426 (2005). doi: 10.1073/pnas.0500298102. pmid:16027373 Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.-H. & Liu, B. Predicting flu trends using twitter data. In Computer Communications Workshops (INFOCOM WKSHPS), 2011 IEEE Conference on, 702–707 (IEEE, 2011). Tizzoni M. et al. Real-time numerical forecast of global epidemic spreading: case study of 2009 A/H1N1pdm. BMC medicine 10, 165 (2012). doi: 10.1186/1741-7015-10-165. pmid:23237460 Caldarelli G. et al. A multi-level geographical study of Italian political elections from Twitter data. PloS one 9, e95809 (2014). doi: 10.1371/journal.pone.0095809. pmid:24802857 Malkiel B. G. & Fama E. F. Efficient capital markets: A review of theory and empirical work. J. Finance 25, 383–417 (1970). doi: 10.2307/2325486. Thelwall M., Buckley K., Paltoglou G., Cai D. & Kappas A. Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci. Technol. 61, 2544–2558 (2010). doi: 10.1002/asi.21416. Wang, C., Tsai, M., Liu, T., & Chang, C. Financial Sentiment Analysis for Risk Prediction. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, 802–808, (2013). Loughran T. & McDonald B. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Finance 66, 35–65 (2011). doi: 10.1111/j.1540-6261.2010.01625.x. Chan W. S. Stock price reaction to news and no-news: drift and reversal after headlines. J. Fin. Econ. 70, 223–260 (2003). doi: 10.1016/S0304-405X(03)00146-6. Tetlock P. C. Giving content to investor sentiment: The role of media in the stock market. J. Finance 62, 1139–1168 (2007). doi: 10.1111/j.1540-6261.2007.01232.x. Mao, H., Counts, S. & Bollen, J. Predicting financial markets: Comparing survey, news, Twitter and search engine data. Preprint arXiv:1112.1051 (2011). Lillo F., Miccichè S., Tumminello M., Piilo J. & Mantegna R. N. How news affect the trading behavior of different categories of investors in a financial market. Quant. Finance 15, 213–229 (2015). doi: 10.1080/14697688.2014.931593. Reis, J. C., Benvenuto, F., Vaz de Melo, P., Prates, O., Kwak, H., & An, J. Breaking the News: First Impressions Matter on Online News In ICWSM’15: Proceedings of The International Conference on Weblogs and Social Media, 2015, (2015). Ruiz-Martinez, J.M., Valencia-Garcia, R., & Garcia-Sanchez, F. Semantic-Based Sentiment analysis in financial news. In Proceedings of the First International Workshop on Finance and Economics on the Semantic Web (FEOSW 2012) in conjunction with 9th Extended Semantic Web Conference (ESWC 2012), (2012). Bordino, I., Kourtellis, N., Laptev, N. Billawala, Y. Stock trade volume prediction with yahoo finance user browsing behavior. In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, 1168–1173 (2014). Bank M., Larch M. & Peter G. Google search volume and its influence on liquidity and returns of german stocks. Fin. Mar. Port. Man. 25, 239–264 (2011). doi: 10.1007/s11408-011-0165-y. Bordino I. et al. Web search queries can predict stock market volumes. PloS One 7, e40014 (2012). doi: 10.1371/journal.pone.0040014. pmid:22829871 Preis T., Reith D. & Stanley H. E. Complex dynamics of our economic life on different scales: insights from search engine query data. Philos. T. R. Soc. A. 368, 5707–5719 (2010). doi: 10.1098/rsta.2010.0284. Kristoufek L. Can Google Trends search queries contribute to risk diversification? Sci.Rep. 3 (2013). doi: 10.1038/srep02713. Vlastakis N. & Markellos R. N. Information demand and stock market volatility. J. Ban. Fin. 36, 1808–1821 (2012). doi: 10.1016/j.jbankfin.2012.02.007. Zhang W., Shen D., Zhang Y. & Xiong X. Open source information, investor attention, and asset pricing. Economic Modelling 33, 613–619 (2013). doi: 10.1016/j.econmod.2013.03.018. Curme C., Preis T., Stanley H. E. & Moat H. S. Quantifying the semantics of search behavior before stock market moves. P.N.A.S. 111, 11600–11605 (2014). doi: 10.1073/pnas.1324054111. pmid:25071193 Cutler D. M., Poterba J. M. & Summers L. H. What moves stock prices? J. Port. Man. 15, 4–12 (1989). doi: 10.3905/jpm.1989.409212. Vega C. Stock price reaction to public and private information. J. Fin. Econ. 82, 103–133 (2006). doi: 10.1016/j.jfineco.2005.07.011. Tetlock P. C., Saar-Tsechansky M. & Macskassy S. More than words: Quantifying language to measure firms’ fundamentals. J. Finance 63, 1437–1467 (2008). doi: 10.1111/j.1540-6261.2008.01362.x. Schumaker R. P. & Chen H. Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM (TOIS) 27, 12 (2009). doi: 10.1145/1462198.1462204 Engelberg J. E., Reed A. V. & Ringgenberg M. C. How are shorts informed? Short sellers, news, and information processing. J. Fin. Econ. 105, 260–278 (2012). doi: 10.1016/j.jfineco.2012.03.001. Alanyali M., Moat H. S. & Preis T. Quantifying the relationship between financial news and the stock market. Sci. Rep. 3 (2013). doi: 10.1038/srep03578. pmid:24356666 Zhang Y. et al. Internet information arrival and volatility of sme price index. Physica A 399, 70–74 (2014). doi: 10.1016/j.physa.2013.12.034. Birz G. & Lott J. R Jr. The effect of macroeconomic news on stock returns: New evidence from newspaper coverage. J. Ban. Fin. 35, 2791–2800 (2011). doi: 10.1016/j.jbankfin.2011.03.006. Gross-Klussmann A. & Hautsch N. When machines read the news: Using automated text analytics to quantify high frequency news-implied market reactions. J. Emp. Fin. 18, 321–340 (2011). doi: 10.1016/j.jempfin.2010.11.009. Da Z., Engelberg J. & Gao P. In search of attention. J. Finance 66, 1461–1499 (2011). doi: 10.1111/j.1540-6261.2011.01679.x. De Choudhury, M., Sundaram, H., John, A. & Seligmann, D. D. Can blog communication dynamics be correlated with stock market activity? In Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, 55–60 (ACM, 2008). Mao, Y., Wei, W., Wang, B. & Liu, B. Correlating S&P 500 stocks with Twitter data. In Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research, 69–72 (ACM, 2012). Ruiz, E. J., Hristidis, V., Castillo, C., Gionis, A. & Jaimes, A. Correlating financial time series with micro-blogging activity. In Proceedings of the fifth ACM international conference on Web search and data mining, 513–522 (ACM, 2012). Bollen J., Mao H. & Zeng X. Twitter mood predicts the stock market. J. Comp. Sci. 2, 1–8 (2011). doi: 10.1016/j.jocs.2010.12.007. Bollen, J., Pepe, A. & Mao, H. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, 450–453 (2011). Zhang X., Fuehres H. & Gloor P. A. Predicting stock market indicators through Twitter “I hope it is not as bad as I fear”. Proc. Soc. Beh. Sci. 26, 55–62 (2011). doi: 10.1016/j.sbspro.2011.10.562. Zheludev I., Smith R. & Aste T. When can social media lead financial markets?. Sci. Rep. 4 (2014). doi: 10.1038/srep04213. Preis T., Moat H. S., Stanley H. E. & Bishop S. R. Quantifying the advantage of looking forward. Sci. Rep. 2 (2012). doi: 10.1038/srep00350. Preis T., Moat H. S. & Stanley H. E. Quantifying trading behavior in financial markets using Google Trends. Sci. Rep. 3 (2013). doi: 10.1038/srep01684. Moat H. S. et al. Quantifying wikipedia usage patterns before stock market moves. Sci. Rep. 3 (2013). doi: 10.1038/srep01801. Granger C. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37, 424–438 (1969). doi: 10.2307/1912791. Zipf G. Human Behavior and the Principle of Least Effort (Addison-Wesley, Cambridge, MA, 1949). Bormetti G. et al. Modelling systemic price cojumps with Hawkes factor models. Quant. Finance 15, 1137–1156 (2015). doi: 10.1080/14697688.2014.996586. Calcagnile, L. M. et al. Collective synchronization and high frequency systemic instabilities in financial markets. Preprint arXiv:1505.00704 (2015). citation: Zhou, Wei-Xing and Ranco, Gabriele and Bordino, Ilaria and Bormetti, Giacomo and Caldarelli, Guido and Lillo, Fabrizio and Treccani, Michele Coupling News Sentiment with Web Browsing Data Improves Prediction of Intra-Day Price Dynamics. PLOS ONE, 11 (1). e0146576. ISSN 1932-6203 (2016) document_url: http://eprints.imtlucca.it/3555/1/journal.pone.0146576.PDF