Scientists Analyse Millions of News Articles

A study led by academics at the University of Bristol’s Intelligent Systems Laboratory and the School of Journalism at Cardiff University have used Artificial Intelligence (AI) algorithms to analyse 2.5 million articles from 498 different English-language online news outlets over ten months. The researchers found that:

  • As expected, readability measures show that online tabloid newspapers are more readable than broadsheets and use more sentimental language. Among 15 US and UK newspapers, the Sun is the easiest to read, comparable to the BBC’s children’s news programme, Newsround, while the Guardian is the most difficult to read. Sports’ and ‘Arts’ were the most readable topics while ‘Politics’ and ‘Environment’ were the least readable.
  • The Sun is also the most likely to use adjectives with sentiment, while the Wall Street Journal uses the fewest emotional adjectives.

    CAPTION:Comparison of newspapers based on their readability and linguistic subjectivity.

    CAPTION:Comparison of topics based on their readability and linguistic subjectivity.
  • The study found that men dominate the contents of newspapers. The ranking of topics based on the gender bias of the articles found Sport and Financial articles were the most male biased, with sports news mentioning men eight times more often than women. Fashion and Arts were the least biased, with Fashion articles being one the few topics featuring equal proportions of men and women.

    CAPTION:Comparison of topics based on the ratio of male over female names that are mentioned.
  • The most appealing topics to online readers were Disasters, Crime, and the Environment while the least appealing topics were Fashion, Markets and Prices. The researchers also found that the popular articles tend to be more readable and more linguistically subjective.

    CAPTION:Comparison of topics based on their popularity.

Nello Cristianini, Professor of Artificial Intelligence at the University of Bristol, speaking about the research, said: “The automation of many tasks in news content analysis will not replace the human judgement needed for fine-grained, qualitative forms of analysis, but it allows researchers to focus their attention on a scale far beyond the sample sizes of traditional forms of content analysis.”

Professor Justin Lewis, Head of the School of Journalism, Media and Cultural Studies at Cardiff, said that “even some of the more predictable finding give us pause for thought. The extent to which news is male dominated shows how far we are from gender equity across most areas of public life. The fact that articles about politics are the least readable might also explain widespread public disengagement.”

Notes to editors:

Paper: Research methods in the age of digital journalism, Ilias Flaounas, Omar Ali, Thomas Lansdall-Welfare, Tijl De Bie, Nick Mosdell, Justin Lewis and Nello Cristianini, Digital Journalism, published online ahead of print 01 Nov 2012.

A copy of the paper is available for download from the following URL:

Professor Nello Cristianini is supported by the PASCAL 2 Network of Excellence, and the CompLACS FP7 project.

About the Intelligent Systems Laboratory (ISL)

The University of Bristol’s Intelligent Systems Laboratory (ISL) is in the Merchant Venturers School of Engineering. The University has a long tradition of excellence in Artificial Intelligence, with research groups in Engineering dating back to the 1970s and 1980s. Research activities at the ISL include foundational work in machine learning (many of the ISL members work in this central area of research), and applications to web intelligence, machine translation, bioinformatics, semantic image analysis, robotics, as well as natural intelligent systems.