4 Jul 2015

Potential deployments of video analytics, or my own preferred term “intelligent scene analysis,” are being worked on by some of the best minds at university campuses across the United Kingdom. Later this year I will visit Kingston University whose Digital Imaging Research Centre is one of the largest computer vision groups in Europe.

Violence prevention through analytical intelligence

Here, I talk to Prof David Marshall of Cardiff University, who describes work by PhD candidate Kaelon Lloyd in a thesis that will include comparison of datasets representing different types of violence seen within city street environments. The aim is to improve on current multiple action recognition algorithms, and the ultimate goal is to predict (and so prevent or minimise) violent incidents through analytical intelligence derived from scrutiny of CCTV footage of previous public disorder incidents.

Prof Marshall said: “Kaelon is finding that violent footage has more chaotic patterns, and we now have measures of texture from looking at repeating patterns and densities within an image. It’s the large abrupt changes that give us the classification of violence while the smoother, slower change patterns are associated with normality.”

Crowd dynamics

The computer vision department at Cardiff University is also considering crowd dynamics. While the faculty has not researched the event itself, Prof Marshall points to the depressing regularity of crowd management problems at the annual Hajj pilgrimage in October during which two million Muslims travel to Mecca. The loss of 1,424 lives to a stampede in 1990 and 363 fatalities in 2006 has seen the Saudi government turn to London-based Crowd Vision for alerts on high densities, pressure, turbulence, stop-and-go waves and anomalies. Crucially, the software reports on how people are moving in the most overcrowded areas and predicts how they may move next.

Research suggests that even the most adept security guards cannot sustain the vigilance required to observe more than one screen effectively. Sony has reported that the effectiveness of well-trained operators to notice an anomalous event requiring attention drops by 50% when they are asked to view a bank of nine monitors

Where does money fit in to this? It is cash-strapped councils (likely to face another round of spending cuts in the wake of the UK election result) who are tasked with providing CCTV footage of urban spaces, and police forces who have to respond to incidents spotted by video analytics or human intelligence. Explaining the concept of video analytics to a government auditor intent on imposing cuts would be a difficult task, and yet analytics holds out the prospect of either reducing wage bills for manned CCTV observation or freeing up those council staff to assist the public in other ways.

Inattentiveness of CCTV operators

University research in Johannesburg suggests that 23% of a sample (roughly half of whom were employed full-time in the CCTV sector) lost concentration in the first 30 minutes of observing footage on a video monitor. Prof Marshall is even more pragmatic when pointing to the wealth of research suggesting that the most adept security guards cannot sustain the vigilance required to observe more than one screen effectively. Sony has reported that the effectiveness of well-trained operators to notice an anomalous event requiring attention drops by 50% when they are asked to view a bank of nine monitors. A 2014 University of Portsmouth report draws the crucial distinction between failure to perceive or detect and failure to recognise significance within context.

"Managing city centre environments is usually achieved through partnerships between local councils, the police, the NHS and others… If this technology can prevent just one or two incidents escalating to the point where victims end up in hospital then we will have saved the NHS money", says Prof Marshall of Cardiff University

So do Prof Marshall and his research associates at Cardiff work solely with analytics at ‘the core’? Do they discount the vogue for intelligence within the camera itself that features in promotional literature for many camera manufacturers? “We certainly don’t discount edge intelligence, but the work we’re doing tends to be at the core and will continue to be. That’s not to say we’re not aware of current camera developments, and we acknowledge that the basic processing for our work isn’t particularly intensive.”

Managing city center environments

What is the possible route to market by which Cardiff University’s research will eventually be reflected in commercial video analytics offerings? “Ultimately the most significant end-users for the type of analytics that automates viewing of urban CCTV feeds will always be police forces. But the organizations footing the bill for the camera infrastructure that supports analytics are likely to be town and city councils. In the likely political climate after the General Election [in the United Kingdom], expenditure on municipal cameras will probably remain static with legacy and hybrid installations being the norm. This can only hold back the development of intelligent video surveillance.”

He continued: “Managing city centre environments is usually achieved through partnerships between local councils (who provide CCTV infrastructure), the police (who are on the ground managing people), the NHS (who attend incidents) and others. We estimate that a typical violent episode costs these partners around £33,000 (about $50,000), and even medium-sized cities can see 20 or more such incidents on a typical Friday night.

“If this technology can prevent just one or two incidents escalating to the point where victims end up in hospital then we will have saved the NHS money. The question is, would the outlay in bringing additional technology into the field represent value for money? This is an important step in this translational research stream and one that we are keen to address.”