The rapid spread of COVID-19 in the spring of 2020 brought a massive shock to societies and economies across the world. As governments began to realise the scale of the threat, they instituted widespread measures of social control: lockdowns, quarantines and enforced social distancing. In the UK, the government announced a national lockdown on March 23rd, followed by the quarantining of overseas travellers on June 8th. China acted much more quickly, imposing a full lockdown on the city of Wuhan on January 23rd and quarantining travellers from hotspot countries from March 3rd onward.
Similar measures in nearly a hundred countries meant that by early April, around half of the world’s population found themselves under lockdown, ushering in what seemed like environmental miracles in Venice and New Delhi. This global exercise in social confinement, unprecedented in scope, reflects an almost universal consensus: that the best short-term solution to controlling the virus is to control the people through which it can spread.
However, monitoring large populations is no easy task. This is especially true in countries such as the UK where police forces face budget cuts, exacerbated by the pressure the pandemic has placed on public finances. These budgetary pressures, along with the difficulty of deploying law enforcement to monitor vast populations, has put a premium on automated systems that are able to gather copious amounts of information quickly and cheaply.
This is where artificial intelligence (AI) comes in. When coupled with extensive surveillance systems, such as those that keep a watchful eye over London, Moscow or Beijing, AI-powered software seems capable of enforcing social control more efficiently than any other technology. As such, the pandemic has given a new impetus to the development of AI systems with applications in social control, a process already well under way in the UK and Russia, as well as in China.
With practically all of these systems being developed by private firms, open-source information on their inner workings is sparse. However, this article will attempt to take a closer look at how AI has been applied to the specific set of tasks that these systems perform. In so doing, it seeks to examine the techniques involved and the potential pitfalls that these systems may face.
The harnessing of AI to enforce social control is truly an example of innovation – but what will be its consequences?
One of the most basic elements of social control is identifying individuals who are likely to have been infected. This is particularly important when it comes to airport travellers who may import the virus into the country from abroad. It is therefore not surprising that a number of airports have installed thermal imaging technology to identify individuals with a high temperature, one of the main symptoms of COVID-19.
As early as April, Bournemouth Airport endeavoured to become the first UK airport to screen passengers using this method. Before long, Heathrow followed suit, conducting trials of thermal screening equipment in the immigration halls of Terminal 2. With such systems in place, high-temperature individuals could be intercepted and quarantined if need be.
The system installed at Bournemouth Airport, developed by British IT company SSC, combines infrared cameras with face detection technology. By honing in on human faces, the system can avoid classifying other high-temperature objects, such as a hot beverage, as a person with a fever.
This combination of thermal imaging and facial detection can be found in several systems deployed in China. SenseTime, an AI company specialising in computer vision, has developed similar technology that can “detect a fever within an accuracy of 0.3°C” and additionally “identify any individual who is not wearing a face mask, with a success rate over 99%”. Its system, which has been deployed in Beijing’s airport and subway, is capable of screening up to ten people per second and sends an instant notification to security personnel when a person is spotted not wearing a mask. Guide Infrared, China’s largest provider of thermal imaging systems, has deployed similar “AI-activated facial detection technology” to screen for fever in the Wuhan Metro. As we can see, these systems are widespread – but how do they work?
Face detection methods have been around for several years, nowadays finding use in smartphone cameras and Snapchat filters. They range from more hard-coded techniques (e.g. classifying faces according to a predetermined set of knowledge-based rules) to machine learning algorithms such as artificial neural networks and support vector machines (Box 1). “Machine learning” algorithms are capable of finding patterns in copious amounts of data to effectively “learn” how to execute a certain task – such as face detection – sidestepping the need for a human to specify the steps required to fulfil that task. Judging by their claims, it is probably these algorithms that are employed by SSC, SenseTime and Guide Infrared.
Box 1: Machine Learning
Machine learning algorithms can typically be separated into two groups: those that undergo “supervised learning” and those that undergo “unsupervised learning”. As a subset of artificial intelligence, the benefit of machine learning is that no human has to explicitly spell out the instructions that the algorithm has to follow in order to convert a certain input into a desired output (e.g. if the input is an image, the output may be a determination of whether there is a human face present or not). Rather, these algorithms can be “trained” to fulfil a certain task. By being presented with numerous examples of ideal inputs and outputs (i.e. a large number of images, each matched with their corresponding correct output), the algorithm can eventually adapt itself to conduct whatever task is set by its programmers. During training, the algorithm seeks to minimise a “loss function” that is manually set. The choice of loss function determines the task to be fulfilled.
Moreover, the problem is one of multiple face detection; the goal therefore is not to map an image onto a set of coordinates that define the location of the face, but to identify sub-regions of the image that contain a face. One of the most widely used methods for this, up to the present day, is a machine learning technique developed by Paul Viola and Michael Jones in 2001. As neatly explained by Dr Mike Pound of Computerphile, the algorithm essentially takes an image and computes a set of numbers (“features”) from its pixel intensities, which it then uses to determine the presence of a face. Different features are obtained via different calculations from these pixel values. The features are pre-set in advance, yet their numerousness makes it extremely inefficient to calculate them all for any particular image (using Viola and Jones’ method, a 24x24 pixel image would yield 180,000 different features).
Selecting which of these features to use is where the “machine learning” aspect comes in. The principle is that some features are more pertinent to the detection of a face than others. Therefore, a “weak” detection algorithm that is trained to use only one of these features will still eventually achieve poor yet better-than-random performance, and furthermore, this performance will depend on which individual feature is chosen. It would then be possible to conduct a selection procedure that picks out those features most relevant to face detection (for selecting features, Viola and Jones used a variant of the AdaBoost method originally developed by Freund and Schapire in 1995). This can be done with help of a support vector machine (Box 2). By utilising only these features, it is possible to construct a face detection algorithm that is simultaneously accurate and efficient. By scanning across different sub-regions of an image, the algorithm could then specify those locations at which a face is present.
Box 2: Support Vector Machines
The detection of a face in an image sub-region was treated as a binary classification task – designating a sub-region as belonging either to the positive (“contains a face”) category or to the negative (“does not contain a face”) category. The algorithm used was a support vector machine (SVM), a kind of machine learning technique (18). SVMs operate on data in which each datapoint can be expressed as a set of coordinates in N-dimensional Euclidean space. The algorithm then finds the optimal hyperplane in that space that most clearly separates data-points of the two categories. An SVM that operates on one single feature – yielding a one-dimensional hyperspace – would therefore find the optimal threshold value above which a datapoint would be classified into one category, and below which it would be classified into the other.
It is likely that similar algorithms are used in the fever detection technologies deployed in London and Beijing, and from all appearances it looks like they perform rather well. However, there are still certain drawbacks to this method for combatting COVID-19. Of course, fever detection systems can only identify infected persons if they do in fact have a fever – which would not be true for pre-symptomatic or asymptotic individuals. Moreover, there is always the potential for false positives – as SSC itself admits, “the cause of the high temperature is not always immediately clear. While high temperature could indicate a fever, there are other reasons why a person may be hotter than normal, for example, if they have [to] run to catch a flight or cycled to work”. As such, the system is “not a silver bullet and should be deployed alongside robust safety protocols”.
Incidentally, I realised when passing through Singapore’s Changi Airport in early December that the most mundane of objects can obstruct the thermal cameras’ view of your forehead. In my case, it was my fringe, which I was told by staff to lift up at every camera checkpoint. Probably fine for a small trickle of travellers passing by, but what happens when one is faced with a crowd of people on whom the pandemic has inflicted ever-elongating hairlines?
As can be seen, the use of AI in fever detection is interesting but as yet limited in its potential. However, when paired with face recognition technology, face detection allows for more wide-ranging capabilities.
With more than 160,000 cameras watching 12 million inhabitants, Moscow has one of the most extensive surveillance systems in the world. Since January 2020, at least 105,000 of these cameras have been augmented with facial recognition technology. Boosting the capability of authorities to track citizens’ movements, this system was apparently originally designed to catch criminals – yet recently it has found a new target: people who have been put under quarantine.
By February 21st, around 2,500 travellers arriving in Moscow from China had been placed under two-week quarantine in their homes or hotels, according to Mayor Sergei Sobyanin. He also added that “automated facial-recognition systems and other technology will constantly monitor [their] compliance” during this period. What does this mean? For 32-year old Vladimir Bykovsky, this meant that police turned up at his door thirty minutes after he stepped out of his building to throw out the trash – having just returned from South Korea. As it turns out, the authorities had been alerted by a camera set up on the front door of his apartment building. This surveillance system, developed by Russian AI firm NtechLab, identified more than 200 people who broke their self-isolation regimes in January alone.
Face recognition systems are distinct from face detection systems in that the former must be able not only to determine that a face is present, but also to identify whose face it is. The most widely used tool for facial recognition is the Convolutional Neural Network (CNN), a type of neural network architecture (Box 3). CNNs have been around since the 1990s, yet it was only in the 2010s that a string of developments cemented their position at the forefront of image recognition technology.
Box 3: Neural Networks
A neural network consists of a series of layers of “artificial neurons” – units that receive information from units of the previous layer and transmit information to units of the subsequent layer. This resembles, in an extremely abstract way, the axon-dendrite structure of a biological neuron. The network as a whole operates by relaying and transforming information from an “input layer” to an “output layer”. The manner that information is transmitted from one layer to the next is dependent on the “weights” of the connections between units of the two layers. It is these weights that are modified during training, such that inputs are transformed into outputs in a way that fulfils the stated goal of the algorithm.
In 2012, Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton entered their CNN model into the ImageNet Large Scale Visual Recognition Challenge, an annual image recognition competition in which algorithms are trained on 10,000,000 labelled images depicting more than 10,000 object categories (fish, utensil, flower etc.). The goal was to classify an unseen set of images into their correct categories; for every image, the algorithms calculate a score for each category, and the objective is to assign the highest score for the correct category. Krizhevsky, Sutskever and Hinton’s CNN model was able to achieve a winning top-5 test error rate of 15.3% – this is the proportion of times the correct category did not feature in the five categories with the highest score. The second-best entry achieved 26.2%.
CNNs accomplish this kind of performance by encoding images into points in a low-dimensional Euclidean space (“low” here could mean a 256- or 512-dimensional space, but this is much lower than the dimensionality of a typical image, which is equivalent to the number of pixels it has). If the task is to classify images into different categories, the encoding procedure will be incentivised during training to cluster together images that correspond to a particular category. Hence, the task determines how images are encoded in this space.
One way of leveraging a CNN to conduct face recognition is to train it to cluster together the face images of a particular person in this encoding space. Ideally, images of one person’s face will cluster closely together while remaining distant from images of any other person’s face. This can be achieved using a “triplet loss function” (see Box 1) that penalises proximity between images of different people’s faces and incentivises proximity between images of the same person’s face. The CNN effectively “learns” to cluster different images of the same face together. Hence, when using face recognition to unlock a smartphone, the camera only has to register one’s face a small number of times before it becomes capable of recognising it at a variety of angles. Images of the same face at a multitude of angles and levels of lighting will still tend to cluster together in the encoding space – hence, by calculating the proximity of a new image with clusters of known individuals, the algorithm is effectively able to “identify” the person in the image. This is why travellers landing in Moscow were photographed three times before being entered into a watchlist.
The combination of face detection with face recognition allows for mass surveillance and tracking of individuals within an urban space. Such projects were already underway before the pandemic arrived in force. For instance, in February, London’s Metropolitan Police deployed van-mounted cameras to scan the faces of thousands of shoppers at the Stratford Centre complex, on the prowl for “people who are wanted for serious criminality”. The Met had been trialling its live facial recognition technology from 2016-2019 and eventually decided to operationalise it in the face of some criticism.
According to an independent report on these trials from Essex University’s Human Rights, Big Data and Technology Project, the system’s alerts were accurate only about a fifth of the time. Over the course of test deployments, the system generated 38 watchlist matches that were verified by police officers. Of these, only 8 were correct – that’s about 21%. The findings are ambivalent. On one hand, the system had actually shown itself to perform remarkably well; 21% is far higher than the percentage of watchlist individuals actually present in the general population – therefore, the system performs significantly better than random. However, one always has to weigh this against the inconvenience caused to the 79% who may be stopped or raided by police as the result of a mis-identification. With millions of faces to scan, an algorithm that is rarely wrong will still register some false positives.
This problem makes it dangerous to over-rely on these systems. After all, who could forget the case of 42-year-old Robert Julian-Borchack Williams, who was held by Detroit police for thirty hours after a face recognition system wrongly decided he was the man who had shoplifted $3,800 from an upscale boutique store. The fact that Mr Williams was a black man highlights another long-running criticism of face recognition algorithms: they may be susceptible to racial bias.
In December 2019, the US National Institute of Standards and Technology (NIST) released a report surveying the performance of 189 algorithms from 99 developers – a solid representation of the industry as a whole. Even though the performance of different algorithms varied dramatically, the report found that in general, there were “higher rates of false positives for Asian and African American faces relative to images of Caucasians … rang[ing] from a factor of 10 to 100 times, depending on the individual algorithm”. This was for one-to-one matching, where there is only face to verify against, such as when unlocking a smartphone. For one-to-many matching, such as when checking against a watchlist, “the [NIST] team saw higher rates of false positives for African American females”, testing against an FBI database of 1.6 million mugshots. This has serious implications for the use of such technology in finding criminals.
However, the report notes that “there was no such dramatic difference in false positives in one-to-one matching between Asian and Caucasian faces for algorithms developed in Asia”, in contrast to those developed in the US. Lead author Patrick Grother suspects that this more equitable outcome may be achieving by using more diverse training data, with a sufficient quantity of examples for each ethnic group. This suggests that race, gender, age and other biases present in face recognition technology could be mitigated by paying special attention to the training data – though they may be impossible to purge completely (Box 4).
Box 4: Training Datasets
In machine learning, the performance of a trained algorithm can depend heavily on the dataset on which the algorithm was trained. Biases in an algorithm’s performance – for example, between different ethnic groups or genders – may reflect biases inherent in the training data. To take a crude example, if the representation of minority groups is lower than the representation of Caucasians in a face recognition dataset, the algorithm would generally perform better on Caucasians, given that it has “seen” more examples of this kind. These biases may be caused by something as mundane as choosing to take a random sample from the general population. Since minority groups do in fact account for a minority of the population, this algorithmic bias could be achieved without any personal bias on the part of the programmer.
The increasingly widespread use of this technology to conduct social control in places such as Russia and China inevitably brings new risks. However, as far as Russia is concerned, face recognition is far from the only task to which AI has been applied in dealing with the pandemic.
By March, Moscow had established a “coronavirus information centre”, integrating surveillance technology and artificial intelligence to aid authorities in keeping the pandemic under control. Deputy prime minister Dmitry Chernyshenko shed light on one of its most intriguing functions when he revealed that “the centre … also effectively fights so-called “fakes” and rumours”, mentioning false social media report that 32 people had been killed by the virus. He added that “artificial intelligence, the neural networks that were trained to do a semantic analysis, identified the anomaly”. By flagging and eliminating such “fake news”, the government could effectively grant itself a monopoly over the information shown to the wider public.
The Russian authorities are far from alone in this endeavour. Combatting misinformation has been a constant issue for social media giants such as Facebook and Twitter. Their platforms have helped spread vast amounts of information on COVID-19 from dubious sources, fuelling the alarming “infodemic” that accompanies our pandemic of health. In August, Facebook revealed that between April and June, they took down 7 million posts spreading COVID-19 misinformation, slapping a warning note on an additional 98 million posts that were “misleading but not deemed harmful enough to remove”. Twitter adopted a similar approach in May, attaching a warning label to any content deemed “misleading”, “disputed” or “unverified”.
Intriguingly, back in May, Facebook revealed that it was using “AI systems” to speed up the search for misleading content – in particular, convolutional neural networks. The idea is that once a fact-checking human deems a particular post to be misleading, the network could scan any images found in the post and then trawl the feeds for duplicates of those images. When it finds one, the algorithm attaches a warning label to the post.
However, the system in Russia seems to work somewhat differently – from what little has been revealed about it. Chernyshenko mentioned that it carried out “semantic analysis”, which would seem to place the system in the realm of “natural language processing” (NLP), a form of AI that focuses on the interpretation of human language. There are various methods that can achieve this task, one of them being a variant of the neural networks encountered earlier called a “recurrent neural network” (RNN) (Box 5).
Box 5: Recurrent Neural Networks
These neural networks are particularly well-suited for analysing and processing sequential data (i.e. stock market fluctuations, weather data, human language). The basic idea is that, if our data is organised sequentially, such that each datapoint is associated with a timepoint (t0, t1, t2, …), the network accepts the input at each timepoint in a consecutive manner. At t0, the network accepts the input at t0. At t1, the network accepts the input at t1, but also combines this with whatever it outputted after processing the data at t0. In this way the network can “remember” past timepoints when analysing later ones. After the final timepoint, the network will have seen the entire sequence, and can therefore render an analysis of it.
RNNs and their variants have been used for a variety of tasks involving sequential data, such as the detection of cardiac arrhythmia from electrocardiogram readings. One study has even found that RNNs could be employed to predict the lineage choice of hematopoietic stem cells “up to three generations before conventional molecular markers are observable”. RNNs have also displayed a flair for processing human language, as indicated by their use in “intelligent chatbots” and Google Translate.
Among this linguistic repertoire is a capacity for “sentiment analysis”, the classification of an item of text into one of several categories depending on its content. In one example, Dan Li and Jiang Qian carried out a study of the capability of RNNs to classify movie reviews as either “positive”, “negative” or “neutral”, achieving an accuracy rate of nearly 90%.
The following is an example from the “positive” category:
i loved this movie from beginning to end. i am a musician and I let drugs get in the way of my some of the things i used to love (skateboarding, drawing) but my friends were always there for me. music was like my rehab, life support, and my drug. it changed my life. i can totally relate to this movie and i wish there was more i could say. this movie left me speechless to be honest.
A “negative” review might sound like this:
This german horror film has to be one of the weirdest i have seen. i was not aware of any connection between child abuse and vampirism, but this is supposed based upon a true character. like i said, a very strange movie that is dark and very slow as werner pochath never talks and just spends his time drinking blood.
This is certainly not a straightforward task – whereas some snippets of text might be dependable give-aways (“I loved this movie”), the algorithm ought to be able to handle grammatical errors (“this is supposed based upon a true character”) and unknown words (who – or what – is “werner pochath”?). These difficulties place a premium on the ability to understand context, and this is one of the true challenges of natural language processing. Yet, if an algorithm such as an RNN is able to classify movie reviews, what’s to stop someone from training one to detect misinformation?
Such systems may be of valuable help in combatting potentially dangerous falsehoods concerning COVID-19. However, it is important to remember that “misinformation” is, in a general sense, whatever those in power determine it to be. This holds as much for political “truths” and “untruths” as it does in the medical realm – especially in polities that explicitly police the flow of information.
For instance, in April 2020, Chinese authorities announced that they had shut down more than 150 social media accounts “carrying articles suggesting that some neighbouring countries long to be reunited with China”. One such article, claiming that the people of Kazakhstan were eager to “return to China”, spurred the Kazakh foreign ministry to issue a complaint to Chinese ambassador Zhang Xiao. Hence, the decision to remove the misleading articles was motivated by a desire to preserve ties between China and Kazakhstan, a political aim.
The COVID-19 pandemic has accelerated a trend that was in place before and shall remain long after the crisis has passed. In dealing with this catastrophe, governments worldwide have implemented methods of social control, whether it’s control of movement, as in a quarantine, or the control of thought, as in the myriad efforts to combat misinformation. Many of these techniques are easily generalisable to fields outside the monitoring of public health – a fact that should keep us wary of how they will be employed. It would be all too easy for governments to summon the fear of COVID-19 to expand measures of social control that affect us in more ways than just keeping us safe from disease. It is up to us to determine where the boundaries should lie.