Insights

Algorithms in search of missing persons

Centre for Innovation – Leiden University
Author: Centre for Innovation – Leiden University

The International Commission on Missing Persons (ICMP) manages an online inquiry centre which makes it possible for families, government authorities, forensic professionals and others to access and provide information related to missing persons cases.

Due to the conflict in Syria, increasingly more missing persons from there are being reported in the ICMP online inquiry centre. Currently tens of reports are submitted each week, with an expected increase over the coming months.

Each request is currently checked and validated manually by ICMP staff. Therefore, the question arose: can (parts) of the validation process be supported by automated procedures?

Technical feasibility study

In the partnership between ICMP and Centre for Innovation at Leiden University (CFI), a technical feasibility study was conducted by the CFI and a prototype code was developed to explore whether and how facial recognition software in relation to social media (facebook and twitter) can support with the validation of missing person reports.

Whenever somebody reports a missing person they can upload a photo of that person. This study checked if the use of facial recognition software can validate the information given about the person.

Moreover, if the social media account details of twitter and facebook are being collected, can these be used to monitor the accounts? This aims to detect changes which might display activity of the missing person.

3 hypotheses on social media face verification and activity monitoring

Our study posed three hypotheses as follows:

Hypothesis 1: Existing facial recognition software can be used to verify the face, gender and age of a person.

Hypothesis 2:Social media (twitter and facebook) accounts of missing persons can be monitored and alerts produced to monitor (possible) activity of the person reported missing.

Hypothesis 3: When there is no photo added to the missing person report, available photos from social media accounts (twitter and facebook) can be used to validate age and gender of the missing person through facial recognition.

This ultimately contributes to validating submitted information about the missing person.

Aims and planning

In testing the three hypotheses on how publicly available data from social media can be combined with facial recognition to aid the work of the ICMP, the aim of the partnership between ICMP and CFI was analysis of key components of current facial recognition software, as applied to the needs of the ICMP. We did this by manufacturing a dummy dataset which would put the facial recognition software to the test and identify potential risks. This project had a time-frame of 3 months between August and November 2018.

Subsequently, we discussed in workshops with ICMP staff the results of the technical feasibility study and possibilities in integrating the implementation of facial recognition in ICMP systems. The code of the prototype was made available to ICMP.

Highlights and outcomes

Analysis of existing face recognition software

We utilised three commercially available facial recognition softwares from Microsoft, IBM and Clarifai in order to conduct a test on our dummy dataset. The technology was tested insofar as it applies to the needs of the ICMP, rather than an exhaustive test of the softwares themselves.

A small test was conducted using 20 images of individuals with known age and gender to compare the output from the three softwares.

  • One image could not be processed by both Microsoft and Clarifai. Although IBM got the gender of that image correct, it was 23 years off on the age. That particular image was most likely difficult to process due to the unusual angle at which the individual was photographed in.
  • IBM performed best on age being on average 5.9 years off the correct age. Microsoft and Clarifai were both 7.3 years off on average.
  • IBM got all genders correct but Microsoft and Clarifai got one wrong

There are a few output differences between these softwares

  • Microsoft gives a predicted age and gender
  • IBM gives a minimum and maximum age with a probability and a gender with a probability
  • Clarifai gives a probability distribution for both age and gender

After these three softwares were utilised and the usefulness of this technology was clear, we sought an offline solution where the images are not sent to external servers but kept on a machine controlled by the user (ICMP). We conducted a survey of current open source projects and found a suitable candidate. The original project is located at https://github.com/BoyuanJiang/Age-Gender-Estimate-TF

Then we ran a re-run test with the software that can be locally implemented on ICMP servers.

  • On average it was off by 3.2 years for age
  • All genders were correct
  • The image that troubled the commercial APIs also failed in this implementation

This implementation only gives the age and gender prediction (but no likelihood of that prediction)

Analysis of existing face comparison software

In follow-up discussions with the ICMP development team, they asked if the ability to do facial comparisons of images could also be included. Of the three commercial softwares used so far, only one, Microsoft, offered this solution. However, it required storing images on their servers. After another internet search for open source projects, one was found. The original project can be found at https://github.com/ageitgey/face_recognition.

Again, this allowed performing the data processing on ICMP’s own local server without compromising privacy standards.

Finding comparison images of the individuals already tested for gender and age, the comparison algorithm was tested and found to select the correct individuals almost always. The only time it could not give a correct result was when the person in question was side facing which is a limitation of this algorithm.

The algorithm bases its decision on a distance metric. This means that taking one image and comparing it against a selection of images will only give a positive response if the two images are similar enough. The sensitivity can be controlled by a distance threshold and the authors recommend a specific value.

Name to gender comparison

Another additional feature added during development was a name to gender comparator. A list taken from kaggle.com (a machine learning competition website and data repository) of name and corresponding gender was used to allow the application to suggest if a person’s given name is likely to correspond to a particular gender. The gender-name list was created by taking literary references of names since the year 1880 from google books.

During the workshop with ICMP staff, we developed an additional feature to the prototype code to allow this list to be added to when a name is not present. It is also possible for the ICMP team to add pre-existing lists of names with gender to the comparator.

Risk mitigation

Identified risks and how they have been mitigated:

Risk 1: Privacy and security of existing face recognition tool (f.e: what is the rights to the photo’s? And how are they stored?)

Most commercial facial recognition software is cloud based, meaning that the picture is uploaded in the cloud and stored there.

Due to the sensitivity of the work ICMP does, the software needed to be able to run on their own servers so the photos did not have to go to the cloud and be (temporarily) stored by the commercial organisation.

For this reason we looked into open source software, which was found and was of similar quality as the  commercial solutions. This risk was thus mitigated.

Risk 2: Possibility to integrate face recognition software in ICMP tools (technical, capacity & resources).

Before travelling to Bosnia the CFI team deployed the application to one of their servers to learn any possible pitfalls during deployment. The only servers available to the CFI team are linux servers. Windows servers may not be shipped with all required video and audio libraries required by facial recognition algorithms. This can require a further building of the required libraries on the server. This is what was found to be the case while at the ICMP. The Windows server version was also not recent enough to deploy other virtualisation solutions.

Risk 3: Limits of face recognition software (f.e: there are free versions available, is there a maximum of use?).

As an open source solution is being used which ICMP can install locally there is no limit of the use of the face recognition or face comparison software.

Risk 4: Limits of the scrapability of Facebook and Twitter (f.e: Is it possible to scrape profile pictures?)

The main limit is that only publicly available profiles can be easily accessed to give a high quality images. Access to Twitter and Facebook requires varying levels of information provided to those companies. This should be kept in mind when using such services.

Risk 5: Validation – what is the interpretation if there is no match? (f.e. False negatives).

For this it is important to note that the tool is not there to fully automate the process of missing person report validation, but only to support this process. It can never fully take over the manual check of a person, but only offers guidance.

Conclusion

Hypothesis 1: Existing facial recognition software can be used to verify the face, gender and age of a person.

This hypothesis was confirmed. The closed and open source software tested worked well for adults. During testing on ICMP actual data, we noted that it that the error range was a lot higher when the picture was of a child. Assumed is that this is due to the lack of pictures of children available to train the software, bearing in mind the ethical dilemmas around using photos of children.

By using data from the current ICMP database to train the software, accuracy might be improved for adults and children.

Hypothesis 2:Social media (twitter and facebook) accounts of missing persons can be monitored and alerts produced to monitor (possible) activity of the person reported missing.

This hypothesis is partly confirmed. In the feasibility study we looked at twitter, and it is possible when you have the twitter handle of the person to check when a last message was posted or retweeted. For facebook this has not been tested in this feasibility study.

Hypothesis 3: When there is no photo added to the missing person report the, photos from social media accounts (twitter and facebook) can be used to validate age and gender of the missing person. If both Facebook and twitter have a picture of a person, face recognition can be used to see if it is the same person

This hypothesis was partly confirmed. This has been tested with twitter and it works. You can scrape a picture of twitter and then use the facial recognition software to validate age and gender.

A side note with this is that people don’t always have a picture of their face as a profile picture on twitter. This has not been tested for facebook.

Next steps

The collaboration between ICMP and CFI successfully highlighted a number of areas where algorithmic assistance to humans can be deployed in the validation process of missing persons. ICMP will integrate these processes as part of their form validation including image comparison, name-gender comparison and, age and gender suggester from images. Further development of the name-gender comparison could be carried out by allowing names in Arabic to be included. Although Twitter has been integrated into the application where both image retrieval and latest activity retrieval are possible, Facebook connections could also be further investigated to potentially provide the same functionality.

The success of this collaboration is to be followed-up with another project – Mass Grave Pattern Analysis. The aim of this project is to assist the ICMP in implementing an algorithm for suggesting likely areas that mass graves could be found.

Further collaboration can be explored in the future, potentially marrying current CFI projects with needs of the ICMP. CFI has a number of micro-services that could be utilised –  Text analysis (the core of the Chatbot micro-service) for possibly assisting in the analysis of news feeds, Rule Based Modelling which helps connect data analysts with domain experts by allowing domain experts to interact with the data and data models in an accessible way, and Cash Voucher tracking system which has social networking analysis at its heart.