How Apple Recognizes People in Private Photos via Machine Learning


image
Louis Bouchard Hacker Noon profile picture

I explain Artificial Intelligence terms and news to non-experts.

In a recent publication, Apple explained how they used machine learning to directly recognize people in private photos on your iPhones and iPads without having access to your images to train their algorithms.

For those of you with Apple products, you can actually research by the person in the Photos app.

Indeed, using multiple machine learning-based algorithms that I will cover in this article, running privately on your device, you are able to accurately curate and organize your images and videos on iOS 15.

It will recognize the different people and allow you to research in your pictures where the person appears. If you have thousands of photos like I do, you will already have different clusters each representing different people.

For example, one such cluster could be all the photos where your friend John is in so that you can name it “John” and then search for images of John in your pictures to have them appear automatically.

Watch the video to learn more!

Watch the video

References:

► Read the Full article: https://www.louisbouchard.ai/how-apple-photos-recognizes-people/

►Apple, "Recognizing People in Photos Through Private On-Device Machine Learning", (2021), https://machinelearning.apple.com/research/recognizing-people-photos

►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video Transcript

00:00

In a recent publication, Apple explained how they used machine learning to directly recognize

00:04

people in private photos on your iPhones and iPads without having access to your images

00:09

to train their algorithms.

00:11

I personally do not have an iPhone, so I cannot test it myself, but I am looking for an iPad

00:15

to draw explanations and write math equations and stream it during calls.

00:19

If some of you guys use tablets to do that, please let me know what you think is the best

00:24

to get!

00:25

For those of you with Apple products, you can actually research by the person in the

00:28

Photos app.

00:30

Indeed, using multiple machine learning-based algorithms that I will cover in this video,

00:34

running privately on your device, you are able to accurately curate and organize your

00:39

images and videos on iOS 15.

00:42

It will recognize the different people and allow you to research in your pictures where

00:45

the person appears.

00:47

If you have thousands of photos like I do, you will already have different clusters each

00:50

representing different people.

00:52

For example, one such cluster could be all the photos where your friend John is in so

00:57

that you can name it "John" and then search for images of John in your pictures to have

01:02

them appear automatically.

01:04

It can even recognize photos where the same people frequently appear, even if it doesn't

01:08

know the persons individually or hasn't been directly trained with it, and use it to share

01:13

memories like the "Together" feature shown here.

01:16

This is a super cool built-in application by Apple, and the best is that it even works

01:21

when the face is occluded or sideways, as we will see.

01:24

As I said, it seems to work really well.

01:27

It entirely runs on your device privately, and they are always improving the algorithms,

01:32

but it's even cooler to know how it works, so let dive into it!

01:36

This task of recognizing people in your own picture is extremely challenging because of

01:40

the variability your photos will have.

01:43

Different people, different angles, different scales, different lightings, occlusions because

01:47

your friend was catching a football, or even from other cameras.

01:52

If we would strictly base ourselves on the person's face, this would be pretty incomplete

01:57

as most of our pictures taken on the spot during an event aren't perfect images with

02:02

your friends smiling in front of the camera.

02:04

When you type in John, you'd like to see these events where John won the game by catching

02:09

this ball.

02:10

To attack this, they start by locating the faces and upper bodies of people visible in

02:15

the image using a first detection algorithm.

02:18

This algorithm was trained on many labeled human examples annotated with where the bodies

02:22

and the faces were.

02:24

Meaning that they trained a deep neural network with images sent as inputs, and the outputs

02:28

were only the cropped version of the image with either the bodies or faces of the people.

02:34

This is done by feeding many examples to the network, helping it showing where to focus

02:38

its attention with the correct identified sections.

02:42

This way, it can iteratively learn to find this body part by itself afterward if we show

02:47

it enough examples during training.

02:49

photo: figure 2 cursor + network (U-net, VAE?) with images and box over other image and incorrect

02:50

box until it's better?

02:51

By the way, if you find this interesting, don't forget to subscribe, like the video,

02:53

and share it with your friends or colleagues, it helps a lot!

02:57

Thank you!

02:58

Then, they match the bodies and faces of each individual to have even more data about the

03:03

person in case only one of the two appears in a future image.

03:06

You can see here that both the body and face are sent into a separate model that encodes

03:12

the information, creating embeddings.

03:14

These embeddings are simply the most valuable information about the face and body of the

03:18

person.

03:19

Here, we use another network to encode the information because we want our embeddings

03:23

to be similar for the same person and different for different individuals.

03:27

This is again done with another model that will look like this, inspired by mobilenet,

03:32

which I talked about in my convolutional neural network video.

03:36

It is a lightweight convolutional neural network that can run extremely efficiently, made for

03:41

mobile instead of GPUs.

03:43

If you are not familiar with CNNs, I strongly invite you to watch the video I made explaining

03:48

them simply.

03:49

Basically, it takes the cropped images and compresses the information in a smaller space

03:53

focusing on the most interesting details about the individual.

03:57

This is possible because such a model was trained on a lot of images to do exactly that.

04:02

Then, these embeddings are merged and saved in your phone's gallery unless they have poor

04:07

responses.

04:09

These poor responses may come from unclear faces or upper bodies and would be automatically

04:14

filtered out.

04:15

This is repeated with all your pictures to create clusters out of these embeddings.

04:20

These clusters will be the different people identified.

04:22

It will merge all similar embeddings in small groups where each group is a specific individual.

04:28

So this is the step where all the pictures where John was identified are put into a gallery.

04:33

And what's cool is that this automatically runs during nighttime when your phone charges

04:38

while you sleep and keeps on improving the more pictures you have.

04:42

So once these clusters are created, your new photos containing people are sent to the same

04:46

deep network to create a new embedding per person in the image.

04:50

This new embedding will either join a cluster if they find a match or create a new one based

04:55

on the difference between the embeddings you have in your phone and the new picture's embeddings.

05:00

Here, to find whether it is the same person or not, they focus primarily on the face.

05:04

If it's occluded or sideways, it uses the upper body coupled with what we have from

05:09

the face and takes the time of the photo into account to measure if the clothing could be

05:13

the same or different.

05:15

As you may suspect, the upper body isn't always helpful.

05:18

As they say, "We’ve carefully tuned the set of face and upper body distance thresholds

05:23

to get the most out of the upper body embedding without negatively impacting overall accuracy."

05:29

And this is how Photos regroup your friends within the application without you knowing

05:34

it!

05:36

Another concern was that they want to offer the same experience for all Apple users no

05:40

matter the photographic subject’s skin color, age, or gender.

05:44

It is great that they keep on improving the generalization and working to remove these

05:48

biases from their algorithm the best they can using the broadest datasets possible and

05:53

data augmentation to add variations to the training images.

05:57

If you have an iPhone or iPad, please let me know what you think of this feature in

06:01

the Photos app and how well it works!

06:04

Thank you for watching!         

Join Hacker Noon

Create your free account to unlock your custom reading experience.