#python #neural search #Jina #deep learning #clip #images

Image Encoders: BigTransfer vs CLIP

Published Sep 18, 2021 by Alex C-G


I’ve been mucking around with building a meme search engine using Jina. To do so I’m testing a couple of different image encoders.

In essence, these use a neural network to turn an image file into vector embeddings that can be compared for a similarity (“nearest neighbor”) search. Which one is best (at least for memes)? Let’s put them to the test. We’ll index 10,000 memes and compare:

The code for testing this is in my simple Jina search examples repo. All I did was swap out some code to switch models. Full code in the repo, this is just the swapped out bits:

BigTransfer

flow = (
    Flow()
    .add(
        uses="jinahub+docker://BigTransferEncoder",
        uses_with={"model_name": "Imagenet1k/R50x1", "model_path": "model"},
)

CLIP

flow = (
    Flow()
    .add(
        uses="jinahub+docker://CLIPImageEncoder",
)
Model/query image CLIP BigTransfer Winner
Time to index
(via time python app.py index)
3:24 1:33 BigTransfer
Index size
(via du -hs workspace)
111 mb 113 mb Too close to call

🤷

🤷

🤷

BigTransfer

So, what have we learned?

Not much really.

This is just how well they perform on a folder of memes though - and memes with similar templates are very similar to each other (otherwise they wouldn’t really be memes, just random photos with some Impact font over the top). The models may perform very differently on a dataset of personal photos where variation would be greater.

Next time maybe I’ll test the meme dataset but with by searching variations of the doge meme. There are plenty of variations and I haven’t seen any of those in the dataset so far. So whichever model matches more closely with the classic Doge meme would be the winner.

Testing notes



*****

© 2018-2021, Alex Cureton-Griffiths | Pudhina Fresh theme for Jekyll.