Image Search Engine Using Python

Images provide a lot more information than audio or text. Image processing is the prime field of research for robotics as well as search engines. In this article we will explore the concept of finding similarity between digital images using python. Then we will use our program to find top 10 search results inside a dataset of images for a given picture. It won't be as good as google's search engine because of the technique we will be using to find similarity between images. But what we are going to make will be pretty cool. So lets start.

Setting up the Environment

The code we are going to write requires a few tools which we need to install first. I will try to be as precise as i can and if you get stuck into installing some tool then you can drop a comment below and i will help you sort out the problem. So here are the tools and the steps to install those tools in ubuntu (16.04 but should work on any version). The steps are not independent so follow them accordingly.

  • Install virtualenv
    • sudo pip install -g virtualenv
  • Create environment directory and activate virtual environment.
    • mkdir env
    • virtualenv env
    • cd env
    • source bin/activate
  • Install Numpy
    • pip install numpy
  • Install Pillow (Python image library)
    • pip install pillow
  • Install Tkinter (For GUI)
    • sudo apt-get install python-tk

Our Algorithm

The approach we will be using includes finding euclidean distances between color histograms of images. This is a very basic approach and it will help us to search images using their colors and not using their features. So it is possible that with this approach the best search result for a zebra might be a yin-yang but that depends on the dataset actually. So this is not a bad approach at all. Lets see what the step by step process looks like :

  • Create the color histogram of each image in the dataset
  • Create the color histogram of the image to be searched
  • Calculate the euclidean distance between the histogram of image to be searched and histograms of the images in the dataset.
  • Select the smallest 10 distances and those are the search results.

This might sound a little confusing because we don't know about a couple of things in the algorithm. Like color histogram and euclidean distance. Let us understand these things.

  • What is color histogram:

    Color histogram is spectrum of each possible color space of the image containing the number of pixels for a specific color component. Consider a simple RGB format image in which each pixel has one red component, one green and one blue component. So each pixel of the image can be represented by a tuple of 3 values (red, green, blue). Now the range of these values vary from 0 to 255 and thus forms a unique color by selecting any value for 3 components. Thus (0,0,0) represents white color and (255,255,255) represents black. The total possible values for each color component are 256 and thus we have a total of 768 possible color components (these are not total number of colors but only the number of the color components).

    Now the color histogram for this image will be a table having 768 entries in which each entry contains the number of pixels containing that component

  • What is Euclidean Distance

    It is nothing special but the ordinary distance between 2 vectors. If you know about vectors then you should have known how the distance between 2 vectors is calculated. But in case you dont know then lets assume that we have 2 vectors with N-dimensions. So each vector has N components. The distance between these 2 vectors can be calculated as below.

How the code looks

So far we have set up our environment and have learnt about the alogorithm we are going to use. Now its time to write actual code. Create 2 files with names hist.py and show_images.py. Below is the code for the hist.py file.

from PIL import Image
from numpy import *
import os
import show_images

DATASETDIR = '/path/to/your/dataset/directory/'

def perform_search(filename):
 #create an image object using Image.open method for the given image
 im = Image.open(filename)

 #we can use histogram method of image object to automatically build our histogram
 #we then convert the histogram array to numpy array to perform calculations
 search_histo = array(im.histogram())

 #create an empty list to store distances
 dist = []

 #get all the images of the dataset directory
 files = os.listdir(DATASETDIR)

 #declare the structure of the data for your dist list
 #It is only to perform sorting using numpy
 dtype = [('name', 'S100'), ('distance', float)]

 #Now we calculate euclidean distance between our search_histo and all images histograms
 for file in files:
  imob = Image.open(os.path.join(DATASETDIR, file))
  histo = array(imob.histogram())
  
  #Euclidean Distance Calculation
  try:
   diff = histo - search_histo
   sq = square(diff)
   total = sum(sq)
   result = sqrt(total)
   dist.append((file, result))
  except ValueError:
   pass

 
 #convert our list to numpy array with given data type
 distance = array(dist, dtype=dtype)

 #sort the array in increasing order to get top 10 results
 sort_dist = sort(distance, order='distance')

 top10 = sort_dist[:11]
 
 #show the result images in a window
 show_images.show_images(top10[1:])

Lets build the GUI

We have our backbone ready. Now we just need to show the images for which we require some GUI library. You can use pyqt or tkinter. But for this article we are going to use tkinter. If you have followed the first section then you already have tkinter installed into your system but if its not then i suggest you to install it first. Lets build our GUI program. The following code is for show_images.py

from Tkinter import *
from PIL import Image, ImageTk
import os

DATASETDIR = '/path/to/your/dataset/directory/'

class MainFrame(Frame):
    def __init__(self, parent, *args, **kw):
  Frame.__init__(self, parent, *args, **kw)            

  # create a canvas object and a vertical scrollbar for scrolling it
  vscrollbar = Scrollbar(self, orient=VERTICAL)
  vscrollbar.pack(fill=Y, side=RIGHT, expand=False)

  canvas = Canvas(self, bd=0, highlightthickness=0,
                  yscrollcommand=vscrollbar.set)
  canvas.pack(side=LEFT, fill=BOTH, expand=True)
  vscrollbar.config(command=canvas.yview)

  # reset the view
  canvas.yview_moveto(0)

  # create a frame inside the canvas which will be scrolled with it
  self.interior = interior = Frame(canvas)
  interior_id = canvas.create_window(0, 0, window=interior,
                                     anchor=NW)

  # track changes to the canvas and frame width and sync them,
  # also updating the scrollbar
  def _configure_interior(event):
      # update the scrollbars to match the size of the inner frame
      size = (interior.winfo_reqwidth(), interior.winfo_reqheight())
      canvas.config(scrollregion="0 0 %s %s" % size)
      if interior.winfo_reqwidth() != canvas.winfo_width():
          # update the canvas's width to fit the inner frame
          canvas.config(width=interior.winfo_reqwidth())
  interior.bind('<Configure>', _configure_interior)

  def _configure_canvas(event):
      if interior.winfo_reqwidth() != canvas.winfo_width():
          # update the inner frame's width to fill the canvas
          canvas.itemconfigure(interior_id, width=canvas.winfo_width())
  canvas.bind('<Configure>', _configure_canvas)


def show_images(top10, root=None): 
 root = Tk()
 root.title("Similar Images")
 root.imageframe = MainFrame(root)
 root.imageframe.pack(fill=BOTH, expand=True)

 images = []
 imagetks = []
 imagepanels = []
 r=0

 size = 128, 128

 for image in top10:
  imagename = os.path.splitext(image[0])[0]
  img = Image.open(os.path.join(DATASETDIR, image[0]))
  img.thumbnail(size)
  images.append(img)
  imgtk = ImageTk.PhotoImage(images[-1])
  imagetks.append(imgtk)
  panel = Label(root.imageframe.interior, image=imagetks[-1])
  imagepanels.append(panel)
  imagepanels[-1].grid(row=r, column=0)
  r = r+1

 root.mainloop()

I wish i could tell you why the code for GUI looks like so but then we will be going off topic. I will consider writing another article on making GUI applications with tkinter but till then you can use the above code as it is. Make sure you have specified the dataset directory in your both program files. So now that we have our main program and our gui program, its time to test the program.

  • Run the python interpreter using the command python
  • Now import our hist module using statement import hist
  • call hist.perform_search with the full path to the image to be searched
  • You should see a nice window showing the search results.

Additional Techniques

The technique we used above is pretty simple and might not provide results on the basis of other features of images like the shape, orientation and scaling. So you might tempt to use a better search technique. I know 2 other methods which involves using the following concepts.

Thats it for now. If you encounter any problem or have a question then don't hesitate to drop a comment below. Your feedback is also valuable so tell us your thoughts about the article in the comments.

Comments

  1. Hi. is this going to work on windows or only on ubuntu?

    ReplyDelete
    Replies
    1. The code will work on windows too.
      Tkinter comes with python so there is no need to explicitly install it. Also the sudo commands won't work in windows which are required for environment setup. But the code is good to go.

      Delete
    2. Hi. Sorry for disturbing. Maybe i can get an email to further the conversation. Thank you

      Also, I tried running the code but i got an error message

      Traceback (most recent call last):
      File "C:\Users\DANIEL FAREMI\Desktop\CBIRNEW\show_images.py", line 7, in
      class MainFrame(Frame):
      File "C:\Users\DANIEL FAREMI\Desktop\CBIRNEW\show_images.py", line 59, in MainFrame
      for image in top10:
      NameError: name 'top10' is not defined

      Delete
  2. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work..

    python online course

    ReplyDelete

  3. the blog is about Image Search Using Python updated much useful for students and IT Developers
    for more updates go with ServiceNow Online Training

    For more info on other technologies go with below links

    tableau online training hyderabad

    mulesoft Online Training

    Python Online Training

    ReplyDelete

Post a Comment

Comment on articles for more info.

Popular posts from this blog

Authentication: A step to security

Understanding Python Decorators