Mini-AttraSeek:    Image Search Engine for A Website

 http://attraseek.com

 

 

This whitepaper takes a detailed look at AttraSeek, Inc.’s product,

Mini-AttraSeek, an Image Recognition Search Engine which

searches a company’s website via images directly

 (not keywords).  Some images cannot be accurately described

by keywords such as graphs, charts, engineering/ architectural

designs, art, trademarks, patent drawings, maps, ads,

copyrighted images … Searching via an image directly, instead of

keywords, is more relevant than keywords in these cases.                                                                                                                                                                                                                                  

 

2008

 

AttraSeek, Inc., P. O. Box 13051, Savannah, GA, 31406, USA

9/1/2008

 


 


Contents

Executive Summary. 4

1. Industrial Treads. 5

2. Business Challenge. 5

3. Solution Description. 5

3.1 About AttraSeek Technology. 5

3.2 Mini-AttraSeek Subscription. 6

3.3 Subscription Fee Structure. 6

3.4 How Mini-AttraSeek Works. 8

3.5 Matching Scores. 9

3.6 Improvement. 11

4. Solution Benefits. 11

5. Preparing Mini-AttraSeek Deployment. 12

5.1 Mini-AttraSeek Basics. 13

5.2 Organize Your Web Contents. 13

5.3 Image List & Image Signature. 14

5.4 Image Matching Thread. 14

5.5 Search Engine Web Page. 15

6. Technical Specifications. 16

6.1 Software and Hardware Requirements. 16

6.2 Logistics. 16

7. Scalability. 18

7.1 Number of Servers. 18

7.2 Sequential Search VS. Binary Search. 19

8. Target Market. 21

9. Customizations. 22

9.1 Multiple Signatures for an Image. 22

9.2 Image Variation. 23

9.3 Caching. 23

10. Summary. 24

Contact Us. 24

More Information. 24

References. 24

 


 

Executive Summary

 

AttraSeek, Inc. is an image-recognition search-engine company with proven core technology. AttraSeek, Inc. has two products, AttraSeek search engine & Mini-AttraSeek search engine.

Mini-AttraSeek is the right tool for you, if your company deals with many images, and your clients have problems finding your images with a keyword search engine, such as graphs, charts, engineering/ architectural designs, trademarks, patent drawings, maps, copyrighted images, ads, art, car wheel rims, other auto parts,  

Mini-AttraSeek searches a company’s website via images, not keywords. Many websites today have a “Mini-Google”, i.e. a mini-search engine just for a particular site. These sites might consider adding a “Mini-AttraSeek”.

Image Search is 10 percent of Google’s traffic, it grows 100 percent year-over-year (Exponential Growth), and it’s traditionally never been searched via images themselves. Images are simply not keywords. Some images cannot be accurately described by keywords.

 

For a demo of AttraSeek, go to http://www.attraseek.com/.

To get a Test Image:

 

To make a Search:

 

AttraSeek, Inc. is a spinout company from Attrasoft, Inc., which developed the image recognition software.

Customer Quote: “We love Attrasoft.” Peter A. Andrell III, CTO, TNS Media Intelligence.

 

 

1. Industrial Treads

 

Image Search is 10% of Google’s traffic, it grows 100% year-over-year (Exponential growth), and it’s traditionally never been searched via images themselves. The market for an image-recognition search engine is being driven by the projected exponential growth in the number of images and videos.  There is broad agreement that sophisticated image-recognition technology products will play a major role in managing this image and video growth.  The future of the search engine will have keywords, meanings specified by sentences, images, videos, audio, and a combination of all of the above.

It is under these conditions which the Mini-AttraSeek search engine is created; it searches a website with images directly.

2. Business Challenge

 

Today, one out of ten searches is an image search, which is limited to keyword search. The problems are:

 

AttraSeek’s main image recognition search engine is crawling the Internet to accumulate web content. Meanwhile, if your company needs an image recognition search engine now, Mini-AttraSeek is the solution ready for you. To order Mini-AttraSeek, contact gina@attrasoft.com.

3. Solution Description

3.1 About AttraSeek Technology

 

AttraSeek searches the Internet via images directly, instead of with keywords. AttraSeek is an image recognition search engine company with proven core technology. Google, Yahoo, MSN, and other search engines all search images via keywords. AttraSeek attempts to fill a technology void; i.e. searching images with images.

AttraSeek, Inc., founded in May 2008, is a privately owned, early stage company offering a proprietary image-recognition search engine, which searches the Internet via images, instead of keywords (patent pending). The company is a spinout company from Attrasoft, Inc., which developed the image recognition software.

AttraSeek’s Competitive Advantages are:

 

3.2 Mini-AttraSeek Subscription

 

AttraSeek, Inc. offers these products/services:

  1. AttraSeek searches the Internet via images, not keywords.
  2.  Mini-AttraSeek searches a company’s website via images, not keywords. Many websites today have a “Mini-Google”, i.e. a mini-search engine just for a particular site. These sites might consider adding a “Mini-AttraSeek”.
  3. Customized Mini-AttraSeek addresses a client’s special needs.
  4. Hosting Mini-AttraSeek helps a company to avoid image-recognition search-engine management problems.

 

This whitepaper takes a detailed look at the AttraSeek, Inc.’s product, Mini-AttraSeek, an image recognition search engine which searches a website via images directly (not keywords).

Mini-AttraSeek has an initial setup fee and first-year support fee, plus an annual subscription fee.  The annual subscription fee after the first year includes the annual support.

 

3.3 Subscription Fee Structure

 

Please contact gina@attrasoft.com.

 

3.4 How Mini-AttraSeek Works

 

From the user’s point of view, Mini-AttraSeek will make an image search in three clicks:

A user will:

(1) Start Mini-AttraSeek (Figure 3.1). (For example, http://www.attraseek.com)

Figure 3.1 AttraSeek.

(2) Click the Browse button to select an image (Figure 3.1).

Figure 3.2 Uploaded image of the Google logo.

 

(3) Click the Upload button to upload the image (Figure 3.2).

 

Figure 3.3 Display the results.

(4) Click the Search button to search (Figure 3.3).

There are three potential outcomes:

·         Find Match: display the page.

·         Find No Match: display the No Match page.

·         Time Out: Print timeout message.

 

3.5 Matching Scores

 

Matching scores are illustrated by the following tables.

 

Score

Images

Comments

Original

 

100

 

96

 

95

 

94

 

92

 

88

 

86

 

85

 

84

 

83

 

 

 

3.6 Improvement

 

The Mini-AttraSeek can continue to improve accuracy after deployment via additional training. The client can submit pairs of images (library images and false rejection images) to AttraSeek, Inc. for additional training.

4. Solution Benefits

 

AttraSeek, Inc.’s core technology is its proprietary algorithms that are uniquely suited to image recognition.  Today, the patent pending recognition engine powers a number of production systems. One large-scale implementation has been in production at TNS Media Intelligence since January 2005. In this solution, the matching engine automatically matches ads in magazines and newspapers with ad images in a large-scale database. This technology cut many manual matching employees and significantly increased the productivity of the remaining employees. TNS was also able to enlarge their service areas and enter into new markets, which had been out of their reach due to the prohibitive cost associated with manual labor. Today, TNS is worth $1.87 Billion [1, 2].

Images on the Internet grow exponentially [3]. Image Search is 10% of Google’s traffic [4], it grows 100% year-over-year [4], and it’s traditionally never been searched via images themselves [5, 6, 7]. One can assume many applications need image searches:

 

 

Considering the size of the search engine market [8,9,10], searching the Internet with images, videos, audio, and semantics will logically be the next step.

5. Preparing Mini-AttraSeek Deployment

 

The following will provide an overview of Mini-AttraSeek, so you will have some basic idea of how everything fits together. We will give a high level description of the preparation work.

5.1 Mini-AttraSeek Basics

 

Mini-AttraSeek has the following components:

 

Search Engine Page

This module takes a user’s input-image, submits the image, and looks for an output page. Once the output page is done, it returns the output page to the user and the job is done. This module can be customized.

Matching Engine Thread

The matching engine thread looks for images from a server location. Once the new images are found, it makes a 1:N matching against the existing library and produces an output.

Web Crawler

The web crawler module is an independent module. It browses the World Wide Web in a methodical, automated manner and deposits the results on the hard drive in the exactly the same way as they stay in the web server, with all virtual directories turned into real directories. The deposited files include html files and image files.

Web Crawler Interface (WCI)

Web Crawling Interface (WCI) converts website data into an image list, which is a text file. The text file contains a list of images in the website. This is an intermediate file for computing image signatures.

 

Image Signature Computing

This module computes image signatures.

 

Figure 5.1 Organize your contents.

5.2 Organize Your Web Contents

 

Presumably, as a web master, you already have all of the contents for your website so you do not need to crawl your own site. If you can avoid using the web crawler, do not use it because web crawling will have some corrupted data. Once you have your contents organized, it will first be converted into an image list, which is an intermediate step.

5.3 Image List & Image Signature

 

Mini-AttraSeek uses image signatures  to find and match images. The unique attributes of an image – its “signature” or “fingerprint” - are compared against a library of signatures that have been loaded in the Internet Server. All of the images in your website will need to be converted into image signatures via a fixed procedure.

 

Figure 5.2 Images are converted into image signatures.

Once you organize your content, you will get an image list from your website as an intermediate step. From this list, you will get image signatures, which is a library file.

5.4 Image Matching Thread

 

The image matching algorithms that create and match the signatures are our strength, see Figure 5.3.

Figure 5.3 Image Matching.

An image matching thread will continously match images, see Figure 5.4. This module will follow the following steps:

·         Looks for an image;

·         Computes the image signature;

·         Matches the new signature against the library; and

·         Deposits results.

 

Figure 5.4 Image Matching Thread.

5.5 Search Engine Web Page

 

The design of this page, defaults.aspx, is given in Figure 3.1. The search page process is:

  1. Default.aspx uploads an image via a user query;
  2. Image matching thread finds an image;
  3. Image matching thread makes a 1:N matching and removes the image;
  4. Image matching thread deposits a result;
  5. Default.aspx finds a result and returns result to the user.

6. Technical Specifications

6.1 Software and Hardware Requirements

 

To run Mini-AttraSeek, you must have:

 

Web applications require a server and a client. The client requests a page from the server and the server returns the page to the client, where the page is displayed inside Internet Explorer. Mini-AttraSeek is not PC software. To install Mini-AttraSeek, you must have a server.  Mini-AttraSeek requires Microsoft IIS (Internet Information Services).  Mini-AttraSeek also requires cookies enabled in the client’s Internet Explorer.

32-bit computing has a limit of 2 GB RAM per thread. Assuming the server software takes 0.5 GB, the 1.5 GB left will allow about 500,000 images; therefore, any time you have more than 500,000 images, you will need to consider 64-bit computing, i.e. using an Intel Xeon processor, which has a limit of 1024 GB. RAM requirement is 3K per image. Adding 0.5 GB for the operating system, here are the requirements:

 

 

Images

RAM

Comments

500,000

2 GB

32-bits, 64 bits

1,000,000

3.5 GB

64-bits only

10,000,000

30 GB

64-bits only

50,000,000

128 GB

64-bits only

500,000,000

1024 GB

64-bits only

500,000,000

10 x 128 GB

64-bits only

 

6.2 Logistics

 

            Install Software

  Configure the Web Server

  Organize Website Contents

  Starting Matching Thread(s)

  Adjusting Parameters

  Compute Image Signatures

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 6.1 Installation procedure consists of six steps.

 

 

There are two components to install to a server:

 

The installation procedure consists of the following steps:

 

·         Install Software

·         Configure the Web Server

·         Adjusting Parameters

·         Organize Website Contents

·         Compute Image Signatures

·         Start Matching Thread(s)

 

The entire procedure is listed below:

1. Install Software

2. Set Internet Service Manager

Step 2A. Make Mini-AttraSeek an IIS Application

Step 2B. Select ASPX 2.0

Step 2C. Web Sharing

Step 2D. IIS Security

Step 2E. ASPNET Process Rights

3. Files To Be Adjusted

4. Organize Website Data

5. Signature Computing

Step 5A. Set Up

Step 5B. Get Image List

Step 5C. Compute Signatures

Step 5D. Move Signature File to the Correct Location

6. Starting the matching engine thread

AttraSeek, Inc. will work with you closely and walk you through the process step by step.

7. Scalability

 

The scalability issue is addressed in two ways:

·         Servers

·         Sorting

7.1 Number of Servers

 

The sole purpose of the Search Engine Web Page is:

·         To submit an image; and

·         To look for an output page. Once the output page is done, it returns the output page to the user and the job is done.

 

In the back end, a matching thread looks for submitted images. Once the thread finds a new image, it makes a 1:N matching against the existing library and produces an output. There is no limit on how many search engine computers can be run for this purpose.

There are three configurations:

·         Single Server

·         Multiple Threads

·         Multiple Layers

 

The Single-server configuration uses a single web server to host both front web page and matching engine (Figure 7.1).

 

Multiple-thread configuration uses one to many mapping between the front web page and the back end folders.

 

In the Multiple-layer configuration, the text file output from each matching engine is combined in a server and returned to the front web page. The front page uses one-to-many mapping between the front web page and the back end threads.

 

 

Figure 7.1 Mini-AttraSeek configurations.

 

7.2 Sequential Search VS. Binary Search

 

Mini-AttraSeek supports four different search settings:

Sequential Search (Default)

 

Binary Search

 

Sequential Search is the default setting.

Binary Search can be significantly faster, however:

 

The computation time for both sequential search and binary search for each query is:

                T = T1 + T2 + T3.

Here (for both sequential search and binary search):

 

Example 1.  Sequential Search

Assume 500,000 images are in the library, then:

                T1 = 0.5, T2 = 1, T3 = 0.

Added all together, the computation time per query is:

T = 0.5 + 1 + 0 = 1.5 (second).

This is obviously slower than Google’s 0.2 second. However, Mini-AttraSeek’s 1.5 second enables a razor sharp focus. This is a vast improvement on manually searching through the first 10 pages of keyword results (i.e., 10 to 20 minutes).

Example 2.  Binary Search.

Let’s assume:

50,000,000 images are in the library;

1% of the 50 million images are actually used, i.e. 500,000 images are actually used;

Now:

                T1 = 0.5, T2 = 1, T3 = 0.

Added all together, the computation time per query is:

T = 0.5 + 1 + 0 = 1.5 (second).

8. Target Market

 

The potential customers for Mini-AttraSeek are:

 

SP 500 Companies

Example 1: Boeing has many engineering drawings, and might need an image-recognition search engine to search through designs.

Example 2: GM has many auto parts, and might need an image-recognition search engine for part identification (wheel rims, bumpers,) and location.

Example 3: Google can display relevant advertisements based on images.

Top 500 Companies in the World

See example above.

 

US Top 500 Websites

Example: some websites, like flickr.com (owned by Yahoo), NASA, map sites, … have billions of images, and might need queries based on images, in addition to keywords.

 

Government

Example: There are two hundred countries in the world that need to check a trademark before granting a trademark.

 

Companies with a large number of Images

Companies with a large number of images are law firms with digitalized legal documents, auto parts (wheel rims, bumpers …), sport cards (baseball cards …), collection items (stamps, coins ...), tombstones, drawings, designs, photos, …..

 

Case Studies

One large-scale implementation has been in production at TNS Media Intelligence since January 2005. In this solution, the matching engine automatically matches ads in magazines and newspapers with ad images in a large-scale database. This technology cut manual matching employees and significantly increased the productivity of the remaining employees. TNS was also able to enlarge their service areas and enter into new markets, which had been out of their reach due to the prohibitive cost associated with manual labor. Today, TNS is worth $1.87 Billion [1, 2].

AttraSeek, Inc. is a spinout company of Attrasoft, Inc.

9. Customizations

 

9.1 Multiple Signatures for an Image

 

Customization can accommodate multiple library signatures:

    

Figure 9.1 A single image can produce multiple signatures.

 

Customization can also accommodate searching for several variations of a single image.

        

Figure 9.2 Searching for several variations of a single image.

 

9.2 Image Variation

 

A customer might choose a customization package to deal with issues such as:

Customized Mini-AttraSeek can address:

 

9.3 Caching

 

Caching means once a 100% match is found, the matching engine returns the cached results rather than continuing to search the database. Caching is only available in a customized version.

10. Summary

 

Deploying the Mini-AttraSeek image recognition search engine can be beneficial to your company. It allows your customers to search for images with images, instead of searching for images with keywords.

Mini-AttraSeek (a subscription based product) searches a company’s website via images, not keywords. Many websites today have a “Mini-Google”, i.e. a mini-search engine just for a particular site. These sites might consider adding a “Mini-AttraSeek”.

AttraSeek, Inc. is an image-recognition search-engine company with proven core technology. AttraSeek  and Mini-AttraSeek are based on image-recognition technology developed in-house. Large scale system of this search engine is in production for many years now.

Searching via Images is cool!

Contact Us

 

Gina Porter,  AttraSeek, Inc.  P. O. Box 13051, Savannah, GA, 31406  USA

Email: gina@attrasoft.com   

Website: www.attraseek.com

Tel: US-912-484-1717

More Information

 

For the latest information about our product and services, please see the following resources:

http://www.attraseek.com/

http://www.attrasoft.com/

References

 

[1]Joe Mandese, http://publications.Mediapost.com, “TNS Rejects WPP, Will Nielsen be Next?”, May 5, 2008..

[2] http://www.Mediabuyerplanner.com, “TNS Rejects WPP, Will Nielsen be Next?”, May 5, 2008.

[3] Kodak White Paper, InfoImaging-A $225 Billion Industry Created by the Convergence of Image Science and Information technology, 2001.

[4] Pixsy.com

[5] Google, google.com.

[6] Yahoo, yahoo.com.

[7] MSN, msn.com.

[8] http://www.pcworld.com, Allison Taylor, ITWorldCanada.com, “Google’s Secret: ‘Cheap and fast’ Hardware”.

[9] “The Online Ad Market size - SEO market size too?”, http://greenisbetter.org/blogs/seo-internet-advertising-market-size/

[10] http://www.google-watch.org/cgi-bin/cookie.htm

 

Copyright ©2008 AttraSeek, Inc.  All Rights Reserved.