Mini-AttraSeek: Image Search Engine for A Website http://attraseek.com This whitepaper takes a detailed
look at AttraSeek, Inc.’s product, Mini-AttraSeek, an Image
Recognition Search Engine which searches a company’s website via
images directly (not keywords). Some images cannot be accurately
described by keywords such as graphs, charts,
engineering/ architectural designs, art, trademarks, patent
drawings, maps, ads, copyrighted images … Searching via
an image directly, instead of keywords, is more relevant than
keywords in these cases.
2008 AttraSeek, Inc., P. O. Box 13051, Savannah, GA,
31406, USA 9/1/2008
Contents
3.1 About AttraSeek Technology
3.2 Mini-AttraSeek Subscription
3.3 Subscription Fee Structure
5. Preparing Mini-AttraSeek Deployment
5.2 Organize Your Web Contents
5.3 Image List & Image Signature
6.1 Software and Hardware Requirements
7.2 Sequential Search VS. Binary Search
9.1 Multiple Signatures for an Image
AttraSeek, Inc. is an image-recognition search-engine company with proven core technology. AttraSeek, Inc. has two products, AttraSeek search engine & Mini-AttraSeek search engine.
Mini-AttraSeek is the right tool for you, if your company deals with many images, and your clients have problems finding your images with a keyword search engine, such as graphs, charts, engineering/ architectural designs, trademarks, patent drawings, maps, copyrighted images, ads, art, car wheel rims, other auto parts, …
Mini-AttraSeek searches a company’s website via images, not keywords. Many websites today have a “Mini-Google”, i.e. a mini-search engine just for a particular site. These sites might consider adding a “Mini-AttraSeek”.
Image Search is 10 percent of Google’s traffic, it grows 100 percent year-over-year (Exponential Growth), and it’s traditionally never been searched via images themselves. Images are simply not keywords. Some images cannot be accurately described by keywords.
For a demo of AttraSeek, go to http://www.attraseek.com/.
To get a Test Image:
To make a Search:
AttraSeek, Inc. is a spinout company from
Attrasoft, Inc., which developed the image recognition software.
Customer Quote: “We love Attrasoft.” Peter A.
Andrell III, CTO, TNS Media Intelligence.
Image Search is 10% of Google’s traffic, it grows 100% year-over-year (Exponential growth), and it’s traditionally never been searched via images themselves. The market for an image-recognition search engine is being driven by the projected exponential growth in the number of images and videos. There is broad agreement that sophisticated image-recognition technology products will play a major role in managing this image and video growth. The future of the search engine will have keywords, meanings specified by sentences, images, videos, audio, and a combination of all of the above.
It is under these conditions which the Mini-AttraSeek search engine is created; it searches a website with images directly.
Today, one out of ten searches is an image search, which is limited to keyword search. The problems are:
AttraSeek’s main image recognition search engine is crawling the Internet to accumulate web content. Meanwhile, if your company needs an image recognition search engine now, Mini-AttraSeek is the solution ready for you. To order Mini-AttraSeek, contact gina@attrasoft.com.
AttraSeek searches the Internet via images directly, instead of with keywords. AttraSeek is an image recognition search engine company with proven core technology. Google, Yahoo, MSN, and other search engines all search images via keywords. AttraSeek attempts to fill a technology void; i.e. searching images with images.
AttraSeek, Inc., founded in May
2008, is a privately owned, early stage company offering a proprietary
image-recognition search engine, which searches the Internet via images,
instead of keywords (patent pending). The company is a spinout company from Attrasoft, Inc., which
developed the image recognition software.
AttraSeek’s Competitive
Advantages are:
AttraSeek, Inc. offers these products/services:
This whitepaper takes a detailed look at the AttraSeek, Inc.’s product, Mini-AttraSeek, an image recognition search engine which searches a website via images directly (not keywords).
Mini-AttraSeek has an initial setup fee and first-year support fee, plus an annual subscription fee. The annual subscription fee after the first year includes the annual support.
Please contact gina@attrasoft.com.
From the user’s point of view, Mini-AttraSeek will make an image search in three clicks:
A user will:
(1) Start Mini-AttraSeek (Figure 3.1). (For example, http://www.attraseek.com)
Figure 3.1 AttraSeek.
(2) Click the Browse button to select an image (Figure 3.1).
Figure 3.2 Uploaded image of the Google logo.
(3) Click the Upload button to upload the image (Figure 3.2).
Figure 3.3 Display the results.
(4) Click the Search button to search (Figure 3.3).
There are three potential outcomes:
· Find Match: display the page.
· Find No Match: display the No Match page.
· Time Out: Print timeout message.
Matching scores are illustrated by the following tables.
|
Score |
Images |
Comments |
|
Original |
|
|
|
100 |
|
|
|
96 |
|
|
|
95 |
|
|
|
94 |
|
|
|
92 |
|
|
|
88 |
|
|
|
86 |
|
|
|
85 |
|
|
|
84 |
|
|
|
83 |
|
|
The Mini-AttraSeek can continue to improve accuracy after deployment via additional training. The client can submit pairs of images (library images and false rejection images) to AttraSeek, Inc. for additional training.
AttraSeek, Inc.’s core technology
is its proprietary algorithms that are uniquely suited to image
recognition. Today, the patent pending
recognition engine powers a number of production systems. One large-scale
implementation has been in production at TNS Media Intelligence since January
2005. In this solution, the matching engine automatically matches ads in
magazines and newspapers with ad images in a large-scale database. This
technology cut many manual matching employees and significantly increased the
productivity of the remaining employees. TNS was also able to enlarge their
service areas and enter into new markets, which had been out of their reach due
to the prohibitive cost associated with manual labor. Today, TNS is worth $1.87
Billion [1, 2].
Images on the Internet grow exponentially [3].
Image Search is 10% of Google’s traffic [4], it grows 100% year-over-year [4],
and it’s traditionally never been searched via images themselves [5, 6, 7]. One
can assume many applications need image searches:
Considering the size of the search engine market
[8,9,10], searching the Internet with images, videos, audio, and semantics will
logically be the next step.
The following will provide an overview of Mini-AttraSeek, so you will have some basic idea of how everything fits together. We will give a high level description of the preparation work.
Mini-AttraSeek has the following components:
Search Engine Page
This module takes a user’s input-image, submits the image, and looks for an output page. Once the output page is done, it returns the output page to the user and the job is done. This module can be customized.
Matching Engine Thread
The
matching engine thread looks for images
from a server location. Once the new images are found, it makes a 1:N matching
against the existing library and produces an output.
Web Crawler
The web crawler module is an independent module. It browses the World Wide Web in a methodical, automated manner and deposits the results on the hard drive in the exactly the same way as they stay in the web server, with all virtual directories turned into real directories. The deposited files include html files and image files.
Web Crawler Interface (WCI)
Image Signature Computing
This module computes image signatures.
Figure 5.1 Organize your contents.
Presumably, as a web master, you already have all of the contents for your website so you do not need to crawl your own site. If you can avoid using the web crawler, do not use it because web crawling will have some corrupted data. Once you have your contents organized, it will first be converted into an image list, which is an intermediate step.
Figure 5.2 Images are converted into image signatures.
Once you organize your content, you will get an image list from your website as an intermediate step. From this list, you will get image signatures, which is a library file.
The image matching algorithms that create and match the signatures are our strength, see Figure 5.3.
Figure 5.3 Image Matching.
An image
matching thread will continously match images, see Figure 5.4. This module will follow the following steps:
·
Looks
for an image;
·
Computes
the image signature;
·
Matches
the new signature against the library; and
·
Deposits
results.
Figure 5.4 Image Matching Thread.
The design of this page, defaults.aspx, is given in Figure 3.1. The search page process is:
To run Mini-AttraSeek, you must have:
Web applications require a server and a client. The client requests a page from the server and the server returns the page to the client, where the page is displayed inside Internet Explorer. Mini-AttraSeek is not PC software. To install Mini-AttraSeek, you must have a server. Mini-AttraSeek requires Microsoft IIS (Internet Information Services). Mini-AttraSeek also requires cookies enabled in the client’s Internet Explorer.
There are two components to install to a server:
The entire procedure is listed below:
1. Install Software
2. Set Internet Service Manager
Step 2A. Make Mini-AttraSeek an IIS
Application
Step 2B. Select ASPX 2.0
Step 2C. Web Sharing
Step 2D. IIS Security
Step 2E. ASPNET Process Rights
3. Files To Be Adjusted
4. Organize Website Data
5. Signature Computing
Step 5A. Set Up
Step 5B. Get Image List
Step 5C. Compute Signatures
Step 5D. Move Signature File to the
Correct Location
6. Starting the matching engine thread
AttraSeek, Inc. will work with you closely and walk you through the process step by step.
The scalability issue is addressed in two ways:
· Servers
· Sorting
The sole purpose of the Search Engine Web Page is:
· To submit an image; and
· To look for an output page. Once the output page is done, it returns the output page to the user and the job is done.
In the back end, a matching thread looks for submitted images. Once the thread finds a new image, it makes a 1:N matching against the existing library and produces an output. There is no limit on how many search engine computers can be run for this purpose.
There are three configurations:
· Single Server
· Multiple Threads
· Multiple Layers
The Single-server configuration uses a single web server to host both front web page and matching engine (Figure 7.1).
Multiple-thread configuration uses one to many mapping between the front web page and the back end folders.
In the Multiple-layer configuration, the text file output from each matching engine is combined in a server and returned to the front web page. The front page uses one-to-many mapping between the front web page and the back end threads.
Figure 7.1 Mini-AttraSeek
configurations.
Mini-AttraSeek supports four different search settings:
Sequential Search (Default)
Binary Search
Sequential Search is the default setting.
Binary Search can be significantly faster, however:
The computation time for both sequential search and binary search for each query is:
T = T1 + T2 + T3.
Here (for both sequential search and binary search):
Example 1. Sequential Search
Assume 500,000 images are in the library, then:
T1 = 0.5, T2 = 1, T3 = 0.
Added all together, the computation time per query is:
T = 0.5 + 1 + 0 = 1.5 (second).
This is obviously slower than Google’s 0.2 second. However, Mini-AttraSeek’s 1.5 second enables a razor sharp focus. This is a vast improvement on manually searching through the first 10 pages of keyword results (i.e., 10 to 20 minutes).
Example 2. Binary Search.
Let’s assume:
50,000,000 images are in the library;
1% of the 50 million images are actually used, i.e. 500,000 images are actually used;
Now:
T1 = 0.5, T2 = 1, T3 = 0.
Added all together, the computation time per query is:
T = 0.5 + 1 + 0 = 1.5 (second).
The potential customers for Mini-AttraSeek are:
SP 500 Companies
Example 1: Boeing has many engineering drawings, and might need an image-recognition search engine to search through designs.
Example 2: GM has many auto parts, and might need an image-recognition search engine for part identification (wheel rims, bumpers,) and location.
Example 3: Google can display relevant advertisements based on images.
Companies with a large number of Images
Companies with a large number of images are law firms with digitalized legal documents, auto parts (wheel rims, bumpers …), sport cards (baseball cards …), collection items (stamps, coins ...), tombstones, drawings, designs, photos, …..
Case Studies
One large-scale implementation has been in production
at TNS Media Intelligence since January 2005. In this solution, the matching
engine automatically matches ads in magazines and newspapers with ad images in
a large-scale database. This technology cut manual matching employees and
significantly increased the productivity of the remaining employees. TNS was
also able to enlarge their service areas and enter into new markets, which had
been out of their reach due to the prohibitive cost associated with manual
labor. Today, TNS is worth $1.87 Billion [1, 2].
AttraSeek, Inc. is a spinout company of Attrasoft, Inc.
Customization can accommodate multiple library
signatures:
Figure 9.1 A single image can produce multiple
signatures.
Customization can also accommodate searching for
several variations of a single image.
Figure 9.2 Searching for several variations of a single
image.
A customer might choose a customization package to deal
with issues such as:
Customized Mini-AttraSeek can address:
Caching means once a 100% match is found, the matching
engine returns the cached results rather than continuing to search the
database. Caching is only available in a customized version.
Deploying the Mini-AttraSeek image recognition search engine can be beneficial to your company. It allows your customers to search for images with images, instead of searching for images with keywords.
Mini-AttraSeek (a subscription based product) searches a company’s website via images, not keywords. Many websites today have a “Mini-Google”, i.e. a mini-search engine just for a particular site. These sites might consider adding a “Mini-AttraSeek”.
AttraSeek, Inc. is an image-recognition search-engine company with proven core technology. AttraSeek and Mini-AttraSeek are based on image-recognition technology developed in-house. Large scale system of this search engine is in production for many years now.
Searching via Images is cool!
Gina Porter, AttraSeek, Inc. P. O. Box 13051, Savannah, GA, 31406 USA
Email: gina@attrasoft.com
Website: www.attraseek.com
Tel: US-912-484-1717
For the latest information about our product and services, please see the following resources:
http://www.attraseek.com/
[1]Joe Mandese, http://publications.Mediapost.com, “TNS Rejects WPP, Will Nielsen be Next?”, May 5, 2008..
[2] http://www.Mediabuyerplanner.com, “TNS Rejects WPP, Will Nielsen be Next?”, May 5, 2008.
[3] Kodak White Paper, InfoImaging-A $225 Billion Industry Created by the Convergence of Image Science and Information technology, 2001.
[4] Pixsy.com
[5] Google, google.com.
[6] Yahoo, yahoo.com.
[7] MSN, msn.com.
[8] http://www.pcworld.com, Allison Taylor, ITWorldCanada.com, “Google’s Secret: ‘Cheap and fast’ Hardware”.
[9] “The Online Ad Market size - SEO market size too?”, http://greenisbetter.org/blogs/seo-internet-advertising-market-size/
[10] http://www.google-watch.org/cgi-bin/cookie.htm
Copyright ©2008 AttraSeek, Inc. All Rights Reserved.