Introduction

FSI Viewer offers a basic search option within the TIF images metadata, which gets filled with the relevant content when converting PDF files with FSI Pages converter. While this basic search functionality is satisfying for most image collections, it is nevertheless somewhat limited. FSI Pages Addon additionally offers the possibility to redirect search queries entered in the FSI Pages search window, to a server side script, which can then query a database and return the results found as XML formatted data. This way you can implement HTML formatted search results based on an existing database. In this tutorial we will demonstrate how this can be achieved using the popular scripting language PHP and a MySQL database.
Using the sample code from this tutorial requires:

  • basic PHP and SQL knowledge
  • a webserver running FSI Server, MySQL and PHP
  • a MySQL database

For this tutorial we will use a simple database structure. The table contains 3 data fields only: ID, PageNumber and Search. The Search column holds the entire text we will be searching through, making use of the advanced Full-Text search options of MySQL. If you want to learn more about this feature, please take a look at the MySQL Documentation. You can download the database structure and contents to import them to your MySQL database, eg. via phpMyAdmin, which contains searchable data for the first 7 pages of the catalog. Using a fulltext search in MySQL sorts the entire contents of the column alphabetically for faster searching.

Source: sql_dump.zip

Setting up FSI Viewer

As usual, we start with the needed configuration for FSI Viewer. Take a look at the sample FSI Viewer configuration file:

<fsi_parameter>
      <PLUGINS>
            <PLUGIN src="pages">

                  <!-- Other items removed for demonstration purposes -->

                  <!-- Edit line below to match your settings -->
                  <SearchCustomURL value="http://www.yourdomain.com/path/mysqlsearch.html" />
                  <SearchCustomValue value="sdfs35lp" />
            </PLUGIN>
</fsi_parameter>

 

The SearchCustomURL parameter contains the full link to the server side script, which receives the search phrases and returns valid XML to FSI Pages.
The second parameter, SearchCustomValue, is similar to the UniqueID used in the Notepad Plugin tutorial. It should be different in every configuration file if you have multiple image collections (catalogs). Do not use the following characters in SearchCustomValue: [Space] ~ % & ; : ” ‘ , < > ? #. Any combination of digits and letters will do.

Once you have added these lines to your configuration file, FSI Viewer will redirect all search phrases entered in the FSI Pages search window to your server side script instead of querying the image server.

The Server Side Script

The script must first connect to the MySQL database, then query the database table holding the information, and finally return some valid XML data representing the search result in the following form:

<?xml version="1.0" encoding="utf-8"?>
<results>
      <item page="10">
            Some <b>text</b> for this result...
      </item>
      <item page="20">
            Some <b>text</b> for this result...
      </item>
      <item page="30">
            Some <b>text</b> for this result...
      </item>
</results>
  • The required “page” attribute of each item specifies the page to display if the user clicks the item.
  • Each node may contain HTML formatted text describing the result.

For testing purposes, you can save this XML file as a fake server side “script” result to test whether the .xml configuration works as intended. Of course any search phrase you enter will return the same results in this case. In order to display the correct results, we need to build the XML dynamically using server side script.

Going dynamic

Now lets examine, what FSI viewer hands over to the script and replace the static .xml file with some real script:

<?php
ob_start();                                     // Start output buffer
var_dump($_POST);                           // Dump $_POST array
$buffer = ob_get_contents();           // Get Buffer contents
ob_end_clean();                                 // End and Clean Output buffer

$fp = fopen("./buffer.txt","w+");     // Create and open 'buffer.txt' for writing
fputs($fp,$buffer);                           // write buffer to file
fclose($fp);                                       // close file 'buffer.txt'

//Finally we print out some XML for FSI Viewer
echo "<?xml version="1.0" encoding="utf-8"?>";
echo "<results>";
echo "      <item page="1">";
echo "            You searched for <b>".$_POST['query']"</b>!";
echo "      </item>";
echo "</results>";
?>

Then launch FSI Pages and do a search. You will get 1 single result, pointing to page 1, with the information of what you just searched for. The script above writes the contents of the PHP $_POST-Array to buffer.txt so we can see, what FSI Viewer sends to the script. Now open up the buffer.txt and take a look:

array(9) {
  ["searchvalue"]=>
  string(11) "mySearchTerm*"
  ["query"]=>
  string(11) "mySearchTerm*"
  ["src"]=>
  string(23) "images/Catalogs/WeinFR/"
  ["customvalue"]=>
  string(8) "sdfs35lp"
  ["cols"]=>
  string(1) "4"
  ["rows"]=>
  string(3) "512"
  ["page"]=>
  string(1) "1"
  ["vtl"]=>
  string(20) "fsi/searchresult.xml"
  ["cmd"]=>
  string(6) "search"
}

What we need here, is the [‘searchvalue’] or [‘query’], both contain the phrase we entered in the searchbox. While both are the same, $_POST[‘searchvalue’] is for compatibility with different image servers. We will go with $_POST[‘query’] from now on. Note, how the search term has the ‘*’-wildcard added at its end.
The $_POST[‘customvalue’] may be interesting for your script, to identify the catalog the query comes from. You can use it to alter the SQL query, to query a different table depending on the $_POST[‘customvalue’] content or something similar. In this tutorial we assume a single FSI Pages instance to search and neglect this parameter.

Querying the database
Now we need to query the database with the query we get. Here is a little PHP class we will use:

<?php
/**************************************
  Copyright by NeptuneLabs 2006-2007
        (info@neptunelabs.com)
**************************************/

class sqlsearch{

      function sqlsearch(){ //Constructor

            //Edit the following parameters to match your settings
            $dbhost='localhost';
            $dbport='3306';
            $dbuser='username';
            $dbpass='password';
            $database='Articles';

            $this->dbconnect = @mysql_connect($dbhost.':'.$dbport, $dbuser, $dbpass);

            $result=@mysql_select_db($database);

      }


      function search($terms){
            $results = false;
            $terms = '"'.(str_replace('*', '', $terms)).'"'; //Remove trailing wildcard
                  // Edit query to match your database table name
                  $query = 'SELECT ID, Page, Search, MATCH (Search) AGAINST ('.$terms.' IN NATURAL LANGUAGE MODE) AS Score FROM Articles
                  WHERE MATCH (Search) AGAINST ('.$terms.' IN NATURAL LANGUAGE MODE) ORDER BY Score DESC, Page ASC';
                  $result = mysql_query($query);

                  if( mysql_affected_rows() == 0 ) { //Fallback if no result in searching as a phrase in natural human language
                        $query = "SELECT ID, Page, Search FROM Articles WHERE Search LIKE '%".str_replace('"','',$terms)."%'";
                        $result = mysql_query($query);
                  }

                  if ($result != false){
                        while ($tmp=mysql_fetch_assoc($result)){
                            // Edit the following line to match your database table field names
                              $results[]=array('page'=>$tmp['Page'], 'text' => substr($tmp['Search'], 0, 120));
                        }
                  }

            return $results;
      }


      function printResults($results){
            $xml[]='<results>';

            if ($results != false){
                  foreach ($results as $result){
                        $xml[]='<item page="'.$result['page'].'">';
                        $xml[]=$this->xmlencode($result['text']);
                        $xml[]='</item>';
                  }
            }
            $xml[]='</results>';
            $xml_full = implode("n", $xml);
            $this->print_xml($xml_full);
      }


      function xmlencode($rawcontent){
            return str_replace(array('<', '>', '"', "'", '&'), array('&lt;', '&gt;', '&quot;', '&apos;', '&amp;'), $rawcontent);
      }


      function print_xml($xmlraw){
            header ('Content-Type: application/xml');
            $xml="<?xml version="1.0" encoding="UTF-8" ?>n";
            $xml.=$xmlraw;
            echo $xml;
      }


}
?>

Some explanation about this class:
The constructor sqlsearch() connects to the database upon an instance of the class is created. Please alter the variables to fit your settings.
The search() function querys the database table and returns an array with the results. (Please refer to the MySQL Documentation for details about the query.)
The printResults() function takes an array of results, and prints out the full XML, using xmlencode() and print_xml()
Now we include the class in the following script:

<?php

require_once("search.inc");             //Load class

$sql = new sqlsearch();                    //Initialise and connect
$results = $sql->search($_POST['query']); //search Database
if (!empty($results))
      $sql->printResults($results);
}

?>

Actually these few lines of code do everything that has to be done, using the “sqlsearch” class from above.
If you save both files to the location defined in the .xml configuration file, your FSI Pages instance uses a MySQL based search.

Cross Domain considerations

Please note that Adobe Flash Player can – by default – not retrieve XML data from a foreign domain. This is: an FSI Viewer instance located at http://somedomain/fsi.swf is not allowed to read XML data from http://anotherdomain/data.xml. In addition to that, Adobe Flash Player does not resolve domains to IP addresses. So do not mix adressing via IP addresses and domains.
To work around that you need to request XML data from the domain that you are loading FSI Viewer from – or, if this is not possible – add a crossdomain policy file to the root directory of the domain your server side script resides in, e.g. http://anotherdomain/crossdomain.xml

<?xml version="1.0" ?>
<cross-domain-policy>
<allow-access-from domain="*" secure="false" />
</cross-domain-policy>

Final notes

This tutorial is just a basic demonstration of the required setup. You should enhance the scripts shown here and add security checks on the database, add some error logging and alike. It’s recommended not to display the search-column contents in the search results of FSI Pages, but rather link the results with the articles name and display these. You can also store the search results in some other database table to learn more about what users search for with enhanced scripts.