A client of mine wanted to use the power of Oracle Text to search for keywords in BLOBs stored within their database, using APEX as the front-end.

I’d never worked with Oracle Text before, and did not find too many posts about it’s integration with APEX, so I thought I’d write up how I went about it, as it may be of interest to others.

Step 1: Privileges for schema owner

The schema owner where your app resides will need some grants before being able to use the Oracle Text packages. Ensure you’ve granted the following:

GRANT CTXAPP to [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_CLS TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_DDL TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_DOC TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_OUTPUT TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_QUERY TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_REPORT TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_THES TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_ULEXER TO [schema-owner];

Step 2: Create the index on your BLOB table

create index hdocsx on [table-name]([blob-column]) indextype is ctxsys.context
  parameters ('filter ctxsys.auto_filter');

I used the auto-filter parameter as this is recommended for PDF documents, which is what we happened to be working with. Read the Oracle documentation to ensure you use the filter that is right for your requirements.

Important: ensure your table to be indexed has a primary key defined before you create your index or it will be unusable for some of the Oracle Text functions such as MARKUP or SNIPPET, for example.

Step 3: Create an APEX report page, with a regular Report Region on your BLOB table

You can now include Oracle Text functions such as CONTAINS. For example, with a blob field ‘CONTENTS’, and a form search field ‘:P1_SEARCH’, you could use

select file_name, doc_id, xxx, yyy

from BLOB_TABLE

where CONTAINS(contents,’:P1_SEARCH’)>0

Such a query would return any documents that contain the word or search term the user entered in the P1_SEARCH field. So simple, yet so powerful!

Declarative support for BLOBS  allows you to easily include a file download link for your documents.

Step 4: Create a marked up version of the document

My client wanted us to highlight the search terms within the document so that the user could easily see why the document was returned in the search results. To do this, I created a popup form that would open when the user would click on a ‘Preview’ link.

The popup form had a single Dynamic PL/SQL region, and a single (hidden) item containing the document’s ID. The PL/SQL region had the following Source:

declare

 v_clob_selected   CLOB;
 v_read_amount     integer;
 v_read_offset     integer;
 v_buffer          varchar2(32767);
 v_query           varchar(2000);
 v_cursor          integer;

 begin

   htp.p('<b>HTML version with highlighted terms</b>');

   begin
     ctx_doc.markup (index_name => 'IDX_DOCS',  -- name of Oracle Text index created in Step 2
                     textkey    => : P5_ID,      -- hidden item on page containing ID (primary key) of BLOB table
                     text_query => : P1_SEARCH,  -- item containing the term the user searched for on page 1
                     restab     => v_clob_selected,
                     starttag   => '<i><font color=red>',  -- I highlighted the terms in red, feel free to use any other style!
                     endtag     => '</font></i>');

     v_read_amount := 32767;
     v_read_offset := 1;
     begin
      loop
        dbms_lob.read(v_clob_selected,v_read_amount,v_read_offset,v_buffer);
        htp.p(v_buffer);
        v_read_offset := v_read_offset + v_read_amount;
        v_read_amount := 32767;
      end loop;
     exception
      when no_data_found then
         null;
     end;

     exception
      when others then
        null; --showHTMLdoc(p_id);
   end;
end;

As simple as that! This procedure converts the BLOB into HTML, with the search terms marked up. It does the best it can, and may not work well on ALL document types, but most of the PDFs we worked with rendered nicely.

One could also provide a ‘SNIPPET’ of information in the results list, using CTX_DOCS.SNIPPET, ¬†that would display a snippet containing the search term, so that a user could scan a larger result set without having to open each document.

More information on the Markup and Snippet functions can be found in Oracle Text Reference.