Web Sql


Written by Golan Weiss
02936205

Table of contents:

  1. WebSQL Presentation
  2. Why the current search engines are not enough?
  3. WWW Object Characteristics
  4. WWW as Distributed Objects
  5. Information on WWW
  6. Hyperlinks
  7. A Search Engine Query
  8. A DB Engine Query
  9. A webSQL Query
  10. Implementation
  11. Examples of WebSQL Web structure queries
  12. Anchor's usage in the WebSql
  13. A hypertext link Definition
  14. Download software



WebSQL Presentation

Why WebSQL? 


Back To table of contents

Why the current search engines are not enough?


A Typical WWW Spider

tex2html_wrap117

Lots More Info Out There

tex2html_wrap119


Search Engine

Back To table of contents

WWW as a DB Query Engine


WWW Objects

tex2html_wrap121

WWW Object Characteristics

WWW as Distributed Objects

tex2html_wrap123


Information on WWW

Information can be

WWW is ?

Hyperlinks

Yossi home page has links

Assumption: link distance related to conceptual relevance

A Search Engine Query

What are Yossi Matias's publications?

  1. Search on Yossi Matias.
  2. Refine search.
  3. Follow links browse through the web.

A different search pattern.

  1. Search on Yossi Matias intersect publications
  2. Follow links

A DB Engine Query

Where are Moti Matias's publications?

  SELECT * 
  FROM   PUBLICATIONS
  WHERE  AUTHOR = "Yossi Matias"

A DB Engine Query

Assume no PUBLICATIONS relation

  CREATE VIEW PUBLICATIONS
   SELECT *
   FROM   WWW_PAGES 
   WHERE  WWW_PAGES contains "publications" 

  SELECT d.url, d.title
  FROM   PUBLICATIONS
  WHERE  PUBLICATIONS contains "Curtis"

A webSQL Query

  SELECT d.url, d.title
  FROM   Document d SUCH THAT
         "http://www.infoseek.com" -> d, 
         Document p SUCH THAT
         d -> p
  WHERE  d.title contains "Yossi "  AND
         p.title contains "publications"

A webSQL Query, known location

  SELECT d.url, d.title
  FROM   Document d SUCH THAT
         "http://www.math.tau.edu" -> d, 
         Document p SUCH THAT
         d -> p
  WHERE  d.title contains "Yossi"  AND
         p.title contains "publications"

Back To table of contents


Implementation

WebSql presentation of the WebSQL compiler, query engine, and user interfaces.

Both the WebSQL compiler and query engine are implemented as a set of Java  classes, which form the WebSQL class library. The library can be used from any Java program.

The WebSQL system architecture is depicted in the following Figure .

  figure77
 The Architecture of the WebSQL System

The Compiler and Virtual Machine. The WebSQL compiler parses the query and translates it into a nested loop program in a custom-designed object language. The object program is executed by an interpreter that implements a stack machine. The evaluation of the range specified in the FROM clause is done via specially designed operation codes whose results are vectors of Document or Anchor tuples.

The Query Engine. Whenever the interpreter encounters an operation code corresponding to a range specifying condition, the query engine is invoked to perform the actual evaluation. Depending on the type of condition, this involves either sending a request to index servers or a depth-first traversal of a sub-part of the document network.

There are three different interfaces that allow us to use the language interactively. The simplest interface is an HTML form connected to a CGI script. The user can either fill in the form to assemble a query or type a complete WebSQL query directly. When the Submit button is pressed, the query is sent to the CGI script that invokes a stand-alone Java application running on our server. This application parses the query, and if no errors are found, hands it in to the query execution engine which produces the result as a list of tuples that gets formatted into an HTML table and is shipped back to the user. This interface, although slow and with limited user interaction, has the advantage that it can be used with any browser.

Back To table of contents

Examples with possible output

Find all documents accesible from the "ISG Technologies" home page only the documents in the same server will be accesible .
select d.url, d.title, d.type, d.length, d.modif from Document d SUCH THAT "http://www.isgtec.com" ->* d ;

Example

Find documents about aluminum.
select d.url, d.title from Document d such that d mentions "aluminum";

Output:

d.url d.title
http://www.cygnus.nb.ca/retail/universal/univrsl5.htmlUNIVERSAL SIGNS
http://altavista.software.digital.com/ AltaVista Software
http://www.drms.dla.mil/drmo/newengland/56800002.html56800002
http://www-cmrc.sri.com/CIN/sep-oct94/article02.htmlALUMINUM CHEMICALS
http://westpasco.com/members/Aluminum.htmlWest Pasco Chamber Of Commerce Member Directory
http://www.crisny.org/communities/colonie/government/gen.colonie.htmlPURCHASING DEPARTMENT - Sand for Ice Control
http://www.digital.com/
http://www.rmc.com/divs/rasco/areas/rascogrm.htmlRASCO - Grand Rapids, Michigan
http://www.gassprings.com/mc-as_.htmGuden Continuous Hinges -- Aluminum / Stainless Pin
http://www.metalogic.be/MatWeb/reading/mat-cor/al___ccc.htmAluminum Alloys : Corrosion Hazards Overview

Note: This result of this WebSQL query is constructed by sending the string pattern ("aluminum") to an index server. There is a default index server (currently AltaVista), but a different one can be selected by using the "define index" statement (see Language Reference).

Anchor's usage in the WebSql

Since we are interested not only in the contents of individual documents but also in the hypertext structure they generate, we need to take into account the links between documents. A link is characterized by the anchor document URL, the link label and the destination document URL. Therefore, we can consider all the links as tuples in a relation:

Anchor(base, label, href)

where base is the URL of the anchor document, label is the link's label and href is the URL of the destination document, all represented as character strings. Now we can pose queries that refer to the links present in documents.

select x.url from document x such that "http://www.math.tau.ac.il/~matias" =>|-> x, anchor y such that base = x where y.label contains "publications";
Back To table of contents

A hypertext link Definition

A hypertext link in an HTML document is said to be:

If we assign an arrow-like symbol to each of the three link types, we can write path regular expressions in a compact, intuitive way. Therefore, let #> denote an interior link, -> - a local link and => a global link. Also, let = denote the empty path. Path regular expressions are built from these symbols using concatenation, alternation (|) and repetition (*). For example, =|=>->* is a regular expression that represents the set containing the zero length path and all paths that start with a global link and continue with zero or more interior links.

Examples of WebSQL Web structure queries

Example

Find all documents directly accesible from the Computer Science department home page that reffer to graduate studies.
select x.url from document x such that "http://www.cs.tau.edu/" ->|=> x where x.url contains "grad" and not (x.url contains "undergrad");

Note: The expression ->|=> is a path regular expression that means local link (->) in the same server or global link (=>) to a remote server.

Example

Find all the computer science graduate students interested in databases on a remote or local server .
select x.url from document x such that "http://www.cs.tau.ca/homepages.html" =>|-> x where x.text contains "database";

Example

Find all documents related to Java and the documents directly accesible from them the documents accesible from them are at a remote host and not on the local server.
select y.url from document x such that x mentions "java", document y such that x => y;
Back To table of contents






How to download software?
This site holds the publications regarding the websql,download the software,project members and general documentation press the try it button.
Enjoy!!! Go to site http://www.cs.toronto.edu/~websql