next up previous
Next: The servlet module Up: Implementation Previous: Implementation

The URL scrambler

In our prototype, the URL scrambler is written in Java and starts as a separate process. On the one hand, this may have performance disadvantages. On the other hand, the flexibility of Java made it possible to develop the prototype quickly.

The scrambler communicates with the Apache server over sockets. One communication channel is needed to connect with the document parser. The other channel connects with the module ``rewrite'' that rewrites requested URLs.

For each user connected to the server the scrambler maintains information about the session. This information consists of the SSL session id and a hash table of URL-SURL pairs, which ensures the fast lookup of requested SURLs. The SSL session id is a string with 64 hexadecimal characters. It is created by the SSL Apache module and can be assumed to be unique.

Each time a URL is sent, the scrambler tries to find an existing session. If no session is found a new session is created. Otherwise, the existing session is reused. The SURL that is created subsequently consists of the session id and a document id that is unique for this session.

Whenever an SURL is received, the scrambler retrieves the session id and checks whether this session id is valid, i.e. whether it is identical to the current SSL session id. In this case, a lookup follows into the appropriate hash table to find the URL belonging to this SURL. If there is a URL, it is returned. Otherwise, the SURL is returned unchanged, leading to an error message generated by the Web server.

For example, the URL in an HTML document might look like
whereas the generated SURL the scrambler produces might look like

next up previous
Next: The servlet module Up: Implementation Previous: Implementation
Tim Wellhausen