IVLE - System Architecture ========================== Author: Matt Giuca Date: 10/12/2007 This document describes the high-level system architecture of IVLE, specifically with respect to the "pluggable clients" interface. Users and authorization ----------------------- We need some way to authenticate users and store information about a logged-in user. Whether they are stored in a database local to our system remains to be seen. Importantly, we need some way to send user information to the clients. This is discussed in the "pluggable clients" section. Pluggable clients ----------------- The IVLE system is largely just a collection of various components, called "clients", such as the file browser, text editor, console, tutorial sheets, etc. The architecture provides a common interface in which clients can be plugged in. Firstly, we want all HTML pages on the site to be generated with a common header. The easiest way to do this is to write our own Python handler which is common to the entire application (this replaces the standard handlers such as Publisher). This top-level handler handles all authentication (for instance, checking the session to see if a user is logged in properly and if not, redirecting to the login page). It then outputs the header, and calls the appropriate client based on the URL. The test of "whether the student is an Informatics student" is considered part of the authentication layer. (So students who are not enrolled in Informatics are treated the same way as a garbage username). Note that some clients ("login" and "exec") do not require authentication. This will be one of the properties of the client in the global clients file. Note that the handler does *not* perform authorization - that is left up to the clients. One special feature of the handler will be the ability to write an XHTML header (which includes the user's name and links to profile page, IVLE logo, and tabs for all the clients). This is important to keep a consistent interface between the clients. This header will be available upon request from the client. It is up to the client to NOT request a header for non-HTML content (or it will be ruined), and also not to request a header when executing student's code (ie. the exec module will never request a header). ### Plugin interface ### The top-level handler will keep a Python file (or a text, JSON, etc file) containing a list of valid clients. This is a dictionary mapping clients' internal names (the top-level directories, as described below in "URLs" and the "planned clients") to some other date about the clients (such as a friendly name to display in the tabs, and a boolean as to whether or not to display the client in the tabs). Part of the HTML header which the handler generates is a set of tabs linking to all of the clients in this list, or at least the ones with "show in tabs" turned on. Clients such as "exec" and "admin" will not have a tab. Each client will be located physically in a directory "clients", in a subdirectory of the client's name. (eg. the console is located in "clients/console"). There *must* be a file in this directory called **client.py**. This file is called by the handler for most requests. All requests will go through the handler. Note that there is some media (such as CSS, JavaScript and image files which are directly part of the application itself), which we do not want to pass through the handler. These will be placed in a special top-level directory, which Apache will be told to serve directly. (eg. "/media"). This means that the contents of each client directory is a Python program *only*, and contains no files accessible by the browser. It consists of client.py, plus any Python files imported by client.py (but none of these files will directly serve web content). Inside client.py, there is a fixed interface which all clients must follow. Firstly, there is a set of information which the handler must pass to the client in numerous calls - such as username, URL, and nicely split up parts of the URL such as the path, the GET variables, and also the POST data, as well as mod_python's low-level Request object. This information is encapsulated into an object and passed as a single argument to the client handling functions. Note that as stated above, the handler may need to insert HTML contents into the output stream. Instead of having two separate function calls (a call to find the mime type and a call to get content), we'll simply provide a wrapper object to the client where the client can make callbacks to. To this end, the client receives an object containing all of the information, as well as an object with some methods to call. The handler passes this to a function in client.py, `handle`. The callback object contains the following methods: * set_mime_type(string) - Sets the output mime type. May be called any number of times (including 0, will default to HTML), but may not be called after any writing has been done. * set_status(string) - Sets the HTTP response status. The string is a numeric code followed by a description, for example "404 File Not Found". May not be called after any writing has been done. * set_location(string) - Sets the Location field of the HTTP response to a new URL. For use with 300-level HTTP response codes. May not be called after any writing has been done. * write_html_headers() - Writes the general site headers to the output stream. May not be called after any writing has been done. * write(string) - Writes raw data to the output. Note that this is very similar to the CGI interface, but much higher level (we have functions to call instead of writing strings, and we send the GET and POST data in a packaged object instead of environment variables and stdin). Note that, as with CGI, there is a "cutoff point" during the processing (immediately when the first call to `write` or `write_html_headers` is made) - in which the response headers are written to the server. ### Help files ### There will be a "help" app which is special in that it goes inside all of the other apps directories looking for a help file. So aside from "client.py", another special file is "help.html" which is a static help file for each module, sitting in that app's top-level directory. help.html is not to be served directly. The "help" app will embed it within another page. Therefore it is not a real HTML file - it should just be the inside of a body (it should not contain html or body tags). ### Application directory hierarchy ### Due to the handler, we have a nice property that the application directory hierarchy is completely removed from the apparent hierarchy on the web. This has two opportunities: we can call the applications (in their directory hierarchy) a different name than the URL suggests, and also we can lay out the directory hierarchy with developers interests in mind. We capitalise on the first issue by mapping the "action" (url name) of a client to the actual name. (Clients are indexed by url-name so they can be looked up when a URL is requested). The proposed application directory hierarchy is: / /clients - All clients go in here /clients/myclient - "actual" names of the clients /dispatch - Code files for the top-level dispatch /dispatch.py - Entrypoint for the top-level dispatch /media - Publically viewable files (Note that this directory hierarchy maps onto the web site) /media/myclient - media files specific to each client go in a subdir /media/dispatch - media files for the top-level dispatch /conf - Special .py files which hold configuration info (for the admin to edit, not the programmers). URLs ---- It would be good if we had full control of URLs and were able to make them "nice" at all times. The criteria for "nice" URLs are as follows: * The paths in the URLs reflect a sensible hierarchy of where you are in the program at the current time. * The URLs do not contain any file extensions for the pages (no .html or .py), although linked files such as CSS, JavaScript and image files should have appropriate file extensions. * The URLs do not contain unnecessary garbage arguments, and preferably no GET arguments at all (for instance, the file browser will specify the path to browse in the actual URL path, not the GET arguments. * The URL does not contain the student's login name. This is implicit in the browser session. (This requirement allows for us to link to URLs in documentation which will work for any student). (Note that URLs may contain other students login names for browsing their work - this is determined by the individual clients). The top-level directory given in the URL determines the client which the handler will pass off to. For instance, http://www.example.com/ivle/console Since IVLE is located at `http://www.example.com/ivle`, it will consider the "top-level directory" to be "console", and therefore will call the client whose action is "console". This may not be the actual name of the client. For example, the "edit" action maps onto the "editor" client, while the "serve" action maps onto the "exec" client. (Perhaps it is best for simplicity if these do in fact correspond). For another example, consider the file browser (action name "files"). The URL may have subdirectories after it which indicate the path to explore. This will be detailed in the clients section below. An example of a browse URL is: http://www.example.com/ivle/files/jdoe/151/proj1/ In this instance, the handler will see the top-level directory as "files", and will therefore link to the file browser client. The file browser client will then receive the additional arguments passed to it in some way, which in this case are "jdoe/151/proj1/". The file browser client will then handle this path and serve up the correct directory. ### Relative URLs inside HTML content ### It is a requirement that the application can be placed anywhere in a web server's directory hierarchy, not just at the top level. This means HTML should never contain absolute URLs (beginning with, eg, "/browse"). Then it would need to be in the site root. To solve the problem of how to generate URLs, one of the fields the handler will pass into the clients (which it will read from a config file somewhere) will be the "site root". This may be "/ivle", for instance. Therefore all absolute URLs generated by the applications must be prepended with the "site root". (In our case the site root will probably be "/", but it's a good feature to have). ### Student's directory hierarchy, common code ### Many clients share the concept of exploring the student's directory hierarchy, as explained above for the browser module. The common code for handling the student id or group name (etc) and authorization will be available as a separate module for all such clients (browser, editor, exec) to use. Planned Clients --------------- ### File Browser, Text Editor and Executor ### Three of the most important clients are the file browser ("browser"), text editor ("editor") and executor ("exec"). These three share a commonality in that they all access the student's directory hierarchy and files. They all share a lot of code in common, and in particular, there is a common server-side handler for file access, directory listings and subversion. Firstly, every file and directory is classified into one of the following categories (based on its inferred MIME type and possibly whether it contains invalid Unicode characters): 1. Directory 2. Image 3. Audio 4. Text file (unless it fits the above, eg, SVG files) 5. Any other binary file How each of these is handled depends on which of the 3 clients is accessing the file. #### File Browser #### Name: `browser` Action name: `files` Tab name: "Files" 1. Directory - Displays a directory listing (this is its primary purpose). 2. Image - Displays the image inside the main navigation interface. 3. Audio - (non-core) Provides a streaming audio player within the main navigation interface. 4. Text file - Redirect to edit. 5. Binary file - Provides a download link within the main navigation interface. Note that no matter what, using browser will remain within the navigation interface so you will never be "lost" inside a raw image or something. It also will not throw binary files as downloads directly to you. Note that the src of the image tag in (2) and the href of the download link in (5) will simply be links to the exec version of the same file. File browser will include the Python file which serves up JSON responses to requests for directory hierarchies, and performs SVN and file access commands. This file will be used by the text editor (at least) and possibly exec. #### Text Editor #### Name: `editor` Action name: `edit` Tab name: "Edit" No matter what, editor provides a text area (with advanced editing capabilities and syntax highlighting) for any file, even if it is binary. The only exception is directories, which redirect to browser. Note that it will not be possible to click into the editor for a binary file (the browser will not offer an edit link). However, it will still be possible to manually nav there, and then you handle the shock yourself. #### Executor #### Name: `exec` Action name: `serve` Tab name: (not shown) The executor is used to directly serve files out of a student's directory, as if it was a standard web server. (It can be thought of as a little web server inside IVLE). This means that: * A whitelist of file types is kept which simply are served up raw. This includes HTML, JavaScript, CSS, all reasonable image and audio formats, etc. * Special "executable" file types (.py, .psp). Exec will call popen on a Python process which loads a mod_python handler, cgihandler or psphandler on the given file. * HTTP errors for banned files. * When presented with a directory, it first tries to execute `__init__.py` (the default item for the directory). It could also look for `index.html` or `index.psp` if that failed. Failing that, it returns an HTTP 403 Forbidden error. ### Console ### Name: `console` Action name: `console` Tab name: "Console" ### Tutorial Pages ### Name: `tutorial` Action name: `tutorial` Tab name: "Tutorial" ### Administration ### Name: `admin` Action name: `admin` Tab name: (not shown) Client checks authorization for admin status. Tab is not shown so students will not normally know about this (but even if they find it they will be denied access). ### Login ### Name: `login` Action name: `login` Tab name: (not shown) Authentication not required. Presents a login box. Other similar clients are "logout" (which just immediately logs the current user out and redirects to the main page), and "profile" (user settings).