8
by mattgiuca
doc: Added directory "notes", with all the design and research I've done so |
1 |
IVLE Design - Python Execution Environment |
2 |
========================================== |
|
3 |
||
4 |
Author: Matt Giuca |
|
5 |
Date: 3/12/2007 |
|
6 |
||
7 |
Python scripts can be executed from IVLE in two ways. |
|
8 |
||
9 |
1. A persistent, interactive environment. |
|
10 |
2. A stateless CGI environment. |
|
11 |
||
12 |
These two methods diverge because it isn't really possible to have an |
|
13 |
interactive environment (as in user interacts with stdin) as well as allow CGI |
|
14 |
programs to run. |
|
15 |
||
16 |
The interactive environment will be experienced from the web interface through |
|
17 |
an Ajax-based console, rather than directly printing output to the screen. |
|
18 |
||
19 |
The CGI environment will be experienced by executing the CGI on the server and |
|
20 |
directing its output directly to the browser. |
|
21 |
||
22 |
Ways to execute code |
|
23 |
-------------------- |
|
24 |
||
25 |
There are a number of different ways to run Python from inside the web |
|
26 |
environment. Each method launches either the interactive environment or the |
|
27 |
CGI environment. |
|
28 |
||
29 |
Note: REPL refers to the interactive Read-Eval-Print Loop that Python gives |
|
30 |
you if you run it with -i. |
|
31 |
||
32 |
1. Navigate the browser directly to a .py file. Executes the program in the |
|
33 |
CGI environment. |
|
34 |
2. Click "execute" on a .py file from the file browser. This links the browser |
|
35 |
to the .py file (possibly in a new tab), resulting in the same as #1. The |
|
36 |
Editor screen could also have a "run in browser" button which opens the |
|
37 |
file in CGI mode in a new browser tab. |
|
38 |
3. Go to the Console screen. This launches the INTERACTIVE environment without |
|
39 |
loading any modules (with REPL). |
|
40 |
4. Perhaps, the file browser also has a "run in console" button. This goes to |
|
41 |
the console screen loading the file as an argument in INTERACTIVE mode |
|
42 |
(this could be with or without REPL - I'd say with). |
|
43 |
5. The Editor screen should be divided into 2, with an editor at the top and a |
|
44 |
console at the bottom. A "run" button would launch a new Python session in |
|
45 |
the console at the bottom, running in INTERACTIVE mode (without REPL). |
|
46 |
6. The Reference / Tutorial screen consists of a number of small edit box / |
|
47 |
console pairs. The "run" button runs the application inside the small |
|
48 |
console window in INTERACTIVE mode (without REPL). |
|
49 |
||
50 |
The Interactive Environment |
|
51 |
--------------------------- |
|
52 |
||
53 |
The interactive environment will be: |
|
54 |
||
55 |
* Wrapped in an Ajax console for interaction. |
|
56 |
* Persistent. The user is given a Python environment which lets them |
|
57 |
interact with the Python interpreter console, as well as interact with their |
|
58 |
running programs through stdin. |
|
59 |
* Optionally presents a `python -i' environment (depending on how it is run - |
|
60 |
see use cases above, this is REPL mode). |
|
61 |
* Not designed to handle CGI apps. If you run a CGI app in this environment, |
|
62 |
you just see the CGI headers (running as a normal program) (see below for |
|
63 |
discussion). |
|
64 |
* Given a large but finite CPU time limit (possibly a minute). We |
|
65 |
want to let the user continue interacting for some time, but kill the |
|
66 |
process if the user puts it into an endless loop and fails to terminate it. |
|
67 |
||
68 |
### Discussion: Persistent console vs controlled pickled sessions ### |
|
69 |
||
70 |
There are two possible ways to handle giving students an interactive Python |
|
71 |
session. |
|
72 |
||
73 |
The important thing we need to have is a persistent session - that is, it |
|
74 |
remembers the loaded modules and variables (the environment) from one command |
|
75 |
to the next, so if you type "a = 4"; "a", it prints "4". |
|
76 |
||
77 |
The simplest possibility is to just give students their own Python process |
|
78 |
which they can use however they want. |
|
79 |
||
80 |
The other alternative discussed was to simulate a persistent environment but |
|
81 |
actually create a new Python session each time a command is executed. This |
|
82 |
would be done by pickling and unpickling the symbol table, storing the pickled |
|
83 |
file as a "persistent environment" local to each user. |
|
84 |
||
85 |
This has the following advantages compared to the persistent console: |
|
86 |
||
87 |
* Better memory usage as only students who are currently executing commands |
|
88 |
take up memory. |
|
89 |
* Better load balancing, as each command could be run on a different machine. |
|
90 |
* Ability for our software to differentiate between the interactive REPL |
|
91 |
environment and the running process. This would make it easy to do the CGI |
|
92 |
handling described below (which might be difficult or impossible with a |
|
93 |
dedicated Python process). Also we can give each command 1 second of CPU |
|
94 |
time instead of giving the entire process 1 minute of CPU time, which is |
|
95 |
fairer and more sensible. |
|
96 |
||
97 |
But the following disadvantages: |
|
98 |
||
99 |
* Much harder to implement. Possibility of introducing bugs since we are |
|
100 |
effectively trying to emulate a Python REPL environment. |
|
101 |
* All of a given student's Python windows would be interfering with one |
|
102 |
another as they would all share the one environment. |
|
103 |
* Possibly huge time overhead for loading and unloading the environment each |
|
104 |
time. |
|
105 |
* Possibly not that great a saving as the main memory overhead of a persistent |
|
106 |
Python process is shared between all users anyway. |
|
107 |
||
108 |
I think we decided to go with a single persistent environment for each |
|
109 |
student. |
|
110 |
||
111 |
### Discussion: Handling CGI output ### |
|
112 |
||
113 |
If the first thing the program outputs is CGI headers, and one of those is a |
|
114 |
Content-Type header, why not handle it specially. Split the screen in half |
|
115 |
horizontally, with the console at the top (still able to receive input from |
|
116 |
stdin in the console). The bottom half would present the content rendered |
|
117 |
correctly in the browser as HTML, Image content, etc. |
|
118 |
||
119 |
This feature should be optional, as you might want to see the raw CGI output. |
|
120 |
||
121 |
Also it could cause problems in REPL mode - ideally the program's stdout could |
|
122 |
go to the CGI content handler while the stdout caused by the REPL environment |
|
123 |
could go to the console. This would be ideal, as students could then type |
|
124 |
print statements directly into the REPL and see the results printed live into |
|
125 |
the browser. |
|
126 |
||
127 |
A possibly-friendlier alternative is to do this the other way round. In the |
|
128 |
CGI environment, if no Content-Type is detected, rather than just printing the |
|
129 |
output to the browser, perhaps it should launch the full console (even for |
|
130 |
public access visitors?) This way you can present interactive non-CGI programs |
|
131 |
in a production environment. |
|
132 |
||
133 |
Ultimately this could just mean that whether the environment is interactive or |
|
134 |
CGI depends on whether a Content-Type is given, rather than how the session |
|
135 |
was launched (though we also want to be able to have an interactive mode with |
|
136 |
CGI programs that let you see the raw output). |
|
137 |
||
138 |
The CGI Environment |
|
139 |
------------------- |
|
140 |
||
141 |
The CGI environment will be: |
|
142 |
||
143 |
* Sent directly to the browser through the CGI. |
|
144 |
* Stateless. Programs are run directly and are expected to terminate |
|
145 |
immediately. |
|
146 |
* Wrapped in a default content type. Because programs executed in the CGI |
|
147 |
environment are sent straight to the browser, they are expected to produce |
|
148 |
CGI output. As a bare minimum, this must include a newline at the start of |
|
149 |
the file to signal to the web server that the headers are finished. |
|
150 |
The upshot of this is that it can execute non-CGI programs as well, only |
|
151 |
they are non-interactive and just print their output in plain text into the |
|
152 |
browser. |
|
153 |
* Given a short CPU time limit (possibly a second). This is just for the |
|
154 |
execution of a single program, and it should not take long at all (after |
|
155 |
all, it is a web service). |
|
156 |
||
157 |
### Discussion: What if the student needs more CPU time? ### |
|
158 |
||
159 |
They should not need more CPU time when running a web process because it isn't |
|
160 |
friendly to visitors anyway. If they need to, they can do preprocessing of the |
|
161 |
data, and use the interactive environment which gives them a lot more CPU |
|
162 |
time. |
|
163 |
||
164 |
Perhaps there could be a way for the students to request more CPU time on a |
|
165 |
limited basis (for one run?), like "nice" but in the web interface. (Note: |
|
166 |
infeasible if it's a web application since we'll want an arbitrary number of |
|
167 |
accesses). |
|
168 |
||
169 |
### Discussion: How does CGI work when you don't give a content type? ### |
|
170 |
||
171 |
The server gives a 500 Internal Server Error if non-header data is found |
|
172 |
before the first blank line, so a non-CGI console program by itself will |
|
173 |
usually give this error. |
|
174 |
||
175 |
If there is a valid blank line but no Content-Type header, Apache will apply a |
|
176 |
default Content-Type of text/plain (verified empirically). However the CGI |
|
177 |
specification seems to state that a Content-Type is required. |
|
178 |
||
179 |
> A full document with a corresponding MIME type |
|
180 |
> |
|
181 |
> In this case, you must tell the server what kind of document you will be |
|
182 |
> outputting via a MIME type. |
|
183 |
||
184 |
[(source)](http://hoohoo.ncsa.uiuc.edu/cgi/primer.html) |
|
185 |
||
186 |
Therefore we can write a wrapper which looks at the output and acts |
|
187 |
accordingly: |
|
188 |
||
189 |
* If it begins with data which is invalid headers, assume this is a non-CGI |
|
190 |
program and insert Content-Type: text/plain. |
|
191 |
* If it begins with a blank line (no headers), it would be safest to make the |
|
192 |
same assumption. |
|
193 |
* If it begins with valid headers INCLUDING Content-Type, let it pass as-is |
|
194 |
(assume it is a valid CGI program). |
|
195 |
* If it begins with valid headers but not including Content-Type, probably |
|
196 |
best to just let it through anyway and let Apache handle it. |