Skip to content

Commit 1fcaeea

Browse files
committed
Add documentation for evaluatePython evaluation function
Includes architecture overview, evaluation pipeline, security model, key commands, testing framework, environment variables, and deployment details.
1 parent 2f4fdca commit 1fcaeea

1 file changed

Lines changed: 103 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# evaluatePython
2+
3+
A [Lambda Feedback](https://lambda-feedback.github.io/user-documentation/) evaluation function that executes student Python code submissions, runs them against test cases, and returns structured formative feedback. Deployed as a Docker container on the Lambda Feedback platform.
4+
5+
## Architecture
6+
7+
All source lives in `evaluation_function/`:
8+
9+
| File | Role |
10+
|------|------|
11+
| `main.py` | IPC server entry point; registers `evaluation_function` and `preview_function` with lf_toolkit |
12+
| `evaluation.py` | Core evaluation pipeline: security check → subprocess execution → output comparison → S3 plot upload → structured feedback |
13+
| `preview.py` | AST-based pre-execution security validator (`_SecurityVisitor`) |
14+
| `dev.py` | CLI wrapper for local manual testing |
15+
16+
### Evaluation pipeline (`evaluation.py`)
17+
18+
1. Run AST security check on student code
19+
2. For each test case (or once if none):
20+
- Inject matplotlib figure-capture preamble
21+
- Execute student code in a subprocess with 25-second timeout (`_TIMEOUT = 25`)
22+
- Compare stdout against `expected_output`
23+
3. Upload any captured matplotlib figures to S3 (`_UPLOAD_FOLDER = "evaluatePython"`)
24+
4. Return a `Result` with feedback tags: `pass`, `fail`, `hidden_pass`, `hidden_fail`, `error`, `output`, `summary`
25+
26+
### Request shape
27+
28+
```python
29+
# params["tests"] is optional
30+
{
31+
"tests": [
32+
{
33+
"input": "5\n", # stdin fed to student code
34+
"expected_output": "25\n", # expected stdout
35+
"hidden": False # if True, suppress expected/actual in feedback
36+
}
37+
]
38+
}
39+
```
40+
41+
### Security model (`preview.py`)
42+
43+
`_SecurityVisitor` walks the AST before any execution and blocks:
44+
45+
- **Modules**: `os`, `sys`, `subprocess`, `socket`, `urllib`, `http`, `requests`, `shutil`, `pathlib`, `ftplib`, `smtplib`, `ctypes`, `multiprocessing`, `threading`, `importlib`, `pickle`, `builtins`
46+
- **Builtins**: `exec`, `eval`, `compile`, `open`, `__import__`, `input`
47+
- **Dunder attribute access**: any `__attr__` style attribute
48+
49+
## Key commands
50+
51+
```bash
52+
# Install dependencies
53+
poetry install
54+
55+
# Run all tests
56+
pytest
57+
58+
# Lint (critical errors fail CI; style/complexity are informational)
59+
flake8 ./evaluation_function --count --select=E9,F63,F7,F82 --show-source --statistics
60+
flake8 ./evaluation_function --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
61+
62+
# Manual local testing
63+
python -m evaluation_function.dev "print(5*5)" "25"
64+
# With params JSON as third argument:
65+
python -m evaluation_function.dev "print(5*5)" "" '{"tests":[{"input":"","expected_output":"25\n"}]}'
66+
67+
# Docker build
68+
docker build -t evaluatepython .
69+
# Cross-platform (CI uses linux/x86_64):
70+
docker build --platform=linux/x86_64 .
71+
72+
# Run the server locally (port 8080)
73+
docker run -it --rm -p 8080:8080 evaluatepython
74+
```
75+
76+
## Tests
77+
78+
Two test files, run with `pytest`:
79+
80+
- `evaluation_function/evaluation_test.py` — integration tests covering: all pass, partial fail, hidden test failure, runtime error, no test cases
81+
- `evaluation_function/preview_test.py` — unit tests covering: valid Python, syntax errors, dangerous imports, dangerous builtins, dunder access
82+
83+
CI runs on Python 3.12 and uploads JUnit XML results (`.github/workflows/test-lint.yml`).
84+
85+
## Environment
86+
87+
| Variable | Value | Purpose |
88+
|----------|-------|---------|
89+
| `VIRTUAL_ENV` | `/app/.venv` | Set in Dockerfile |
90+
| `MPLBACKEND` | `Agg` | Set at subprocess runtime to suppress GUI |
91+
| `FUNCTION_COMMAND` | `python` | lf_toolkit runner |
92+
| `FUNCTION_ARGS` | `-m,evaluation_function.main` | lf_toolkit runner |
93+
| `FUNCTION_RPC_TRANSPORT` | `ipc` | lf_toolkit transport |
94+
| `LOG_LEVEL` | `debug` | Logging verbosity |
95+
| `AWS_*` / boto3 credentials | Runtime env | Required for S3 plot uploads |
96+
97+
Dependencies managed via Poetry; `.venv` is created in-project (`poetry.toml`).
98+
99+
## Deployment
100+
101+
- Push to `main` triggers GitHub Actions (`.github/workflows/`) which builds and deploys to Lambda Feedback automatically
102+
- The function name is declared in `config.json` as `EvaluationFunctionName: "evaluatePython"` (lowerCamelCase)
103+
- The base Docker image is `ghcr.io/lambda-feedback/evaluation-function-base/python:test-sandbox-3.12`

0 commit comments

Comments
 (0)