|
| 1 | +# evaluatePython |
| 2 | + |
| 3 | +A [Lambda Feedback](https://lambda-feedback.github.io/user-documentation/) evaluation function that executes student Python code submissions, runs them against test cases, and returns structured formative feedback. Deployed as a Docker container on the Lambda Feedback platform. |
| 4 | + |
| 5 | +## Architecture |
| 6 | + |
| 7 | +All source lives in `evaluation_function/`: |
| 8 | + |
| 9 | +| File | Role | |
| 10 | +|------|------| |
| 11 | +| `main.py` | IPC server entry point; registers `evaluation_function` and `preview_function` with lf_toolkit | |
| 12 | +| `evaluation.py` | Core evaluation pipeline: security check → subprocess execution → output comparison → S3 plot upload → structured feedback | |
| 13 | +| `preview.py` | AST-based pre-execution security validator (`_SecurityVisitor`) | |
| 14 | +| `dev.py` | CLI wrapper for local manual testing | |
| 15 | + |
| 16 | +### Evaluation pipeline (`evaluation.py`) |
| 17 | + |
| 18 | +1. Run AST security check on student code |
| 19 | +2. For each test case (or once if none): |
| 20 | + - Inject matplotlib figure-capture preamble |
| 21 | + - Execute student code in a subprocess with 25-second timeout (`_TIMEOUT = 25`) |
| 22 | + - Compare stdout against `expected_output` |
| 23 | +3. Upload any captured matplotlib figures to S3 (`_UPLOAD_FOLDER = "evaluatePython"`) |
| 24 | +4. Return a `Result` with feedback tags: `pass`, `fail`, `hidden_pass`, `hidden_fail`, `error`, `output`, `summary` |
| 25 | + |
| 26 | +### Request shape |
| 27 | + |
| 28 | +```python |
| 29 | +# params["tests"] is optional |
| 30 | +{ |
| 31 | + "tests": [ |
| 32 | + { |
| 33 | + "input": "5\n", # stdin fed to student code |
| 34 | + "expected_output": "25\n", # expected stdout |
| 35 | + "hidden": False # if True, suppress expected/actual in feedback |
| 36 | + } |
| 37 | + ] |
| 38 | +} |
| 39 | +``` |
| 40 | + |
| 41 | +### Security model (`preview.py`) |
| 42 | + |
| 43 | +`_SecurityVisitor` walks the AST before any execution and blocks: |
| 44 | + |
| 45 | +- **Modules**: `os`, `sys`, `subprocess`, `socket`, `urllib`, `http`, `requests`, `shutil`, `pathlib`, `ftplib`, `smtplib`, `ctypes`, `multiprocessing`, `threading`, `importlib`, `pickle`, `builtins` |
| 46 | +- **Builtins**: `exec`, `eval`, `compile`, `open`, `__import__`, `input` |
| 47 | +- **Dunder attribute access**: any `__attr__` style attribute |
| 48 | + |
| 49 | +## Key commands |
| 50 | + |
| 51 | +```bash |
| 52 | +# Install dependencies |
| 53 | +poetry install |
| 54 | + |
| 55 | +# Run all tests |
| 56 | +pytest |
| 57 | + |
| 58 | +# Lint (critical errors fail CI; style/complexity are informational) |
| 59 | +flake8 ./evaluation_function --count --select=E9,F63,F7,F82 --show-source --statistics |
| 60 | +flake8 ./evaluation_function --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics |
| 61 | + |
| 62 | +# Manual local testing |
| 63 | +python -m evaluation_function.dev "print(5*5)" "25" |
| 64 | +# With params JSON as third argument: |
| 65 | +python -m evaluation_function.dev "print(5*5)" "" '{"tests":[{"input":"","expected_output":"25\n"}]}' |
| 66 | + |
| 67 | +# Docker build |
| 68 | +docker build -t evaluatepython . |
| 69 | +# Cross-platform (CI uses linux/x86_64): |
| 70 | +docker build --platform=linux/x86_64 . |
| 71 | + |
| 72 | +# Run the server locally (port 8080) |
| 73 | +docker run -it --rm -p 8080:8080 evaluatepython |
| 74 | +``` |
| 75 | + |
| 76 | +## Tests |
| 77 | + |
| 78 | +Two test files, run with `pytest`: |
| 79 | + |
| 80 | +- `evaluation_function/evaluation_test.py` — integration tests covering: all pass, partial fail, hidden test failure, runtime error, no test cases |
| 81 | +- `evaluation_function/preview_test.py` — unit tests covering: valid Python, syntax errors, dangerous imports, dangerous builtins, dunder access |
| 82 | + |
| 83 | +CI runs on Python 3.12 and uploads JUnit XML results (`.github/workflows/test-lint.yml`). |
| 84 | + |
| 85 | +## Environment |
| 86 | + |
| 87 | +| Variable | Value | Purpose | |
| 88 | +|----------|-------|---------| |
| 89 | +| `VIRTUAL_ENV` | `/app/.venv` | Set in Dockerfile | |
| 90 | +| `MPLBACKEND` | `Agg` | Set at subprocess runtime to suppress GUI | |
| 91 | +| `FUNCTION_COMMAND` | `python` | lf_toolkit runner | |
| 92 | +| `FUNCTION_ARGS` | `-m,evaluation_function.main` | lf_toolkit runner | |
| 93 | +| `FUNCTION_RPC_TRANSPORT` | `ipc` | lf_toolkit transport | |
| 94 | +| `LOG_LEVEL` | `debug` | Logging verbosity | |
| 95 | +| `AWS_*` / boto3 credentials | Runtime env | Required for S3 plot uploads | |
| 96 | + |
| 97 | +Dependencies managed via Poetry; `.venv` is created in-project (`poetry.toml`). |
| 98 | + |
| 99 | +## Deployment |
| 100 | + |
| 101 | +- Push to `main` triggers GitHub Actions (`.github/workflows/`) which builds and deploys to Lambda Feedback automatically |
| 102 | +- The function name is declared in `config.json` as `EvaluationFunctionName: "evaluatePython"` (lowerCamelCase) |
| 103 | +- The base Docker image is `ghcr.io/lambda-feedback/evaluation-function-base/python:test-sandbox-3.12` |
0 commit comments