Skip to content

Commit 4cb996e

Browse files
committed
Add unit_test mode to evaluation function with examples and updated documentation
Implemented `unit_test` mode to allow execution of teacher-defined test functions or `unittest.TestCase` subclasses. Refactored evaluation pipeline, updated CLI to handle `unit_test` params, and expanded local testing support. Revised documentation across `README.md`, `CLAUDE.md`, `user.md`, and `dev.md` with detailed usage instructions, schemas, and examples for all modes.
1 parent 61a5b84 commit 4cb996e

5 files changed

Lines changed: 494 additions & 162 deletions

File tree

CLAUDE.md

Lines changed: 27 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -16,26 +16,38 @@ All source lives in `evaluation_function/`:
1616
### Evaluation pipeline (`evaluation.py`)
1717

1818
1. Run AST security check on student code
19-
2. For each test case (or once if none):
20-
- Inject matplotlib figure-capture preamble
21-
- Execute student code in a subprocess with 25-second timeout (`_TIMEOUT = 25`)
22-
- Compare stdout against `expected_output`
19+
2. Dispatch by `params["mode"]` (required):
20+
- **`demo`**: execute code with no stdin; return stdout/plots as `output` feedback (no pass/fail)
21+
- **`io_test`**: for each test in `params["tests"]`, execute with `test["input"]` as stdin and compare stdout against `test["expected_output"]`; upload matplotlib plots on pass or fail
22+
- **`unit_test`**: append `params["test_code"]` + unit-runner harness to student code; execute once; parse JSON results; supports plain `test_*` functions, `unittest.TestCase` subclasses, and Hypothesis-based tests
2323
3. Upload any captured matplotlib figures to S3 (`_UPLOAD_FOLDER = "evaluatePython"`)
24-
4. Return a `Result` with feedback tags: `pass`, `fail`, `hidden_pass`, `hidden_fail`, `error`, `output`, `summary`
24+
4. Return a `Result` with feedback tags: `pass`, `fail`, `hidden_fail`, `error`, `output`, `summary`
2525

2626
### Request shape
2727

2828
```python
29-
# params["tests"] is optional
29+
# params["mode"] is required
30+
31+
# demo — run and show output, no pass/fail
32+
{"mode": "demo"}
33+
34+
# io_test — run against stdin/stdout test cases
3035
{
36+
"mode": "io_test",
3137
"tests": [
3238
{
33-
"input": "5\n", # stdin fed to student code
39+
"input": "5\n", # stdin fed to student code
3440
"expected_output": "25\n", # expected stdout
35-
"hidden": False # if True, suppress expected/actual in feedback
41+
"hidden": False # True = suppress input/output in feedback
3642
}
3743
]
3844
}
45+
46+
# unit_test — run student code then execute test functions/TestCases
47+
{
48+
"mode": "unit_test",
49+
"test_code": "def test_square():\n assert square(5) == 25\n"
50+
}
3951
```
4052

4153
### Security model (`preview.py`)
@@ -59,10 +71,12 @@ pytest
5971
flake8 ./evaluation_function --count --select=E9,F63,F7,F82 --show-source --statistics
6072
flake8 ./evaluation_function --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
6173

62-
# Manual local testing
63-
python -m evaluation_function.dev "print(5*5)" "25"
64-
# With params JSON as third argument:
65-
python -m evaluation_function.dev "print(5*5)" "" '{"tests":[{"input":"","expected_output":"25\n"}]}'
74+
# Manual local testing (defaults to demo mode)
75+
python -m evaluation_function.dev "print(5*5)"
76+
# io_test mode with params JSON:
77+
python -m evaluation_function.dev "print(5*5)" "" '{"mode":"io_test","tests":[{"input":"","expected_output":"25\n"}]}'
78+
# unit_test mode:
79+
python -m evaluation_function.dev "def sq(n): return n*n" "" '{"mode":"unit_test","test_code":"def test_sq():\n assert sq(3)==9\n"}'
6680

6781
# Docker build
6882
docker build -t evaluatepython .
@@ -77,7 +91,7 @@ docker run -it --rm -p 8080:8080 evaluatepython
7791

7892
Two test files, run with `pytest`:
7993

80-
- `evaluation_function/evaluation_test.py` — integration tests covering: all pass, partial fail, hidden test failure, runtime error, no test cases
94+
- `evaluation_function/evaluation_test.py` — integration tests covering: all modes (demo, io_test, unit_test), all pass, partial fail, hidden test failure, runtime error, matplotlib plot capture, Hypothesis support
8195
- `evaluation_function/preview_test.py` — unit tests covering: valid Python, syntax errors, dangerous imports, dangerous builtins, dunder access
8296

8397
CI runs on Python 3.12 and uploads JUnit XML results (`.github/workflows/test-lint.yml`).

README.md

Lines changed: 89 additions & 122 deletions
Original file line numberDiff line numberDiff line change
@@ -1,194 +1,161 @@
1-
# evaluatePython Evaluation Function
1+
# evaluatePython
22

3-
This repository uses an existing autograder to provide formative feedback on Python code.
3+
A [Lambda Feedback](https://lambda-feedback.github.io/user-documentation/) evaluation function that executes student Python code submissions in a secure sandbox, runs them against test cases, and returns structured formative feedback. Deployed as a Docker container on the Lambda Feedback platform.
44

55
## Deployment
6+
67
[![Create Release Request](https://img.shields.io/badge/Create%20Release%20Request-blue?style=for-the-badge)](https://github.com/lambda-feedback/evaluatePython/issues/new?template=release-request.yml)
7-
To deploy to production, update the README button above to point to the correct repository.
88

9+
Push to `main` triggers GitHub Actions which automatically builds and deploys to Lambda Feedback. See [`.github/workflows/`](.github/workflows/) for CI/CD configuration.
910

1011
## Usage
1112

12-
You can run the evaluation function either using [the pre-built Docker image](#run-the-docker-image) or build and run [the binary executable](#build-and-run-the-binary).
13-
1413
### Run the Docker Image
1514

16-
The pre-built Docker image comes with [Shimmy](https://github.com/lambda-feedback/shimmy) installed.
17-
18-
> [!TIP]
19-
> Shimmy is a small application that listens for incoming HTTP requests, validates the incoming data and forwards it to the underlying evaluation function. Learn more about Shimmy in the [Documentation](https://github.com/lambda-feedback/shimmy).
20-
21-
The pre-built Docker image is available on the GitHub Container Registry. You can run the image using the following command:
22-
2315
```bash
24-
docker run -p 8080:8080 ghcr.io/lambda-feedback/evaluation-function-boilerplate-python:latest
16+
docker run -it --rm -p 8080:8080 ghcr.io/lambda-feedback/evaluatepython:latest
2517
```
2618

27-
### Run the Script
19+
The image includes [Shimmy](https://github.com/lambda-feedback/shimmy), which listens for HTTP requests on port 8080 and forwards them to the evaluation function.
2820

29-
You can choose between running the Python evaluation function itself, ore using Shimmy to run the function.
21+
### Evaluation Modes
3022

31-
**Raw Mode**
23+
The function supports three modes, set via `params.mode`.
3224

33-
Use the following command to run the evaluation function directly:
25+
**`demo`** — run student code and show output (no pass/fail):
3426

35-
```bash
36-
python -m evaluation_function.main
27+
```json
28+
{
29+
"response": "print(5 * 5)",
30+
"params": { "mode": "demo" }
31+
}
3732
```
3833

39-
This will run the evaluation function using the input data from `request.json` and write the output to `response.json`.
34+
**`io_test`** — compare stdout against expected output for each test case:
4035

41-
**Shimmy**
42-
43-
To have a more user-friendly experience, you can use [Shimmy](https://github.com/lambda-feedback/shimmy) to run the evaluation function.
36+
```json
37+
{
38+
"response": "n = int(input())\nprint(n * n)",
39+
"params": {
40+
"mode": "io_test",
41+
"tests": [
42+
{ "input": "5\n", "expected_output": "25\n" },
43+
{ "input": "3\n", "expected_output": "9\n", "hidden": true }
44+
]
45+
}
46+
}
47+
```
4448

45-
To run the evaluation function using Shimmy, use the following command:
49+
**`unit_test`**run student code then execute `test_*` functions or `unittest.TestCase` subclasses (including Hypothesis tests):
4650

47-
```bash
48-
shimmy -c "python" -a "-m" -a "evaluation_function.main" -i ipc
51+
```json
52+
{
53+
"response": "def square(n): return n * n",
54+
"params": {
55+
"mode": "unit_test",
56+
"test_code": "def test_positive():\n assert square(5) == 25\ndef test_zero():\n assert square(0) == 0\n"
57+
}
58+
}
4959
```
5060

5161
## Development
5262

5363
### Prerequisites
5464

55-
- [Docker](https://docs.docker.com/get-docker/)
56-
- [Python](https://www.python.org)
65+
- [Python 3.12+](https://www.python.org)
66+
- [Poetry](https://python-poetry.org)
67+
- [Docker](https://docs.docker.com/get-docker/) (for container builds)
5768

5869
### Repository Structure
5970

60-
```bash
61-
evaluation_function/main.py # evaluation function entrypoint
62-
evaluation_function/evaluation.py # evaluation function implementation
63-
evaluation_function/evaluation_test.py # evaluation function tests
64-
evaluation_function/preview.py # evaluation function preview
65-
evaluation_function/preview_test.py # evaluation function preview tests
66-
67-
config.json # evaluation function deployment configuration file
6871
```
69-
70-
### Development Workflow
71-
72-
In its most basic form, the development workflow consists of writing the evaluation function in the `evaluation_function.wl` file and testing it locally. As long as the evaluation function adheres to the Evaluation Function API, a development workflow which incorporates using Shimmy is not necessary.
73-
74-
Testing the evaluation function can be done by running the `dev.py` script using the Python interpreter like so:
75-
76-
```bash
77-
python -m evaluation_function.dev <response> <answer>
72+
evaluation_function/main.py # IPC server entry point
73+
evaluation_function/evaluation.py # core evaluation pipeline (all three modes)
74+
evaluation_function/preview.py # AST-based security validator
75+
evaluation_function/dev.py # CLI wrapper for local testing
76+
evaluation_function/evaluation_test.py # integration tests
77+
evaluation_function/preview_test.py # preview/security tests
78+
config.json # deployment configuration
7879
```
7980

80-
> [!NOTE]
81-
> Specify the `response` and `answer` as command-line arguments.
82-
83-
### Building the Docker Image
84-
85-
To build the Docker image, run the following command:
81+
### Setup
8682

8783
```bash
88-
docker build -t my-python-evaluation-function .
84+
poetry install
8985
```
9086

91-
### Running the Docker Image
87+
### Local Testing
9288

93-
To run the Docker image, use the following command:
89+
The `dev.py` script calls the evaluation function directly (no Docker required). It defaults to `demo` mode if no params are supplied:
9490

9591
```bash
96-
docker run -it --rm -p 8080:8080 my-python-evaluation-function
97-
```
98-
99-
This will start the evaluation function and expose it on port `8080`.
100-
101-
## Deployment
102-
103-
This section guides you through the deployment process of the evaluation function. If you want to deploy the evaluation function to Lambda Feedback, follow the steps in the [Lambda Feedback](#deploy-to-lambda-feedback) section. Otherwise, you can deploy the evaluation function to other platforms using the [Other Platforms](#deploy-to-other-platforms) section.
104-
105-
### Deploy to Lambda Feedback
92+
# demo mode (default)
93+
python -m evaluation_function.dev "print(5 * 5)"
10694

107-
Deploying the evaluation function to Lambda Feedback is simple and straightforward, as long as the repository is within the [Lambda Feedback organization](https://github.com/lambda-feedback).
95+
# io_test mode
96+
python -m evaluation_function.dev "print(int(input())**2)" "" \
97+
'{"mode":"io_test","tests":[{"input":"5\n","expected_output":"25\n"}]}'
10898

109-
After configuring the repository, a [GitHub Actions workflow](.github/workflows/deploy.yml) will automatically build and deploy the evaluation function to Lambda Feedback as soon as changes are pushed to the main branch of the repository.
110-
111-
**Configuration**
112-
113-
The deployment configuration is stored in the `config.json` file. Choose a unique name for the evaluation function and set the `EvaluationFunctionName` field in [`config.json`](config.json).
114-
115-
> [!IMPORTANT]
116-
> The evaluation function name must be unique within the Lambda Feedback organization, and must be in `lowerCamelCase`. You can find a example configuration below:
117-
118-
```json
119-
{
120-
"EvaluationFunctionName": "compareStringsWithPython"
121-
}
99+
# unit_test mode
100+
python -m evaluation_function.dev "def square(n): return n*n" "" \
101+
'{"mode":"unit_test","test_code":"def test_sq():\n assert square(3)==9\n"}'
122102
```
123103

124-
### Deploy to other Platforms
125-
126-
If you want to deploy the evaluation function to other platforms, you can use the Docker image to deploy the evaluation function.
127-
128-
Please refer to the deployment documentation of the platform you want to deploy the evaluation function to.
129-
130-
If you need help with the deployment, feel free to reach out to the Lambda Feedback team by creating an issue in the template repository.
131-
132-
## FAQ
133-
134-
### Pull Changes from the Template Repository
135-
136-
If you want to pull changes from the template repository to your repository, follow these steps:
137-
138-
1. Add the template repository as a remote:
104+
### Running Tests
139105

140106
```bash
141-
git remote add template https://github.com/lambda-feedback/evaluation-function-boilerplate-python.git
107+
pytest
142108
```
143109

144-
2. Fetch changes from all remotes:
110+
### Linting
145111

146112
```bash
147-
git fetch --all
113+
# Critical errors (fail CI)
114+
flake8 ./evaluation_function --count --select=E9,F63,F7,F82 --show-source --statistics
115+
# Style/complexity (informational)
116+
flake8 ./evaluation_function --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
148117
```
149118

150-
3. Merge changes from the template repository:
119+
### Building the Docker Image
151120

152121
```bash
153-
git merge template/main --allow-unrelated-histories
122+
docker build -t evaluatepython .
123+
# Cross-platform (CI uses linux/x86_64):
124+
docker build --platform=linux/x86_64 -t evaluatepython .
154125
```
155126

156-
> [!WARNING]
157-
> Make sure to resolve any conflicts and keep the changes you want to keep.
158-
159-
## Troubleshooting
160-
161-
### Containerized Evaluation Function Fails to Start
127+
### Running the Docker Image
162128

163-
If your evaluation function is working fine when run locally, but not when containerized, there is much more to consider. Here are some common issues and solution approaches:
129+
```bash
130+
docker run -it --rm -p 8080:8080 evaluatepython
131+
```
164132

165-
**Run-time dependencies**
133+
## Deployment to Lambda Feedback
166134

167-
Make sure that all run-time dependencies are installed in the Docker image.
135+
The function name is declared in [`config.json`](config.json) as `"evaluatePython"` (lowerCamelCase). Pushing to `main` triggers automated deployment via GitHub Actions.
168136

169-
- Python packages: Make sure to add the dependency to the `pyproject.toml` file, and run `poetry install` in the Dockerfile.
170-
- System packages: If you need to install system packages, add the installation command to the Dockerfile.
171-
- ML models: If your evaluation function depends on ML models, make sure to include them in the Docker image.
172-
- Data files: If your evaluation function depends on data files, make sure to include them in the Docker image.
137+
> [!IMPORTANT]
138+
> The evaluation function name must be unique within the Lambda Feedback organization and must be in `lowerCamelCase`.
173139
174-
**Architecture**
140+
## Troubleshooting
175141

176-
Some package may not be compatible with the architecture of the Docker image. Make sure to use the correct platform when building and running the Docker image.
142+
### Containerized Function Fails to Start
177143

178-
E.g. to build a Docker image for the `linux/x86_64` platform, use the following command:
144+
- **Run-time dependencies**: ensure all packages are in `pyproject.toml` and installed via `poetry install` in the Dockerfile.
145+
- **Architecture**: some packages are platform-specific. Build with `--platform=linux/x86_64` to match the CI/production environment.
146+
- **Standalone check**: run the function directly inside the container to isolate startup errors:
179147

180148
```bash
181-
docker build --platform=linux/x86_64 .
149+
docker run -it --rm evaluatepython python -m evaluation_function.main
182150
```
183151

184-
**Verify Standalone Execution**
185-
186-
If requests are timing out, it might be due to the evaluation function not being able to run. Make sure that the evaluation function can be run as a standalone script. This will help you to identify issues that are specific to the containerized environment.
187-
188-
To run just the evaluation function as a standalone script, without using Shimmy, use the following command:
152+
### Pulling Changes from the Template Repository
189153

190154
```bash
191-
docker run -it --rm my-python-evaluation-function python -m evaluation_function.main
155+
git remote add template https://github.com/lambda-feedback/evaluation-function-boilerplate-python.git
156+
git fetch --all
157+
git merge template/main --allow-unrelated-histories
192158
```
193159

194-
If the command starts without any errors, the evaluation function is working correctly. If not, you will see the error message in the console.
160+
> [!WARNING]
161+
> Resolve conflicts carefully — template updates may overwrite evaluatePython-specific code.

0 commit comments

Comments
 (0)