|
1 | | -# evaluatePython Evaluation Function |
| 1 | +# evaluatePython |
2 | 2 |
|
3 | | -This repository uses an existing autograder to provide formative feedback on Python code. |
| 3 | +A [Lambda Feedback](https://lambda-feedback.github.io/user-documentation/) evaluation function that executes student Python code submissions in a secure sandbox, runs them against test cases, and returns structured formative feedback. Deployed as a Docker container on the Lambda Feedback platform. |
4 | 4 |
|
5 | 5 | ## Deployment |
| 6 | + |
6 | 7 | [](https://github.com/lambda-feedback/evaluatePython/issues/new?template=release-request.yml) |
7 | | -To deploy to production, update the README button above to point to the correct repository. |
8 | 8 |
|
| 9 | +Push to `main` triggers GitHub Actions which automatically builds and deploys to Lambda Feedback. See [`.github/workflows/`](.github/workflows/) for CI/CD configuration. |
9 | 10 |
|
10 | 11 | ## Usage |
11 | 12 |
|
12 | | -You can run the evaluation function either using [the pre-built Docker image](#run-the-docker-image) or build and run [the binary executable](#build-and-run-the-binary). |
13 | | - |
14 | 13 | ### Run the Docker Image |
15 | 14 |
|
16 | | -The pre-built Docker image comes with [Shimmy](https://github.com/lambda-feedback/shimmy) installed. |
17 | | - |
18 | | -> [!TIP] |
19 | | -> Shimmy is a small application that listens for incoming HTTP requests, validates the incoming data and forwards it to the underlying evaluation function. Learn more about Shimmy in the [Documentation](https://github.com/lambda-feedback/shimmy). |
20 | | -
|
21 | | -The pre-built Docker image is available on the GitHub Container Registry. You can run the image using the following command: |
22 | | - |
23 | 15 | ```bash |
24 | | -docker run -p 8080:8080 ghcr.io/lambda-feedback/evaluation-function-boilerplate-python:latest |
| 16 | +docker run -it --rm -p 8080:8080 ghcr.io/lambda-feedback/evaluatepython:latest |
25 | 17 | ``` |
26 | 18 |
|
27 | | -### Run the Script |
| 19 | +The image includes [Shimmy](https://github.com/lambda-feedback/shimmy), which listens for HTTP requests on port 8080 and forwards them to the evaluation function. |
28 | 20 |
|
29 | | -You can choose between running the Python evaluation function itself, ore using Shimmy to run the function. |
| 21 | +### Evaluation Modes |
30 | 22 |
|
31 | | -**Raw Mode** |
| 23 | +The function supports three modes, set via `params.mode`. |
32 | 24 |
|
33 | | -Use the following command to run the evaluation function directly: |
| 25 | +**`demo`** — run student code and show output (no pass/fail): |
34 | 26 |
|
35 | | -```bash |
36 | | -python -m evaluation_function.main |
| 27 | +```json |
| 28 | +{ |
| 29 | + "response": "print(5 * 5)", |
| 30 | + "params": { "mode": "demo" } |
| 31 | +} |
37 | 32 | ``` |
38 | 33 |
|
39 | | -This will run the evaluation function using the input data from `request.json` and write the output to `response.json`. |
| 34 | +**`io_test`** — compare stdout against expected output for each test case: |
40 | 35 |
|
41 | | -**Shimmy** |
42 | | - |
43 | | -To have a more user-friendly experience, you can use [Shimmy](https://github.com/lambda-feedback/shimmy) to run the evaluation function. |
| 36 | +```json |
| 37 | +{ |
| 38 | + "response": "n = int(input())\nprint(n * n)", |
| 39 | + "params": { |
| 40 | + "mode": "io_test", |
| 41 | + "tests": [ |
| 42 | + { "input": "5\n", "expected_output": "25\n" }, |
| 43 | + { "input": "3\n", "expected_output": "9\n", "hidden": true } |
| 44 | + ] |
| 45 | + } |
| 46 | +} |
| 47 | +``` |
44 | 48 |
|
45 | | -To run the evaluation function using Shimmy, use the following command: |
| 49 | +**`unit_test`** — run student code then execute `test_*` functions or `unittest.TestCase` subclasses (including Hypothesis tests): |
46 | 50 |
|
47 | | -```bash |
48 | | -shimmy -c "python" -a "-m" -a "evaluation_function.main" -i ipc |
| 51 | +```json |
| 52 | +{ |
| 53 | + "response": "def square(n): return n * n", |
| 54 | + "params": { |
| 55 | + "mode": "unit_test", |
| 56 | + "test_code": "def test_positive():\n assert square(5) == 25\ndef test_zero():\n assert square(0) == 0\n" |
| 57 | + } |
| 58 | +} |
49 | 59 | ``` |
50 | 60 |
|
51 | 61 | ## Development |
52 | 62 |
|
53 | 63 | ### Prerequisites |
54 | 64 |
|
55 | | -- [Docker](https://docs.docker.com/get-docker/) |
56 | | -- [Python](https://www.python.org) |
| 65 | +- [Python 3.12+](https://www.python.org) |
| 66 | +- [Poetry](https://python-poetry.org) |
| 67 | +- [Docker](https://docs.docker.com/get-docker/) (for container builds) |
57 | 68 |
|
58 | 69 | ### Repository Structure |
59 | 70 |
|
60 | | -```bash |
61 | | -evaluation_function/main.py # evaluation function entrypoint |
62 | | -evaluation_function/evaluation.py # evaluation function implementation |
63 | | -evaluation_function/evaluation_test.py # evaluation function tests |
64 | | -evaluation_function/preview.py # evaluation function preview |
65 | | -evaluation_function/preview_test.py # evaluation function preview tests |
66 | | - |
67 | | -config.json # evaluation function deployment configuration file |
68 | 71 | ``` |
69 | | - |
70 | | -### Development Workflow |
71 | | - |
72 | | -In its most basic form, the development workflow consists of writing the evaluation function in the `evaluation_function.wl` file and testing it locally. As long as the evaluation function adheres to the Evaluation Function API, a development workflow which incorporates using Shimmy is not necessary. |
73 | | - |
74 | | -Testing the evaluation function can be done by running the `dev.py` script using the Python interpreter like so: |
75 | | - |
76 | | -```bash |
77 | | -python -m evaluation_function.dev <response> <answer> |
| 72 | +evaluation_function/main.py # IPC server entry point |
| 73 | +evaluation_function/evaluation.py # core evaluation pipeline (all three modes) |
| 74 | +evaluation_function/preview.py # AST-based security validator |
| 75 | +evaluation_function/dev.py # CLI wrapper for local testing |
| 76 | +evaluation_function/evaluation_test.py # integration tests |
| 77 | +evaluation_function/preview_test.py # preview/security tests |
| 78 | +config.json # deployment configuration |
78 | 79 | ``` |
79 | 80 |
|
80 | | -> [!NOTE] |
81 | | -> Specify the `response` and `answer` as command-line arguments. |
82 | | -
|
83 | | -### Building the Docker Image |
84 | | - |
85 | | -To build the Docker image, run the following command: |
| 81 | +### Setup |
86 | 82 |
|
87 | 83 | ```bash |
88 | | -docker build -t my-python-evaluation-function . |
| 84 | +poetry install |
89 | 85 | ``` |
90 | 86 |
|
91 | | -### Running the Docker Image |
| 87 | +### Local Testing |
92 | 88 |
|
93 | | -To run the Docker image, use the following command: |
| 89 | +The `dev.py` script calls the evaluation function directly (no Docker required). It defaults to `demo` mode if no params are supplied: |
94 | 90 |
|
95 | 91 | ```bash |
96 | | -docker run -it --rm -p 8080:8080 my-python-evaluation-function |
97 | | -``` |
98 | | - |
99 | | -This will start the evaluation function and expose it on port `8080`. |
100 | | - |
101 | | -## Deployment |
102 | | - |
103 | | -This section guides you through the deployment process of the evaluation function. If you want to deploy the evaluation function to Lambda Feedback, follow the steps in the [Lambda Feedback](#deploy-to-lambda-feedback) section. Otherwise, you can deploy the evaluation function to other platforms using the [Other Platforms](#deploy-to-other-platforms) section. |
104 | | - |
105 | | -### Deploy to Lambda Feedback |
| 92 | +# demo mode (default) |
| 93 | +python -m evaluation_function.dev "print(5 * 5)" |
106 | 94 |
|
107 | | -Deploying the evaluation function to Lambda Feedback is simple and straightforward, as long as the repository is within the [Lambda Feedback organization](https://github.com/lambda-feedback). |
| 95 | +# io_test mode |
| 96 | +python -m evaluation_function.dev "print(int(input())**2)" "" \ |
| 97 | + '{"mode":"io_test","tests":[{"input":"5\n","expected_output":"25\n"}]}' |
108 | 98 |
|
109 | | -After configuring the repository, a [GitHub Actions workflow](.github/workflows/deploy.yml) will automatically build and deploy the evaluation function to Lambda Feedback as soon as changes are pushed to the main branch of the repository. |
110 | | - |
111 | | -**Configuration** |
112 | | - |
113 | | -The deployment configuration is stored in the `config.json` file. Choose a unique name for the evaluation function and set the `EvaluationFunctionName` field in [`config.json`](config.json). |
114 | | - |
115 | | -> [!IMPORTANT] |
116 | | -> The evaluation function name must be unique within the Lambda Feedback organization, and must be in `lowerCamelCase`. You can find a example configuration below: |
117 | | -
|
118 | | -```json |
119 | | -{ |
120 | | - "EvaluationFunctionName": "compareStringsWithPython" |
121 | | -} |
| 99 | +# unit_test mode |
| 100 | +python -m evaluation_function.dev "def square(n): return n*n" "" \ |
| 101 | + '{"mode":"unit_test","test_code":"def test_sq():\n assert square(3)==9\n"}' |
122 | 102 | ``` |
123 | 103 |
|
124 | | -### Deploy to other Platforms |
125 | | - |
126 | | -If you want to deploy the evaluation function to other platforms, you can use the Docker image to deploy the evaluation function. |
127 | | - |
128 | | -Please refer to the deployment documentation of the platform you want to deploy the evaluation function to. |
129 | | - |
130 | | -If you need help with the deployment, feel free to reach out to the Lambda Feedback team by creating an issue in the template repository. |
131 | | - |
132 | | -## FAQ |
133 | | - |
134 | | -### Pull Changes from the Template Repository |
135 | | - |
136 | | -If you want to pull changes from the template repository to your repository, follow these steps: |
137 | | - |
138 | | -1. Add the template repository as a remote: |
| 104 | +### Running Tests |
139 | 105 |
|
140 | 106 | ```bash |
141 | | -git remote add template https://github.com/lambda-feedback/evaluation-function-boilerplate-python.git |
| 107 | +pytest |
142 | 108 | ``` |
143 | 109 |
|
144 | | -2. Fetch changes from all remotes: |
| 110 | +### Linting |
145 | 111 |
|
146 | 112 | ```bash |
147 | | -git fetch --all |
| 113 | +# Critical errors (fail CI) |
| 114 | +flake8 ./evaluation_function --count --select=E9,F63,F7,F82 --show-source --statistics |
| 115 | +# Style/complexity (informational) |
| 116 | +flake8 ./evaluation_function --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics |
148 | 117 | ``` |
149 | 118 |
|
150 | | -3. Merge changes from the template repository: |
| 119 | +### Building the Docker Image |
151 | 120 |
|
152 | 121 | ```bash |
153 | | -git merge template/main --allow-unrelated-histories |
| 122 | +docker build -t evaluatepython . |
| 123 | +# Cross-platform (CI uses linux/x86_64): |
| 124 | +docker build --platform=linux/x86_64 -t evaluatepython . |
154 | 125 | ``` |
155 | 126 |
|
156 | | -> [!WARNING] |
157 | | -> Make sure to resolve any conflicts and keep the changes you want to keep. |
158 | | -
|
159 | | -## Troubleshooting |
160 | | - |
161 | | -### Containerized Evaluation Function Fails to Start |
| 127 | +### Running the Docker Image |
162 | 128 |
|
163 | | -If your evaluation function is working fine when run locally, but not when containerized, there is much more to consider. Here are some common issues and solution approaches: |
| 129 | +```bash |
| 130 | +docker run -it --rm -p 8080:8080 evaluatepython |
| 131 | +``` |
164 | 132 |
|
165 | | -**Run-time dependencies** |
| 133 | +## Deployment to Lambda Feedback |
166 | 134 |
|
167 | | -Make sure that all run-time dependencies are installed in the Docker image. |
| 135 | +The function name is declared in [`config.json`](config.json) as `"evaluatePython"` (lowerCamelCase). Pushing to `main` triggers automated deployment via GitHub Actions. |
168 | 136 |
|
169 | | -- Python packages: Make sure to add the dependency to the `pyproject.toml` file, and run `poetry install` in the Dockerfile. |
170 | | -- System packages: If you need to install system packages, add the installation command to the Dockerfile. |
171 | | -- ML models: If your evaluation function depends on ML models, make sure to include them in the Docker image. |
172 | | -- Data files: If your evaluation function depends on data files, make sure to include them in the Docker image. |
| 137 | +> [!IMPORTANT] |
| 138 | +> The evaluation function name must be unique within the Lambda Feedback organization and must be in `lowerCamelCase`. |
173 | 139 |
|
174 | | -**Architecture** |
| 140 | +## Troubleshooting |
175 | 141 |
|
176 | | -Some package may not be compatible with the architecture of the Docker image. Make sure to use the correct platform when building and running the Docker image. |
| 142 | +### Containerized Function Fails to Start |
177 | 143 |
|
178 | | -E.g. to build a Docker image for the `linux/x86_64` platform, use the following command: |
| 144 | +- **Run-time dependencies**: ensure all packages are in `pyproject.toml` and installed via `poetry install` in the Dockerfile. |
| 145 | +- **Architecture**: some packages are platform-specific. Build with `--platform=linux/x86_64` to match the CI/production environment. |
| 146 | +- **Standalone check**: run the function directly inside the container to isolate startup errors: |
179 | 147 |
|
180 | 148 | ```bash |
181 | | -docker build --platform=linux/x86_64 . |
| 149 | +docker run -it --rm evaluatepython python -m evaluation_function.main |
182 | 150 | ``` |
183 | 151 |
|
184 | | -**Verify Standalone Execution** |
185 | | - |
186 | | -If requests are timing out, it might be due to the evaluation function not being able to run. Make sure that the evaluation function can be run as a standalone script. This will help you to identify issues that are specific to the containerized environment. |
187 | | - |
188 | | -To run just the evaluation function as a standalone script, without using Shimmy, use the following command: |
| 152 | +### Pulling Changes from the Template Repository |
189 | 153 |
|
190 | 154 | ```bash |
191 | | -docker run -it --rm my-python-evaluation-function python -m evaluation_function.main |
| 155 | +git remote add template https://github.com/lambda-feedback/evaluation-function-boilerplate-python.git |
| 156 | +git fetch --all |
| 157 | +git merge template/main --allow-unrelated-histories |
192 | 158 | ``` |
193 | 159 |
|
194 | | -If the command starts without any errors, the evaluation function is working correctly. If not, you will see the error message in the console. |
| 160 | +> [!WARNING] |
| 161 | +> Resolve conflicts carefully — template updates may overwrite evaluatePython-specific code. |
0 commit comments