Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 69 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,18 @@ ______________________________________________________________________

Run Stata's do-files with pytask.

## Table of Contents

- [Installation](#installation)
- [Usage](#usage)
- [Dependencies and Products](#dependencies-and-products)
- [Accessing dependencies and products in the script](#accessing-dependencies-and-products-in-the-script)
- [YAML Configuration Files](#yaml-configuration-files)
- [Command Line Arguments](#command-line-arguments)
- [Repeating tasks with different scripts or inputs](#repeating-tasks-with-different-scripts-or-inputs)
- [Configuration](#configuration)
- [Changes](#changes)

## Installation

pytask-stata is available on [PyPI](https://pypi.org/project/pytask-stata) and
Expand Down Expand Up @@ -99,9 +111,59 @@ def task_run_do_file(produces: Path = Path("auto.dta")):
### Accessing dependencies and products in the script

Dependencies and products registered in the task function signature are used by pytask
to order tasks and track whether they are up-to-date. They are not automatically passed
to the Stata script. Use the `options` argument of the decorator to pass paths or other
values as command line arguments to your Stata executable.
to order tasks and track whether they are up-to-date. pytask-stata offers two modes to
pass these paths and other task data to the Stata script.

1. Use the default YAML configuration file. This is the recommended mode if your Stata
installation can use the user-written `yaml` package.
1. Use the `options` argument of the decorator to pass command line arguments. This is
the compatibility mode for Stata installations where `yaml.ado` is not available or
not supported.

Do not combine both interfaces. If `options` is supplied, pytask-stata assumes the
do-file receives all required values through command line arguments and does not create
a YAML configuration file.

#### YAML Configuration Files

By default, pytask-stata serializes all task keyword arguments and passes the path to
the generated YAML file as the first argument to the do-file. To read the file inside
Stata, install the user-written `yaml` package.

```stata
ssc install yaml
```

Then read the configuration file in the Stata task.

```python
from pathlib import Path

from pytask import mark


@mark.stata(script=Path("script.do"))
def task_run_do_file(
depends_on: Path = Path("input.dta"),
produces: Path = Path("auto.dta"),
):
pass
```

```do
args config
yaml read using "`config'", locals replace
local depends_on = r(yaml_depends_on)
local produces = r(yaml_produces)

use "`depends_on'", clear
save "`produces'"
```

#### Command Line Arguments

Use the `options` argument of the decorator to pass paths or other values as command
line arguments to your Stata executable. This mode does not require the `yaml` package.

For example, pass paths for the dependency and product with

Expand All @@ -119,22 +181,22 @@ def task_run_do_file(
pass
```

And in your `script.do`, you can intercept the value with
And in your `script.do`, you can intercept the values with

```do
* Intercept command line arguments and save them to macros.
args depends_on produces

sysuse auto, clear
use "`depends_on'", clear
save "`produces'"
```

The relative path inside the do-file works only because pytask-stata switches the
current working directory to the directory of the task module before the task is
executed.

To make the task independent from the current working directory, pass the full path as
an command line argument. Here is an example.
To make the task independent from the current working directory, pass the full path as a
command line argument. Here is an example.

```python
# Absolute path to the build directory.
Expand Down
36 changes: 36 additions & 0 deletions docs/yaml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# YAML Data Passed to Stata

pytask-stata serializes task keyword arguments with PyYAML and passes the path to the
generated YAML file as the first command line argument to the do-file. Inside Stata,
read the file with the user-written `yaml` package.

```stata
args config
yaml read using "`config'", locals replace
```

The `yaml` package stores parsed YAML in a Stata dataset with `key`, `value`, `level`,
`parent`, and `type` columns. With `locals`, scalar leaves are also available as
`r(yaml_<key>)` macros. Nested keys are flattened with underscores.

## Supported Types

| Python value | YAML shape | Stata representation | Access pattern |
| -------------------------------- | ----------------------------- | -------------------------------------------------------------------- | ----------------------------------------- |
| Non-empty `str` | `name: hello` | `type=string`, `value=hello` | `r(yaml_name)` |
| `int` | `count: 42` | `type=numeric`, `value=42` | `r(yaml_count)` |
| `float` | `ratio: 3.14` | `type=numeric`, `value=3.14` | `r(yaml_ratio)` |
| `bool` | `enabled: true` | `type=boolean`, `value=1` or `0` | `r(yaml_enabled)` |
| `None` | `missing: null` | `type=null`, empty value | validate as `null`; no useful macro value |
| `pathlib.Path` | `path: build/out.dta` | `type=string`, POSIX-style path | `r(yaml_path)` |
| Flat `list` / `tuple` of scalars | `items:` plus `- value` lines | parent row plus `items_1`, `items_2`, ... rows with `type=list_item` | use flattened keys such as `items_1` |
| Nested `dict` with scalar leaves | nested mapping | flattened keys such as `config_child` | `yaml get config, attributes(child)` |

## Recommended Limits

Keep the YAML bridge to configuration-like data: scalar values, paths, flat scalar
lists, and nested dictionaries with scalar leaves.

Avoid empty strings, lists of dictionaries, sets, bytes, decimals, and arbitrary Python
objects. PyYAML may emit YAML tags such as `!!set` or `!!binary`, or fail with a
`RepresenterError`; those forms are not useful as a stable Stata interface.
2 changes: 1 addition & 1 deletion justfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ test-ci:

# Run type checking
typing:
uv run --group typing --group test --isolated ty check
uv run --group typing --group test --group test-mock-stata --isolated ty check

# Run linting and formatting
lint:
Expand Down
Loading
Loading