Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 4 additions & 8 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,13 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install packages
run: |
python -m pip install --upgrade pip
pip install pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
pip install .
run: uv sync --extra dev
- name: pre-commit
uses: pre-commit/action@v3.0.0
- name: Test with pytest
run: pytest
run: uv run pytest
2 changes: 0 additions & 2 deletions MANIFEST.in

This file was deleted.

51 changes: 39 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,40 +26,52 @@ Documentation: [dataherb.github.io/dataherb-python](https://dataherb.github.io/d

> Requires Python 3

The DataHerb cli provides tools to create dataset metadata, validate metadata, search dataset in flora, and download dataset.
The DataHerb CLI provides tools to create dataset metadata, validate metadata, search datasets in a flora, and download datasets.

### Command status

| Command | Status | Description |
|---------|--------|-------------|
| `version` | ✅ Stable | Print the installed version |
| `configure` | ✅ Stable | Set up or inspect the local configuration |
| `search` | ✅ Stable | Search datasets by keyword or id |
| `download` | ✅ Stable | Download a dataset from a flora by id |
| `create` | ✅ Stable | Create metadata for a local dataset folder |
| `add` | ✅ Stable | Add a remote dataset to the flora |
| `remove` | ✅ Stable | Remove a dataset from the flora |
| `upload` | ⚠️ Experimental | Upload the current folder to a remote (git sync is manual) |
| `validate` | ✅ Stable | Validate the `dataherb.json` in the current folder |
| `serve` | ✅ Stable | Launch a local MkDocs site for the flora |

### Search and Download

Search by keyword

```
dataherb search covid19
# Shows the minimal metadata
# Shows a rich table summary
```

Search by dataherb id

```
dataherb search -i covid19_eu_data
# Shows the full metadata
dataherb search --id covid19_eu_data
# Shows the dataset details
```

Download dataset by dataherb id

```
dataherb download covid19_eu_data
# Downloads this dataset: http://dataherb.io/flora/covid19_eu_data
# Downloads this dataset
```


### Create Dataset Using Command Line Tool

We provide a template for dataset creation.

Within a dataset folder where the data files are located, use the following command line tool to create the metadata template.

```bash
dataherb create
dataherb create .
```

### Upload dataset to remote
Expand All @@ -70,8 +82,16 @@ Within the dataset folder, run
dataherb upload
```

### UI for all the datasets in a flora
> **Note:** Git-based uploads (`source: git`) currently guide you to push manually.
> Pass `--experimental` to attempt an automatic git push.

### Validate dataset metadata

```bash
dataherb validate
```

### UI for all the datasets in a flora

```bash
dataherb serve
Expand Down Expand Up @@ -132,8 +152,15 @@ We desigined the following workflow to share and index open datasets.

## Development

1. Create a conda environment.
2. Install requirements: `pip install -r requirements.txt`
1. Install [uv](https://docs.astral.sh/uv/getting-started/installation/).
2. Install the project and dev dependencies:
```bash
uv sync --extra dev
```
3. Run tests:
```bash
uv run pytest
```

## Documentation

Expand Down
118 changes: 118 additions & 0 deletions dataherb/cmd/configure.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
"""
CLI command for configuring dataherb.
"""
import json
import sys
from pathlib import Path

import click
import inquirer
from loguru import logger

from dataherb.utils.configs import Config

logger.remove()
logger.add(sys.stderr, level="INFO", enqueue=True)


@click.command()
@click.option(
"--show/--no-show", "-s/ ", default=False, help="Show the current configuration"
)
@click.option(
"--locate/--no-locate",
"-l/ ",
default=False,
help="Locate the folder that contains the configuration",
)
def configure(show, locate):
"""
Configure dataherb; inspect, or locate the current configurations.

:param show: if flag is given, will show the current configuration instead of starting
the configuration process.
:param locate: if flag is given, will locate the configuration folder
and open in filesystem.
"""

home = Path.home()
config_path = home / ".dataherb" / "config.json"

if locate:
click.launch(config_path.parent)
elif not show:
if config_path.exists():
is_overwite = click.confirm(
click.style(
f"Config file ({config_path}) already exists. Overwrite?", fg="red"
),
default=False,
)
if is_overwite:
click.echo(click.style("Overwriting config file...", fg="red"))
else:
click.echo("Skipping...")
raise SystemExit(0)

if not config_path.parent.exists():
config_path.parent.mkdir(parents=True)

###############
# Ask questions
###############
questions = [
inquirer.Path(
"workdir",
message="Where should I put all the datasets and flora database? An empty folder is recommended.",
# path_type=inquirer.Path.DIRECTORY,
normalize_to_absolute_path=True,
),
inquirer.Text(
"default_flora",
message="How would you name the default flora? Please keep the default value if this is not clear to you.",
default="flora",
),
]

answers = inquirer.prompt(questions)

config = {
"workdir": answers.get("workdir"),
"default": {
"flora": answers.get("default_flora"),
"aggregrated": False, # if false, we will use folders for each herb.
},
}

flora_path_workdir = answers.get("workdir", "")
if flora_path_workdir.startswith("~"):
home = Path.home()
flora_path_workdir = str(home / flora_path_workdir[2:])

flora_path = (
Path(flora_path_workdir) / "flora" / f"{answers.get('default_flora')}"
)
if not flora_path.exists():
click.secho(
f"{flora_path} doesn't exist. Creating {flora_path}...", fg="red"
)
flora_path.mkdir(parents=True)
else:
click.secho(f"{flora_path} exists, using the folder directly.", fg="green")

logger.debug(f"config: {config}")

with open(config_path, "w") as f:
json.dump(config, f, indent=4)

click.secho(f"The dataherb config has been saved to {config_path}!", fg="green")
else:
if not config_path.exists():
click.secho(f"Config file ({config_path}) doesn't exist.", fg="red")
else:
c = Config()
click.secho(f"The current config for dataherb is:")
click.secho(
json.dumps(c.config, indent=2, sort_keys=True, ensure_ascii=False)
)
click.secho(f"The above config is extracted from {config_path}")
Loading