Skip to content

Commit 74f8ade

Browse files
authored
Simplify the cache module and interface (#20)
- Supports tags - Fixed issues with sqlalchemy and session handling - Supports various compression methods in the hashlib library - Cleanup expired resources in the cache - Updated documentation & tests
1 parent 24b7702 commit 74f8ade

15 files changed

Lines changed: 963 additions & 489 deletions

CHANGELOG.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,26 @@
11
# Changelog
22

3-
## Version 0.4 (development)
3+
## Version 0.5.0
4+
5+
- SQLAlchemy session management
6+
* Implemented proper session handling
7+
* Fixed `DetachedInstanceError` issues and added helper method `_get_detached_resource` for consistent session management
8+
* Improved transaction handling with commits and rollbacks
9+
10+
- New features
11+
* Added cache statistics with `get_stats()` method
12+
* Implemented resource tagging
13+
* Added cache size management
14+
* Added support for file compression
15+
* Added resource validation with checksums
16+
* Improved search
17+
* Added metadata export/import functionality
18+
19+
## Version 0.4.1
20+
21+
- Method to list all resources.
22+
23+
## Version 0.4
424

525
- Migrate the schema to match R/Bioconductor's BiocFileCache (Check out [this issue](https://github.com/BiocPy/pyBiocFileCache/issues/11)). Thanks to [@khoroshevskyi ](https://github.com/khoroshevskyi) for the PR.
626

README.md

Lines changed: 62 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -4,74 +4,91 @@
44

55
# pyBiocFileCache
66

7-
File system based cache for resources & metadata. Compatible with [BiocFileCache R package](https://github.com/Bioconductor/BiocFileCache)
7+
`pyBiocFileCache` is a Python package that provides a robust file caching system with resource validation, cache size management, file compression, and resource tagging. Compatible with [BiocFileCache R package](https://github.com/Bioconductor/BiocFileCache).
88

9-
***Note: Package is in development. Use with caution!!***
9+
## Installation
1010

11-
### Installation
11+
Install from [PyPI](https://pypi.org/project/pyBiocFileCache/),
1212

13-
Package is published to [PyPI](https://pypi.org/project/pyBiocFileCache/)
14-
15-
```
13+
```bash
1614
pip install pybiocfilecache
1715
```
1816

19-
#### Initialize a cache directory
17+
## Quick Start
2018

21-
```
22-
from pybiocfilecache import BiocFileCache
23-
import os
24-
25-
bfc = BiocFileCache(cache_dir = os.getcwd() + "/cache")
26-
```
19+
```python
20+
from biocfilecache import BiocFileCache
2721

28-
Once the cache directory is created, the library provides methods to
29-
- `add`: Add a resource or artifact to cache
30-
- `get`: Get the resource from cache
31-
- `remove`: Remove a resource from cache
32-
- `update`: update the resource in cache
33-
- `purge`: purge the entire cache, removes all files in the cache directory
22+
# Initialize cache
23+
cache = BiocFileCache("path/to/cache/directory")
3424

35-
### Add a resource to cache
25+
# Add a file to cache
26+
resource = cache.add("myfile", "path/to/file.txt")
3627

37-
(for testing use the temp files in the `tests/data` directory)
28+
# Retrieve a file from cache
29+
resource = cache.get("myfile")
3830

39-
```
40-
rec = bfc.add("test1", os.getcwd() + "/test1.txt")
41-
print(rec)
31+
# Use the cached file
32+
print(resource.rpath) # Path to cached file
4233
```
4334

44-
### Get resource from cache
35+
## Advanced Usage
4536

46-
```
47-
rec = bfc.get("test1")
48-
print(rec)
49-
```
37+
### Configuration
5038

51-
### Remove resource from cache
39+
```python
40+
from biocfilecache import BiocFileCache, CacheConfig
41+
from datetime import timedelta
42+
from pathlib import Path
5243

53-
```
54-
rec = bfc.remove("test1")
55-
print(rec)
44+
# Create custom configuration
45+
config = CacheConfig(
46+
cache_dir=Path("cache_directory"),
47+
max_size_bytes=1024 * 1024 * 1024, # 1GB
48+
cleanup_interval=timedelta(days=7),
49+
compression=True
50+
)
51+
52+
# Initialize cache with configuration
53+
cache = BiocFileCache(config=config)
5654
```
5755

58-
### Update resource in cache
56+
### Resource Management
5957

60-
```
61-
rec = bfc.get("test1"m os.getcwd() + "test2.txt")
62-
print(rec)
63-
```
58+
```python
59+
# Add file with tags and expiration
60+
from datetime import datetime, timedelta
6461

65-
### purge the cache
62+
resource = cache.add(
63+
"myfile",
64+
"path/to/file.txt",
65+
tags=["data", "raw"],
66+
expires=datetime.now() + timedelta(days=30)
67+
)
6668

67-
```
68-
bfc.purge()
69+
# List resources by tag
70+
resources = cache.list_resources(tag="data")
71+
72+
# Search resources
73+
results = cache.search("myfile", field="rname")
74+
75+
# Update resource
76+
cache.update("myfile", "path/to/new_file.txt")
77+
78+
# Remove resource
79+
cache.remove("myfile")
6980
```
7081

82+
### Cache Statistics and Maintenance
7183

72-
<!-- pyscaffold-notes -->
84+
```python
85+
# Get cache statistics
86+
stats = cache.get_stats()
87+
print(stats)
7388

74-
## Note
89+
# Clean up expired resources
90+
removed_count = cache.cleanup()
7591

76-
This project has been set up using PyScaffold 4.1. For details and usage
77-
information on PyScaffold see https://pyscaffold.org/.
92+
# Purge entire cache
93+
cache.purge()
94+
```

docs/best_practices.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Best Practices
2+
3+
1. Use context managers for cleanup:
4+
```python
5+
with BiocFileCache("cache_directory") as cache:
6+
cache.add("myfile", "path/to/file.txt")
7+
```
8+
9+
2. Add tags for better organization:
10+
```python
11+
cache.add("data.csv", "data.csv", tags=["raw", "csv", "2024"])
12+
```
13+
14+
3. Set expiration dates for temporary files:
15+
```python
16+
cache.add("temp.txt", "temp.txt", expires=datetime.now() + timedelta(hours=1))
17+
```
18+
19+
4. Regular maintenance:
20+
```python
21+
# Periodically clean up expired resources
22+
cache.cleanup()
23+
24+
# Monitor cache size
25+
stats = cache.get_stats()
26+
if stats["cache_size_bytes"] > threshold:
27+
# Take action
28+
```

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ package_dir =
4747
# For more information, check out https://semver.org/.
4848
install_requires =
4949
importlib-metadata; python_version<"3.8"
50-
sqlalchemy>=2,<2.1
50+
sqlalchemy
5151

5252
[options.packages.find]
5353
where = src

0 commit comments

Comments
 (0)