Skip to content

Commit dcacf81

Browse files
committed
Update readme
1 parent 40bb961 commit dcacf81

1 file changed

Lines changed: 9 additions & 6 deletions

File tree

README.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ It allows you to index your content into the usual Nextcloud database.
66

77
## Compatibility
88

9-
The extension requires your Nextcloud database to be MySQL (tested) or PostgreSQL (currently untested). SQLite might work as well, but isn't yet implemented.
9+
The extension requires your Nextcloud database to be MySQL or PostgreSQL.
1010

1111
## Status
1212

@@ -17,21 +17,24 @@ What works:
1717
* Indexing of text in PDF documents
1818
* This is done by extracting the text via [Smalot/PdfParser].
1919
* This app itself does *NOT* do optical chracter recognition (OCR)! If your files don't already contain the extracted text, maybe the [files_fulltextsearch_tesseract] app is for you. I haven't tested it together with this app.
20-
* MySQL
20+
* MySQL (tested in CI pipeline and in real world usage)
21+
* PostgreSQL (tested in CI pipeline)
22+
* Plainly assumes "english" configuration (which influences stopwords and normalization)
2123
* Basic searching
22-
* If the database is MySQL, it uses [Boolean Full-Text Searches], so you can use operators like `+` and `-`, as well as a trailing `*` wildcard
24+
* If the database is MySQL, it uses [Boolean Full-Text Searches], so you can use operators like `+` and `-`, as well as a trailing `*` wildcard
25+
* If the database is PostgreSQL, the query is converted using [`websearch_to_tsquery`], so you can use `-` for exclusions and quote text to enforce word groups
2326
* Passing the `occ fulltextsearch:test` harness
2427

2528
[Smalot/PdfParser]: https://github.com/Smalot/PdfParser
2629
[files_fulltextsearch_tesseract]: https://github.com/nextcloud/files_fulltextsearch_tesseract
30+
[Boolean Full-Text Searches]: https://dev.mysql.com/doc/refman/8.4/en/fulltext-boolean.html
31+
[`websearch_to_tsquery`]: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
2732

2833
What does *NOT* work:
2934
* Indexing of Office documents: The upstream [fulltextsearch_elasticsearch] app simply passes the files on to the [Elasticsearch Attachment processor], which in turn uses [Apache Tika] for processing. Since I want to keep this app lean, I don't want to pull in any Java dependencies.
30-
* "Advanced" features of the full text search framework. There are fields for tags, metatags, subtags, parts, excerpts and whatnot. I have no idea yet what they are used for. The app just stores them on indexing and returns them in search results, but doesn't search those fields.
31-
* PostgreSQL: Could work, but I haven't tested it. Might need small fixes, and plainly assumes "english" configuration (which influences stopwords and normalization).
35+
* "Advanced" features of the full text search framework. There are fields for tags, metatags, subtags, parts and whatnot. I have no idea yet what they are used for. The app just stores them on indexing and returns them in search results, but doesn't search those fields.
3236
* SQLite: Might be implementable, but I haven't spent more time than a quick search for "fulltext search sqlite"
3337

3438
[fulltextsearch_elasticsearch]: https://github.com/nextcloud/fulltextsearch_elasticsearch
35-
[Boolean Full-Text Searches]: https://dev.mysql.com/doc/refman/8.4/en/fulltext-boolean.html
3639
[Elasticsearch Attachment processor]: https://www.elastic.co/docs/reference/enrich-processor/attachment
3740
[Apache Tika]: https://tika.apache.org/

0 commit comments

Comments
 (0)