A Java-based toolkit for correcting interlinear glossed text and generating CoNLL-U output, with both:
- a CLI pipeline for document processing,
- and a desktop application for interactive rule editing, annotation settings, workspace management, and preview.
This project was originally made for the Chuj project at Université de Montréal and is designed for low-resource language workflows based on glossed data and CoNLL-U export.
The repository is organized as a multi-module Maven project:
core— correction, alignment, parsing, annotation, CLI, and CoNLL-U generationbackend— Spring Boot services, persistence, rule/settings managementapp— JavaFX desktop wrapper embedding the frontend and backendfrontend— Vue/Vite frontend bundled into the desktop application
- correction of glossed interlinear entries using YAML rules
- annotation to CoNLL-U
- YAML-driven annotation configuration
- lexicon- and extractor-based annotation rules
- desktop UI for:
- workspace entry management
- correction preview
- CoNLL-U preview
- rule editing
- annotation settings editing
- Java 21
- Maven 3.9+
- For desktop packaging on Windows:
- a JDK including
jpackage
- a JDK including
- For frontend-only development:
- Node.js and npm
.
├── app/ # JavaFX desktop application
├── backend/ # Spring Boot backend
├── core/ # CLI + core NLP pipeline
├── docs/ # Documentation for the project
├── frontend/ # Vue/Vite frontend
└── scripts/ # packaging helpers
The core module provides a command-line interface.
mvn -pl core -am clean packageThis produces the following file :
core/target/nlp-studio-core-0.1.0-all.jar
- Prepare CoNLL-U from an input document
java -cp core/target/nlp-studio-core-0.1.0-all.jar org.titiplex.Main prepare input.docx correction.yaml annotation.yaml output.conlluThis command:
- reads a .docx or .txt input file,
- applies correction rules from correction.yaml,
- applies annotation settings from annotation.yaml,
- writes the resulting output.conllu.
- Generate a corrected DOCX
java -cp core/target/nlp-studio-core-0.1.0-all.jar org.titiplex.Main correct-docx input.docx correction.yaml corrected.docx- Generate corpus statistics
java -cp core/target/nlp-studio-core-0.1.0-all.jar org.titiplex.Main stats input.docx correction.yaml stats.txtThe legacy 4-argument mode is still supported:
java -cp core/target/nlp-studio-core-0.1.0-all.jar org.titiplex.Main input.docx correction.yaml annotation.yaml output.conlluThe desktop application is a JavaFX container that starts an embedded Spring Boot backend and loads the bundled Vue frontend.
From the repository root:
mvn -pl core,backend,app -am clean install -DskipTests
mvn -f app/pom.xml javafx:runIf you want to work on the frontend separately:
cd frontend
npm install
npm run devUseful frontend commands:
npm run build
npm run test
npm run typecheckVerify that the project builds correctly:
mvn clean verifymvn -pl app -am -Pdesktop-prod clean packageGenerated file:
app/target/nlp-studio-app-0.1.0-all.jar
This project can build and publish installers, for each specific OS. For that, please read packaging/README.md.
The app build automatically:
- installs Node.js and npm through Maven,
- runs npm ci,
- runs the frontend build,
- copies the built frontend into the desktop application resources.
So in most cases, you do not need to build the frontend manually before packaging the desktop application.
The pipeline relies on YAML-based resources such as:
- correction rules
- annotation definitions
- POS and feature definitions
- lexicons
- extractors
- gloss mapping
This makes the system extensible and suitable for iterative linguistic work without hardcoding every rule in Java.
Run all tests:
mvn test
# or
mvn clean verify "-Dskip.frontend=true"
# then
cd frontend
npm ci
npm run typecheck
npm run playwright:install
npm run test
npm run test:e2e
npm run test:coverage
npm run buildRun the full project build:
mvn clean packageThis branch focuses on an integrated NLP studio workflow rather than only a standalone converter:
- CLI processing remains available
- desktop editing and preview are first-class
- backend-managed rules and annotation settings are part of the current architecture
The documentation is built using MkDocs and is available at this repository's GitHub Pages.
To preview the doc and edit it live:
mkdocs serve --livereloadTo build the doc (generates files in site/) :
mkdocs buildTo publish the doc in your repository under github pages, in the gh-pages branch :
mkdocs gh-deployThis projects runs under the GPL-v3 license, please see LICENSE