Skip to content

Latest commit

 

History

History
168 lines (119 loc) · 3.72 KB

File metadata and controls

168 lines (119 loc) · 3.72 KB

Query Reformulation Methods

querygym provides several state-of-the-art query reformulation methods.

Available Methods

GenQR (Generic Query Reformulation)

Simple keyword expansion using LLM.

import querygym as qg

reformulator = qg.create_reformulator("genqr", model="gpt-4")
result = reformulator.reformulate(qg.QueryItem("q1", "neural networks"))

GenQR Ensemble

Ensemble of multiple keyword expansion prompts for better coverage.

reformulator = qg.create_reformulator(
    "genqr_ensemble",
    model="gpt-4",
    params={"repeat_query_weight": 3}
)

Parameters:

  • repeat_query_weight (int): Number of times to repeat original query (default: 3)

Query2Doc

Generates pseudo-documents relevant to the query.

reformulator = qg.create_reformulator("query2doc", model="gpt-4")

Supports both zero-shot and chain-of-thought variants.

QA Expand

Decomposes query into sub-questions, generates answers, and refines.

reformulator = qg.create_reformulator("qa_expand", model="gpt-4")

MuGI

Multi-granularity information expansion.

reformulator = qg.create_reformulator("mugi", model="gpt-4")

LameR

Context-based passage synthesis using retrieved documents.

# Load contexts
contexts = qg.load_contexts("contexts.jsonl")

# Create reformulator
reformulator = qg.create_reformulator("lamer", model="gpt-4")

# Reformulate with contexts
results = reformulator.reformulate_batch(queries, contexts=contexts)

Note: LameR requires contexts from initial retrieval.

Query2E

Query to entity expansion.

reformulator = qg.create_reformulator("query2e", model="gpt-4")

CSQE

Context-based sentence extraction from retrieved documents.

# Requires contexts
contexts = qg.load_contexts("contexts.jsonl")

reformulator = qg.create_reformulator("csqe", model="gpt-4")
results = reformulator.reformulate_batch(queries, contexts=contexts)

ThinkQE

Multi-round reasoning-based expansion with iterative corpus feedback.

reformulator = qg.create_reformulator(
    "thinkqe",
    model="deepseek-ai/DeepSeek-R1-Distill-Qwen-14B",
    params={
        "searcher": searcher,
        "num_interaction": 3,
        "keep_passage_num": 5,
        "gen_num": 2,
        "accumulate": True,
        "use_passage_filter": True,
        "search_k": 1000,
    },
)

Method Comparison

Method Requires Context Type Best For
genqr No Keyword expansion General queries
genqr_ensemble No Keyword expansion Robust expansion
query2doc No Pseudo-document Dense retrieval
qa_expand No QA-based Complex queries
mugi No Multi-granular Diverse expansion
lamer Yes Context synthesis Re-ranking
query2e No Entity expansion Entity queries
csqe Yes Sentence extraction Precision-focused
thinkqe Yes Iterative reasoning Multi-round feedback

Custom Parameters

All methods support custom parameters:

reformulator = qg.create_reformulator(
    "genqr_ensemble",
    model="gpt-4",
    params={
        "repeat_query_weight": 5,
        "temperature": 0.7
    },
    llm_config={
        "temperature": 0.8,
        "max_tokens": 512
    }
)

Batch Processing

Process multiple queries efficiently:

queries = qg.load_queries("queries.tsv")
reformulator = qg.create_reformulator("genqr", model="gpt-4")

# Batch reformulation with progress bar
results = reformulator.reformulate_batch(queries)

Using Custom Prompts

See Prompt Bank for details on customizing prompts.

Next Steps