PhishVLM

Reference-based phishing detection without a predefined reference list — powered by Vision-Language Models.

An extension of our USENIX Security 2024 work "Less Defined Knowledge and More True Alarms: Reference-based Phishing Detection without a Pre-defined Reference List."

• Read our Paper • • Visit our Website • • Download our Datasets • • Cite our Paper •

Introduction

Existing reference-based phishing detection:

❌ Relies on a predefined reference list, which lacks comprehensiveness and incurs a high maintenance cost.
❌ Does not fully exploit the textual semantics present on the webpage.

PhishVLM builds a reference-based phishing detector that is:

✅ Free of a predefined reference list — modern VLMs have encoded far more extensive brand–domain knowledge than any hand-curated list.
✅ Chain-of-thought credential-taking prediction — the credential-taking status is reasoned step-by-step directly from the screenshot.

Framework

Input: a URL and its screenshot | Output: Phish / Benign and the phishing target brand.

Step 1 — Brand recognition. Input: cropped logo screenshot. Output: the VLM's predicted brand domain.
Step 2 — Credential-Requiring-Page (CRP) classification. Input: the webpage screenshot. The VLM chooses A. Credential-taking page or B. Non-credential-taking page. If A, go to Step 4; if B, go to Step 3.
Step 3 — CRP transition (only when Step 2 returns B). Input: screenshots of all clickable UI elements. The most likely login UI is clicked, and the pipeline returns to Step 1 with the updated webpage and URL (bounded by rank.depth_limit).
Step 4 — Decision. A page is flagged as phishing when all of the following hold:
1. the predicted brand's domain is inconsistent with the webpage's own domain; and
2. brand validation passes — by default the on-page logo is matched against Google Image search results for the predicted brand (brand_valid.activate: True); if validation is disabled, the predicted brand domain is instead required to be alive; and
3. the page is classified as a credential-taking page (Step 2 returns A).
If the predicted brand is itself a web-hosting / cloud provider (see datasets/hosting_blacklists.txt), the page is treated as benign. Otherwise the page is reported as benign.

Repository Structure

PhishVLM/
├── param_dict.yaml                 # Pipeline hyper-parameters
├── requirements.txt
├── prompts/                        # VLM prompts (system + few-shot examples)
│   ├── brand_recog_prompt.json
│   ├── crp_pred_prompt.json
│   └── crp_trans_prompt.json
├── datasets/
│   ├── hosting_blacklists.txt      # Web-hosting / cloud-provider domains
│   ├── test_sites/                 # Bundled demo site (www.baidu.com)
│   ├── openai_key.txt              # (you create) OpenAI API key
│   └── google_api_key.txt          # (you create) Google Search key + engine id
├── figures/
└── scripts/
    ├── infer/
    │   └── run.py                  # Entry point (inference loop)
    ├── pipeline/
    │   └── phishvlm.py             # PhishVLM class — the 4-step pipeline
    ├── utils/                      # Web interaction, drawing, logging helpers
    │   ├── web_utils.py
    │   ├── draw_utils.py
    │   ├── logger_utils.py
    │   └── PhishIntentionWrapper.py
    └── phishintention/             # Logo detector / siamese / OCR backbones
        ├── model_config.py         # load_config(): builds the vision models
        ├── configs/                # *.yaml model configs
        ├── modules/                # detector + logo matching
        ├── ocr_lib/                # OCR-aided siamese encoder
        ├── utils/
        └── setup.sh                # Downloads pretrained weights

Setup

Tested on Ubuntu with an NVIDIA GPU and CUDA 11. A CPU-only run is possible but slow; the vision backbones fall back to CPU automatically.

Step 1: Install requirements

A new conda environment named phishvlm is created in this step.

conda create -n phishvlm python=3.10 -y
conda activate phishvlm

# Python dependencies
pip install -r requirements.txt

# PyTorch (must match your CUDA version — example below is CUDA 11.3)
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 \
    --extra-index-url https://download.pytorch.org/whl/cu113

# detectron2 (used by the logo / layout detector)
pip install --no-build-isolation git+https://github.com/facebookresearch/detectron2.git

# Download the pretrained vision-model weights into scripts/phishintention/models/
cd scripts/phishintention
chmod +x setup.sh
./setup.sh
cd ../..

setup.sh downloads the layout/logo detector, the OCR-aided siamese encoder and the supporting reference files via gdown and places them under scripts/phishintention/models/. Re-running it is safe — existing files are skipped.

Step 2: Install Google Chrome

PhishVLM drives a headless Chrome through Selenium. Install Chrome and a matching driver (the driver is fetched automatically at runtime by webdriver-manager):

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb

Step 3: Register two API keys

🔑 OpenAI API key — tutorial. Paste the key into ./datasets/openai_key.txt:
```
echo "sk-your-openai-key" > ./datasets/openai_key.txt
```
🔑 Google Programmable Search API key — tutorial. Put the API key on the first line and the Search Engine ID on the second line of ./datasets/google_api_key.txt:
```
[API_KEY]
[SEARCH_ENGINE_ID]
```

Both key files live under datasets/ and are git-ignored, so your secrets are never committed.

Prepare a Dataset

Organize the sites you want to test as one subfolder per website:

testing_dir/
├── aaa.com/
│   ├── shot.png   # webpage screenshot
│   ├── info.txt   # webpage URL
│   └── html.txt   # webpage HTML source
├── bbb.com/
│   ├── shot.png
│   ├── info.txt
│   └── html.txt
└── ...

A ready-to-run example is provided in datasets/test_sites/.

Run PhishVLM

Run from the project root as a module (this guarantees the scripts package is importable):

conda activate phishvlm
python -m scripts.infer.run --folder ./datasets/test_sites

Optional arguments:

Argument	Default	Description
`--folder`	`./datasets/test_sites`	Folder of websites to test.
`--config`	`./param_dict.yaml`	Pipeline hyper-parameter file.

Understand the Output

The console prints a live log, e.g.:

Expand to see a sample log

[PhishLLMLogger][DEBUG] Folder ./datasets/field_study/2023-09-01/device-...remotewd.com
[PhishLLMLogger][DEBUG] Time taken for LLM brand prediction: 0.97 Detected brand: sonicwall.com
[PhishLLMLogger][DEBUG] Domain sonicwall.com is valid and alive
[PhishLLMLogger][DEBUG] Time taken for LLM CRP classification: 2.92   CRP prediction: A. This is a credential-requiring page.
[❗️] Phishing discovered, phishing target is sonicwall.com

A results file named [today's date]_phishllm.txt is written to the working directory (tab-separated). When a site is flagged as phishing, an annotated predict.png is saved inside that site's folder. Columns:

Column	Meaning
`folder`	Website subfolder name.
`phish_prediction`	`phish` or `benign`.
`target_prediction`	Predicted target brand domain (e.g. `paypal.com`).
`brand_recog_time`	Time spent on brand recognition + validation (s).
`crp_prediction_time`	Time spent on CRP prediction (s).
`crp_transition_time`	Time spent on CRP transition / ranking (s).

Configuration

All pipeline knobs live in param_dict.yaml, including:

VLM_model — the OpenAI vision model used (default gpt-4o-mini-2024-07-18).
brand_recog, crp_pred, rank — temperature, token limits and sleep/timeouts per step.
brand_valid — whether to validate the predicted brand via logo matching, and the top-k / similarity threshold to use.
rank.depth_limit — maximum number of CRP transitions (clicks) before giving up.

Model weights and detection thresholds are configured in scripts/phishintention/configs/configs.yaml.

Troubleshooting

ModuleNotFoundError: No module named 'scripts' — run the pipeline from the project root with the module form: python -m scripts.infer.run (not python scripts/infer/run.py).
FileNotFoundError: openai_key.txt / google_api_key.txt — create the key files under datasets/ as described in Step 3.
Chrome / driver errors — make sure Google Chrome is installed; webdriver-manager downloads the matching ChromeDriver on first run (requires network access).
CUDA / detectron2 build errors — verify that your installed PyTorch CUDA build matches the CUDA toolkit on your machine before installing detectron2.

Citation

@inproceedings{299838,
  author    = {Ruofan Liu and Yun Lin and Xiwen Teoh and Gongshen Liu and Zhiyong Huang and Jin Song Dong},
  title     = {Less Defined Knowledge and More True Alarms: Reference-based Phishing Detection without a Pre-defined Reference List},
  booktitle = {33rd USENIX Security Symposium (USENIX Security 24)},
  year      = {2024},
  isbn      = {978-1-939133-44-1},
  address   = {Philadelphia, PA},
  pages     = {523--540},
  url       = {https://www.usenix.org/conference/usenixsecurity24/presentation/liu-ruofan},
  publisher = {USENIX Association},
  month     = aug
}

If you have any issues running our code, please open a GitHub issue or email us: liu.ruofan16@u.nus.edu, lin_yun@sjtu.edu.cn, dcsdjs@nus.edu.sg.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhishVLM

Table of Contents

Introduction

Framework

Repository Structure

Setup

Step 1: Install requirements

Step 2: Install Google Chrome

Step 3: Register two API keys

Prepare a Dataset

Run PhishVLM

Understand the Output

Configuration

Troubleshooting

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
datasets		datasets
figures		figures
prompts		prompts
scripts		scripts
.gitignore		.gitignore
README.md		README.md
param_dict.yaml		param_dict.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PhishVLM

Table of Contents

Introduction

Framework

Repository Structure

Setup

Step 1: Install requirements

Step 2: Install Google Chrome

Step 3: Register two API keys

Prepare a Dataset

Run PhishVLM

Understand the Output

Configuration

Troubleshooting

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages