Skip to content
This repository was archived by the owner on Mar 4, 2026. It is now read-only.
This repository was archived by the owner on Mar 4, 2026. It is now read-only.

The variable "end_training" in Bert_Large training is wrongly used.  #170

@taotod

Description

@taotod

In the code below, the variable "end_training" is defined with boolean type to decide when to end the training.

https://github.com/IntelAI/models/blob/cdd842a33eb9d402ff18bfb79bd106ae132a8e99/models/language_modeling/pytorch/bert_large/training/gpu/run_pretrain_mlperf.py#L838

In the code below to calculate the one iteration training time, the variable "end_training" is wrongly re-used to record the end training time.
https://github.com/IntelAI/models/blob/cdd842a33eb9d402ff18bfb79bd106ae132a8e99/models/language_modeling/pytorch/bert_large/training/gpu/run_pretrain_mlperf.py#L1006

"end_training" is set with a non-zero value in the code line 1006. As a result, after one data file is used for training, the training exits here and will never go to next data file.
https://github.com/IntelAI/models/blob/cdd842a33eb9d402ff18bfb79bd106ae132a8e99/models/language_modeling/pytorch/bert_large/training/gpu/run_pretrain_mlperf.py#L1079

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions