Answer to: Decoder only model AI making repetitive responses
Score: 0
I think I know what is causing the problem in your code.
cross-attention to itself
You build a transformerdecoder and passed memory=X, this makes the layer run cross-attention over memory=X. Because there's no encoder, the model effectively learns to attend to the same sequence via cross-attentions and tends to echo tokens which leads to repetition. So what you want for GPT-style models, is masked self-attention only, instead of cross-attention.
During your training you pad sequences but your loss doesn't ignore pad tokens:
Your training:
loss = criterion(logits.view(-1, vocab_size), yb.view(-1))
how you build your batches:
if len(ids) < block_size + 1:
pad_id = tokenizer.token_to_id('<pad>') or 0
ids += [pad_id] * (block_size + 1 - len(ids))
x = ids[:block_size]
y = ids[1:block_size+1]
this all means that if you have a short text chunk, the remainig tokens in y are just <pad>, so your model learns:
"When I see <pad>, the next token should be <pad>.
So to fix this, tell the loss to ignore padding.
pad_id = tokenizer.token_to_id('\<pad\>') or 0
criterion = nn.CrossEntropyLoss(ignore_index=pad_id)
Try to use top-k/top-p sampling and a repetition penalty during generation, this will probably reduce looping when the model is uncertain.
View Question ↗
Question
Parent Entity
Score: 2 • Views: 99
Site: stackoverflow
Other Comments / Reviews
SaaS Metrics