Good point!

1 min readSep 22, 2020

Good point! I haven't seen a lot of few-shot learning in BERT. However, the title for the GPT-3 paper was 'Language Models are Few-Shot Learners', and obviously GPT-3 has a few differences from BERT, but both are technically transformers.

Written by Andre Ye

Responses (1)