Andre Ye·Follow1 min read·Sep 22, 2020--1ShareGood point! I haven't seen a lot of few-shot learning in BERT. However, the title for the GPT-3 paper was 'Language Models are Few-Shot Learners', and obviously GPT-3 has a few differences from BERT, but both are technically transformers.