Good point! I haven't seen a lot of few-shot learning in BERT. However, the title for the GPT-3 paper was 'Language Models are Few-Shot Learners', and obviously GPT-3 has a few differences from BERT, but both are technically transformers.
Good point! I haven't seen a lot of few-shot learning in BERT. However, the title for the GPT-3 paper was 'Language Models are Few-Shot Learners', and obviously GPT-3 has a few differences from BERT, but both are technically transformers.
ML & CS enthusiast. Let’s connect: https://www.linkedin.com/in/andre-ye. Check out my podcast: https://open.spotify.com/show/0wUzfk9C6nnH9G0tKXudUe
ML & CS enthusiast. Let’s connect: https://www.linkedin.com/in/andre-ye. Check out my podcast: https://open.spotify.com/show/0wUzfk9C6nnH9G0tKXudUe