Good point! I haven't seen a lot of few-shot learning in BERT. However, the title for the GPT-3 paper was 'Language Models are Few-Shot Learners', and obviously GPT-3 has a few differences from BERT, but both are technically transformers.

ML & CS enthusiast. Let’s connect: https://www.linkedin.com/in/andre-ye. Check out my podcast: https://open.spotify.com/show/0wUzfk9C6nnH9G0tKXudUe

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store