Hello Lucas,

I believe that for most people, size of the dataset should be the only deciding factor, but there are some other guiding points:

  • If you feel constricted by a limitation of functionality in Pandas, you may want to check out some SQL for it. There’s probably a solution to whatever you are seeking (there are also SQL integrations with Python if you don’t want to do SQL querying in specialized software like DataGrip).
  • If your dataset has lots of specialized data types, like dates, it may be easier to handle it in SQL, depending on your confidence working with dates in Python (which can get quite thorny).

I think that, however, Pandas can usually handle what I need, and that processing SQL can slim down the dataset size and hence processing time, which would be my primary use case (hybrid). If you’re doing an analysis, then I wouldn’t recommend using SQL alone, as it’s definitely helpful to have Pandas’ integration with plotting and statistics libraries. If you’re just looking for some quick numbers, however, SQL is definitely the way to go.

Hope this helped!

ML & CS enthusiast. Let’s connect: https://www.linkedin.com/in/andre-ye. Check out my podcast: https://open.spotify.com/show/0wUzfk9C6nnH9G0tKXudUe

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store