Hello Lucas,

I believe that for most people, size of the dataset should be the only deciding factor, but there are some other guiding points:

  • If you feel constricted by a limitation of functionality in Pandas, you may want to check out some SQL for it. There’s probably a solution to whatever you are seeking (there are also SQL integrations with Python if you don’t want to do SQL querying in specialized software like DataGrip).
  • If your dataset has lots of specialized data types, like dates, it may be easier to handle it in SQL, depending on your confidence working with dates in Python (which can get quite thorny).

I think that, however, Pandas can usually handle what I need, and that processing SQL can slim down the dataset size and hence processing time, which would be my primary use case (hybrid). If you’re doing an analysis, then I wouldn’t recommend using SQL alone, as it’s definitely helpful to have Pandas’ integration with plotting and statistics libraries. If you’re just looking for some quick numbers, however, SQL is definitely the way to go.

Hope this helped!

ML & CS enthusiast. Let’s connect: https://www.linkedin.com/in/andre-ye.

ML & CS enthusiast. Let’s connect: https://www.linkedin.com/in/andre-ye.