Testing Python vs R

Python

Tutorial

Published

Jun 7, 2025

As data analysts and researchers, we often toggle between R and Python. But how do they compare in terms of raw speed—especially for common tasks like grouped aggregation?

In this post, I benchmark different approaches using both R and Python, including base R, tidyverse, data.table, pandas, and polars. I also experiment with calling Python’s Polars from within R using reticulate, just to see if inter-language calls come with performance penalties—or surprises.

Let’s dive in.

The Setup

The dataset I’m using is nba_all_elo.csv. You can get the data here It contains Elo ratings and predictions for thousands of NBA games. This data has 126314 rows and 23 columns.

bench::mark(
  base_r = {
    nba <- read.csv("nba_all_elo.csv")
    aggregate(forecast ~ game_result, data = nba, FUN = mean)
  },
  iterations = 10,
  check = FALSE
)

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 1 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 base_r        1.79s    2.02s     0.491     121MB     1.32

bench::mark(
  tidyverse = {
    read_csv("nba_all_elo.csv") |>
      summarise(avg_points = mean(forecast), .by = game_result)
  },
  iterations = 10,
  check = FALSE
)

# A tibble: 1 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 tidyverse     645ms    651ms      1.51    50.7MB     3.53

polars_stmt = """
import polars as pl
df = pl.read_csv('nba_all_elo.csv')
df.group_by('game_result').agg(pl.col('forecast').mean().alias('avg_points'))
"""
polars_time = timeit.timeit(stmt=polars_stmt, number=10) / 10
print(f"Polars avg time over 10 runs: {polars_time:.6f} sec")

Polars avg time over 10 runs: 0.044341 sec

pandas_stmt = """
import pandas as pd
df = pd.read_csv('nba_all_elo.csv')
df.groupby('game_result', as_index=False)["forecast"].mean()
"""
pandas_time = timeit.timeit(stmt=pandas_stmt, number=10) / 10
print(f"Pandas avg time over 10 runs: {pandas_time:.6f} sec")

Pandas avg time over 10 runs: 0.439172 sec

bench::mark(
  r_polars = {
    df <- pl$read_csv("nba_all_elo.csv")
    df$group_by("game_result")$agg(
      pl$col("forecast")$mean()$alias("avg_points")
    )
  },
  iterations = 10,
  check = FALSE
)

# A tibble: 1 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 r_polars     63.7ms   69.3ms      13.9    1.52MB        0

bench::mark(
  py_polars_in_r = {
    py_run_string(
      "
import polars as pl
df = pl.read_csv('nba_all_elo.csv')
df.group_by('game_result').agg(
  pl.col('forecast').mean().alias('avg_points')
)
      "
    )
  },
  iterations = 10,
  check = FALSE
)

# A tibble: 1 × 6
  expression          min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 py_polars_in_r   37.3ms   40.9ms      23.0        0B        0

bench::mark(
  data_table = {
    dt <- fread("nba_all_elo.csv")
    dt[, .(avg_points = mean(forecast, na.rm = TRUE)), by = game_result]
  },
  iterations = 10,
  check = FALSE
)

# A tibble: 1 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 data_table    124ms    126ms      7.73    27.1MB     6.45

Key Takeaways

Here’s what stood out from the benchmarks:

✅ Polars in R is the fastest option available. data.table is fast option but often comes at the cost of syntax which I am not comfortable with. It comfortably outsmarts both base R and the tidyverse.

✅ Polars is blazingly fast often 5–10x faster than Pandas or Tidyverse.

✅ Surprise! When I invoked Python’s Polars from within R via reticulate, it was as fast as running the same code directly in Python.

Final Thoughts

So, if performance is your priority:

Use Polars in R.
Use polars in Python—or even inside R if you’re already mixing languages.
Avoid tidyverse for speed-critical tasks, unless readability is your goal.

With tools like reticulate and quarto, we can blend strengths across ecosystems—without giving up speed. These tools help us to get the best of both worlds.

This document was built with R version 4.5.1 and Python version 3.11.

Session Info

R

Warning in system2("quarto", "-V", stdout = TRUE, env = paste0("TMPDIR=", :
running command '"quarto" TMPDIR=C:/Users/Nithin
M/AppData/Local/Temp/RtmpgL5g5J/file122014843ff -V' had status 1

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.1 (2025-06-13 ucrt)
 os       Windows 11 x64 (build 26100)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_India.utf8
 ctype    English_India.utf8
 tz       Asia/Calcutta
 date     2025-06-19
 pandoc   3.6.2 @ C:/Users/NITHIN~1/AppData/Local/Pandoc/ (via rmarkdown)
 quarto   1.6.40 @ C:\\PROGRA~1\\Quarto\\bin\\quarto.exe

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 bench       * 1.1.4   2025-01-16 [1] CRAN (R 4.5.0)
 data.table  * 1.17.4  2025-05-26 [1] CRAN (R 4.5.0)
 dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.5.0)
 pacman      * 0.5.1   2019-03-11 [1] CRAN (R 4.5.0)
 polars      * 0.22.4  2025-05-31 [1] https://r-multiverse.r-universe.dev (R 4.4.3)
 quarto      * 1.4.4   2024-07-20 [1] CRAN (R 4.5.0)
 readr       * 2.1.5   2024-01-10 [1] CRAN (R 4.5.0)
 reticulate  * 1.42.0  2025-03-25 [1] CRAN (R 4.5.0)
 sessioninfo * 1.2.3   2025-02-05 [1] CRAN (R 4.5.0)
 tictoc      * 1.2.1   2024-03-18 [1] CRAN (R 4.5.0)
 tidypolars  * 0.13.0  2025-05-28 [1] https://r-multiverse.r-universe.dev (R 4.4.3)

 [1] C:/Users/Nithin M/AppData/Local/R/win-library/4.4
 [2] C:/Program Files/R/R-4.5.1/library
 * ── Packages attached to the search path.

─ Python configuration ───────────────────────────────────────────────────────
 python:         C:/Users/Nithin M/OneDrive/Documents/GitHub/Websites and CV/websites/personal/nithinmkp.github.io/.venv/Scripts/python.exe
 libpython:      C:/Users/Nithin M/AppData/Roaming/uv/python/cpython-3.11.11-windows-x86_64-none/python311.dll
 pythonhome:     C:/Users/Nithin M/OneDrive/Documents/GitHub/Websites and CV/websites/personal/nithinmkp.github.io/.venv
 virtualenv:     C:/Users/Nithin M/OneDrive/Documents/GitHub/Websites and CV/websites/personal/nithinmkp.github.io/.venv/Scripts/activate_this.py
 version:        3.11.11 (main, Feb 12 2025, 14:49:02) [MSC v.1942 64 bit (AMD64)]
 Architecture:   64bit
 numpy:          C:/Users/Nithin M/OneDrive/Documents/GitHub/Websites and CV/websites/personal/nithinmkp.github.io/.venv/Lib/site-packages/numpy
 numpy_version:  2.2.6
 
 NOTE: Python version was forced by VIRTUAL_ENV

──────────────────────────────────────────────────────────────────────────────

Python

pandas: 2.3.0

polars: 1.30.0