Unlocking the Web: Data Scraping Techniques

Prerequisites

Python

from bs4 import BeautifulSoup # pip install bs4
import requests # pip install requests
import re
import polars as pl # pip install polars

R

library(datapasta)  # install.packages("datapasta")
library(rvest)      # install.packages("rvest")
library(httr)      # install.packages("httr")
library(dplyr)     # install.packassges("dplyr")
library(stringr)   # install.packages("stringr")

IDE

  • VS Code or RStudio

Workfile

The demo html file is available in resources folder in the github repo.

Slides

You can view the slides below or open them in a new tab.