Understanding Pandas Returning Empty Data Frames
As a technical blogger, I have encountered numerous scenarios where data frames are returned empty. In this article, we will delve into the world of pandas and explore why it might return an empty data frame.
Introduction to Pandas
Pandas is a powerful library used for data manipulation and analysis in Python. It provides data structures and functions designed to make working with structured data easy and efficient.
Scrape Data Using BeautifulSoup and Selenium
In the provided question, we are using BeautifulSoup and Selenium to scrape data from a JavaScript-heavy website. The goal is to extract specific columns’ contents.
Importing Libraries
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
Setting Up the Browser
# Set up the browser
driver = webdriver.Chrome()
driver.get("https://www.canalplus.com/programme-tv/")
Parsing HTML Content
soup = BeautifulSoup(driver.page_source, 'html.parser')
Extracting Data
The provided code snippet attempts to extract data using XPath expressions:
time = []
sport = []
description = []
# Programme time
for item in soup.select("guide___1Ogg9"):
if item.find_next(find_element_by_xpath='//*[@id="landing_layers_1"]/div/div[2]/div[3]/div/ul/div[2]/ul/li/ul/li[2]/a/div[1]'):
time.append(item.find_next(
find_element_by_xpath='//*[@id="landing_layers_1"]/div/div[2]/div[3]/div/ul/div[2]/ul/li/ul/li[2]/a/div[1]').text.strip())
else:
time.append("Nan")
# Sport Name
if item.find_next(find_element_by_xpath='//*[@id="landing_layers_1"]/div/div[2]/div[3]/div/ul/div[2]/ul/li/ul/li[1]/a/div/div[2]/div[1]/span'):
sport.append(item.find_next(
find_element_by_xpath='//*[@id="landing_layers_1"]/div/div[2]/div[3]/div/ul/div[2]/ul/li/ul/li[1]/a/div/div[2]/div[1]/span').text.strip())
else:
sport.append("Nan")
# Programme info
if item.find_next(find_element_by_xpath='//*[@id="landing_layers_1"]/div/div[2]/div[4]/div[2]/div/div/ul/div[2]/ul/li/ul/li[4]/a/div[2]'):
description.append(item.find_next(
find_element_by_xpath='//*[@id="landing_layers_1"]/div/div[2]/div[4]/div[2]/div/div/ul/div[2]/ul/li/ul/li[4]/a/div[2]').text.strip())
else:
description.append("Nan")
Using the API to Fetch Data
In the provided answer, we use an API to fetch data directly from the website.
Importing Libraries
import requests
from tabulate import tabulate
Making a Request to the API
api_url = "https://secure-webtv-static.canal-plus.com/metadata/cpfra/all/v2.2/globalchannels.json"
response = requests.get(api_url).json()
Parsing JSON Response
tv_programme = {
channel["name"]: [
[
e['title'],
e['subTitle'],
pendulum.parse(e['timecodes'][0]['start']).time().strftime("%H:%M"),
datetime.timedelta(
milliseconds=e['timecodes'][0]['duration'],
).__str__().rsplit(".")[0],
] for e in channel["events"]
] for channel in response["channels"]
}
Creating a Data Frame
df = pd.DataFrame(tv_programme["CANAL+"], columns=["Title", "Subtitle", "Time", "Duration"])
print("Here is your data. Right I am off to sleep then!")
print(df)
df.to_csv("canalPlusSport.csv")
Conclusion
In this article, we explored the different scenarios where pandas might return an empty data frame. We discussed how using BeautifulSoup and Selenium can lead to empty data frames if not implemented correctly. Additionally, we saw how using APIs can be a more efficient way to fetch data directly from websites.
Recommended Reading
Last modified on 2023-09-06