Seamless Data Flow: Using Google Sheets API with pandas DataFrames // Coders Digest

Working with data often means juggling between different platforms and formats. Google Sheets is a powerful collaborative tool, but its real potential emerges when you can programmatically access and manipulate your spreadsheet data. In this post, I’ll walk through how to create a robust integration between Google Sheets and pandas DataFrames in Python.

Why Connect Google Sheets with pandas?

Before diving into the code, let’s consider why this combination is so powerful:

Collaborative Data Collection: Team members can input data through a familiar interface
Automated Processing: Transform, clean, and analyze spreadsheet data programmatically
Two-way Synchronization: Push analysis results back to shareable spreadsheets
Scheduled Operations: Set up data pipelines that pull from or update Google Sheets

Setting Up Authentication

The first step is authentication. Google uses OAuth 2.0 to allow your application to access its APIs. Here’s how to set it up:

import os
import pandas as pd
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from googleapiclient.discovery import build

# Define the scopes
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']

# Path to your OAuth credentials file
SERVICE_ACCOUNT_FILE = 'path/to/google-sheets.json'

Creating a Google Sheets API Wrapper

Let’s create a class that handles authentication and provides methods for reading and updating spreadsheets:

class GoogleSheetsApi:
    def __init__(self) -> None:
        creds = None

        # Check if we have stored credentials
        if os.path.exists(SERVICE_ACCOUNT_FILE):
            creds = Credentials.from_authorized_user_file(
                SERVICE_ACCOUNT_FILE, SCOPES)

        # If no valid credentials available, authenticate
        if not creds or not creds.valid:
            if creds and creds.expired and creds.refresh_token:
                creds.refresh(Request())
            else:
                # Path to your OAuth client secrets file
                client_secrets_file = 'path/to/client_secrets.json'
                flow = InstalledAppFlow.from_client_secrets_file(
                    client_secrets_file, SCOPES)
                creds = flow.run_local_server(port=0)

            # Save credentials for future use
            with open(SERVICE_ACCOUNT_FILE, "w") as token:
                token.write(creds.to_json())

        # Initialize the Sheets API service
        self.service = build(
            'sheets', 'v4', credentials=creds, cache_discovery=False)

Reading Spreadsheet Data into a DataFrame

Now let’s create a method to read spreadsheet data into a pandas DataFrame:

def read_sheet_as_df(self, sheet_id, range_name='Sheet1!A1:Z100'):
    """
    Reads Google Sheet data into a pandas DataFrame.

    Args:
        sheet_id (str): The spreadsheet ID (found in the sheet's URL)
        range_name (str): The range to read in A1 notation

    Returns:
        pandas.DataFrame: DataFrame containing the sheet data
    """
    # Call the Sheets API
    sheet = self.service.spreadsheets()
    result = sheet.values().get(
        spreadsheetId=sheet_id, range=range_name).execute()
    values = result.get('values', [])

    # Convert to DataFrame if data is present
    if values:
        df = pd.DataFrame(values[1:], columns=values[0])
        return df
    else:
        print('No data found.')
        return pd.DataFrame()

Updating a Spreadsheet with DataFrame Data

Just as important as reading data is being able to write data back to Google Sheets:

def update_sheet_with_df(self, sheet_id, df, range_name='Sheet1!A1:Z'):
    """
    Updates a Google Sheet with DataFrame data.

    Args:
        sheet_id (str): The ID of the Google Sheet to update
        df (pandas.DataFrame): DataFrame containing the data
        range_name (str): The range to update in A1 notation

    Returns:
        bool: True if successful, False otherwise
    """
    try:
        # Convert DataFrame to list of lists (including header)
        values = [df.columns.tolist()] + df.values.tolist()
        body = {'values': values}

        # Update the sheet
        self.service.spreadsheets().values().update(
            spreadsheetId=sheet_id,
            range=range_name,
            valueInputOption='RAW',
            body=body).execute()
        return True
    except Exception as e:
        print(f"Error updating sheet: {str(e)}")
        return False

Handling Incremental Updates

A common scenario is needing to append new data without duplicating existing entries. Here’s a method that handles this case:

def append_new_rows(self, sheet_id, df, range_name='Sheet1!A1:Z'):
    """
    Appends new rows to a Google Sheet, avoiding duplicates based on the first column.

    Args:
        sheet_id (str): The ID of the Google Sheet to update
        df (pandas.DataFrame): DataFrame containing the new data
        range_name (str): The range in A1 notation

    Returns:
        bool: True if successful, False otherwise
    """
    try:
        # First, read existing data
        existing_df = self.read_sheet_as_df(sheet_id, range_name)

        if existing_df.empty:
            # If sheet is empty, write the entire DataFrame including headers
            return self.update_sheet_with_df(sheet_id, df, range_name)

        # Get the name of the first column (key column)
        key_col = existing_df.columns[0]

        # Find new rows by comparing key column
        existing_keys = set(existing_df[key_col].astype(str))
        new_rows = df[~df[key_col].astype(str).isin(existing_keys)]

        if new_rows.empty:
            print("No new data to add.")
            return True

        # Append new rows to the sheet
        start_row = len(existing_df) + 2  # +1 for header, +1 for next row
        append_range = f"{range_name.split('!')[0]}!A{start_row}"

        body = {'values': new_rows.values.tolist()}

        self.service.spreadsheets().values().append(
            spreadsheetId=sheet_id,
            range=append_range,
            valueInputOption='RAW',
            insertDataOption='INSERT_ROWS',
            body=body).execute()

        print(f"Added {len(new_rows)} new rows to the sheet.")
        return True

    except Exception as e:
        print(f"Error appending to sheet: {str(e)}")
        return False

Putting It All Together

Here’s how you might use this API wrapper in practice:

# Initialize the API wrapper
gsheets = GoogleSheetsApi()

# Read data from a Google Sheet
sheet_id = '1LQDvYuV87AJzgik-jBO9ZO5pjpNXfd5LvSoa9Vy6Knk'  # Replace with your sheet ID
df = gsheets.read_sheet_as_df(sheet_id, 'Raw!A1:D100')

# Process the data with pandas
df['Value'] = pd.to_numeric(df['Value'], errors='coerce')
summary_df = df.groupby('Category').agg({'Value': 'sum'}).reset_index()

# Write the summary back to another sheet in the same spreadsheet
gsheets.update_sheet_with_df(sheet_id, summary_df, 'Summary!A1:B10')

Security Best Practices

When working with the Google Sheets API, it’s important to keep security in mind:

Never commit credentials: Keep your OAuth files in .gitignore
Use environment variables: For sensitive values like sheet IDs
Limit API scopes: Only request the permissions you need
Regularly rotate credentials: Update your client secrets periodically

Final Thoughts

The combination of Google Sheets and pandas DataFrames creates a powerful system for collaborative data workflows. The spreadsheet interface is accessible to team members of all technical levels, while the pandas integration enables sophisticated data processing and automation.

This approach works especially well for:

Small to medium-sized datasets
Collaborative data collection processes
Workflows requiring human review and input
Presenting results to non-technical stakeholders

By building a clean API wrapper like the one shown here, you can create maintainable, reusable code that bridges the gap between collaborative spreadsheets and programmatic data analysis.

Have you built interesting integrations between Google Sheets and Python? Share your experiences in the comments below!