Change Tracking

Change tracking compares the current content of a page against the last time you scraped it. Add changeTracking to your formats array to detect whether a page is new, unchanged, or modified, and optionally get a structured diff of what changed.

Works with /scrape, /crawl, and /batch/scrape
Two diff modes: git-diff for line-level changes, json for field-level comparison
Scoped to your team, and optionally scoped to a tag that you pass in

How it works

Every scrape with changeTracking enabled stores a snapshot and compares it against the previous snapshot for that URL.

Scrape	Result
First time	`changeStatus: "new"` (no previous version exists)
Content unchanged	`changeStatus: "same"`
Content modified	`changeStatus: "changed"` (diff data available)
Page removed	`changeStatus: "removed"`

The response includes these fields in the changeTracking object:

Field	Type	Description
`previousScrapeAt`	`string \| null`	Timestamp of the previous scrape (`null` on first scrape)
`changeStatus`	`string`	`"new"`, `"same"`, `"changed"`, or `"removed"`
`visibility`	`string`	`"visible"` (discoverable via links/sitemap) or `"hidden"` (URL works but is no longer linked)
`diff`	`object \| undefined`	Line-level diff (only present in `git-diff` mode when status is `"changed"`)
`json`	`object \| undefined`	Field-level comparison (only present in `json` mode when status is `"changed"`)

Basic usage

Include both markdown and changeTracking in the formats array. The markdown format is required because change tracking compares pages via their markdown content.

from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

result = firecrawl.scrape(
    "https://example.com/pricing",
    formats=["markdown", "changeTracking"]
)

print(result.changeTracking)

Response

On the first scrape, changeStatus is "new" and previousScrapeAt is null:

{
  "success": true,
  "data": {
    "markdown": "# Pricing\n\nStarter: $9/mo\nPro: $29/mo...",
    "changeTracking": {
      "previousScrapeAt": null,
      "changeStatus": "new",
      "visibility": "visible"
    }
  }
}

On subsequent scrapes, changeStatus reflects whether content changed:

{
  "success": true,
  "data": {
    "markdown": "# Pricing\n\nStarter: $12/mo\nPro: $39/mo...",
    "changeTracking": {
      "previousScrapeAt": "2025-06-01T10:00:00.000+00:00",
      "changeStatus": "changed",
      "visibility": "visible"
    }
  }
}

Git-diff mode

The git-diff mode returns line-by-line changes in a format similar to git diff. Pass an object in the formats array with modes: ["git-diff"]:

result = firecrawl.scrape(
    "https://example.com/pricing",
    formats=[
        "markdown",
        {
            "type": "changeTracking",
            "modes": ["git-diff"]
        }
    ]
)

if result.changeTracking.changeStatus == "changed":
    print(result.changeTracking.diff.text)

Response

The diff object contains both a plain-text diff and a structured JSON representation:

{
  "changeTracking": {
    "previousScrapeAt": "2025-06-01T10:00:00.000+00:00",
    "changeStatus": "changed",
    "visibility": "visible",
    "diff": {
      "text": "@@ -1,3 +1,3 @@\n # Pricing\n-Starter: $9/mo\n-Pro: $29/mo\n+Starter: $12/mo\n+Pro: $39/mo",
      "json": {
        "files": [{
          "chunks": [{
            "content": "@@ -1,3 +1,3 @@",
            "changes": [
              { "type": "normal", "content": "# Pricing" },
              { "type": "del", "ln": 2, "content": "Starter: $9/mo" },
              { "type": "del", "ln": 3, "content": "Pro: $29/mo" },
              { "type": "add", "ln": 2, "content": "Starter: $12/mo" },
              { "type": "add", "ln": 3, "content": "Pro: $39/mo" }
            ]
          }]
        }]
      }
    }
  }
}

The structured diff.json object contains:

files: array of changed files (typically one for web pages)
chunks: sections of changes within a file
changes: individual line changes with type ("add", "del", or "normal"), line number (ln), and content

JSON mode

The json mode extracts specific fields from both the current and previous version of the page using a schema you define. This is useful for tracking changes in structured data like prices, stock levels, or metadata without parsing a full diff. Pass modes: ["json"] with a schema defining the fields to extract:

result = firecrawl.scrape(
    "https://example.com/product/widget",
    formats=[
        "markdown",
        {
            "type": "changeTracking",
            "modes": ["json"],
            "schema": {
                "type": "object",
                "properties": {
                    "price": { "type": "string" },
                    "availability": { "type": "string" }
                }
            }
        }
    ]
)

if result.changeTracking.changeStatus == "changed":
    changes = result.changeTracking.json
    print(f"Price: {changes['price']['previous']} → {changes['price']['current']}")

Response

Each field in the schema is returned with previous and current values:

{
  "changeTracking": {
    "previousScrapeAt": "2025-06-05T08:00:00.000+00:00",
    "changeStatus": "changed",
    "visibility": "visible",
    "json": {
      "price": {
        "previous": "$19.99",
        "current": "$24.99"
      },
      "availability": {
        "previous": "In Stock",
        "current": "In Stock"
      }
    }
  }
}

You can also pass an optional prompt to guide the LLM extraction alongside the schema.

JSON mode uses LLM extraction and costs 5 credits per page. Basic change tracking and git-diff mode have no additional cost.

Using tags

By default, change tracking compares against the most recent scrape of the same URL scraped by your team. Tags let you maintain separate tracking histories for the same URL, which is useful when you monitor the same page at different intervals or in different contexts.

# Hourly monitoring (compared against last "hourly" scrape)
result = firecrawl.scrape(
    "https://example.com/pricing",
    formats=[
        "markdown",
        { "type": "changeTracking", "tag": "hourly" }
    ]
)

# Daily summary (compared against last "daily" scrape)
result = firecrawl.scrape(
    "https://example.com/pricing",
    formats=[
        "markdown",
        { "type": "changeTracking", "tag": "daily" }
    ]
)

Crawl with change tracking

Add change tracking to crawl operations to monitor an entire site for changes. Pass the changeTracking format inside scrapeOptions:

result = firecrawl.crawl(
    "https://example.com",
    limit=50,
    scrape_options={
        "formats": ["markdown", "changeTracking"]
    }
)

for page in result.data:
    status = page.changeTracking.changeStatus
    url = page.metadata.url
    print(f"{url}: {status}")

Batch scrape with change tracking

Use batch scrape to monitor a specific set of URLs:

result = firecrawl.batch_scrape(
    [
        "https://example.com/pricing",
        "https://example.com/product/widget",
        "https://example.com/blog/latest"
    ],
    formats=["markdown", {"type": "changeTracking", "modes": ["git-diff"]}]
)

Scheduling change tracking

Change tracking is most useful when you scrape on a regular schedule. You can automate this with cron, cloud schedulers, or workflow tools.

Cron job

Create a script that scrapes a URL and alerts on changes:

check-pricing.sh

#!/bin/bash
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://competitor.com/pricing",
    "formats": [
      "markdown",
      {
        "type": "changeTracking",
        "modes": ["json"],
        "schema": {
          "type": "object",
          "properties": {
            "starter_price": { "type": "string" },
            "pro_price": { "type": "string" }
          }
        }
      }
    ]
  }')

STATUS=$(echo "$RESPONSE" | jq -r '.data.changeTracking.changeStatus')

if [ "$STATUS" = "changed" ]; then
  echo "$RESPONSE" | jq '.data.changeTracking.json'
  # Send alert via email, Slack, etc.
fi

Schedule it with crontab -e:

0 */6 * * * /path/to/check-pricing.sh >> /var/log/price-monitor.log 2>&1

Schedule	Expression
Every hour	`0 * * * *`
Every 6 hours	`0 /6 * *`
Daily at 9 AM	`0 9 * * *`
Weekly on Monday at 8 AM	`0 8 * * 1`

Cloud and serverless schedulers

AWS: EventBridge rule triggering a Lambda function
GCP: Cloud Scheduler triggering a Cloud Function
Vercel / Netlify: Cron-triggered serverless functions
GitHub Actions: Scheduled workflows with schedule and cron trigger

Workflow automation

No-code platforms like n8n, Zapier, and Make can call the Firecrawl API on a schedule and route results to Slack, email, or databases. See the workflow automation guides.

Webhooks

For async operations like crawl and batch scrape, use webhooks to receive change tracking results as they arrive instead of polling.

job = firecrawl.start_crawl(
    "https://example.com",
    limit=50,
    scrape_options={
        "formats": [
            "markdown",
            {"type": "changeTracking", "modes": ["git-diff"]}
        ]
    },
    webhook={
        "url": "https://your-server.com/firecrawl-webhook",
        "headers": {"Authorization": "Bearer your-webhook-secret"},
        "events": ["crawl.page", "crawl.completed"]
    }
)

The crawl.page event payload includes the changeTracking object for each page:

{
  "success": true,
  "type": "crawl.page",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [{
    "markdown": "# Pricing\n\nStarter: $12/mo...",
    "metadata": {
      "title": "Pricing",
      "url": "https://example.com/pricing",
      "statusCode": 200
    },
    "changeTracking": {
      "previousScrapeAt": "2025-06-05T12:00:00.000+00:00",
      "changeStatus": "changed",
      "visibility": "visible",
      "diff": {
        "text": "@@ -2,1 +2,1 @@\n-Starter: $9/mo\n+Starter: $12/mo"
      }
    }
  }]
}

For webhook configuration details (headers, metadata, events, retries, signature verification), see the Webhooks documentation.

Configuration reference

The full set of options available when passing a changeTracking format object:

Parameter	Type	Default	Description
`type`	`string`	(required)	Must be `"changeTracking"`
`modes`	`string[]`	`[]`	Diff modes to enable: `"git-diff"`, `"json"`, or both
`schema`	`object`	(none)	JSON Schema for field-level comparison (required for `json` mode)
`prompt`	`string`	(none)	Custom prompt to guide LLM extraction (used with `json` mode)
`tag`	`string`	`null`	Separate tracking history identifier

Data models

interface ChangeTrackingResult {
  previousScrapeAt: string | null;
  changeStatus: "new" | "same" | "changed" | "removed";
  visibility: "visible" | "hidden";
  diff?: {
    text: string;
    json: {
      files: Array<{
        from: string | null;
        to: string | null;
        chunks: Array<{
          content: string;
          changes: Array<{
            type: "add" | "del" | "normal";
            ln?: number;
            ln1?: number;
            ln2?: number;
            content: string;
          }>;
        }>;
      }>;
    };
  };
  json?: Record<string, { previous: any; current: any }>;
}

Important details

The markdown format must always be included alongside changeTracking. Change tracking compares pages via their markdown content.

Scoping: Comparisons are scoped to your team. Your first scrape of any URL returns "new", even if other users have scraped it.
URL matching: Previous scrapes are matched on exact source URL, team ID, markdown format, and tag. Keep URLs consistent between scrapes.
Parameter consistency: Using different includeTags, excludeTags, or onlyMainContent settings across scrapes of the same URL produces unreliable comparisons.
Comparison algorithm: The algorithm is resistant to whitespace and content order changes. Iframe source URLs are ignored to handle captcha/antibot randomization.
Caching: Requests with changeTracking bypass the index cache. The maxAge parameter is ignored.
Error handling: Monitor the warning field in responses and handle the changeTracking object potentially being absent (this can occur if the database lookup for the previous scrape times out).

Billing

Mode	Cost
Basic change tracking	No extra cost (standard scrape credits)
`git-diff` mode	No extra cost
`json` mode	5 credits per page

Get Started

New Features

Standard Features

Webhooks

Developer Guides

Use Cases

Contributing

How it works

Basic usage

Response

Git-diff mode

Response

JSON mode

Response

Using tags

Crawl with change tracking

Batch scrape with change tracking

Scheduling change tracking

Cron job

Cloud and serverless schedulers

Workflow automation

Webhooks

Configuration reference

Data models

Important details

Billing

Get Started

New Features

Standard Features

Webhooks

Developer Guides

Use Cases

Contributing

​How it works

​Basic usage

​Response

​Git-diff mode

​Response

​JSON mode

​Response

​Using tags

​Crawl with change tracking

​Batch scrape with change tracking

​Scheduling change tracking

​Cron job

​Cloud and serverless schedulers

​Workflow automation

​Webhooks

​Configuration reference

​Data models

​Important details

​Billing

How it works

Basic usage

Response

Git-diff mode

Response

JSON mode

Response

Using tags

Crawl with change tracking

Batch scrape with change tracking

Scheduling change tracking

Cron job

Cloud and serverless schedulers

Workflow automation

Webhooks

Configuration reference

Data models

Important details

Billing