Skip to main content
Change Tracking Change tracking compares the current content of a page against the last time you scraped it. Add changeTracking to your formats array to detect whether a page is new, unchanged, or modified, and optionally get a structured diff of what changed.
  • Works with /scrape, /crawl, and /batch/scrape
  • Two diff modes: git-diff for line-level changes, json for field-level comparison
  • Scoped to your team, and optionally scoped to a tag that you pass in

How it works

Every scrape with changeTracking enabled stores a snapshot and compares it against the previous snapshot for that URL.
ScrapeResult
First timechangeStatus: "new" (no previous version exists)
Content unchangedchangeStatus: "same"
Content modifiedchangeStatus: "changed" (diff data available)
Page removedchangeStatus: "removed"
The response includes these fields in the changeTracking object:
FieldTypeDescription
previousScrapeAtstring | nullTimestamp of the previous scrape (null on first scrape)
changeStatusstring"new", "same", "changed", or "removed"
visibilitystring"visible" (discoverable via links/sitemap) or "hidden" (URL works but is no longer linked)
diffobject | undefinedLine-level diff (only present in git-diff mode when status is "changed")
jsonobject | undefinedField-level comparison (only present in json mode when status is "changed")

Basic usage

Include both markdown and changeTracking in the formats array. The markdown format is required because change tracking compares pages via their markdown content.
from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")

result = firecrawl.scrape(
    "https://example.com/pricing",
    formats=["markdown", "changeTracking"]
)

print(result.changeTracking)

Response

On the first scrape, changeStatus is "new" and previousScrapeAt is null:
{
  "success": true,
  "data": {
    "markdown": "# Pricing\n\nStarter: $9/mo\nPro: $29/mo...",
    "changeTracking": {
      "previousScrapeAt": null,
      "changeStatus": "new",
      "visibility": "visible"
    }
  }
}
On subsequent scrapes, changeStatus reflects whether content changed:
{
  "success": true,
  "data": {
    "markdown": "# Pricing\n\nStarter: $12/mo\nPro: $39/mo...",
    "changeTracking": {
      "previousScrapeAt": "2025-06-01T10:00:00.000+00:00",
      "changeStatus": "changed",
      "visibility": "visible"
    }
  }
}

Git-diff mode

The git-diff mode returns line-by-line changes in a format similar to git diff. Pass an object in the formats array with modes: ["git-diff"]:
result = firecrawl.scrape(
    "https://example.com/pricing",
    formats=[
        "markdown",
        {
            "type": "changeTracking",
            "modes": ["git-diff"]
        }
    ]
)

if result.changeTracking.changeStatus == "changed":
    print(result.changeTracking.diff.text)

Response

The diff object contains both a plain-text diff and a structured JSON representation:
{
  "changeTracking": {
    "previousScrapeAt": "2025-06-01T10:00:00.000+00:00",
    "changeStatus": "changed",
    "visibility": "visible",
    "diff": {
      "text": "@@ -1,3 +1,3 @@\n # Pricing\n-Starter: $9/mo\n-Pro: $29/mo\n+Starter: $12/mo\n+Pro: $39/mo",
      "json": {
        "files": [{
          "chunks": [{
            "content": "@@ -1,3 +1,3 @@",
            "changes": [
              { "type": "normal", "content": "# Pricing" },
              { "type": "del", "ln": 2, "content": "Starter: $9/mo" },
              { "type": "del", "ln": 3, "content": "Pro: $29/mo" },
              { "type": "add", "ln": 2, "content": "Starter: $12/mo" },
              { "type": "add", "ln": 3, "content": "Pro: $39/mo" }
            ]
          }]
        }]
      }
    }
  }
}
The structured diff.json object contains:
  • files: array of changed files (typically one for web pages)
  • chunks: sections of changes within a file
  • changes: individual line changes with type ("add", "del", or "normal"), line number (ln), and content

JSON mode

The json mode extracts specific fields from both the current and previous version of the page using a schema you define. This is useful for tracking changes in structured data like prices, stock levels, or metadata without parsing a full diff. Pass modes: ["json"] with a schema defining the fields to extract:
result = firecrawl.scrape(
    "https://example.com/product/widget",
    formats=[
        "markdown",
        {
            "type": "changeTracking",
            "modes": ["json"],
            "schema": {
                "type": "object",
                "properties": {
                    "price": { "type": "string" },
                    "availability": { "type": "string" }
                }
            }
        }
    ]
)

if result.changeTracking.changeStatus == "changed":
    changes = result.changeTracking.json
    print(f"Price: {changes['price']['previous']}{changes['price']['current']}")

Response

Each field in the schema is returned with previous and current values:
{
  "changeTracking": {
    "previousScrapeAt": "2025-06-05T08:00:00.000+00:00",
    "changeStatus": "changed",
    "visibility": "visible",
    "json": {
      "price": {
        "previous": "$19.99",
        "current": "$24.99"
      },
      "availability": {
        "previous": "In Stock",
        "current": "In Stock"
      }
    }
  }
}
You can also pass an optional prompt to guide the LLM extraction alongside the schema.
JSON mode uses LLM extraction and costs 5 credits per page. Basic change tracking and git-diff mode have no additional cost.

Using tags

By default, change tracking compares against the most recent scrape of the same URL scraped by your team. Tags let you maintain separate tracking histories for the same URL, which is useful when you monitor the same page at different intervals or in different contexts.
# Hourly monitoring (compared against last "hourly" scrape)
result = firecrawl.scrape(
    "https://example.com/pricing",
    formats=[
        "markdown",
        { "type": "changeTracking", "tag": "hourly" }
    ]
)

# Daily summary (compared against last "daily" scrape)
result = firecrawl.scrape(
    "https://example.com/pricing",
    formats=[
        "markdown",
        { "type": "changeTracking", "tag": "daily" }
    ]
)

Crawl with change tracking

Add change tracking to crawl operations to monitor an entire site for changes. Pass the changeTracking format inside scrapeOptions:
result = firecrawl.crawl(
    "https://example.com",
    limit=50,
    scrape_options={
        "formats": ["markdown", "changeTracking"]
    }
)

for page in result.data:
    status = page.changeTracking.changeStatus
    url = page.metadata.url
    print(f"{url}: {status}")

Batch scrape with change tracking

Use batch scrape to monitor a specific set of URLs:
result = firecrawl.batch_scrape(
    [
        "https://example.com/pricing",
        "https://example.com/product/widget",
        "https://example.com/blog/latest"
    ],
    formats=["markdown", {"type": "changeTracking", "modes": ["git-diff"]}]
)

Scheduling change tracking

Change tracking is most useful when you scrape on a regular schedule. You can automate this with cron, cloud schedulers, or workflow tools.

Cron job

Create a script that scrapes a URL and alerts on changes:
check-pricing.sh
#!/bin/bash
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://competitor.com/pricing",
    "formats": [
      "markdown",
      {
        "type": "changeTracking",
        "modes": ["json"],
        "schema": {
          "type": "object",
          "properties": {
            "starter_price": { "type": "string" },
            "pro_price": { "type": "string" }
          }
        }
      }
    ]
  }')

STATUS=$(echo "$RESPONSE" | jq -r '.data.changeTracking.changeStatus')

if [ "$STATUS" = "changed" ]; then
  echo "$RESPONSE" | jq '.data.changeTracking.json'
  # Send alert via email, Slack, etc.
fi
Schedule it with crontab -e:
0 */6 * * * /path/to/check-pricing.sh >> /var/log/price-monitor.log 2>&1
ScheduleExpression
Every hour0 * * * *
Every 6 hours0 */6 * * *
Daily at 9 AM0 9 * * *
Weekly on Monday at 8 AM0 8 * * 1

Cloud and serverless schedulers

  • AWS: EventBridge rule triggering a Lambda function
  • GCP: Cloud Scheduler triggering a Cloud Function
  • Vercel / Netlify: Cron-triggered serverless functions
  • GitHub Actions: Scheduled workflows with schedule and cron trigger

Workflow automation

No-code platforms like n8n, Zapier, and Make can call the Firecrawl API on a schedule and route results to Slack, email, or databases. See the workflow automation guides.

Webhooks

For async operations like crawl and batch scrape, use webhooks to receive change tracking results as they arrive instead of polling.
job = firecrawl.start_crawl(
    "https://example.com",
    limit=50,
    scrape_options={
        "formats": [
            "markdown",
            {"type": "changeTracking", "modes": ["git-diff"]}
        ]
    },
    webhook={
        "url": "https://your-server.com/firecrawl-webhook",
        "headers": {"Authorization": "Bearer your-webhook-secret"},
        "events": ["crawl.page", "crawl.completed"]
    }
)
The crawl.page event payload includes the changeTracking object for each page:
{
  "success": true,
  "type": "crawl.page",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "data": [{
    "markdown": "# Pricing\n\nStarter: $12/mo...",
    "metadata": {
      "title": "Pricing",
      "url": "https://example.com/pricing",
      "statusCode": 200
    },
    "changeTracking": {
      "previousScrapeAt": "2025-06-05T12:00:00.000+00:00",
      "changeStatus": "changed",
      "visibility": "visible",
      "diff": {
        "text": "@@ -2,1 +2,1 @@\n-Starter: $9/mo\n+Starter: $12/mo"
      }
    }
  }]
}
For webhook configuration details (headers, metadata, events, retries, signature verification), see the Webhooks documentation.

Configuration reference

The full set of options available when passing a changeTracking format object:
ParameterTypeDefaultDescription
typestring(required)Must be "changeTracking"
modesstring[][]Diff modes to enable: "git-diff", "json", or both
schemaobject(none)JSON Schema for field-level comparison (required for json mode)
promptstring(none)Custom prompt to guide LLM extraction (used with json mode)
tagstringnullSeparate tracking history identifier

Data models

interface ChangeTrackingResult {
  previousScrapeAt: string | null;
  changeStatus: "new" | "same" | "changed" | "removed";
  visibility: "visible" | "hidden";
  diff?: {
    text: string;
    json: {
      files: Array<{
        from: string | null;
        to: string | null;
        chunks: Array<{
          content: string;
          changes: Array<{
            type: "add" | "del" | "normal";
            ln?: number;
            ln1?: number;
            ln2?: number;
            content: string;
          }>;
        }>;
      }>;
    };
  };
  json?: Record<string, { previous: any; current: any }>;
}

Important details

The markdown format must always be included alongside changeTracking. Change tracking compares pages via their markdown content.
  • Scoping: Comparisons are scoped to your team. Your first scrape of any URL returns "new", even if other users have scraped it.
  • URL matching: Previous scrapes are matched on exact source URL, team ID, markdown format, and tag. Keep URLs consistent between scrapes.
  • Parameter consistency: Using different includeTags, excludeTags, or onlyMainContent settings across scrapes of the same URL produces unreliable comparisons.
  • Comparison algorithm: The algorithm is resistant to whitespace and content order changes. Iframe source URLs are ignored to handle captcha/antibot randomization.
  • Caching: Requests with changeTracking bypass the index cache. The maxAge parameter is ignored.
  • Error handling: Monitor the warning field in responses and handle the changeTracking object potentially being absent (this can occur if the database lookup for the previous scrape times out).

Billing

ModeCost
Basic change trackingNo extra cost (standard scrape credits)
git-diff modeNo extra cost
json mode5 credits per page