Readability Extractor

Fetch any URL and get back clean, readable article content — title, plain text, clean HTML, excerpt, byline, and word count. Powered by Mozilla Readability, the same engine as Firefox Reader View.

⚡ 2 credits per call

Try it

Fill in the fields below and hit Send.

Why use this

AI agents can't fetch and clean web pages on their own. Here's why that matters:

How it works: The service fetches the URL server-side, parses the raw HTML, and runs it through Mozilla Readability — the open-source library that powers Firefox Reader View. Readability strips navigation menus, ads, footers, sidebars, and scripts, leaving only the article body. The result is clean, structured text ready to pass directly to an LLM or process programmatically.

Why AIs can't do this themselves: Language models have no HTTP access at inference time — they can't retrieve a URL. Even if an agent could fetch the raw HTML, passing it to an LLM wastes context window on boilerplate (a typical news page is 80–90% non-content). And because the service runs server-side, it handles redirects, cookies, and compressed responses that would trip up a naive client.

When to use it:

  • Summarise a news article or blog post without passing raw HTML to your LLM
  • Pull developer documentation (MDN, GitHub READMEs, API docs) into agent context
  • Research pipelines that need clean article text before analysis
  • Citation and fact-checking tasks where the agent needs the article body
  • Any workflow where the agent is given a URL and needs to act on its content

Pricing

Any URL 2 credits

2 credits covers the server-side HTTP fetch, HTML parse, and Readability extraction. The call is always charged if the URL is reachable — even if the page has no extractable article content (null fields returned).

Request format

{ "url": "https://example.com/article" }
Field Type Notes
url string Required. Must be a valid URL including protocol (https://).

Response

{
  "url": "https://example.com/article",
  "title": "Article Title",
  "byline": "Author Name",
  "excerpt": "First sentence or lede...",
  "html": "<div><p>Clean HTML body...</p></div>",
  "text": "Clean plain text body...",
  "wordCount": 842
}
Field Notes
title Page or article title. null if not detected.
byline Author name or byline. null if not present.
excerpt Short excerpt or lede. null if not detected.
html Clean article HTML with navigation, ads, and boilerplate stripped. null if Readability found no article.
text Plain text body. null if Readability found no article.
wordCount Word count of the plain text body. null if text is null.

All fields except url are nullable. A null result means Readability couldn't identify an article — this is not an error and credits are still charged.

Errors (400, no charge)

Error Cause
Invalid URL Missing protocol, malformed URL
Fetch failed DNS error, connection refused, network timeout (10s)
Non-200 response Target URL returned 4xx or 5xx

API Reference

Endpoint

POST https://api.lightningapi.tools/readability/extract

Required headers

Authorization: Bearer <apiKey>
Content-Type: application/json

Example request

{
  "url": "https://en.wikipedia.org/wiki/Node.js"
}

Example response

{
  "url": "https://en.wikipedia.org/wiki/Node.js",
  "title": "Node.js",
  "byline": null,
  "excerpt": "Node.js is a cross-platform, open-source JavaScript runtime environment...",
  "html": "<div id=\"readability-page-1\"><p>Node.js is...</p></div>",
  "text": "Node.js is a cross-platform, open-source JavaScript runtime environment...",
  "wordCount": 4821
}