Readability Extractor

Fetch any URL and get back clean, readable article content — title, plain text, clean HTML, excerpt, byline, and word count. Powered by Mozilla Readability, the same engine as Firefox Reader View.

⚡ 2 credits per call

Try it

Fill in the fields below and hit Send.

Why use this

AI agents can't fetch and clean web pages on their own. Here's why that matters:

How it works: The service fetches the URL server-side, parses the raw HTML, and runs it through Mozilla Readability — the open-source library that powers Firefox Reader View. Readability strips navigation menus, ads, footers, sidebars, and scripts, leaving only the article body. The result is clean, structured text ready to pass directly to an LLM or process programmatically.

Why AIs can't do this themselves: Language models have no HTTP access at inference time — they can't retrieve a URL. Even if an agent could fetch the raw HTML, passing it to an LLM wastes context window on boilerplate (a typical news page is 80–90% non-content). And because the service runs server-side, it handles redirects, cookies, and compressed responses that would trip up a naive client.

When to use it:

Summarise a news article or blog post without passing raw HTML to your LLM
Pull developer documentation (MDN, GitHub READMEs, API docs) into agent context
Research pipelines that need clean article text before analysis
Citation and fact-checking tasks where the agent needs the article body
Any workflow where the agent is given a URL and needs to act on its content

Pricing

Any URL	2 credits

2 credits covers the server-side HTTP fetch, HTML parse, and Readability extraction. The call is always charged if the URL is reachable — even if the page has no extractable article content (null fields returned).

Request format

{ "url": "https://example.com/article" }

Field	Type	Notes
`url`	string	Required. Must be a valid URL including protocol (`https://`).

Response

{
  "url": "https://example.com/article",
  "title": "Article Title",
  "byline": "Author Name",
  "excerpt": "First sentence or lede...",
  "html": "<div><p>Clean HTML body...</p></div>",
  "text": "Clean plain text body...",
  "wordCount": 842
}

Field	Notes
`title`	Page or article title. `null` if not detected.
`byline`	Author name or byline. `null` if not present.
`excerpt`	Short excerpt or lede. `null` if not detected.
`html`	Clean article HTML with navigation, ads, and boilerplate stripped. `null` if Readability found no article.
`text`	Plain text body. `null` if Readability found no article.
`wordCount`	Word count of the plain text body. `null` if `text` is null.

All fields except url are nullable. A null result means Readability couldn't identify an article — this is not an error and credits are still charged.

Errors (400, no charge)

Error	Cause
Invalid URL	Missing protocol, malformed URL
Fetch failed	DNS error, connection refused, network timeout (10s)
Non-200 response	Target URL returned 4xx or 5xx

API Reference

Endpoint

POST https://api.lightningapi.tools/readability/extract

Required headers

Authorization: Bearer <apiKey>
Content-Type: application/json

Example request

{
  "url": "https://en.wikipedia.org/wiki/Node.js"
}

Example response

{
  "url": "https://en.wikipedia.org/wiki/Node.js",
  "title": "Node.js",
  "byline": null,
  "excerpt": "Node.js is a cross-platform, open-source JavaScript runtime environment...",
  "html": "<div id=\"readability-page-1\"><p>Node.js is...</p></div>",
  "text": "Node.js is a cross-platform, open-source JavaScript runtime environment...",
  "wordCount": 4821
}