Readability Extractor
Fetch any URL and get back clean, readable article content — title, plain text, clean HTML, excerpt, byline, and word count. Powered by Mozilla Readability, the same engine as Firefox Reader View.
Try it
Fill in the fields below and hit Send.
Why use this
AI agents can't fetch and clean web pages on their own. Here's why that matters:
How it works: The service fetches the URL server-side, parses the raw HTML, and runs it through Mozilla Readability — the open-source library that powers Firefox Reader View. Readability strips navigation menus, ads, footers, sidebars, and scripts, leaving only the article body. The result is clean, structured text ready to pass directly to an LLM or process programmatically.
Why AIs can't do this themselves: Language models have no HTTP access at inference time — they can't retrieve a URL. Even if an agent could fetch the raw HTML, passing it to an LLM wastes context window on boilerplate (a typical news page is 80–90% non-content). And because the service runs server-side, it handles redirects, cookies, and compressed responses that would trip up a naive client.
When to use it:
- Summarise a news article or blog post without passing raw HTML to your LLM
- Pull developer documentation (MDN, GitHub READMEs, API docs) into agent context
- Research pipelines that need clean article text before analysis
- Citation and fact-checking tasks where the agent needs the article body
- Any workflow where the agent is given a URL and needs to act on its content
Pricing
| Any URL | 2 credits |
|---|
2 credits covers the server-side HTTP fetch, HTML parse, and Readability extraction. The call is always charged if the URL is reachable — even if the page has no extractable article content (null fields returned).
Request format
{ "url": "https://example.com/article" }
| Field | Type | Notes |
|---|---|---|
url |
string | Required. Must be a valid URL including protocol (https://). |
Response
{
"url": "https://example.com/article",
"title": "Article Title",
"byline": "Author Name",
"excerpt": "First sentence or lede...",
"html": "<div><p>Clean HTML body...</p></div>",
"text": "Clean plain text body...",
"wordCount": 842
}
| Field | Notes |
|---|---|
title |
Page or article title. null if not detected. |
byline |
Author name or byline. null if not present. |
excerpt |
Short excerpt or lede. null if not detected. |
html |
Clean article HTML with navigation, ads, and boilerplate stripped. null if Readability found no article. |
text |
Plain text body. null if Readability found no article. |
wordCount |
Word count of the plain text body. null if text is null. |
All fields except url are nullable. A null result means Readability couldn't identify an article — this is not an error and credits are still charged.
Errors (400, no charge)
| Error | Cause |
|---|---|
| Invalid URL | Missing protocol, malformed URL |
| Fetch failed | DNS error, connection refused, network timeout (10s) |
| Non-200 response | Target URL returned 4xx or 5xx |
API Reference
Endpoint
POST https://api.lightningapi.tools/readability/extract
Required headers
Authorization: Bearer <apiKey>
Content-Type: application/json
Example request
{
"url": "https://en.wikipedia.org/wiki/Node.js"
}
Example response
{
"url": "https://en.wikipedia.org/wiki/Node.js",
"title": "Node.js",
"byline": null,
"excerpt": "Node.js is a cross-platform, open-source JavaScript runtime environment...",
"html": "<div id=\"readability-page-1\"><p>Node.js is...</p></div>",
"text": "Node.js is a cross-platform, open-source JavaScript runtime environment...",
"wordCount": 4821
}