Text Extractor

Fetch any URL and get back clean readable text: title, plain text, clean HTML, excerpt, byline, and word count. Powered by Mozilla Readability, the same library that drives Firefox Reader View.

⚡ 2 credits per call

Try it

Fill in the fields below and hit Send.

Why use this

Language models can't retrieve URLs at inference time, and raw HTML is mostly noise. A typical news page is 80-90% navigation, ads, and scripts.

This service fetches the URL server-side, strips the boilerplate using Mozilla Readability (the library that powers Firefox Reader View), and returns just the article content. It handles redirects, cookies, and compressed responses without any extra work on your end.

Good for:

Summarising articles without feeding raw HTML to your model
Pulling developer docs (MDN, GitHub READMEs, API references) into context
Research pipelines that need article text before analysis
Fact-checking tasks where the model needs the actual page content

Pricing

Any URL	2 credits

Charged on any reachable URL, including pages with no extractable content.

Request format

{ "url": "https://example.com/article" }

Field	Type	Notes
`url`	string	Required. Must be a valid URL including protocol (`https://`).

Response

{
  "url": "https://example.com/article",
  "title": "Article Title",
  "byline": "Author Name",
  "excerpt": "First sentence or lede...",
  "html": "<div><p>Clean HTML body...</p></div>",
  "text": "Clean plain text body...",
  "wordCount": 842
}

Field	Notes
`title`	Page or article title. `null` if not detected.
`byline`	Author name or byline. `null` if not present.
`excerpt`	Short excerpt or lede. `null` if not detected.
`html`	Clean article HTML with navigation, ads, and boilerplate stripped. `null` if Readability found no article.
`text`	Plain text body. `null` if Readability found no article.
`wordCount`	Word count of the plain text body. `null` if `text` is null.

All fields except url are nullable.

Errors (400, no charge)

Error	Cause
Invalid URL	Missing protocol, malformed URL
Fetch failed	DNS error, connection refused, network timeout (10s)
Non-200 response	Target URL returned 4xx or 5xx

API Reference

Endpoint

POST https://api.lightningapi.tools/extract-text

Required headers

Authorization: Bearer <apiKey>
Content-Type: application/json

Example request

{
  "url": "https://en.wikipedia.org/wiki/Node.js"
}

Example response

{
  "url": "https://en.wikipedia.org/wiki/Node.js",
  "title": "Node.js",
  "byline": null,
  "excerpt": "Node.js is a cross-platform, open-source JavaScript runtime environment that executes JavaScript code outside a web browser.",
  "html": "<div id=\"readability-page-1\"><p>Node.js is...</p></div>",
  "text": "Node.js is a cross-platform, open-source JavaScript runtime environment...",
  "wordCount": 4821
}