Text Extractor

Fetch any URL and get back clean readable text: title, plain text, clean HTML, excerpt, byline, and word count. Powered by Mozilla Readability, the same library that drives Firefox Reader View.

⚡ 2 credits per call

Try it

Fill in the fields below and hit Send.

Why use this

Language models can't retrieve URLs at inference time, and raw HTML is mostly noise. A typical news page is 80-90% navigation, ads, and scripts.

This service fetches the URL server-side, strips the boilerplate using Mozilla Readability (the library that powers Firefox Reader View), and returns just the article content. It handles redirects, cookies, and compressed responses without any extra work on your end.

Good for:

  • Summarising articles without feeding raw HTML to your model
  • Pulling developer docs (MDN, GitHub READMEs, API references) into context
  • Research pipelines that need article text before analysis
  • Fact-checking tasks where the model needs the actual page content

Pricing

Any URL 2 credits

Charged on any reachable URL, including pages with no extractable content.

Request format

{ "url": "https://example.com/article" }
Field Type Notes
url string Required. Must be a valid URL including protocol (https://).

Response

{
  "url": "https://example.com/article",
  "title": "Article Title",
  "byline": "Author Name",
  "excerpt": "First sentence or lede...",
  "html": "<div><p>Clean HTML body...</p></div>",
  "text": "Clean plain text body...",
  "wordCount": 842
}
Field Notes
title Page or article title. null if not detected.
byline Author name or byline. null if not present.
excerpt Short excerpt or lede. null if not detected.
html Clean article HTML with navigation, ads, and boilerplate stripped. null if Readability found no article.
text Plain text body. null if Readability found no article.
wordCount Word count of the plain text body. null if text is null.

All fields except url are nullable.

Errors (400, no charge)

Error Cause
Invalid URL Missing protocol, malformed URL
Fetch failed DNS error, connection refused, network timeout (10s)
Non-200 response Target URL returned 4xx or 5xx

API Reference

Endpoint

POST https://api.lightningapi.tools/extract-text

Required headers

Authorization: Bearer <apiKey>
Content-Type: application/json

Example request

{
  "url": "https://en.wikipedia.org/wiki/Node.js"
}

Example response

{
  "url": "https://en.wikipedia.org/wiki/Node.js",
  "title": "Node.js",
  "byline": null,
  "excerpt": "Node.js is a cross-platform, open-source JavaScript runtime environment that executes JavaScript code outside a web browser.",
  "html": "<div id=\"readability-page-1\"><p>Node.js is...</p></div>",
  "text": "Node.js is a cross-platform, open-source JavaScript runtime environment...",
  "wordCount": 4821
}