Text Extractor
Fetch any URL and get back clean readable text: title, plain text, clean HTML, excerpt, byline, and word count. Powered by Mozilla Readability, the same library that drives Firefox Reader View.
Try it
Fill in the fields below and hit Send.
Why use this
Language models can't retrieve URLs at inference time, and raw HTML is mostly noise. A typical news page is 80-90% navigation, ads, and scripts.
This service fetches the URL server-side, strips the boilerplate using Mozilla Readability (the library that powers Firefox Reader View), and returns just the article content. It handles redirects, cookies, and compressed responses without any extra work on your end.
Good for:
- Summarising articles without feeding raw HTML to your model
- Pulling developer docs (MDN, GitHub READMEs, API references) into context
- Research pipelines that need article text before analysis
- Fact-checking tasks where the model needs the actual page content
Pricing
| Any URL | 2 credits |
|---|
Charged on any reachable URL, including pages with no extractable content.
Request format
{ "url": "https://example.com/article" }
| Field | Type | Notes |
|---|---|---|
url |
string | Required. Must be a valid URL including protocol (https://). |
Response
{
"url": "https://example.com/article",
"title": "Article Title",
"byline": "Author Name",
"excerpt": "First sentence or lede...",
"html": "<div><p>Clean HTML body...</p></div>",
"text": "Clean plain text body...",
"wordCount": 842
}
| Field | Notes |
|---|---|
title |
Page or article title. null if not detected. |
byline |
Author name or byline. null if not present. |
excerpt |
Short excerpt or lede. null if not detected. |
html |
Clean article HTML with navigation, ads, and boilerplate stripped. null if Readability found no article. |
text |
Plain text body. null if Readability found no article. |
wordCount |
Word count of the plain text body. null if text is null. |
All fields except url are nullable.
Errors (400, no charge)
| Error | Cause |
|---|---|
| Invalid URL | Missing protocol, malformed URL |
| Fetch failed | DNS error, connection refused, network timeout (10s) |
| Non-200 response | Target URL returned 4xx or 5xx |
API Reference
Endpoint
POST https://api.lightningapi.tools/extract-text
Required headers
Authorization: Bearer <apiKey>
Content-Type: application/json
Example request
{
"url": "https://en.wikipedia.org/wiki/Node.js"
}
Example response
{
"url": "https://en.wikipedia.org/wiki/Node.js",
"title": "Node.js",
"byline": null,
"excerpt": "Node.js is a cross-platform, open-source JavaScript runtime environment that executes JavaScript code outside a web browser.",
"html": "<div id=\"readability-page-1\"><p>Node.js is...</p></div>",
"text": "Node.js is a cross-platform, open-source JavaScript runtime environment...",
"wordCount": 4821
}