A self-hosted data engine that collects from 764 Texas public data sources — Secretary of State, Comptroller, county appraisal districts, Ethics Commission, Railroad Commission, TCEQ, TDLR, DPS, and more. Business filings, property records, campaign finance, environmental permits, licensing data, tax records.
Built on FeedEater — a modular TypeScript data engine with NATS messaging and PostgreSQL storage.
Each data source is a module you can toggle on or off. Deploy with docker compose up. Manage via web UI, CLI, or REST API.
$ docker compose up -d
[+] Running 5/5
✔ Container feedeater-nats Started
✔ Container feedeater-postgres Started
✔ Container feedeater-engine Started
✔ Container feedeater-api Started
✔ Container feedeater-ui Started
$ curl localhost:3000/api/health
{
"status": "healthy",
"engine": "feedeater",
"modules_loaded": 47,
"sources_available": 764,
"sources_enabled": 0,
"database": "connected",
"nats": "connected"
}
Five containers: NATS (message bus), PostgreSQL (storage), FeedEater engine (module runner), REST API (query + control), Web UI (dashboard). Everything talks over NATS. Modules publish data, the engine stores it, you query it.
Sources with downloadable datasets. Toggle on, the module downloads the full dataset, stores it in Postgres. Comptroller Open Data (1,360 datasets), SOS bulk filings, and more.
Sources with enumerable key spaces. The module iterates through ID ranges — entity numbers, license numbers, permit IDs — collecting everything. Rate-limited, resumable, runs in background.
Sources that require search parameters. Each query module has its own page in the web UI with the right input fields. Search on demand, save results to your local database.
Open localhost:3000 in your browser. See all 764 sources. Toggle them on or off. Trigger manual scrapes.
Search your collected data. Each query source has a dedicated page with the right input fields for that source.
Monitor collection status, see what's running, what's completed, what's failed.
# List available sources
$ curl localhost:3000/api/sources
# Enable a source
$ curl -X POST localhost:3000/api/sources/texas-sos-business/enable
# Trigger an on-demand scrape
$ curl -X POST localhost:3000/api/sources/texas-hcad/scrape
# Query collected data
$ curl "localhost:3000/api/data/texas-hcad?search=austin&limit=10"
# Export to JSON
$ curl "localhost:3000/api/data/texas-sos-business?format=json" > export.json
The same API that powers the web UI. Point your AI agent, your RAG pipeline, or your scripts at it. No authentication required on localhost.
Business filings, entity search, assumed names, UCC filings, notary records.
1,360 datasets on the Open Data Portal. Tax records, franchise tax, sales tax permits, state expenditures.
Property records across 254 counties. Assessed values, ownership, tax amounts, legal descriptions.
Campaign finance reports, lobbyist registrations, personal financial statements, political committee filings.
Oil and gas well data, pipeline permits, operator records, production reports.
Environmental permits, professional licenses, criminal records, vehicle registrations, and dozens more state agencies.
| Engine | FeedEater (TypeScript) |
| Messaging | NATS |
| Database | PostgreSQL 14+ |
| Containerization | Docker + Docker Compose |
| API | REST with OpenAPI docs at /api/docs |
| Web UI | Lightweight dashboard at localhost:3000 |
| Data Sources | 764 across state, county, and municipal levels |
| Collection Modes | Bulk, Crawl, Query |
| Output Formats | JSON, CSV, PostgreSQL |
| Rate Limiting | Configurable per module (default 1 req/sec) |
| Retry Logic | Exponential backoff with configurable retries |
| Resume | Crawl jobs resume from last checkpoint |
| Logging | Structured JSON logs, configurable verbosity |
| License | MIT — modify, resell, do whatever you want |
| Updates | 12 months included with purchase |
Is this legal?
Yes. All sources are public records under Texas law. This tool accesses the same data available on government websites.
Why 764 sources?
Texas has data spread across dozens of state agencies, 254 counties, and hundreds of municipalities. Many agencies have multiple portals. We map and module every accessible source.
What do I need to run it?
Docker. That's it. docker compose up brings up the entire stack — engine, database, message bus, API, and web UI.
Can AI agents use this?
Yes. The REST API is designed for programmatic access. Point your agent at localhost:3000/api — enable sources, trigger scrapes, query data. No authentication on localhost.
What happens after 12 months?
You keep everything forever. After 12 months, you stop receiving module updates when source sites change. Renew for another $13.37/year, or maintain the modules yourself — it's MIT licensed.
Do I need an API key?
No. Zero external API dependencies. All modules scrape public websites directly.
Can I modify the code?
Yes. MIT licensed. Modify it, add your own modules, integrate it into your pipeline, resell it — no restrictions.
How is this different from BatchData or PropStream?
They charge $50-$500/month for access to their servers and limit you to property data. This is $13.37 one-time for the entire Texas public data landscape — property, business, environmental, licensing, campaign finance, and more. Runs on your machine. You own everything.
Will there be other states?
Yes. The FeedEater module architecture scales to any state. Texas is first. Same engine, different modules.
764 Texas public data sources. One command.
Buy on Gumroad — $13.37