How to Secure Public APIs from Data Scraping on Your Own Website

 

πŸ›‘️ How to Secure Public APIs from Data Scraping on Your Own Website

In today’s data-driven web, many developers expose APIs to power client-side visualizations, dashboards, or maps. But even if your APIs are public for your site, it doesn’t mean you want anyone to extract all your data.

Let’s explore practical ways to protect your public APIs from scraping, abuse, and misuse, while keeping them functional for legitimate users.


🚨 The Problem: "Public, But Not Open"

Say you have API endpoints like:

http

GET /api/themes
GET /api/files/:theme GET /api/attributes/:theme/:fileName GET /api/values/:theme/:fileName/:attribute

These power dropdowns or filters in your frontend. They’re read-only, sure—but a scraper could still write a simple script to extract all your data.

So what can you do?


πŸ” 1. Restrict Origins Using CORS

Only allow your own frontend to make API calls:

js

const cors = require('cors');
app.use(cors({ origin: 'https://yourdomain.com', // only allow your frontend methods: ['GET'] }));

πŸ’‘ This won't stop curl or Node scripts, but it blocks abuse from other browser-based frontends.


πŸ”‘ 2. Require a Lightweight API Key

Even for public endpoints, you can enforce token-based access:

js

app.use('/api', (req, res, next) => {
const token = req.headers['x-api-key']; if (token !== process.env.PUBLIC_READ_TOKEN) { return res.status(403).json({ error: 'Unauthorized' }); } next(); });

🧠 Use .env to store the key, and keep it minimal. This slows down bots and adds friction.


🧱 3. Rate Limit to Throttle Scrapers

Add request limits per IP:

js

const rateLimit = require('express-rate-limit');
const apiLimiter = rateLimit({ windowMs: 1 * 60 * 1000, // 1 minute max: 10, // 10 requests per minute }); app.use('/api/', apiLimiter);

⏱️ Legit users won’t notice, but mass scraping scripts will hit a wall.


πŸ•΅️ 4. Use a Backend-Only Proxy to Hide Real Endpoints

Instead of letting your frontend call:

http

GET /api/values/:theme/:fileName/:attribute

...route all requests through your own backend:

js

app.get('/public-data', async (req, res) => {
// Internally call your private API const result = await getValuesSafely(req.query); res.json(result); });

Now scrapers can’t reverse-engineer your DB or API layout.


πŸ“‰ 5. Whitelist & Validate Inputs

Prevent enumeration attacks by validating inputs:

js

const allowedThemes = ['roads', 'buildings'];
if (!allowedThemes.includes(theme)) return res.status(400);

Don’t let users guess table names or column values freely.


πŸ“ˆ 6. Monitor Requests and Detect Abuse

Log access patterns to detect scraping behavior:

js

app.use((req, res, next) => {
const ip = req.headers['x-forwarded-for'] || req.connection.remoteAddress; console.log(`[API ACCESS] ${ip} accessed ${req.originalUrl}`); next(); });

Then:

  • Block suspicious IPs

  • Alert on high-frequency usage

  • Track most-accessed endpoints


🧠 7. Understand: Nothing Is 100% Secure

Even with all protections, a determined scraper using headless browsers (like Puppeteer or Selenium) can simulate real usage.

But your job is to make scraping:

  • πŸ” Slower

  • 🧩 Harder

  • 🎯 Easier to detect

That alone discourages most casual abuse.


✅ Final Thoughts: Layered Security Wins

TechniquePurpose
CORS + API keysBasic access control
Rate limitingThrottle abusers
Input validationPrevent DB exposure
Backend-only proxiesHide internal APIs
Monitoring & alertsDetect scraping attempts

Comments