# Casino Affiliate Crawler

Headless browser crawler that scrapes casino affiliate ranking pages, stores extracted data in PostgreSQL, and provides a React backoffice dashboard for viewing results.

## Architecture

```
crawler/                          # Backend (Node.js / Express)
├── src/
│   ├── app.js                     # Express server entry point
│   ├── setup-db.js                # Database initialisation script
│   ├── db.js                      # PostgreSQL pool config
│   ├── middleware/auth.js         # JWT authentication middleware
│   ├── routes/
│   │   ├── auth.js               # Login, register, profile endpoints
│   │   └── crawler.js            # Crawl data & trigger endpoints
│   └── services/
│       ├── crawler.js             # Puppeteer crawl + DOM extraction
│       └── scheduler.js           # Periodic crawl job (every hour)
├── screenshots/                   # Full-page screenshots per crawl
└── package.json

casino-dashboard/                 # Frontend (React / Vite)
├── src/
│   ├── api.js                     # Axios client + auth helpers
│   ├── App.jsx                    # Router + AuthProvider wrapper
│   └── components/
│       ├── Login.jsx              # Sign-in form with JWT
│       ├── Dashboard.jsx          # Crawl history list + run button
│       ├── CrawlDetail.jsx        # Casino table, screenshot viewer
│       └── Sidebar.jsx            # Navigation shell
└── package.json
```

## Prerequisites

- **Node.js** 18+
- **Google Chrome** installed on the system
- **PostgreSQL** reachable at `192.168.21.197:5432` with user `postgres`

## Quick Start

### 1. Install dependencies

```bash
# Backend
cd crawler
npm install

# Frontend
cd casino-dashboard
npm install
```

### 2. Initialise the database

```bash
cd ../
node src/setup-db.js
```

This creates the `casino_crawler` database and tables (`crawls`, `casinos`, `users`). A default admin user is seeded:

| Username | Password |
|----------|----------|
| `admin`  | `admin123` |

### 3. Start both servers

```bash
# Terminal 1 – Backend
cd crawler
npm start

# Terminal 2 – Frontend
cd casino-dashboard
npm run dev
```

- **Backend API**: http://localhost:3001
- **Frontend Dashboard**: http://localhost:5173
- First crawl runs automatically ~5 s after backend starts, then every hour.

## How It Works

### Crawler (`src/services/crawler.js`)

Uses Puppeteer + `puppeteer-extra-plugin-stealth` to bypass CloudFront bot detection. Each run:

1. Navigates to the target affiliate ranking page
2. Waits for network idle + 5 s buffer for lazy-loaded content
3. Takes a full-page screenshot stored in `screenshots/`
4. Extracts casino name, position, bonus offer, and affiliate link via site-specific DOM strategies
5. Inserts records into PostgreSQL

Two targeted extractors are implemented:

| Site | Selector Strategy |
|------|------------------|
| **top10onlineslots.co.uk** | Finds divs containing "Get Bonus" text + logo `<img>`, pulls bonus from child spans |
| **ubet.co.uk** | Targets `.mainProduct.row-index-N` cards, reads `wss-vendorName-*` for name and `coupon-container` for the offer |

A generic fallback covers any future affiliate site.

### Scheduled Runs

Every hour the scheduler triggers crawls for all configured sites (see `src/services/scheduler.js`). A crawl can also be triggered manually via button in the dashboard or a POST to `/api/crawler/run-all`.

## Database Schema

### `crawls`

| Column | Type | Description |
|--------|------|-------------|
| id | SERIAL PK | Auto-increment |
| url | TEXT | Crawled page URL |
| site_name | VARCHAR(255) | Human-readable site label |
| crawled_at | TIMESTAMP | When the crawl ran |
| status | VARCHAR(50) | `completed` or `failed: ...` |
| screenshot_path | TEXT | Filename in `screenshots/` |

### `casinos`

| Column | Type | Description |
|--------|------|-------------|
| id | SERIAL PK | Auto-increment |
| crawl_id | INT FK → crawls.id | Which crawl this casino belongs to |
| position | INT | Rank on the page |
| casino_name | VARCHAR(255) | Casino brand name |
| url | TEXT | Affiliate redirect URL |
| bonus_offer | TEXT | Welcome bonus / free spins text |

### `users`

| Column | Type | Description |
|--------|------|-------------|
| id | SERIAL PK | Auto-increment |
| username | VARCHAR(100) UNIQUE | Login name |
| password_hash | VARCHAR(255) | bcrypt hash |
| role | VARCHAR(50) | Currently always `admin` |
| created_at | TIMESTAMP | Account creation time |

## API Endpoints

All authenticated endpoints require `Authorization: Bearer <token>` header.

### Auth

| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/auth/login` | Login, returns JWT + user object |
| POST | `/api/auth/register` | Create new admin user |
| GET | `/api/auth/me` | Current user profile |

### Crawler

| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/crawler/all` | All crawls with nested casino arrays |
| GET | `/api/crawler/:id` | Single crawl detail + screenshot path |
| POST | `/api/crawler/run-all` | Trigger immediate crawl of all sites |
| POST | `/api/crawler/run` | Crawl a single custom URL (body: `{url, siteName}`) |

### Health

| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/health` | DB connectivity check |

## Adding New Sites

1. Add the site config object to `src/services/scheduler.js` under `sites[]`.
2. Write a new extractor method in `src/services/crawler.js` and add a URL-based dispatch in `extractCasinoData()`.
3. Restart the backend.

## Screenshots

Full-page screenshots are saved as PNGs in `screenshots/` and served statically at `/screenshots/<filename>`. Each crawl writes one file named `<siteName>_<timestamp>.png`. The dashboard viewer loads them through the Vite proxy → Express static route.

## Production Build

```bash
cd casino-dashboard
npm run build   # outputs to dist/
```

The `dist/` folder can be served by any static server or reverse-proxied behind Nginx alongside the Express API on port 3001. Set `VITE_API_URL=https://yourdomain.com/api` as an environment variable so the frontend talks to the correct backend.