Simplify to whitelist/blacklist model

- Rewrite merge_blocklists.py to sync a single blacklist from upstream
  and subtract the locally-maintained whitelist
- Replace whitelist contents with subtitle + webm seed
- Remove blacklist_permissive, whitelist_with_subtitles, and all
  .prev files that are no longer needed
- Rewrite README to reflect the two-file model and link to wiki
This commit is contained in:
CodeX
2026-04-07 01:09:17 +02:00
parent 568d27bbf1
commit 14fd0cf511
8 changed files with 162 additions and 896 deletions
+106 -26
View File
@@ -1,40 +1,120 @@
# ARR Stack Blocklists
# arr/blocklists
Automatically synchronized blocklists for use with Cleanuparr in the ARR media stack.
Curated blacklist and whitelist for the ARR media stack. The blacklist is
synced automatically from upstream Cleanuparr and stripped of anything
listed in the locally-maintained whitelist, so consumers like qBittorrent
and Cleanuparr can point at a single raw URL per list and stay in sync.
## Files
See the wiki for full technical reference:
- [Sync](https://git.hisp.no/arr/blocklists/wiki/Sync)
-- three-way merge, whitelist exclusion, `.prev` snapshot, edge cases
- [Lists](https://git.hisp.no/arr/blocklists/wiki/Lists)
-- the two-file model, pattern semantics, maintaining the whitelist
- [Consumers](https://git.hisp.no/arr/blocklists/wiki/Consumers)
-- qBittorrent and Cleanuparr integration, raw URLs, recommended modes
- [CI and Workflow](https://git.hisp.no/arr/blocklists/wiki/CI-and-Workflow)
-- scheduled Gitea Actions job, manual dispatch, commit behaviour
| File | Description |
|------|-------------|
| `blacklist` | Standard blocklist — blocks all known malicious and unwanted file types |
| `blacklist_permissive` | Permissive blocklist — blocks genuinely malicious types with fewer false positives |
| `whitelist` | Whitelist — only files matching these patterns are allowed |
| `whitelist_with_subtitles` | Whitelist with subtitle file types included |
| `*.prev` | Internal sync reference files — do not edit manually |
## How it works
The repository contains two data files:
| File | Role | Source |
|---|---|---|
| `blacklist` | Extensions blocked by downloaders and file cleaners | Synced from upstream, with the whitelist subtracted |
| `whitelist` | Extensions that must never be blocked or deleted | Locally maintained |
On every scheduled run the sync script:
1. Fetches the current upstream blacklist from Cleanuparr.
2. Detects any manual additions made directly to `blacklist` (three-way
merge against `blacklist.prev`).
3. Subtracts every entry listed in `whitelist`.
4. Writes the result back to `blacklist` and updates `blacklist.prev`.
The whitelist is the single source of truth for "what I want kept." Adding
an extension to `whitelist` removes it from `blacklist` on the next sync
and prevents consumers from blocking or deleting it. See
[Sync](https://git.hisp.no/arr/blocklists/wiki/Sync) for the full algorithm.
## Prerequisites
- A consumer that reads a remote text file of glob patterns (qBittorrent
excluded file names, Cleanuparr blacklist/whitelist sync, etc.)
- Network access from that consumer to `git.hisp.no`
## File structure
| Path | Purpose |
|---|---|
| `blacklist` | Merged output: upstream blacklist minus the whitelist. Consumer-facing |
| `blacklist.prev` | Snapshot of the last upstream fetch. Baseline for the three-way merge. Do not edit |
| `whitelist` | Locally-maintained allow list. Edit directly to add or remove entries |
| `scripts/merge_blocklists.py` | Sync script executed by the scheduled workflow |
| `.gitea/workflows/sync.yml` | Scheduled Gitea Actions workflow |
## Usage
Point Cleanuparr's Malware Blocker and Blacklist Sync at the raw URL of your chosen file:
Point your consumer at the raw URL of the file it should use.
### qBittorrent
qBittorrent has no whitelist feature, so it consumes the blacklist directly.
Set the excluded file names list (Options -> Downloads -> Excluded file
names) to:
```
https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist_permissive
https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist
```
## Sync
Because the whitelist is already subtracted from this file, any extension
you add to `whitelist` stops being blocked by qBittorrent on the next sync.
Files are automatically synchronized from the upstream [Cleanuparr](https://github.com/Cleanuparr/Cleanuparr) repository every 6 hours via Gitea Actions.
### Cleanuparr
The sync uses a three-way merge strategy:
- Upstream additions are automatically included
- Upstream removals are automatically removed
- Your custom additions are preserved across every sync
Cleanuparr supports both blacklist and whitelist modes. Use whichever
matches your setup:
## Custom Entries
- **Blacklist mode** -- point at the same `blacklist` raw URL as qBittorrent.
- **Whitelist mode** -- point at the `whitelist` raw URL:
To add your own entries, edit the relevant file directly in Gitea. Your additions will be detected as custom entries and preserved on every subsequent sync.
## Upstream Source
Blocklists are sourced from:
```
https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/
```
https://git.hisp.no/arr/blocklists/raw/branch/main/whitelist
```
See [Consumers](https://git.hisp.no/arr/blocklists/wiki/Consumers) for
recommended mode per feature.
## Maintaining the whitelist
Edit `whitelist` directly in Gitea or via a local clone. One glob pattern
per line, sorted, no blank lines. Patterns are matched against the blacklist
with exact-string set subtraction:
- `*.srt` in `whitelist` removes `*.srt` from `blacklist`.
- `*sample.srt` in `blacklist` is not affected by `*.srt` in `whitelist`.
Sample-file patterns are preserved because exact-string subtraction only
removes identical entries.
See [Lists](https://git.hisp.no/arr/blocklists/wiki/Lists) for the full
pattern rules and examples.
## Sync schedule
The Gitea Actions workflow runs every 7 days at 04:00 UTC and on manual
dispatch. Each run:
1. Executes `scripts/merge_blocklists.py`.
2. Commits `blacklist` and `blacklist.prev` if either changed.
3. Pushes the commit to `main`.
See [CI and Workflow](https://git.hisp.no/arr/blocklists/wiki/CI-and-Workflow)
for workflow details and manual dispatch instructions.
## Upstream source
The blacklist is sourced from:
```
https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist
```