Merge pull request 'Simplify to whitelist/blacklist model' (#1) from feat/whitelist-strip-blacklist into main

Reviewed-on: #1
This commit was merged in pull request #1.
This commit is contained in:
2026-04-07 01:12:57 +02:00
8 changed files with 162 additions and 896 deletions
+106 -26
View File
@@ -1,40 +1,120 @@
# ARR Stack Blocklists # arr/blocklists
Automatically synchronized blocklists for use with Cleanuparr in the ARR media stack. Curated blacklist and whitelist for the ARR media stack. The blacklist is
synced automatically from upstream Cleanuparr and stripped of anything
listed in the locally-maintained whitelist, so consumers like qBittorrent
and Cleanuparr can point at a single raw URL per list and stay in sync.
## Files See the wiki for full technical reference:
- [Sync](https://git.hisp.no/arr/blocklists/wiki/Sync)
-- three-way merge, whitelist exclusion, `.prev` snapshot, edge cases
- [Lists](https://git.hisp.no/arr/blocklists/wiki/Lists)
-- the two-file model, pattern semantics, maintaining the whitelist
- [Consumers](https://git.hisp.no/arr/blocklists/wiki/Consumers)
-- qBittorrent and Cleanuparr integration, raw URLs, recommended modes
- [CI and Workflow](https://git.hisp.no/arr/blocklists/wiki/CI-and-Workflow)
-- scheduled Gitea Actions job, manual dispatch, commit behaviour
| File | Description | ## How it works
|------|-------------|
| `blacklist` | Standard blocklist — blocks all known malicious and unwanted file types | The repository contains two data files:
| `blacklist_permissive` | Permissive blocklist — blocks genuinely malicious types with fewer false positives |
| `whitelist` | Whitelist — only files matching these patterns are allowed | | File | Role | Source |
| `whitelist_with_subtitles` | Whitelist with subtitle file types included | |---|---|---|
| `*.prev` | Internal sync reference files — do not edit manually | | `blacklist` | Extensions blocked by downloaders and file cleaners | Synced from upstream, with the whitelist subtracted |
| `whitelist` | Extensions that must never be blocked or deleted | Locally maintained |
On every scheduled run the sync script:
1. Fetches the current upstream blacklist from Cleanuparr.
2. Detects any manual additions made directly to `blacklist` (three-way
merge against `blacklist.prev`).
3. Subtracts every entry listed in `whitelist`.
4. Writes the result back to `blacklist` and updates `blacklist.prev`.
The whitelist is the single source of truth for "what I want kept." Adding
an extension to `whitelist` removes it from `blacklist` on the next sync
and prevents consumers from blocking or deleting it. See
[Sync](https://git.hisp.no/arr/blocklists/wiki/Sync) for the full algorithm.
## Prerequisites
- A consumer that reads a remote text file of glob patterns (qBittorrent
excluded file names, Cleanuparr blacklist/whitelist sync, etc.)
- Network access from that consumer to `git.hisp.no`
## File structure
| Path | Purpose |
|---|---|
| `blacklist` | Merged output: upstream blacklist minus the whitelist. Consumer-facing |
| `blacklist.prev` | Snapshot of the last upstream fetch. Baseline for the three-way merge. Do not edit |
| `whitelist` | Locally-maintained allow list. Edit directly to add or remove entries |
| `scripts/merge_blocklists.py` | Sync script executed by the scheduled workflow |
| `.gitea/workflows/sync.yml` | Scheduled Gitea Actions workflow |
## Usage ## Usage
Point Cleanuparr's Malware Blocker and Blacklist Sync at the raw URL of your chosen file: Point your consumer at the raw URL of the file it should use.
### qBittorrent
qBittorrent has no whitelist feature, so it consumes the blacklist directly.
Set the excluded file names list (Options -> Downloads -> Excluded file
names) to:
``` ```
https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist_permissive https://git.hisp.no/arr/blocklists/raw/branch/main/blacklist
``` ```
## Sync Because the whitelist is already subtracted from this file, any extension
you add to `whitelist` stops being blocked by qBittorrent on the next sync.
Files are automatically synchronized from the upstream [Cleanuparr](https://github.com/Cleanuparr/Cleanuparr) repository every 6 hours via Gitea Actions. ### Cleanuparr
The sync uses a three-way merge strategy: Cleanuparr supports both blacklist and whitelist modes. Use whichever
- Upstream additions are automatically included matches your setup:
- Upstream removals are automatically removed
- Your custom additions are preserved across every sync
## Custom Entries - **Blacklist mode** -- point at the same `blacklist` raw URL as qBittorrent.
- **Whitelist mode** -- point at the `whitelist` raw URL:
To add your own entries, edit the relevant file directly in Gitea. Your additions will be detected as custom entries and preserved on every subsequent sync.
## Upstream Source
Blocklists are sourced from:
``` ```
https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/ https://git.hisp.no/arr/blocklists/raw/branch/main/whitelist
``` ```
See [Consumers](https://git.hisp.no/arr/blocklists/wiki/Consumers) for
recommended mode per feature.
## Maintaining the whitelist
Edit `whitelist` directly in Gitea or via a local clone. One glob pattern
per line, sorted, no blank lines. Patterns are matched against the blacklist
with exact-string set subtraction:
- `*.srt` in `whitelist` removes `*.srt` from `blacklist`.
- `*sample.srt` in `blacklist` is not affected by `*.srt` in `whitelist`.
Sample-file patterns are preserved because exact-string subtraction only
removes identical entries.
See [Lists](https://git.hisp.no/arr/blocklists/wiki/Lists) for the full
pattern rules and examples.
## Sync schedule
The Gitea Actions workflow runs every 7 days at 04:00 UTC and on manual
dispatch. Each run:
1. Executes `scripts/merge_blocklists.py`.
2. Commits `blacklist` and `blacklist.prev` if either changed.
3. Pushes the commit to `main`.
See [CI and Workflow](https://git.hisp.no/arr/blocklists/wiki/CI-and-Workflow)
for workflow details and manual dispatch instructions.
## Upstream source
The blacklist is sourced from:
```
https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist
```
-410
View File
@@ -1,410 +0,0 @@
*.000
*.001
*.002
*.004
*.7z
*.7z.001
*.7z.002
*.a00
*.a01
*.a02
*.ace
*.ain
*.alz
*.ana
*.apex
*.apk
*.apz
*.ar
*.arc
*.archiver
*.arduboy
*.arh
*.ari
*.arj
*.ark
*.asice
*.ayt
*.b1
*.b64
*.b6z
*.ba
*.bat
*.bdoc
*.bh
*.bin
*.bmp
*.bndl
*.boo
*.bundle
*.bz
*.bz2
*.bza
*.bzabw
*.bzip
*.bzip2
*.c00
*.c01
*.c02
*.c10
*.car
*.cb7
*.cba
*.cbr
*.cbt
*.cbz
*.cdz
*.cit
*.cmd
*.com
*.comppkg.hauptwerk.rar
*.comppkg_hauptwerk_rar
*.conda
*.cp9
*.cpgz
*.cpt
*.ctx
*.ctz
*.cxarchive
*.czip
*.daf
*.dar
*.db
*.dd
*.deb
*.dgc
*.dist
*.diz
*.dl_
*.dll
*.dmg
*.dz
*.ecar
*.ecs
*.ecsbx
*.edz
*.efw
*.egg
*.epi
*.etc
*.exe
*.f
*.f3z
*.fcx
*.fp8
*.fzpz
*.gar
*.gca
*.gif
*.gmz
*.gz
*.gz2
*.gza
*.gzi
*.gzip
*.ha
*.hbc
*.hbc2
*.hbe
*.hki
*.hki1
*.hki2
*.hki3
*.hpk
*.hpkg
*.htm
*.htmi
*.html
*.hyp
*.iadproj
*.ice
*.ico
*.ini
*.ipg
*.ipk
*.ish
*.iso
*.isx
*.ita
*.ize
*.j
*.jar
*.jar.pack
*.jex
*.jgz
*.jhh
*.jic
*.jpg
*.js
*.jsonlz4
*.kextraction
*.kgb
*.ksp
*.kwgt
*.kz
*.layout
*.lbr
*.lemon
*.lha
*.lhzd
*.libzip
*.link
*.lnk
*.lpkg
*.lqr
*.lz
*.lz4
*.lzh
*.lzm
*.lzma
*.lzo
*.lzr
*.lzx
*.mar
*.mbz
*.md
*.memo
*.mint
*.mlproj
*.mou
*.movpkg
*.mozlz4
*.mpkg
*.msi
*.mxc
*.mzp
*.nar
*.nex
*.nfo
*.npk
*.nz
*.oar
*.odlgz
*.opk
*.osf
*.oz
*.p01
*.p19
*.p7z
*.pa
*.pack.gz
*.package
*.pae
*.pak
*.paq6
*.paq7
*.paq8
*.paq8f
*.paq8l
*.paq8p
*.par
*.par2
*.pax
*.pbi
*.pcv
*.pea
*.perl
*.pet
*.pf
*.php
*.pim
*.pima
*.pit
*.piz
*.pkg
*.pkg.tar.xz
*.pkg.tar.zst
*.pkz
*.pl
*.png
*.prs
*.ps1
*.psc1
*.psd1
*.psm1
*.psz
*.pup
*.puz
*.pvmp
*.pvmz
*.pwa
*.pxl
*.py
*.pyd
*.q
*.qda
*.r0
*.r00
*.r01
*.r02
*.r03
*.r04
*.r1
*.r2
*.r21
*.r30
*.rar
*.rb
*.readme
*.reg
*.rev
*.rk
*.rnc
*.rp9
*.rpm
*.rss
*.run
*.rz
*.s00
*.s01
*.s02
*.s09
*.s7z
*.sar
*.sbx
*.scr
*.sdc
*.sdn
*.sdoc
*.sdocx
*.sea
*.sen
*.sfg
*.sfm
*.sfs
*.sfx
*.sh
*.shar
*.shk
*.shr
*.sifz
*.sipa
*.sit
*.sitx
*.smpf
*.snagitstamps
*.snappy
*.snb
*.snz
*.spa
*.spd
*.spl
*.spm
*.spt
*.sqf
*.sql
*.sqx
*.sqz
*.srep
*.stg
*.stkdoodlz
*.stproj
*.sy_
*.tar.bz2
*.tar.gz
*.tar.gz2
*.tar.lz
*.tar.lzma
*.tar.xz
*.tar.z
*.tar.zip
*.taz
*.tbz
*.tbz2
*.tcx
*.text
*.tg
*.tgs
*.tgz
*.thumb
*.tlz
*.tlzma
*.torrent
*.tpsr
*.trs
*.tx_
*.txt
*.txz
*.tz
*.tzst
*.ubz
*.uc2
*.ufdr
*.ufs.uzip
*.uha
*.url
*.uue
*.uvm
*.uzed
*.uzip
*.vbs
*.vem
*.vfs
*.vib
*.vip
*.vmcz
*.vms
*.voca
*.vpk
*.vrpackage
*.vsi
*.vwi
*.wa
*.wacz
*.waff
*.war
*.wastickers
*.wdz
*.whl
*.wick
*.wlb
*.wot
*.wsf
*.wux
*.xapk
*.xar
*.xcf.bz2
*.xcf.gz
*.xcf.xz
*.xcfbz2
*.xcfgz
*.xcfxz
*.xez
*.xfp
*.xip
*.xmcdz
*.xml
*.xoj
*.xopp
*.xx
*.xz
*.xzm
*.y
*.yc
*.yz1
*.z
*.z00
*.z01
*.z02
*.z03
*.z04
*.zabw
*.zap
*.zed
*.zfsendtotarget
*.zhelp
*.zi
*.zi_
*.zim
*.zip
*.zipx
*.zix
*.zl
*.zoo
*.zpaq
*.zpi
*.zsplit
*.zst
*.zw
*.zwi
*.zz
-410
View File
@@ -1,410 +0,0 @@
*.000
*.001
*.002
*.004
*.7z
*.7z.001
*.7z.002
*.a00
*.a01
*.a02
*.ace
*.ain
*.alz
*.ana
*.apex
*.apk
*.apz
*.ar
*.arc
*.archiver
*.arduboy
*.arh
*.ari
*.arj
*.ark
*.asice
*.ayt
*.b1
*.b64
*.b6z
*.ba
*.bat
*.bdoc
*.bh
*.bin
*.bmp
*.bndl
*.boo
*.bundle
*.bz
*.bz2
*.bza
*.bzabw
*.bzip
*.bzip2
*.c00
*.c01
*.c02
*.c10
*.car
*.cb7
*.cba
*.cbr
*.cbt
*.cbz
*.cdz
*.cit
*.cmd
*.com
*.comppkg.hauptwerk.rar
*.comppkg_hauptwerk_rar
*.conda
*.cp9
*.cpgz
*.cpt
*.ctx
*.ctz
*.cxarchive
*.czip
*.daf
*.dar
*.db
*.dd
*.deb
*.dgc
*.dist
*.diz
*.dl_
*.dll
*.dmg
*.dz
*.ecar
*.ecs
*.ecsbx
*.edz
*.efw
*.egg
*.epi
*.etc
*.exe
*.f
*.f3z
*.fcx
*.fp8
*.fzpz
*.gar
*.gca
*.gif
*.gmz
*.gz
*.gz2
*.gza
*.gzi
*.gzip
*.ha
*.hbc
*.hbc2
*.hbe
*.hki
*.hki1
*.hki2
*.hki3
*.hpk
*.hpkg
*.htm
*.htmi
*.html
*.hyp
*.iadproj
*.ice
*.ico
*.ini
*.ipg
*.ipk
*.ish
*.iso
*.isx
*.ita
*.ize
*.j
*.jar
*.jar.pack
*.jex
*.jgz
*.jhh
*.jic
*.jpg
*.js
*.jsonlz4
*.kextraction
*.kgb
*.ksp
*.kwgt
*.kz
*.layout
*.lbr
*.lemon
*.lha
*.lhzd
*.libzip
*.link
*.lnk
*.lpkg
*.lqr
*.lz
*.lz4
*.lzh
*.lzm
*.lzma
*.lzo
*.lzr
*.lzx
*.mar
*.mbz
*.md
*.memo
*.mint
*.mlproj
*.mou
*.movpkg
*.mozlz4
*.mpkg
*.msi
*.mxc
*.mzp
*.nar
*.nex
*.nfo
*.npk
*.nz
*.oar
*.odlgz
*.opk
*.osf
*.oz
*.p01
*.p19
*.p7z
*.pa
*.pack.gz
*.package
*.pae
*.pak
*.paq6
*.paq7
*.paq8
*.paq8f
*.paq8l
*.paq8p
*.par
*.par2
*.pax
*.pbi
*.pcv
*.pea
*.perl
*.pet
*.pf
*.php
*.pim
*.pima
*.pit
*.piz
*.pkg
*.pkg.tar.xz
*.pkg.tar.zst
*.pkz
*.pl
*.png
*.prs
*.ps1
*.psc1
*.psd1
*.psm1
*.psz
*.pup
*.puz
*.pvmp
*.pvmz
*.pwa
*.pxl
*.py
*.pyd
*.q
*.qda
*.r0
*.r00
*.r01
*.r02
*.r03
*.r04
*.r1
*.r2
*.r21
*.r30
*.rar
*.rb
*.readme
*.reg
*.rev
*.rk
*.rnc
*.rp9
*.rpm
*.rss
*.run
*.rz
*.s00
*.s01
*.s02
*.s09
*.s7z
*.sar
*.sbx
*.scr
*.sdc
*.sdn
*.sdoc
*.sdocx
*.sea
*.sen
*.sfg
*.sfm
*.sfs
*.sfx
*.sh
*.shar
*.shk
*.shr
*.sifz
*.sipa
*.sit
*.sitx
*.smpf
*.snagitstamps
*.snappy
*.snb
*.snz
*.spa
*.spd
*.spl
*.spm
*.spt
*.sqf
*.sql
*.sqx
*.sqz
*.srep
*.stg
*.stkdoodlz
*.stproj
*.sy_
*.tar.bz2
*.tar.gz
*.tar.gz2
*.tar.lz
*.tar.lzma
*.tar.xz
*.tar.z
*.tar.zip
*.taz
*.tbz
*.tbz2
*.tcx
*.text
*.tg
*.tgs
*.tgz
*.thumb
*.tlz
*.tlzma
*.torrent
*.tpsr
*.trs
*.tx_
*.txt
*.txz
*.tz
*.tzst
*.ubz
*.uc2
*.ufdr
*.ufs.uzip
*.uha
*.url
*.uue
*.uvm
*.uzed
*.uzip
*.vbs
*.vem
*.vfs
*.vib
*.vip
*.vmcz
*.vms
*.voca
*.vpk
*.vrpackage
*.vsi
*.vwi
*.wa
*.wacz
*.waff
*.war
*.wastickers
*.wdz
*.whl
*.wick
*.wlb
*.wot
*.wsf
*.wux
*.xapk
*.xar
*.xcf.bz2
*.xcf.gz
*.xcf.xz
*.xcfbz2
*.xcfgz
*.xcfxz
*.xez
*.xfp
*.xip
*.xmcdz
*.xml
*.xoj
*.xopp
*.xx
*.xz
*.xzm
*.y
*.yc
*.yz1
*.z
*.z00
*.z01
*.z02
*.z03
*.z04
*.zabw
*.zap
*.zed
*.zfsendtotarget
*.zhelp
*.zi
*.zi_
*.zim
*.zip
*.zipx
*.zix
*.zl
*.zoo
*.zpaq
*.zpi
*.zsplit
*.zst
*.zw
*.zwi
*.zz
+51 -33
View File
@@ -1,49 +1,67 @@
"""Sync the blacklist from upstream Cleanuparr, preserving manual local
additions and stripping entries listed in the locally-maintained whitelist.
See the wiki (Sync) for the full algorithm and rationale.
"""
import urllib.request import urllib.request
import os
files = { UPSTREAM_URL = "https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist"
"blacklist": "https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist", BLACKLIST = "blacklist"
"blacklist_permissive": "https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/blacklist_permissive", BLACKLIST_PREV = "blacklist.prev"
"whitelist": "https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/whitelist", WHITELIST = "whitelist"
"whitelist_with_subtitles": "https://raw.githubusercontent.com/Cleanuparr/Cleanuparr/main/whitelist_with_subtitles",
}
def merge_blocklist(filename, url):
prev_file = f"{filename}.prev"
# Fetch new upstream def read_lines(path):
with urllib.request.urlopen(url) as r: """Read a file into a set of non-empty stripped lines. Empty set if missing."""
upstream_new = set(line.strip() for line in r.read().decode().splitlines() if line.strip())
# Read previous upstream (empty set if first run)
try: try:
with open(prev_file) as f: with open(path) as f:
upstream_prev = set(line.strip() for line in f if line.strip()) return set(line.strip() for line in f if line.strip())
except FileNotFoundError: except FileNotFoundError:
return set()
def main():
# Fetch the current upstream blacklist
with urllib.request.urlopen(UPSTREAM_URL) as r:
upstream_new = set(
line.strip() for line in r.read().decode().splitlines() if line.strip()
)
# Previous upstream snapshot: baseline for detecting local additions.
# On first run (no snapshot on disk), use the current upstream as the
# baseline so nothing is treated as a local addition.
upstream_prev = read_lines(BLACKLIST_PREV)
if not upstream_prev:
upstream_prev = upstream_new.copy() upstream_prev = upstream_new.copy()
# Read current local file # Current committed blacklist (may contain manual local additions)
try: local = read_lines(BLACKLIST)
with open(filename) as f:
local = set(line.strip() for line in f if line.strip())
except FileNotFoundError:
local = set()
# Three-way merge # Locally-maintained whitelist (exclusion source)
whitelist = read_lines(WHITELIST)
# Three-way merge: anything in local but not in the previous upstream
# snapshot is a manual local addition that must be preserved.
custom = local - upstream_prev custom = local - upstream_prev
result = upstream_new | custom merged = upstream_new | custom
print(f"[{filename}] Custom preserved: {sorted(custom)}") # Strip whitelist entries from the merged result.
print(f"[{filename}] Upstream added: {sorted(upstream_new - upstream_prev)}") result = merged - whitelist
print(f"[{filename}] Upstream removed: {sorted(upstream_prev - upstream_new)}")
# Write merged result sorted # Reporting for the workflow log
with open(filename, "w") as f: print(f"[{BLACKLIST}] Upstream added: {sorted(upstream_new - upstream_prev)}")
print(f"[{BLACKLIST}] Upstream removed: {sorted(upstream_prev - upstream_new)}")
print(f"[{BLACKLIST}] Custom preserved: {sorted(custom)}")
print(f"[{BLACKLIST}] Whitelist stripped: {sorted(merged & whitelist)}")
# Write the merged blacklist, sorted for deterministic diffs
with open(BLACKLIST, "w") as f:
f.write("\n".join(sorted(result)) + "\n") f.write("\n".join(sorted(result)) + "\n")
# Store new upstream as prev for next run # Store the new upstream snapshot for the next run
with open(prev_file, "w") as f: with open(BLACKLIST_PREV, "w") as f:
f.write("\n".join(sorted(upstream_new)) + "\n") f.write("\n".join(sorted(upstream_new)) + "\n")
for filename, url in files.items():
merge_blocklist(filename, url) if __name__ == "__main__":
main()
+5
View File
@@ -1,3 +1,8 @@
*.ass
*.avi *.avi
*.mkv *.mkv
*.mp4 *.mp4
*.srt
*.ssa
*.sub
*.webm
-3
View File
@@ -1,3 +0,0 @@
*.avi
*.mkv
*.mp4
-7
View File
@@ -1,7 +0,0 @@
*.ass
*.avi
*.mkv
*.mp4
*.srt
*.ssa
*.sub
-7
View File
@@ -1,7 +0,0 @@
*.ass
*.avi
*.mkv
*.mp4
*.srt
*.ssa
*.sub