GitHub - tgys/cryobankscraper: scrapes donors from fairfax cryobank and displays them in a gallery · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
tgys
cryobankscraper
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
master
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>2 Commits<br>2 Commits
cmake
cmake
include
include
src
src
CMakeLists.txt
CMakeLists.txt
README.md
README.md
flake.lock
flake.lock
flake.nix
flake.nix
View all files
Repository files navigation
cryobankscraper
C++ scraper for Fairfax Cryobank donor profile pages. It fetches donor URLs, extracts photos from each profile, and writes a JSON gallery you can browse in the Qt app.
Build
Requires Qt 6 (Core, Gui, Network, Widgets, Concurrent) and CMake 3.20+.
cmake -S . -B build<br>cmake --build build
Binaries:
build/scrapeff — default CLI scraper (Chromium-style HTTP, parallel workers)
build/scrape_requests — same scraper with Python requests-compatible HTTP (single-threaded)
build/gallery_qt — desktop gallery; runs a scrape on launch if gallery_data.json is missing
How it scrapes
Build a URL list in the working directory:
Reads base_urls.txt (default: https://fairfaxcryobank.com/search/donorprofile.aspx?number=).
For Fairfax profiles, also harvests links from the “meet our newest donors” listing page.
Generates candidate URLs for donor IDs 0 … 9999 (override with SCRAPEFF_FAIRFAX_MAX_ID), trying both padded (0428) and unpadded (428) number= values.
Merges any extra URLs from target_urls.txt.
Writes the final list to target_urls.txt.
Fetch each profile page over HTTPS (retries, optional pacing via SCRAPEFF_HTTP_SPACING_MS).
Parse HTML with Gumbo: find images inside div.main → div.foto (or #main / #foto).
Download each image and verify it decodes as a real image. One gallery tile per donor profile ID is kept.
Write outputs to the working directory:
gallery_data.json — main gallery data (page, src, did, profile_number)
gallery.html — static HTML gallery
image_urls.txt, fetch_failed.txt, profile_urls_before_image_extract.txt
childhood_photo_did_counts.json — counts per ChildhoodPhoto.ashx?did= id
Donors with a real photo use ChildhoodPhoto.ashx?did=…. Donors without one show a shared placeholder (search-temp-01.jpg).
Gallery
gallery_qt loads gallery_data.json (or scrapes first). By default it shows only donors with profile pics (real ChildhoodPhoto URLs), sorted by donor id low → high. Use Filter → All donors to include placeholders.
Useful environment variables
Variable<br>Purpose
SCRAPEFF_THREADS<br>Parallel workers for scrapeff (default 4)
SCRAPEFF_FAIRFAX_MAX_ID<br>Upper bound for brute-force donor IDs (default 10000)
SCRAPEFF_FAIRFAX_SKIP_BRUTE_IDS=1<br>Only scrape URLs found on the listing page
SCRAPEFF_TARGET_URLS_ONLY=1<br>Use target_urls.txt only, skip generation
SCRAPEFF_HTTP_PROFILE=urllib<br>Python-requests-style client (or use scrape_requests)
Run the CLI from the directory where you want output files written (target_urls.txt, gallery_data.json, etc.).
About
scrapes donors from fairfax cryobank and displays them in a gallery
Resources
Readme
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
stars
Watchers
watching
Forks
forks
Report repository
Releases
No releases published
Packages
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
C++<br>95.9%
Nix<br>2.1%
CMake<br>2.0%
You can’t perform that action at this time.