Convert blog (blogspot.com, wordpress.com...) or any website to epub using GUI, CLI or Python.
Find a file
2024-12-17 22:23:55 +01:00
.github/workflows typo 2024-12-17 20:52:01 +01:00
assets Documentation 2024-11-14 23:59:39 +01:00
blog2epub Refactor 2024-12-17 21:27:33 +01:00
packages/aur repo cleanup 2024-07-20 11:55:47 +02:00
tests Split integration tests to separate files and add test for salam pax's blog 2024-11-18 21:28:23 +01:00
.gitignore Fixing images attachment and date parsing on russian blogs 2024-11-13 17:06:45 +01:00
.pylintrc Pylint upd 2024-02-29 15:48:17 +01:00
android.txt OSX buildfixes 2023-02-25 11:38:28 +01:00
BACKLOG.md 1.5.0_RC1 2024-11-19 08:54:16 +01:00
blog2epub.appdata.xml repo cleanup 2024-07-20 11:55:47 +02:00
blog2epub.desktop OSX buildfixes 2023-02-25 11:38:28 +01:00
blog2epub_build_macos.py build scripts refactor 2024-12-17 20:25:24 +01:00
blog2epub_build_windows.py build scripts refactor 2024-12-17 20:25:24 +01:00
blog2epub_macos.spec build scripts refactor 2024-12-17 20:25:24 +01:00
blog2epub_windows.spec build scripts refactor 2024-12-17 20:25:24 +01:00
blog2epub_windows_installer.nsi 1.5.0_RC2 2024-12-16 13:26:38 +01:00
build_linux.sh Fixing windows build 2024-07-19 17:21:39 +02:00
buildozer.spec 1.5.0_RC1 - fixes for buildozer 2024-11-21 08:53:46 +01:00
CHANGELOG.md 1.5.0_RC1 2024-11-19 08:54:16 +01:00
com.bohdanbobrowski.blog2epub.json OSX buildfixes 2023-02-25 11:38:28 +01:00
index.html Github Pages 2023-10-30 00:12:11 +01:00
LICENSE OSX buildfixes 2023-02-25 11:38:28 +01:00
main.py Fight with buildozer errors, but at least compiles 2024-09-04 20:09:42 +02:00
make_macos_dmg.sh Fixing macos builds 2024-07-24 22:45:40 +02:00
Makefile build scripts refactor 2024-12-17 20:50:13 +01:00
pyproject.toml Documentation update 2024-12-17 22:01:09 +01:00
README.md v1.5.0_RC2 2024-12-17 22:22:33 +01:00

blog2epub

Maintenance MIT license GitHub all releases GitHub release (with filter) GitHub Release Date - Published_At

Convert blog to epub using command line or GUI.

My main goal in creating this app is to preserve the legacy of the blogosphere for future generations.

Supported blogs:

  • *.blogspot.com
  • *.wordpress.com
  • multiple other blogs and even some webpages

Main features

  • command line (CLI) and graphic user interface (GUI)
  • script downloads all text contents of selected blog to epub file,
  • if it's possible, it includes post comments,
  • images are downsized (to maximum 800/600px) and converted to grayscale,
  • one post = one epub chapter,
  • chapters are sorted by date ascending,
  • cover is generated automatically from downloaded images.

Example covers

Installation

Checkout for latest available builds.

Running from sources

Easiest way

pip install git+https://github.com/bohdanbobrowski/blog2epub.git

Developer environment

git clone git@github.com:bohdanbobrowski/blog2epub.git
cd blog2epub
python -m venv venv
Windows:
venv\Scripts\activate
pip install -e .[dev]

macOS/Linux:

source ./venv/bin/activate
pip install -e .[dev]

Building own executable

Windows

python blog2epub_build_windows.py

Finally, you can run NSIS to build Windows installer:

"C:\Program Files (x86)\NSIS\makensis" blog2epub_windows_installer.nsi

macOS

python blog2epub_build_macos.py

And then to create dmg image with app:

./make_macos_dmg.sh

Android

Before you start, you'll need to install buildozer following this installation documentation.

buildozer -v android

Screenshots of GUI

Android (Google Pixel 6a)

Windows (11)

Linux (Manjaro Gnome)

macOS (Sonoma 14.4.1)

CLI

blog2epub --help
usage: Blog2epub Cli interface [-h] [-l LIMIT] [-s SKIP] [-q QUALITY] [-o OUTPUT] [-d] url

Convert blog (blogspot.com, wordpress.com or another based on Wordpress) to epub using CLI or GUI.

positional arguments:
  url                   url of blog to download

options:
  -h, --help            show this help message and exit
  -l LIMIT, --limit LIMIT
                        articles limit
  -s SKIP, --skip SKIP  number of skipped articles
  -q QUALITY, --quality QUALITY
                        images quality (0-100)
  -o OUTPUT, --output OUTPUT
                        output epub file name
  -d, --debug           turn on debug

Example:

blog2epub starybezpiek.blogspot.com -l=2 -o=example.epub
Starting blogger.com crawler
Found 54 articles to crawl.
Downloading.
1. 10 lat kremlowskiej propagandy, czyli RT ujawnia swoje sekrety
Downloading.
2. "Komunę obaliliśmy, a nadal jest źle. Dlaczego?" Czyli 1984 Orwella właściwie odczytany
Locale set as en_US.UTF-8
Generating cover (800px*600px) from 1 images.
Cover generated: .\starybezpiek.blogspot.com\example.epub.jpg
Epub created: .\example.epub

Examples

blog2epub starybezpiek.blogspot.com
blog2epub velosov.blogspot.com -l=10
blog2epub poznanskiehistorie.blogspot.com -q=100
blog2epub classicameras.blogspot.com --limit=10 --no-images

Running tests

pytest ./tests
pytest --cov=blog2epub ./tests
pytest --cov=blog2epub --cov-report=html ./tests

Current version

v1.5.0 - Release Candidate 2

  • integration testing
  • increase unit test coverage
  • use sitemaps.xml for scraping
  • crawlers refactor
    • use data models
    • more common methods in crawler class
    • expand crawler abstract
  • cli interface refactor
  • greek alphabet support
  • image download and attachment bug solved (ex. modernistyczny-poznan.blogspot.com)
  • improved resistance to http errors
  • dedicated crawler class for zeissikonveb.de
  • (on GUI) skip value is enlarged on limit value (if such is set)
  • download progress is much more verbose, also on GUI it can be cancelled everytime
  • remove poetry as it's overcomplicated for the case,
  • Windows installer!
  • results of cancelled downloads might be converted to epub

» Complete Change Log here «

Project backlog

And finally, a list known bugs and future plans for some new functions and enhancements: BACKLOG.md

Project road map:

  • 1.0 - somewhat working
  • 2.0 - fully working project, 90% unit tested and available builds for Android/Windows/Linux/MacOS