blog2epub/README.md

198 lines
6.5 KiB
Markdown
Raw Permalink Normal View History

2019-08-21 21:13:40 +02:00
<p align="center">
2023-10-29 10:13:40 +01:00
<img src="./assets/blog2epub_256px.png" width="256" height="256" />
2019-08-21 21:13:40 +02:00
</p>
2021-10-22 19:59:59 +02:00
# blog2epub
2019-03-06 00:03:05 +01:00
2023-10-30 22:41:12 +01:00
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/bohdanbobrowski/blog2epub/graphs/commit-activity) [![MIT license](https://img.shields.io/badge/License-MIT-blue.svg)](https://lbesson.mit-license.org/) ![GitHub all releases](https://img.shields.io/github/downloads/bohdanbobrowski/blog2epub/total) ![GitHub release (with filter)](https://img.shields.io/github/v/release/bohdanbobrowski/blog2epub) ![GitHub Release Date - Published_At](https://img.shields.io/github/release-date/bohdanbobrowski/blog2epub)
Convert blog to epub using command line or GUI.
2019-03-06 00:03:05 +01:00
2024-11-19 13:28:25 +01:00
> My main goal in creating this app is to preserve the legacy of the blogosphere for future generations.
### Supported blogs:
- *.blogspot.com
2024-11-19 10:15:38 +01:00
- *.wordpress.com
- multiple other blogs and even some webpages
### Main features
2019-08-21 20:50:12 +02:00
- command line (CLI) and graphic user interface (GUI)
- script downloads all text contents of selected blog to epub file,
- if it's possible, it includes post comments,
2023-10-23 12:59:50 +02:00
- images are downsized (to maximum 800/600px) and converted to grayscale,
2021-10-16 10:17:40 +02:00
- one post = one epub chapter,
2015-07-13 12:05:27 +02:00
- chapters are sorted by date ascending,
2021-10-16 10:35:01 +02:00
- cover is generated automatically from downloaded images.
2023-10-29 10:13:40 +01:00
### Example covers
<table style="width:100%;text-align:center;"><tr><td>
2024-11-14 19:48:39 +01:00
<img src="assets/v1.5.0/archaia-ellada_blogspot_com_2014.11.01-2014.12.01.epub.jpg" width="400" style="margin:0 10px 10px 0" />
2023-10-29 10:13:40 +01:00
</td><td>
2024-11-14 19:48:39 +01:00
<img src="assets/v1.5.0/boston1775_blogspot_com_2024.11.10-2024.11.14.epub.jpg" width="400" style="margin:0 10px 10px 0" />
2023-10-29 10:13:40 +01:00
</td></tr><tr><td>
2024-11-14 19:48:39 +01:00
<img src="assets/v1.5.0/velosov_blogspot_com_2013.02.22-2013.03.10.epub.jpg" width="400" style="margin:0 10px 10px 0" />
2023-10-29 10:13:40 +01:00
</td><td>
2024-11-14 19:48:39 +01:00
<img src="assets/v1.5.0/zeissikonveb_de_2021.04.10-2024.10.19.epub.jpg" width="400" style="margin:0 10px 10px 0;" />
2023-10-29 10:13:40 +01:00
</td></tr></table>
2019-08-21 20:50:12 +02:00
## Installation
2024-07-20 12:13:26 +02:00
Checkout for latest available [builds](https://github.com/bohdanbobrowski/blog2epub/releases).
2024-07-19 19:01:03 +02:00
### Running from sources
2021-10-05 21:20:41 +02:00
2024-12-17 17:38:32 +01:00
#### Easiest way
2024-12-14 20:41:32 +01:00
pip install git+https://github.com/bohdanbobrowski/blog2epub.git
2024-12-17 17:38:32 +01:00
#### Developer environment
2024-12-14 20:41:32 +01:00
2021-10-22 19:59:59 +02:00
git clone git@github.com:bohdanbobrowski/blog2epub.git
cd blog2epub
2024-12-17 17:38:32 +01:00
python -m venv venv
##### Windows:
venv\Scripts\activate
pip install -e .[dev]
#### macOS/Linux:
2024-12-17 21:31:13 +01:00
source ./venv/bin/activate
2024-12-17 22:22:33 +01:00
pip install -e .[dev]
2024-07-19 19:01:03 +02:00
### Building own executable
#### Windows
2024-12-17 20:45:21 +01:00
python blog2epub_build_windows.py
2024-12-17 17:38:32 +01:00
Finally, you can run NSIS to build Windows installer:
"C:\Program Files (x86)\NSIS\makensis" blog2epub_windows_installer.nsi
2021-10-05 21:20:41 +02:00
2024-07-20 14:22:02 +02:00
#### macOS
2024-12-17 20:45:21 +01:00
python blog2epub_build_macos.py
2024-07-24 22:45:40 +02:00
And then to create dmg image with app:
./make_macos_dmg.sh
2024-07-20 14:22:02 +02:00
2024-09-10 08:49:54 +02:00
#### Android
Before you start, you'll need to install buildozer following this [installation documentation](https://buildozer.readthedocs.io/en/latest/installation.html).
buildozer -v android
2023-10-29 10:13:40 +01:00
## Screenshots of GUI
2024-11-01 19:36:05 +01:00
### Android (Google Pixel 6a)
<p align="center">
2024-11-14 23:59:39 +01:00
<img src="assets/v1.5.0/blog2pub_android_pixel6a_screenshot1.png" width="200px" />
<img src="assets/v1.5.0/blog2pub_android_pixel6a_screenshot2.png" width="200px" />
<img src="assets/v1.5.0/blog2pub_android_pixel6a_screenshot3.png" width="200px" />
<img src="assets/v1.5.0/blog2pub_android_pixel6a_screenshot4.png" width="200px" />
2024-11-01 19:36:05 +01:00
</p>
### Windows (11)
2023-10-29 10:13:40 +01:00
<p align="center">
2024-11-14 19:48:39 +01:00
<img src="assets/v1.5.0/blog2epub_win11_screenshot.png" width="600px" />
2023-10-29 10:13:40 +01:00
</p>
### Linux (Manjaro Gnome)
2023-10-29 10:13:40 +01:00
<p align="center">
2024-07-20 12:08:11 +02:00
<img src="assets/v1.3.0/blog2epub_linux_screenshot.png" width="600px" />
2023-10-29 10:13:40 +01:00
</p>
2024-07-20 13:06:43 +02:00
### macOS (Sonoma 14.4.1)
2023-10-29 17:55:14 +01:00
2024-07-20 13:06:43 +02:00
<p align="center">
<img src="assets/v1.3.0/blog2epub_macos_screenshot.png" width="600px" />
</p>
2023-10-29 10:13:40 +01:00
2019-08-21 20:50:12 +02:00
## CLI
2024-11-14 19:18:49 +01:00
blog2epub --help
usage: Blog2epub Cli interface [-h] [-l LIMIT] [-s SKIP] [-q QUALITY] [-o OUTPUT] [-d] url
Convert blog (blogspot.com, wordpress.com or another based on Wordpress) to epub using CLI or GUI.
positional arguments:
url url of blog to download
options:
-h, --help show this help message and exit
-l LIMIT, --limit LIMIT
articles limit
-s SKIP, --skip SKIP number of skipped articles
-q QUALITY, --quality QUALITY
images quality (0-100)
-o OUTPUT, --output OUTPUT
output epub file name
-d, --debug turn on debug
Example:
blog2epub starybezpiek.blogspot.com -l=2 -o=example.epub
Starting blogger.com crawler
Found 54 articles to crawl.
Downloading.
1. 10 lat kremlowskiej propagandy, czyli RT ujawnia swoje sekrety
Downloading.
2. "Komunę obaliliśmy, a nadal jest źle. Dlaczego?" Czyli 1984 Orwella właściwie odczytany
Locale set as en_US.UTF-8
Generating cover (800px*600px) from 1 images.
Cover generated: .\starybezpiek.blogspot.com\example.epub.jpg
Epub created: .\example.epub
2016-03-07 01:24:55 +01:00
2023-10-29 10:13:40 +01:00
## Examples
2019-08-21 20:50:12 +02:00
2024-12-17 21:09:46 +01:00
blog2epub starybezpiek.blogspot.com
blog2epub velosov.blogspot.com -l=10
blog2epub poznanskiehistorie.blogspot.com -q=100
blog2epub classicameras.blogspot.com --limit=10 --no-images
2019-06-13 19:37:52 +02:00
2024-10-31 14:26:40 +01:00
## Running tests
pytest ./tests
pytest --cov=blog2epub ./tests
pytest --cov=blog2epub --cov-report=html ./tests
2024-07-20 14:22:02 +02:00
## Current version
2024-07-05 10:37:56 +02:00
2024-12-17 17:38:32 +01:00
### [v1.5.0 - Release Candidate 2](https://github.com/bohdanbobrowski/blog2epub/releases/tag/v1.5.0_RC2)
2024-11-05 08:43:19 +01:00
- [X] integration testing
2024-11-13 21:25:23 +01:00
- [X] increase unit test coverage
2024-11-15 02:47:36 +01:00
- [X] use sitemaps.xml for scraping
2024-11-09 01:38:27 +01:00
- [X] crawlers refactor
2024-11-15 02:47:36 +01:00
- [X] use data models
- [X] more common methods in crawler class
- [X] expand crawler abstract
2024-11-09 01:38:27 +01:00
- [X] cli interface refactor
2024-11-15 02:47:36 +01:00
- [X] greek alphabet support
- [X] image download and attachment bug solved (ex. modernistyczny-poznan.blogspot.com)
- [X] improved resistance to http errors
- [X] dedicated crawler class for zeissikonveb.de
2024-11-16 17:03:31 +01:00
- [X] (on GUI) skip value is enlarged on limit value (if such is set)
- [X] download progress is much more verbose, also on GUI it can be cancelled everytime
2024-12-17 17:38:32 +01:00
- [X] remove poetry as it's overcomplicated for the case,
- [X] Windows installer!
- [X] results of cancelled downloads might be converted to epub
2024-07-19 19:01:03 +02:00
2024-07-22 22:52:07 +02:00
[&raquo; Complete Change Log here &laquo;](https://github.com/bohdanbobrowski/blog2epub/blob/master/CHANGELOG.md)
## Project backlog
2024-10-31 14:26:40 +01:00
And finally, a list known bugs and future plans for some new functions and enhancements: [BACKLOG.md](https://github.com/bohdanbobrowski/blog2epub/blob/master/BACKLOG.md)
2024-11-01 19:36:05 +01:00
## Project road map:
- 1.0 - somewhat working
- 2.0 - fully working project, 90% unit tested and available builds for Android/Windows/Linux/MacOS