Hi! I'm one of the programmers at Gutenberg.
We've been improving the site a lot over the past few months (and more is coming!).
If you haven't visited the page recently, it's worth checking out again: https://www.gutenberg.org/
> All Project Gutenberg metadata are available digitally in the XML/RDF format. This is updated daily (other than the legacy format mentioned below). Please use one of these files as input to a database or other tools you may be developing, instead of crawling or roboting the website.
Perhaps you can find the information you are looking for there.
However if you plan on scraping or otherwise hitting them with a ton of traffic, consider at least to donate a good amount for the traffic you cause them. It ain't free after all.
While PG has probably gotten a lot of use and growth with the growth/maintreaming of the Internet since the 1990s, (TIL) it started back in 1971:
> Michael S. Hart began Project Gutenberg in 1971 with the digitization of the United States Declaration of Independence.[5] Hart, a student at the University of Illinois, obtained access to a Xerox Sigma V mainframe computer in the university's Materials Research Lab. […] This computer was one of the 15 nodes on ARPANET, the computer network that would become the Internet. Hart believed one day the general public would be able to access computers and decided to make works of literature available in electronic form for free. […]
I'm surprised no eBook Reader vendor has a Project Gutenberg "Store." Where you can just browse Gutenberg, find a book, and just grab it down to the reader. Instead, they either are actively hostile (Kindle), or require the use of Calibre (which itself is good, it is just the friction).
Project Gutenberg had (has?) a tendency toward plaintext that always put me off. (And it has been over a decade I'm sure since I explored the site—so I am no doubt now misinformed.)
I like a styled formatted book—would prefer PDFs. (I know, not a popular format apparently.)
I like the idea of Project Gutenberg but guess I found book scans on archive.org my preference.
My go-to example is Lewis Carroll's "Through the Looking Glass" with the fantastic art of John Tenniel and Carroll's sometimes creative formatting of the prose…
I see they (Project Gutenberg) have ePub now, which can be good if well done.
(If not well done it can be a kind of mess. Re-flowable "HTML", paginated… Anyone ever try to print a long web page and did you enjoy the result? Perhaps that is as much on the ePub reader though.)
As others here have mentioned, https://standardebooks.org/ is excellent and my understanding is that they use Gutenberg books as a source for theirs but done up much nicer.
We're supporting EPUB3 for the vast majority of books! At the same time we also have a "Plain Text" version for each as in a sense it's the most robust. PdFs are in the works!
As a Kindle user, I still miss the old version of the site. The new one looks great on normal desktop, but the old one was simple enough to load and directly download books on the device's built-in browser.
Keep up the good work!
(I can’t quite tell if that’s an egregious abuse of the site or you’re perfectly fine to share without human eye balls hitting your www?)
> All Project Gutenberg metadata are available digitally in the XML/RDF format. This is updated daily (other than the legacy format mentioned below). Please use one of these files as input to a database or other tools you may be developing, instead of crawling or roboting the website.
And strongly consider a donation! (My addition)
https://www.gutenberg.org/ebooks/offline_catalogs.html#the-p...
https://www.gutenberg.org/ebooks/offline_catalogs.html
Perhaps you can find the information you are looking for there.
However if you plan on scraping or otherwise hitting them with a ton of traffic, consider at least to donate a good amount for the traffic you cause them. It ain't free after all.
Don't hit the site with agent. The section furtherst bottom machine readable.
> Michael S. Hart began Project Gutenberg in 1971 with the digitization of the United States Declaration of Independence.[5] Hart, a student at the University of Illinois, obtained access to a Xerox Sigma V mainframe computer in the university's Materials Research Lab. […] This computer was one of the 15 nodes on ARPANET, the computer network that would become the Internet. Hart believed one day the general public would be able to access computers and decided to make works of literature available in electronic form for free. […]
* https://en.wikipedia.org/wiki/Project_Gutenberg
Technically, I can also just directly pull the epub from Project Gutenberg, but sometimes the formatting leaves a lot to be desired.
Once you get an e-reader that runs a semi-capable OS (ex - stock android, even an older version), it's hard to go back to something like a kindle.
but yes, generally I agree with your point. Library of 75k books seems pretty valuable to have direct access to.
I like a styled formatted book—would prefer PDFs. (I know, not a popular format apparently.)
I like the idea of Project Gutenberg but guess I found book scans on archive.org my preference.
My go-to example is Lewis Carroll's "Through the Looking Glass" with the fantastic art of John Tenniel and Carroll's sometimes creative formatting of the prose…
I see they (Project Gutenberg) have ePub now, which can be good if well done.
(If not well done it can be a kind of mess. Re-flowable "HTML", paginated… Anyone ever try to print a long web page and did you enjoy the result? Perhaps that is as much on the ePub reader though.)
You can download books in most browsers. I know Amazon have done things to make life difficult for other stores in the past.
could be a trick to ease that fear :D