I was thinking about how data is so easily lost. For example how Nyaa almost disappeared overnight, how some MyAnimeList functions were unavailable in May and June 2018 because they were late to comply to GDPR.
So this post is about anime databases, and how resilient (?) they are. I wanted to know, how possible it is to make a complete copy of the database. To this end, I found this Anime & Mange Stack Exchange question that asks: “What databases and listing sites exist for anime, manga, etc?”. From that list, I retained AniDB, AniList, the AnimeNewsNetwork Encyclopaedia, Anime-Planet, Kitsu, and MyAnimeList.
Here are my findings:
AniDB explicitly tells you not to make a copy of their database in
their policies. I used to diss on MyAnimeList being
all proprietary and such, but AniDB is not much better.
Sure, its content is licenced under CC-BY-NC-SA 4.0, but that’s still quite restrictive.
Or rather, the -NC modifier is a bit controversial,
as it doesn’t qualify as a free culture licence.
That’s why Wikipedia is licenced under CC-BY-SA and not CC-BY-SA-NC.
Did you know that database have rights too? They’re referred to as sui generis database rights. That’s “of its own kind” in Latin. It’s about the energy put into the creation of the database. You must get permission from the creator to use information extracted from the database.
It just so happens that Creative Commons v4 makes it clear that those rights are also waived for CC-licenced databases. Unfortunately for us, iron-fisted AniDB forbids that.
That aside, they also prohibit scraping the HTML or the API. However, they do have a working API for developers and data dumps of the list of titles; just not the free to use kind.
While writing this blog post, I found something else to trash on. I am fully conscious that I’m just speaking ill of that project. But why would you just paste a 792 lines-long or 6684 words-worth XML document straight into a wiki page? The API response for information about an anime is a mess composed of general information, recommendations, reviews, ratings, further information about characters (and their descriptions), episode list, tags and their definitions, and staff.
AniList doesn’t really say much about how to use their database. They do have an API which can be used to retrieve the data.
They prohibit “‘Hoarding’ or mass collection of data” which puts a curb on our cause.
I don’t have much to say about the quality and completeness of the dataset as it isn’t a reference as MyAnimeList or Anime News Network can be.
Anime News Network’s Encyclopaedia is by far the nicest.
To use the content, you simply have to mention and link back to ANN.
They provide dumps of the anime titles just like AniDB. And just like AniDB they provide more info about each title. But unlike AniDB, they caringly recommend you to cache the data, and only request it as needed; they also specify the rate limits.
Now it wouldn’t be fair if I didn’t point that ANN’s API also spews out a ton of information about a title, although they do warn you about that. The drawback of ANN is that there isn’t any API documentation, so there isn’t much you can do with it. The data structure for the anime information isn’t documented either, but it’s self descriptive enough.
I hate to admit it, but AniDB has much more information about anime when it comes to dates, or tags. ANN has the advantage that it’s more oriented towards the industry, so when the information exists about the anime’s production team(s), it’s quite detailed.
Anime-Planet has absolutely no API. The development of the website is quite closed off too.
Which is unlike Kitsu which prides itself in being open source. As such, they have an API that can be used to retrieve data.
Though, pet peeve of mine, the documentation is awful to use and not clear. It feels as though it exists just for the sake of it, and not because documentation is important.
Finally, MyAnimeList, the reference for anime watchers…
has absolutely no API. None.
However, as it is popular, someone has created an API which scrapes the webpages for information.
- Anime News Networks would let you copy their database, but you have to make sense of the data yourself. And some information (like company descriptions) is unavailable through the API.
- AniDB is the most likely contender for possible duplication, but their docs may or may not be usable, and more importantly, they forbid it.
- Same goes for AniList.
- Kitsu is probably doable, but I don’t even want to try.
- Anime-Planet is completely closed off.
- MyAnimeList is the same, theoretically speaking.
This doesn’t sound very encouraging, but hey, when’s the last time you wanted to backup a database? fading nervous laughter