Categorize book authors as fiction vs non-fiction

Question:

For my own personal purposes, I have about ~300 authors (full name) of various books. I want to partition this list into “fiction authors” and “non-fiction authors”. If an author writes both, then the majority gets the vote.

I looked at Amazon Product Search API: I can search by author (in Python), but there is no way to find the book category (fiction vs rest):

>>> node = api.item_search('Books', Author='Richard Dawkins')
>>> for book in node.Items.Item:
...     print book.ItemAttributes.Title

What are my options? I prefer to do this in Python.

Asked By: Sridhar Ratnakumar

||

Answers:

Well, you can try another service – Google Book Search API. To use Python you can have a look at gdata-python-api. In its protocol, in result feed there is a node <dc:subject> – probably that’s what you need:

<?xml version="1.0" encoding="UTF-8"?>
<feed 
      
       
      
      >
  <id>http://www.google.com/books/feeds/volumes</id>
  <updated>2008-08-12T23:25:35.000</updated>

<!--  a loot of information here, just removed those nodes to save space.. -->

    <dc:creator>Jane Austen</dc:creator>
    <dc:creator>James Kinsley</dc:creator>
    <dc:creator>Fiona Stafford</dc:creator>
    <dc:date>2004</dc:date>
    <dc:description>
      If a truth universally acknowledged can shrink quite so rapidly into 
      the opinion of a somewhat obsessive comic character, the reader may reasonably feel ...
    </dc:description>
    <dc:format>382</dc:format>
    <dc:identifier>8cp-Z_G42g4C</dc:identifier>
    <dc:identifier>ISBN:0192802380</dc:identifier>
    <dc:publisher>Oxford University Press, USA</dc:publisher>
    <dc:subject>Fiction</dc:subject>
    <dc:title>Pride and Prejudice</dc:title>
    <dc:title>A Novel</dc:title>
  </entry>
</feed>

Of course, this protocol gives you some overhead information, related to this book (like visible or not on Google Books etc.)

Answered By: Maxym

Did you look at BrowseNodes? To me (who has not been using this API before) it seems BrowseNodes correspond to Amazon’s product categories. Maybe you find more information there.

Answered By: Reiner Gerecke

After spending some time messing with the Amazon API it looks like they don’t provide the kind of information you want.

They don’t mention categories of that type in their documentation and if you serialise the stuff the api sends you there is not a single mention of fiction or non-fiction catergories.

You can use this to print out a nice XML string (you might want to direct it at a file for easy reading) with all of the stuff the api sends.

from lxml import etree

node = api.item_search('Books', Author='Richard Dawkins')

print etree.tostring(node, pretty_print=True)
Answered By: thomas