Sunday, 22 April 2007

Python, Amazon, graphs, oh my! : Part 3

In previous posts here and here, I looked at a reading list on Digital History Hacks, and examined the liked between the books and suggested new book, using Amazon's API.

I wrote my scripts with the idea of looking at some other reading lists, one such list is from the Long Now Foundation (list). Read about the foundation here, and listen to their superb seminar series here.

The majority of the code work has been done, we can take the original script (scrape1.py) and tweak it for the new list.
All that's required is the new source of data (URL), an appropriate regular expression pattern to identify the ASINs and a new prefix to keen the results separate from previous tinkerings.
Here are the changes...

#the page we want to scrape
URL='http://www.longnow.org/shop/books/'

# the pattern we want to look for
PAT="ASIN\/([0-9]+[X]*)"

# the prefix for the filename
filename='ln'
This produces an new pickle of ASINs, and the file name is usefully prefixed, ln_asins.pik.
Now all we have to do is alter the filename prefix in graph4.py and that will produce the following SVG, bitmap below.

I though the list would be more connected, and I was initially surprised how few connections there are. On reflection, the Long Now covers a broad range of subjects and the connection between them is the foundation.

Still we can still search through Amazon's similar product suggestion and see what recommendations we can offer.
Change prefix in graph5.py and amazon offers up 280 items, we'll use any suggestion with more than a third of the connections of the more connected original.
here's the results. (SVG)
Well, there's plenty going on here, too much maybe.
The neat and tidy cluster in the top right is by the historian and author Daniel J. Boorstin.
There's another cluster where three titles, previous unconnected, become connected.
So if you're interested in these...
  • When Good Companies Do Bad Things: Responsibility and Risk in an Age of Globalization
  • Built to Last: Successful Habits of Visionary Companies (Hardcover)
  • Guns, Germs, and Steel: The Fates of Human Societies (Paperback)
...the consider these...

  • The World Is Flat [Updated and Expanded]: A Brief History of the Twenty-first Century (Hardcover)
  • Good to Great and the Social Sectors: A Monograph to Accompany Good to Great (Paperback)
  • Blink: The Power of Thinking Without Thinking (Paperback)
  • The Tipping Point: How Little Things Can Make a Big Difference (Paperback)

The large node is actually a duplicate of another title but an alternate format.
Perhaps a future version will include some dupe checking, LibraryThing's API provides for matching ISBNs and titles.


2 comments:

William J. Turkel said...

Matt, I really like what you're doing with this series. I'm surprised, too, that the Long Now books aren't more highly connected. Looking through the list, I realize that I've read a number of them, but wouldn't really think to put them in the same category. Like many nerds, however, my career has been shaped to a degree by Stewart Brand's various projects, so it's not surprising that his list seems familiar. Bill

Matt Joyce said...

Ah, but they are all connected by the list, just as the works on your reading list are connected by you. I think that's the next step for this series, dig around and find some lists. Matt.