babbaging

Saturday, 2 February 2008

Teracopy

Every so often a nice little utility comes a long, and it makes stop and wonder why the OS vendors didn't include it as part of the OS.

Teracopy is a small windows utility from Code Sector which replaces the tediously slow native file copying process.

Essentially it provides file copy queues.

Find the file you want to copy, right click use the context menu to use Teracopy, Copy to.. etc, start the copy process.

Want to copy another file to the same place?
Just drag it to the existing copy dialog and it will be added to the queue.
So simple and intuitive.

I love it. and it will be part of my essential installs.

Wednesday, 4 July 2007

Australian product recalls feed

I'm a subscriber of Choice magazine (Australia) and a supporter of their campaigns.
Towards the back of each issue there is a couple of pages listing product recalls. I typically scan through them, but there's a lot and the format is not great.

The same information is provided online by the Australian government, here.
It's not the most attractive website, but it probably supports every browser and let's face it how slick does it need to be?

If the style doesn't date it, the lack of RSS does, and that's what I was after, a low volume feed.
I emailed the address on the contact page, but I haven't heard back, so I though I'd just scrape the pages for data and fabricate my own feed.

I tried using feed43.com, a service specifically designed to scape content from html to produce rss feeds. However, there's no facility to pull content from multiple sources, so the feed it produced was not much more than a teaser, no good for my offline reader.

I decided to write my own, using python.
A quick Google for a python module located PyRSS2Gen.

Essentially, this the the processes :

Download the web page with a list of product recall in the last 30 days.
Locate and extract the individual recalls using regular expresions.
Download each recalled item's information page.
Extract the details of the recall.
Construct an RSS feed from the scraped content.

Here's the code, and the resulting rss feed.
A cron job runs every day, and updates the feed.

Thursday, 26 April 2007

What is a book ?

First I saw the cover of an old novel pop up on my LibraryThing collection and then, a post about Ludditism on Digital History Hacks reminded me of the same novel, and today a post titled Bio-Optic Organized Knowledge Review, reminded me of it again.
The novel, Days by James Lovegrove is an old favorite of mine, and a significant section of the plot revolves around an escalating conflict between a book shop and a computer shop.

What sticks in my mind is this lovely piece about books, spoken by Miss Dalloway the Luddite bookshop manager.

"A book. As a source of easily retrievable information, portable, needing no peripheral support systems, instantly accessible to anyone on the planet old enough to read and turn a page, a book is without peer. A book does not come with an instruction manual. A book is not subject to constant software upgrades. A book is not technologically outmoded after five years. A book will never ‘go wrong’ and have to be repaired by a trained (and expensive) technician. A book cannot be accidentally erased at the touch of a button or have it’s contents corrupted by magnetic fields. Is it possible to think of a object on this earth more – horrible term – user-friendly than a book?"

That was written in 1997, when computers were outmoded in five years. (yeah right)

I've long since been waiting for the arrival of e-ink, I have wanted a good ebook reader for years and this post on Signal vs Noise, gives an encouraging review of the Sony Reader PRS-500 (massive image), encouraging enough for me to look around this room for something to sell.

Books they be changing.

Sunday, 22 April 2007

Python, Amazon, graphs, oh my! : Part 3

In previous posts here and here, I looked at a reading list on Digital History Hacks, and examined the liked between the books and suggested new book, using Amazon's API.

I wrote my scripts with the idea of looking at some other reading lists, one such list is from the Long Now Foundation (list). Read about the foundation here, and listen to their superb seminar series here.

The majority of the code work has been done, we can take the original script (scrape1.py) and tweak it for the new list.
All that's required is the new source of data (URL), an appropriate regular expression pattern to identify the ASINs and a new prefix to keen the results separate from previous tinkerings.
Here are the changes...

#the page we want to scrape
URL='http://www.longnow.org/shop/books/'

# the pattern we want to look for
PAT="ASIN\/([0-9]+[X]*)"

# the prefix for the filename
filename='ln'

This produces an new pickle of ASINs, and the file name is usefully prefixed, ln_asins.pik.
Now all we have to do is alter the filename prefix in graph4.py and that will produce the following SVG, bitmap below.

I though the list would be more connected, and I was initially surprised how few connections there are. On reflection, the Long Now covers a broad range of subjects and the connection between them is the foundation.

Still we can still search through Amazon's similar product suggestion and see what recommendations we can offer.
Change prefix in graph5.py and amazon offers up 280 items, we'll use any suggestion with more than a third of the connections of the more connected original.
here's the results. (SVG)

Well, there's plenty going on here, too much maybe.
The neat and tidy cluster in the top right is by the historian and author Daniel J. Boorstin.
There's another cluster where three titles, previous unconnected, become connected.
So if you're interested in these...

When Good Companies Do Bad Things: Responsibility and Risk in an Age of Globalization
Built to Last: Successful Habits of Visionary Companies (Hardcover)
Guns, Germs, and Steel: The Fates of Human Societies (Paperback)

...the consider these...

The World Is Flat [Updated and Expanded]: A Brief History of the Twenty-first Century (Hardcover)
Good to Great and the Social Sectors: A Monograph to Accompany Good to Great (Paperback)
Blink: The Power of Thinking Without Thinking (Paperback)
The Tipping Point: How Little Things Can Make a Big Difference (Paperback)

The large node is actually a duplicate of another title but an alternate format.
Perhaps a future version will include some dupe checking, LibraryThing's API provides for matching ISBNs and titles.

Tuesday, 17 April 2007

Python, Amazon, graphs, oh my! : Part 2

In this post we looked at Digital History Hack's proposed reading list and visualised it using using Graphviz's SVG output.

Now lets finish what was started and add in the book titles which are considered considered similar by Amazon.

From the original list, Amazon suggest nearly 780 titles, that's clearly not useful.
I've chosen to limit to those which have at least a third of the number of the most connected.

The new books added as new nodes and have a double circle node shapes.

The Long Tail: Why the Future of Business Is Selling Less of More
Wikinomics: How Mass Collaboration Changes Everything
Everyware: The Dawning Age of Ubiquitous Computing
The Victorian Internet

Heres the code graph5.py and the SVG, bitmap below.

Saturday, 14 April 2007

Python, Amazon, graphs, oh my!

I'm a regular reader of William J Turkel's blog, Digital History Hacks.
His recent posts (here and here) about analysing his course's reading list inspired some tinkering of my own.

In William's first post on the subject, he uses the ASIN (Amazon Standard Identification Number) from each of the books in his reading list and Amazon's API to request a list of similar books.
This generates a list of paired ASIN, up to 10 pairs per original title.

Original Title ASIN 1, Similar Title ASIN 1
Original Title ASIN 1, Similar Title ASIN 2
Original Title ASIN 1, Similar Title ASIN 3
Original Title ASIN 2, Similar Title ASIN 1
Original Title ASIN 2, Similar Title ASIN 2
Original Title ASIN 2, Similar Title ASIN 3

And as William shows these pairs can be used to construct a graph, which can be used to visualise the degree of connectivity of the reading list.

My code uses parts of Williams's so if it looks familiar, that's why.

First scrape the ASIN from the webpage, my version of this script is here scrape1.py.

So we have our list of paired ASIN, so let's query amazon and get some similar titles.
You'll need to get your own Amazon Web Service ID which is a trivial enough process.

I have a couple of simple calls to help extract the data we want, in a module called amazon.py.
The process is simple enough.

Load the pickled ASINs
Query Amazon
Compile a list of ASIN pairs
Pickle and save the list of pairs.

The code for that is here, get_similar1.py

I found this article, Python MiniDom, useful when playing with the XML data returned by Amazon, and theres a more comprehensive Amazon wrapper project for python, pyAWS.

OK, so now we have a list of pairs, we can start to play with Graphviz.
There's a lovely Python interface for Graphviz, pydot which I think is great.
Basically using pydot, graphs, nodes and edges become objects and this make it very easy to use.
Let's start by graphing all the original ASINs, and show links where they are similar to each other.

Load the pickled ASINs
Load the pickled ASIN pairs
Create a graph object
Create a node object for ASIN and add to the graph
Create an edge object for each linked original ASIN
Save the graph

Here's the code, graph1.py , and here the resulting graph.

It's quite clear that many of these books are considered similar by amazon, and some stand out much more than others as hubs in a network.

We can quantify that connectedness, and use the data to alter the graphs appearance.

If we count he number of nodes which connect to or from other nodes, we can apply that to the font size, which is often done in tag clouds.

To do this is quite simple, we'll use a dictionary object to assign a value to each ASIN, and as we cycle through the pairs we'll increment the value.

Then, when we create the node, we'll set the font size.

here's the code to calculate the weights:

# for each ASIN associate a value
weight={} for asin in asins: weight[asin]=0

# for each pair
for pair in pairs:
# only if they are one of the originals
if pair[1] in asins: # increment the weight
weight[pair[1]]+=1
weight[pair[0]]+=1

and here is where that value is used, and while we're at it let's make the nodes circles.

# add a node for each original title for asin in asins:
node=pydot.Node(asin, shape='circle', fontsize=8+weight[asin])
g.add_node(node)

the new code, graph2.py and the results...

The larger circles make quite a difference, but for some reason my system does have the right fonts. I think it still illustrates how by calculating and using the weight data we can make the over all picture clearer.
At this point the jpg files are getting large, and the quality is not that great. An alternative graphics format is Scalable Vector Graphics (SVG). Read about it at W3C and at Wikipedia. Firefox has native support for SVG and Adobe provide a viewer.

This format will provide nice fonts and smooth lines and curves, and some other useful features, which we'll get to later.

OK, let's use that weight value again, this time to add some color and view as an SVG.
I have a helper module called gradi.py, which has some functions which generate colour gradients, we can use this to calculate colours for our nodes.
Starting with yellow for node with no connections, and red for those with the most, everything between we'll make orangish.

First we define our colour range and gradient.

loColor=gradi.HTMLColorToRGB('FFCC00')
hiColor=gradi.HTMLColorToRGB('FF0000')
colourgradient=1.0/max(weight.values())

Now when we define our node, we'll calculate the colour and fill the circle.

color=gradi.RGBToHTMLColor(gradi.RGBinterpolate(loColor,hiColor,colorgradient*weight[asin])) node=pydot.Node(asin, shape='circle',style='filled', fillcolor=color, _ fontsize=8+weight[asin])

And then save as SVG

g.write(output_filename+'.svg',format='svg')

Here the modified script, graph3.py and the results, click the image or here for the SVG.
I recommend this Firefox Zoom and Pan extension for SVG files.

I prefer the look of the SVG files and the format provides some useful features that bitmaps can't, for instance we can have Graphviz add http links from the nodes and edges, and include tooltips too. Hovering over nodes will show the title, and clicking will open the Amazon product page.

To add this functionality we just query Amazon for the title and include that data as a node attribute.

node.set_URL('http://www.amazon.com/gp/product/'+asin)
node.set_tooltip(amazon.getelement(amazon.AmazonAPI(asin),'Title'))

Here the adjusted script, graph4.py and a link the new interactive SVG.

I think this is a great improvement over a bitmap of product codes.
In part two we'll add the titles which were suggested by Amazon.

Monday, 12 March 2007

PowerChute Network Shutdown agent for VMWare ESX v3.01

This summary is not available. Please click here to view the post.