Tom Says: Code something crazy every day you feel like it!
As an exercise in writing inefficient Ruby code, I created these scripts for creating a Markov model [1] of bodies of text, then using it to generate realistic-looking quotations based on it! If you follow me on Twitter, you've unfortunately been subject to the results for the past day.
I used the clearest, most space-inefficient method I could imagine:
The hash looks like this one, generated from the sentence "save the store from the storm":
db = { "" => ["save"],
"save" => ["the"],
"the" => ["store", "storm"],
"store" => ["from"],
"from" => ["the"]
}
When this hash is saved to disk (using Marshal), it ends up being bigger than the original text, but the format is incredibly easy to use to produce text, so it was worth it to me as a quick hack. This is the code for generating it:
# usage: ruby learn.rb db-file-name
file = ARGV[0]
# load the existing model from disk
db = Marshal.load(File.read(file)) rescue {}
# read words from $stdin; yield each with the one before
def get_words
while $stdin.gets do
preceding = ""
$_.split.each do |word|
yield preceding, word
preceding = word
end
end
end
# add words to the model
get_words do |preceding, word|
db[preceding] ||= []
db[preceding] << word
end
# save back to disk
File.open(file, "w") { |f| f.write Marshal.dump(db) }
One important thing to note: the word at the beginning of each line is stored as occurring after "", the empty string. These are the possible starting points for generated chains…
Generating text is easy! I may as well show you the code first:
# usage: ruby produce.rb db-file-name max-characters
file = ARGV[0]
count = ARGV[1].to_i
# load the model
db = Marshal.load(File.read(file)) rescue {}
# define convenience method for getting a random element of an array
class Array
def rand
self[Kernel.rand(length)]
end
end
words = []
last = ""
loop do
last = db[last].rand rescue nil
break if last.nil?
break if (words + [last]).join(" ").length >= count
words << last
# break if last[/[\.\?\!]$/] # stop at an end-of-sentence marker
end
puts words.join(" ")
It's easy to explain:
Using the training sentence from before ("save the store from the storm"), you can produce such glorious sentences as: "save the storm," "save the store from the storm," and, "save the store from the store from the store from the store from the storm."
The first thing I did of course, was train it with my plain text copy of the book of Genesis. Perhaps you do not have this text available. Why not try it with your own copy of the whole bible [2]? I've gotten choice "quotations" like:
Then I trained it on a database dump of a forum I frequent full of h4x0rz, g4m3rz, and nubc4k3z:
Get creative.
Using this power, you can become nearly as annoying a tweeter as I am! This script downloads a user's Twitter RSS feed, parses it with Hpricot [3] (so you need the gem), then outputs new tweets to stdout (old tweets are cached). This output is meant to be piped into the learn.rb script … for generating tweets that are not entirely unlike those the poor victim wrote themselves.
The example code has _why [4]'s Twitter feed information hard-coded near the top, for ultimate randomness. The ID number is taken from the URL for his RSS feed on Twitter -- feel free to change this to the ID number for any user you'd like to mimic.
# usage: ruby getwhy.rb | ruby learn.rb why
require "rubygems"
require "hpricot"
require "open-uri"
@id = "3573501"
@file = "_why_tweets"
rss = Hpricot(open("http://twitter.com/statuses/user_timeline/#{@id}.rss"))
new_statuses = (rss/"item title").map { |i| i.inner_html }.map { |i| i.split(" ", 2).last }
saved_statuses = Marshal.load(File.read(@file)) rescue []
never_before_seen = new_statuses - saved_statuses
statuses = new_statuses | saved_statuses
File.open(@file, "w") { |f| f.write Marshal.dump(statuses) }
puts never_before_seen.join("\n")
Posted Jun 25, 2008, in the morning.