Crawling and Scraping Techniques in Ruby

Problem case to solve : Crawl any given site(url), scrape each page uniquely for content from given selector or Xpath Available options for Crawling and Scraping : With [libxml2][1] [Nokogiri][2] is one of the fastest HTML/XML parser. Almost all of the scraping tools written in Ruby such as Mechanize, Wombat, Anemone, etc. uses Nokogiri as there base DSL for sraping. [Mechanize][3] - Is a beast in this category. It efficiently uses Nokogiri for HTML scrapping. Do see Abi's post on [Automating browser navigation's through Script][4] [Wombat][5] takes the scraping term to a next higher level,

Swapnil Abnave November 25, 2013 ruby ruby-gem rails_3 scraping crawling