crawling Blogs
Written by Kiprosh, team of passionate and disciplined craftsmen turning your ideas into reality.
Kiprosh is now part of LawLytics
Written by Kiprosh, team of passionate and disciplined craftsmen turning your ideas into reality.
Problem case to solve : Crawl any given site(url), scrape each page uniquely for content from given selector or Xpath Available options for Crawling and Scraping : With [libxml2][1] [Nokogiri][2] is one of the fastest HTML/XML parser. Almost all of the scraping tools written in Ruby such as Mechanize, Wombat, Anemone, etc. uses Nokogiri as there base DSL for sraping. [Mechanize][3] - Is a beast in this category. It efficiently uses Nokogiri for HTML scrapping. Do see Abi's post on [Automating browser navigation's through Script][4] [Wombat][5] takes the scraping term to a next higher level,