1
0
Fork 0
forked from wezm/wezm.net
wezm.net/v1/content/technical/2009/05/spider-a-site-with-wget-using-sitemap-xml.html

5 lines
No EOL
908 B
HTML

On a number of sites at work we employ a static file caching extension to do just that: create static files that are served until the cache is invalidated. One of things that will invalidate the cache is deploying a new release of the code. This means that many of the requests after deploying will need to be generated from scratch, often causing the full Rails stack to be started (via Passenger) each time. To get around this I came up with the following to use `wget` to spider each of the URLs listed in the `sitemap.xml`. This ensures each of the major pages has been cached so most requests will be cache hits.
wget --quiet http://www.example.com/sitemap.xml --output-document - | egrep -o "http://www\.example\.com[^<]+" | wget --spider -i - --wait 1
That should all be executed on one line. There's a one second wait in there to spread out the requests a bit but you can remove it if you like.