forked from wezm/wezm.net
QA /technical/2009/05/spider-a-site-with-wget-using-sitemap-xml/
This commit is contained in:
parent
0ba542043d
commit
ed25b6b650
1 changed files with 2 additions and 2 deletions
|
@ -1,5 +1,5 @@
|
||||||
On a number of sites at work we employ a static file caching extension to do just that: create static files that are served until the cache is invalidated. One of things that will invalidate the cache is deploying a new release of the code. This means that many of the requests after deploying will need to be generated from scratch, often causing the full Rails stack to be started (via Passenger) each time. To get around this I came up with the following to use <code>wget</code> to spider each of the URLs listed in the <code>sitemap.xml</code>. This ensures each of the major pages has been cached so most requests will be cache hits.
|
On a number of sites at work we employ a static file caching extension to do just that: create static files that are served until the cache is invalidated. One of things that will invalidate the cache is deploying a new release of the code. This means that many of the requests after deploying will need to be generated from scratch, often causing the full Rails stack to be started (via Passenger) each time. To get around this I came up with the following to use `wget` to spider each of the URLs listed in the `sitemap.xml`. This ensures each of the major pages has been cached so most requests will be cache hits.
|
||||||
|
|
||||||
<p style="text-align: left;"><code>wget --quiet http://www.example.com/sitemap.xml --output-document - | egrep -o "http://www\.example\.com[^<]+" | wget --spider -i - --wait 1</code></p>
|
wget --quiet http://www.example.com/sitemap.xml --output-document - | egrep -o "http://www\.example\.com[^<]+" | wget --spider -i - --wait 1
|
||||||
|
|
||||||
That should all be executed on one line. There's a one second wait in there to spread out the requests a bit but you can remove it if you like.
|
That should all be executed on one line. There's a one second wait in there to spread out the requests a bit but you can remove it if you like.
|
Loading…
Reference in a new issue