Scrape a site with wget using a sitemap.xml

The following wget command will create a file called urls.txt containing all the URLs found in a site’s sitemap.xml file.

$ wget -qO- "http://YOURSITE.com/sitemap.xml" --no-check-certificate | grep -Po "<loc>K.+?(?=</loc>)" > urls.txt

Scrape the site using the urls.txt file:

$ wget -mkE -e robots=off -i urls.txt --no-check-certificate > "$site_name-wgetoutput.txt" 2>&1