-
You have a file that contains the URLs you want to download? Use the
`-i' switch:
wget -i file
If you specify `-' as file name, the URLs will be read from
standard input.
-
Create a five levels deep mirror image of the GNU web site, with the
same directory structure the original has, with only one try per
document, saving the log of the activities to `gnulog':
wget -r http://www.gnu.org/ -o gnulog
-
The same as the above, but convert the links in the HTML files to
point to local files, so you can view the documents off-line:
wget --convert-links -r http://www.gnu.org/ -o gnulog
-
Retrieve only one HTML page, but make sure that all the elements needed
for the page to be displayed, such as inline images and external style
sheets, are also downloaded. Also make sure the downloaded page
references the downloaded links.
wget -p --convert-links http://www.server.com/dir/page.html
The HTML page will be saved to `www.server.com/dir/page.html', and
the images, stylesheets, etc., somewhere under `www.server.com/',
depending on where they were on the remote server.
-
The same as the above, but without the `www.server.com/' directory.
In fact, I don't want to have all those random server directories
anyway--just save all those files under a `download/'
subdirectory of the current directory.
wget -p --convert-links -nH -nd -Pdownload \
http://www.server.com/dir/page.html
-
Retrieve the index.html of `www.lycos.com', showing the original
server headers:
wget -S http://www.lycos.com/
-
Save the server headers with the file, perhaps for post-processing.
wget -s http://www.lycos.com/
more index.html
-
Retrieve the first two levels of `wuarchive.wustl.edu', saving them
to `/tmp'.
wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
-
You want to download all the GIFs from a directory on an HTTP
server. You tried `wget http://www.server.com/dir/*.gif', but that
didn't work because HTTP retrieval does not support globbing. In
that case, use:
wget -r -l1 --no-parent -A.gif http://www.server.com/dir/
More verbose, but the effect is the same. `-r -l1' means to
retrieve recursively (see section Recursive Retrieval), with maximum depth
of 1. `--no-parent' means that references to the parent directory
are ignored (see section Directory-Based Limits), and `-A.gif' means to
download only the GIF files. `-A "*.gif"' would have worked
too.
-
Suppose you were in the middle of downloading, when Wget was
interrupted. Now you do not want to clobber the files already present.
It would be:
wget -nc -r http://www.gnu.org/
-
If you want to encode your own username and password to HTTP or
FTP, use the appropriate URL syntax (see section URL Format).
wget ftp://hniksic:[email protected]/.emacs
-
You would like the output documents to go to standard output instead of
to files?
wget -O - http://jagor.srce.hr/ http://www.srce.hr/
You can also combine the two options and make pipelines to retrieve the
documents from remote hotlists:
wget -O - http://cool.list.com/ | wget --force-html -i -