I’m using FileUtils.copyURLToFile(URL, File)
, an Apache Commons IO 2.4 part, to download and save the file on my computer. The problem is that some sites refuse connection without referrer and user agent data.
My questions:
- Is there any way to specify user agent and referrer to the
copyURLToFile
method? - Or should I use another approach to download a file and then save a given
InputStream
to file?
Answer
I’ve re-implement the functionality with HttpComponents
instead of Commons-IO
. This code allows you to download a file in Java according to its URL and save it at the specific destination.
The final code:
public static boolean saveFile(URL imgURL, String imgSavePath) { boolean isSucceed = true; CloseableHttpClient httpClient = HttpClients.createDefault(); HttpGet httpGet = new HttpGet(imgURL.toString()); httpGet.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.11 Safari/537.36"); httpGet.addHeader("Referer", "https://www.google.com"); try { CloseableHttpResponse httpResponse = httpClient.execute(httpGet); HttpEntity imageEntity = httpResponse.getEntity(); if (imageEntity != null) { FileUtils.copyInputStreamToFile(imageEntity.getContent(), new File(imgSavePath)); } } catch (IOException e) { isSucceed = false; } httpGet.releaseConnection(); return isSucceed; }
Of course, the code above takes more space then just single line of code:
FileUtils.copyURLToFile(imgURL, new File(imgSavePath), URLS_FETCH_TIMEOUT, URLS_FETCH_TIMEOUT);
but it will give you more control over a process and let you specify not only timeouts but User-Agent
and Referer
values, which are critical for many web-sites.