How to specify User Agent and Referer in FileUtils.copyURLToFile(URL, File) method?

I’m using FileUtils.copyURLToFile(URL, File), an Apache Commons IO 2.4 part, to download and save the file on my computer. The problem is that some sites refuse connection without referrer and user agent data.

My questions:

  1. Is there any way to specify user agent and referrer to the copyURLToFile method?
  2. Or should I use another approach to download a file and then save a given InputStream to file?

Answer

I’ve re-implement the functionality with HttpComponents instead of Commons-IO. This code allows you to download a file in Java according to its URL and save it at the specific destination.

The final code:

public static boolean saveFile(URL imgURL, String imgSavePath) {

    boolean isSucceed = true;

    CloseableHttpClient httpClient = HttpClients.createDefault();

    HttpGet httpGet = new HttpGet(imgURL.toString());
    httpGet.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.11 Safari/537.36");
    httpGet.addHeader("Referer", "https://www.google.com");

    try {
        CloseableHttpResponse httpResponse = httpClient.execute(httpGet);
        HttpEntity imageEntity = httpResponse.getEntity();

        if (imageEntity != null) {
            FileUtils.copyInputStreamToFile(imageEntity.getContent(), new File(imgSavePath));
        }

    } catch (IOException e) {
        isSucceed = false;
    }

    httpGet.releaseConnection();

    return isSucceed;
}

Of course, the code above takes more space then just single line of code:

FileUtils.copyURLToFile(imgURL, new File(imgSavePath),
                        URLS_FETCH_TIMEOUT, URLS_FETCH_TIMEOUT);

but it will give you more control over a process and let you specify not only timeouts but User-Agent and Referer values, which are critical for many web-sites.

Leave a Reply

Your email address will not be published. Required fields are marked *