怎麼用Java程序抓取網頁原始碼?

怎麼用java程序抓取網頁原始碼?如果抓取代碼都用程序來做的話,相信要快很多。程序如下:       import java.io.BufferedReader;import java.io.IOException;import java.io.InputStream;import java.io.InputStreamReader;import java.net.HttpURLConnection;import java.net.URL;public class HtmlParser {public static String getHtmlContent(URL url, String encode) {StringBuffer contentBuffer = new StringBuffer();int responseCode = -1;HttpURLConnection con = null;try {con = (HttpURLConnection) url.openConnection();con.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");// IE代理進行下載con.setConnectTimeout(60000);con.setReadTimeout(60000);// 獲得網頁返回信息碼responseCode = con.getResponseCode();if (responseCode == -1) {System.out.println(url.toString() + " : connection is failure...");con.disconnect();return null;}if (responseCode >= 400) // 請求失敗{System.out.println("請求失敗:get response code: " + responseCode);con.disconnect();return null;}InputStream inStr = con.getInputStream();InputStreamReader istreamReader = new InputStreamReader(inStr, encode);BufferedReader buffStr = new BufferedReader(istreamReader);String str = null;while ((str = buffStr.readLine()) != null)contentBuffer.append(str);inStr.close();} catch (IOException e) {e.printStackTrace();contentBuffer = null;System.out.println("error: " + url.toString());} finally {con.disconnect();}return contentBuffer.toString();}public static String getHtmlContent(String url, String encode) {if (!url.toLowerCase().startsWith("http://")) {url = "http://" + url;}try {URL rUrl = new URL(url);return getHtmlContent(rUrl, encode);} catch (Exception e) {e.printStackTrace();return null;}}public static void main(String argsp[]){System.out.println(getHtmlContent ;}}

本文內容整理自網絡, 文中所有觀點看法不代表淘大白的立場