Android Web Crawler : Multithreaded Implementation

Web Crawler program automates task of indexing website pages.  It is also referred as Web Spidering and used extensively by Search Engines to provide effective and updated results for user search query.

Typical usage of Web Crawler includes, but not limited to:

  1. Page Indexing for Search Engines
  2. Page Content Scraping
  3. Dictionary Words Processing
  4. Syntax and Structure Validation

Given a root url, web crawler crawls content of current page and add urls extracted in processing queue of uncrawled urls. Once a page is crawled, data of crawled page is stored in database for later processing as per requirement. Task is time consuming if hyperlinks are crawled sequentially. We will create android web crawler example application which will perform parallel execution of crawling task. SQLite database will be used for saving crawled url’s record.

Android Web Crawler Example Application

 

[su_button url=”https://github.com/androidsrc/WebCrawlerUsingJsoup” target=”blank” style=”stroked” background=”#51d461″ color=”#ffffff” size=”6″ center=”yes” radius=”0″ icon=”icon: arrow-circle-o-down”]Download Full Source Code[/su_button]

 

[su_youtube url=”https://www.youtube.com/watch?v=W4FxInOcUME”]

1. Create an android application

2. Preparing Application Manifest File

Add permission for using Internet as crawling requires processing URL with HttpURLConnection. Final AndroidManifest.xml will be as below.

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android" package="com.android.webcrawler" android:versionCode="1" android:versionName="1.0" >

    <uses-sdk android:minSdkVersion="14" android:targetSdkVersion="21" />

    <uses-permission android:name="android.permission.INTERNET" />

    <application android:allowBackup="true" android:icon="@drawable/ic_launcher" android:label="@string/app_name" android:theme="@style/AppTheme" >
        <activity android:name=".MainActivity" android:label="@string/app_name" >
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />

                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
    </application>

</manifest>

3. Preparing Layout Files

Our layout will be comprised of Button to start and stop crawling. EditText to take user input for url to be crawled. Progress Bar will be shown while crawling in running and TextView will update user with crawled pages count. Final layout /res/layout/activity_main.xml will be as below.

WebCrawler_Layout

<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
              xmlns:tools="http://schemas.android.com/tools"
              android:layout_width="match_parent"
              android:layout_height="match_parent"
              android:orientation="vertical"
              android:padding="20dp"
              tools:context="com.android.webcrawler.MainActivity">

    <EditText
        android:id="@+id/webUrl"
        android:layout_width="fill_parent"
        android:layout_height="wrap_content"
        android:layout_marginTop="20dp"
        android:hint="Enter URL"/>

    <Button
        android:id="@+id/start"
        android:layout_width="fill_parent"
        android:layout_height="wrap_content"
        android:layout_marginTop="20dp"
        android:onClick="onClick"
        android:text="Start Crawling"/>

    <LinearLayout
        android:id="@+id/crawlingInfo"
        android:layout_width="fill_parent"
        android:layout_height="wrap_content"
        android:orientation="vertical"
        android:visibility="invisible">

        <ProgressBar
            android:id="@+id/progressBar"
            style="?android:attr/progressBarStyleLarge"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:layout_gravity="center_horizontal"
            android:layout_marginTop="20dp"/>

        <TextView
            android:id="@+id/progressText"
            style="?android:attr/textAppearanceLarge"
            android:layout_width="fill_parent"
            android:layout_height="wrap_content"
            android:gravity="center_horizontal"/>

        <Button
            android:id="@+id/stop"
            android:layout_width="fill_parent"
            android:layout_height="wrap_content"
            android:layout_marginTop="20dp"
            android:onClick="onClick"
            android:text="Stop Crawling"/>
    </LinearLayout>

</LinearLayout>

4. Web Crawler Implementation

Create a class WebCrawler which will define routines and objects to crawl a page. Define an interface CrawlingCallback for providing callback to crawling events like page crawling completed, page crawling failed and complete crawling done.

/**
 * Interface for crawling callback
 */
interface CrawlingCallback {
	void onPageCrawlingCompleted();

	void onPageCrawlingFailed(String Url, int errorCode);

	void onCrawlingCompleted();
}<br>

Define an private class CrawlerRunnable which implements Runnable. Constructor of CrawlerRunnable will have CrawlerCallback and url to be crawled as parameters. Inside this class, we will have an api to download html content body for a given url. Once we have raw html content, we need to process it to extract hyperlinks from it. For purpose of extraction of URLs, I have use jsoup which is a very good HTML parser library. Download jsoup library version 1.8.1 and add it to libs folder of your android project. Final code for CrawlerRunnable will be as below.

	/**
	 * Runnable task which performs task of crawling and adding encountered URls
	 * to crawling list
	 *
	 * @author CLARION
	 *
	 */
	private class CrawlerRunnable implements Runnable {

		CrawlingCallback mCallback;
		String mUrl;

		public CrawlerRunnable(CrawlingCallback callback, String Url) {
			this.mCallback = callback;
			this.mUrl = Url;
		}

		@Override
		public void run() {
			String pageContent = retreiveHtmlContent(mUrl);

			if (!TextUtils.isEmpty(pageContent.toString())) {
				insertIntoCrawlerDB(mUrl, pageContent);
				synchronized (lock) {
					crawledURL.add(mUrl);
				}
				mCallback.onPageCrawlingCompleted();
			} else {
				mCallback.onPageCrawlingFailed(mUrl, -1);
			}

			if (!TextUtils.isEmpty(pageContent.toString())) {
				// START
				// JSoup Library used to filter urls from html body
				Document doc = Jsoup.parse(pageContent.toString());
				Elements links = doc.select("a[href]");
				for (Element link : links) {
					String extractedLink = link.attr("href");
					if (!TextUtils.isEmpty(extractedLink)) {
						synchronized (lock) {
							if (!crawledURL.contains(extractedLink))
								uncrawledURL.add(extractedLink);
						}

					}
				}
				// End JSoup
			}
			// Send msg to handler that crawling for this url is finished
			// start more crawling tasks if queue is not empty
			mHandler.sendEmptyMessage(0);

		}

		private String retreiveHtmlContent(String Url) {
			URL httpUrl = null;
			try {
				httpUrl = new URL(Url);
			} catch (MalformedURLException e) {
				e.printStackTrace();
			}

			int responseCode = HttpStatus.SC_OK;
			StringBuilder pageContent = new StringBuilder();
			try {
				if (httpUrl != null) {
					HttpURLConnection conn = (HttpURLConnection) httpUrl
							.openConnection();
					conn.setConnectTimeout(5000);
					conn.setReadTimeout(5000);
					responseCode = conn.getResponseCode();
					if (responseCode != HttpStatus.SC_OK) {
						throw new IllegalAccessException(
								" http connection failed");
					}
					BufferedReader br = new BufferedReader(
							new InputStreamReader(conn.getInputStream()));
					String line = null;
					while ((line = br.readLine()) != null) {
						pageContent.append(line);
					}
				}

			} catch (IOException e) {
				e.printStackTrace();
				mCallback.onPageCrawlingFailed(Url, -1);
			} catch (IllegalAccessException e) {
				e.printStackTrace();
				mCallback.onPageCrawlingFailed(Url, responseCode);
			}

			return pageContent.toString();
		}

	}

For processing Crawling tasks in parallel, we will use ThreadPoolExecutor which manages a work queue of runnable tasks to execute it in it’s pool. You can define starting and maximum size of pool thread. For managing CrawableTask’s, create new private class RunnableManager which will define required api’s to add task to pool and cancelling them.

	/**
	 * Helper class to interact with ThreadPoolExecutor for adding and removing
	 * runnable in workQueue
	 *
	 * @author CLARION
	 *
	 */
	private class RunnableManager {

		// Sets the amount of time an idle thread will wait for a task before
		// terminating
		private static final int KEEP_ALIVE_TIME = 1;

		// Sets the Time Unit to seconds
		private final TimeUnit KEEP_ALIVE_TIME_UNIT = TimeUnit.SECONDS;

		// Sets the initial threadpool size to 5
		private static final int CORE_POOL_SIZE = 5;

		// Sets the maximum threadpool size to 8
		private static final int MAXIMUM_POOL_SIZE = 8;

		// A queue of Runnables for crawling url
		private final BlockingQueue<Runnable> mCrawlingQueue;

		// A managed pool of background crawling threads
		private final ThreadPoolExecutor mCrawlingThreadPool;

		public RunnableManager() {
			mCrawlingQueue = new LinkedBlockingQueue<>();
			mCrawlingThreadPool = new ThreadPoolExecutor(CORE_POOL_SIZE,
					MAXIMUM_POOL_SIZE, KEEP_ALIVE_TIME, KEEP_ALIVE_TIME_UNIT,
					mCrawlingQueue);
		}

		private void addToCrawlingQueue(Runnable runnable) {
			mCrawlingThreadPool.execute(runnable);
		}

		private void cancelAllRunnable() {
			mCrawlingThreadPool.shutdownNow();
		}

		private int getUnusedPoolSize() {
			return MAXIMUM_POOL_SIZE - mCrawlingThreadPool.getActiveCount();
		}

		private boolean isShuttingDown() {
			return mCrawlingThreadPool.isShutdown()
					|| mCrawlingThreadPool.isTerminating();
		}

	}

For processing of URLs, we will manage a queue of urls to be crawled and HashSet of crawled urls to avoid re-crawling of an page. With addition of few methods for queueing runnable tasks and database content deletion and insertion. Final complete code for WebCrawler.java will be as below.

package com.android.webcrawler;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.HashSet;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.apache.http.HttpStatus;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import android.content.ContentValues;
import android.content.Context;
import android.database.sqlite.SQLiteDatabase;
import android.os.Handler;
import android.os.Looper;
import android.text.TextUtils;

public class WebCrawler {

	/**
	 * Interface for crawling callback
	 */
	interface CrawlingCallback {
		void onPageCrawlingCompleted();

		void onPageCrawlingFailed(String Url, int errorCode);

		void onCrawlingCompleted();
	}

	private Context mContext;
	// SQLiteOpenHelper object for handling crawling database
	private CrawlerDB mCrawlerDB;
	// Set containing already visited URls
	private HashSet<String> crawledURL;
	// Queue for unvisited URL
	BlockingQueue<String> uncrawledURL;
	// For parallel crawling execution using ThreadPoolExecuter
	RunnableManager mManager;
	// Callback interface object to notify UI
	CrawlingCallback callback;
	// For sync of crawled and yet to crawl url lists
	Object lock;

	public WebCrawler(Context ctx, CrawlingCallback callback) {
		this.mContext = ctx;
		this.callback = callback;
		mCrawlerDB = new CrawlerDB(mContext);
		crawledURL = new HashSet<>();
		uncrawledURL = new LinkedBlockingQueue<>();
		lock = new Object();
	}

	/**
	 * API to add crawler runnable in ThreadPoolExecutor workQueue
	 *
	 * @param Url
	 *            - Url to crawl
	 * @param isRootUrl
	 */
	public void startCrawlerTask(String Url, boolean isRootUrl) {
		// If it's root URl, we clear previous lists and DB table content
		if (isRootUrl) {
			crawledURL.clear();
			uncrawledURL.clear();
			clearDB();
			mManager = new RunnableManager();
		}
		// If ThreadPoolExecuter is not shutting down, add wunable to workQueue
		if (!mManager.isShuttingDown()) {
			CrawlerRunnable mTask = new CrawlerRunnable(callback, Url);
			mManager.addToCrawlingQueue(mTask);
		}
	}

	/**
	 * API to shutdown ThreadPoolExecuter
	 */
	public void stopCrawlerTasks() {
		mManager.cancelAllRunnable();
	}

	/**
	 * Runnable task which performs task of crawling and adding encountered URls
	 * to crawling list
	 *
	 * @author CLARION
	 *
	 */
	private class CrawlerRunnable implements Runnable {

		CrawlingCallback mCallback;
		String mUrl;

		public CrawlerRunnable(CrawlingCallback callback, String Url) {
			this.mCallback = callback;
			this.mUrl = Url;
		}

		@Override
		public void run() {
			String pageContent = retreiveHtmlContent(mUrl);

			if (!TextUtils.isEmpty(pageContent.toString())) {
				insertIntoCrawlerDB(mUrl, pageContent);
				synchronized (lock) {
					crawledURL.add(mUrl);
				}
				mCallback.onPageCrawlingCompleted();
			} else {
				mCallback.onPageCrawlingFailed(mUrl, -1);
			}

			if (!TextUtils.isEmpty(pageContent.toString())) {
				// START
				// JSoup Library used to filter urls from html body
				Document doc = Jsoup.parse(pageContent.toString());
				Elements links = doc.select("a[href]");
				for (Element link : links) {
					String extractedLink = link.attr("href");
					if (!TextUtils.isEmpty(extractedLink)) {
						synchronized (lock) {
							if (!crawledURL.contains(extractedLink))
								uncrawledURL.add(extractedLink);
						}

					}
				}
				// End JSoup
			}
			// Send msg to handler that crawling for this url is finished
			// start more crawling tasks if queue is not empty
			mHandler.sendEmptyMessage(0);

		}

		private String retreiveHtmlContent(String Url) {
			URL httpUrl = null;
			try {
				httpUrl = new URL(Url);
			} catch (MalformedURLException e) {
				e.printStackTrace();
			}

			int responseCode = HttpStatus.SC_OK;
			StringBuilder pageContent = new StringBuilder();
			try {
				if (httpUrl != null) {
					HttpURLConnection conn = (HttpURLConnection) httpUrl
							.openConnection();
					conn.setConnectTimeout(5000);
					conn.setReadTimeout(5000);
					responseCode = conn.getResponseCode();
					if (responseCode != HttpStatus.SC_OK) {
						throw new IllegalAccessException(
								" http connection failed");
					}
					BufferedReader br = new BufferedReader(
							new InputStreamReader(conn.getInputStream()));
					String line = null;
					while ((line = br.readLine()) != null) {
						pageContent.append(line);
					}
				}

			} catch (IOException e) {
				e.printStackTrace();
				mCallback.onPageCrawlingFailed(Url, -1);
			} catch (IllegalAccessException e) {
				e.printStackTrace();
				mCallback.onPageCrawlingFailed(Url, responseCode);
			}

			return pageContent.toString();
		}

	}

	/**
	 * API to clear previous content of crawler DB table
	 */
	public void clearDB() {
		try {
			SQLiteDatabase db = mCrawlerDB.getWritableDatabase();
			db.delete(CrawlerDB.TABLE_NAME, null, null);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	/**
	 * API to insert crawled url info in database
	 *
	 * @param mUrl
	 *            - crawled url
	 * @param result
	 *            - html body content of url
	 */
	public void insertIntoCrawlerDB(String mUrl, String result) {

		if (TextUtils.isEmpty(result))
			return;

		SQLiteDatabase db = mCrawlerDB.getWritableDatabase();
		ContentValues values = new ContentValues();
		values.put(CrawlerDB.COLUMNS_NAME.CRAWLED_URL, mUrl);
		values.put(CrawlerDB.COLUMNS_NAME.CRAWLED_PAGE_CONTENT, result);

		db.insert(CrawlerDB.TABLE_NAME, null, values);
	}

	/**
	 * To manage Messages in a Thread
	 */
	private Handler mHandler = new Handler(Looper.getMainLooper()) {
		public void handleMessage(android.os.Message msg) {

			synchronized (lock) {
				if (uncrawledURL != null && uncrawledURL.size() > 0) {
					int availableTasks = mManager.getUnusedPoolSize();
					while (availableTasks > 0 && !uncrawledURL.isEmpty()) {
						startCrawlerTask(uncrawledURL.remove(), false);
						availableTasks--;
					}
				}
			}

		};
	};

	/**
	 * Helper class to interact with ThreadPoolExecutor for adding and removing
	 * runnable in workQueue
	 *
	 * @author CLARION
	 *
	 */
	private class RunnableManager {

		// Sets the amount of time an idle thread will wait for a task before
		// terminating
		private static final int KEEP_ALIVE_TIME = 1;

		// Sets the Time Unit to seconds
		private final TimeUnit KEEP_ALIVE_TIME_UNIT = TimeUnit.SECONDS;

		// Sets the initial threadpool size to 5
		private static final int CORE_POOL_SIZE = 5;

		// Sets the maximum threadpool size to 8
		private static final int MAXIMUM_POOL_SIZE = 8;

		// A queue of Runnables for crawling url
		private final BlockingQueue<Runnable> mCrawlingQueue;

		// A managed pool of background crawling threads
		private final ThreadPoolExecutor mCrawlingThreadPool;

		public RunnableManager() {
			mCrawlingQueue = new LinkedBlockingQueue<>();
			mCrawlingThreadPool = new ThreadPoolExecutor(CORE_POOL_SIZE,
					MAXIMUM_POOL_SIZE, KEEP_ALIVE_TIME, KEEP_ALIVE_TIME_UNIT,
					mCrawlingQueue);
		}

		private void addToCrawlingQueue(Runnable runnable) {
			mCrawlingThreadPool.execute(runnable);
		}

		private void cancelAllRunnable() {
			mCrawlingThreadPool.shutdownNow();
		}

		private int getUnusedPoolSize() {
			return MAXIMUM_POOL_SIZE - mCrawlingThreadPool.getActiveCount();
		}

		private boolean isShuttingDown() {
			return mCrawlingThreadPool.isShutdown()|| mCrawlingThreadPool.isTerminating();
		}
	}
}

Once we get have finished crawling a page, we will add crawling info for this page in our record. Create a class CrawlerDB which will extends SQLiteOpenHelper to manage database. Final CrawlerDB.java file will be as below.

package com.android.webcrawler;

import android.content.Context;
import android.database.sqlite.SQLiteDatabase;
import android.database.sqlite.SQLiteOpenHelper;

/**
 * Helper class to manage crawler database creation and version management.
 * @author CLARION
 *
 */
public class CrawlerDB extends SQLiteOpenHelper{

	public static final String DATABSE_NAME = "crawler.db";
	public static final int DATABSE_VERSION = 1;
	public static final String TABLE_NAME = "CrawledURLs";
	private static final String TEXT_TYPE = " TEXT";

	interface COLUMNS_NAME{
		String _ID = "id";
		String CRAWLED_URL = "crawled_url";
		String CRAWLED_PAGE_CONTENT = "crawled_page_content";
	}

	public static final String SQL_CREATE_ENTRIES =
		    "CREATE TABLE " + TABLE_NAME + " (" +
		    COLUMNS_NAME._ID + " INTEGER PRIMARY KEY," +
		    COLUMNS_NAME.CRAWLED_URL + TEXT_TYPE+"," +
		    COLUMNS_NAME.CRAWLED_PAGE_CONTENT + TEXT_TYPE+
		    " )";

	public static final String SQL_DELETE_ENTRIES =
		    "DROP TABLE IF EXISTS " + TABLE_NAME;

	public CrawlerDB(Context context) {
		super(context, DATABSE_NAME, null, DATABSE_VERSION);
	}

	@Override
	public void onCreate(SQLiteDatabase db) {
		db.execSQL(SQL_CREATE_ENTRIES);
	}

	@Override
	public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) {
		db.execSQL(SQL_DELETE_ENTRIES);
		onCreate(db);
	}
}

Finaly, let’s move onto MainActivity.java. It will implement OnClickListener interface for view click callbacks. After clicking on start button, we will startCrawling task with help of WebCrawler class object and CrawlingCallback will be used to update user for crawled page count. Since it is space and time consuming time, we will stop crwaling after one minute if user do not opt for stopping it manually. Once crawling task is finished, you can check output in your LogCat for crawed urls being displayed after querying crawling database. Final code for MainActivity.java after adding these funtionalities will be as below.

package com.android.webcrawler;

import android.app.Activity;
import android.database.Cursor;
import android.database.sqlite.SQLiteDatabase;
import android.os.Bundle;
import android.os.Handler;
import android.text.TextUtils;
import android.util.Log;
import android.view.View;
import android.view.View.OnClickListener;
import android.widget.Button;
import android.widget.EditText;
import android.widget.LinearLayout;
import android.widget.TextView;
import android.widget.Toast;

public class MainActivity extends Activity implements OnClickListener {

	private LinearLayout crawlingInfo;
	private Button startButton;
	private EditText urlInputView;
	private TextView progressText;

	// WebCrawler object will be used to start crawling on root Url
	private WebCrawler crawler;
	// count variable for url crawled so far
	int crawledUrlCount;
	// state variable to check crawling status
	boolean crawlingRunning;
	// For sending message to Handler in order to stop crawling after 60000 ms
	private static final int MSG_STOP_CRAWLING = 111;
	private static final int CRAWLING_RUNNING_TIME = 60000;

	@Override
	protected void onCreate(Bundle savedInstanceState) {
		super.onCreate(savedInstanceState);
		setContentView(R.layout.activity_main);

		crawlingInfo = (LinearLayout) findViewById(R.id.crawlingInfo);
		startButton = (Button) findViewById(R.id.start);
		urlInputView = (EditText) findViewById(R.id.webUrl);
		progressText = (TextView) findViewById(R.id.progressText);

		crawler = new WebCrawler(this, mCallback);
	}

	/**
	 * callback for crawling events
	 */
	private WebCrawler.CrawlingCallback mCallback = new WebCrawler.CrawlingCallback() {

		@Override
		public void onPageCrawlingCompleted() {
			crawledUrlCount++;
			progressText.post(new Runnable() {

				@Override
				public void run() {
					progressText.setText(crawledUrlCount
							+ " pages crawled so far!!");

				}
			});
		}

		@Override
		public void onPageCrawlingFailed(String Url, int errorCode) {
			// TODO Auto-generated method stub
		}

		@Override
		public void onCrawlingCompleted() {
			stopCrawling();
		}
	};

	/**
	 * Callback for handling button onclick events
	 */
	@Override
	public void onClick(View v) {
		int viewId = v.getId();
		switch (viewId) {
		case R.id.start:
			String webUrl = urlInputView.getText().toString();
			if (TextUtils.isEmpty(webUrl)) {
				Toast.makeText(getApplicationContext(), "Please input web Url",
						Toast.LENGTH_SHORT).show();
			} else {
				crawlingRunning = true;
				crawler.startCrawlerTask(webUrl, true);
				startButton.setEnabled(false);
				crawlingInfo.setVisibility(View.VISIBLE);
				// Send delayed message to handler for stopping crawling
				handler.sendEmptyMessageDelayed(MSG_STOP_CRAWLING,
						CRAWLING_RUNNING_TIME);
			}
			break;
		case R.id.stop:
			// remove any scheduled messages if user stopped crawling by
			// clicking stop button
			handler.removeMessages(MSG_STOP_CRAWLING);
			stopCrawling();
			break;
		}
	}

	private Handler handler = new Handler() {
		public void handleMessage(android.os.Message msg) {
			stopCrawling();
		};
	};

	/**
	 * API to handle post crawling events
	 */
	private void stopCrawling() {
		if (crawlingRunning) {
			crawler.stopCrawlerTasks();
			crawlingInfo.setVisibility(View.INVISIBLE);
			startButton.setEnabled(true);
			startButton.setVisibility(View.VISIBLE);
			crawlingRunning = false;
			if (crawledUrlCount > 0)
				Toast.makeText(getApplicationContext(),printCrawledEntriesFromDb() + "pages crawled",Toast.LENGTH_SHORT).show();

			crawledUrlCount = 0;
			progressText.setText("");
		}
	}

	/**
	 * API to output crawled urls in logcat
	 *
	 * @return number of rows saved in crawling database
	 */
	protected int printCrawledEntriesFromDb() {

		int count = 0;
		CrawlerDB mCrawlerDB = new CrawlerDB(this);
		SQLiteDatabase db = mCrawlerDB.getReadableDatabase();

		Cursor mCursor = db.query(CrawlerDB.TABLE_NAME, null, null, null, null,
				null, null);
		if (mCursor != null && mCursor.getCount() > 0) {
			count = mCursor.getCount();
			mCursor.moveToFirst();
			int columnIndex = mCursor
					.getColumnIndex(CrawlerDB.COLUMNS_NAME.CRAWLED_URL);
			for (int i = 0; i < count; i++) {
				Log.d("AndroidSRC_Crawler",
						"Crawled Url " + mCursor.getString(columnIndex));
				mCursor.moveToNext();
			}
		}

		return count;
	}
}

5. Build and Run Application

Now, try building and running your application. Input url for website you want to crawl, on finishing crawling you can check visited URLs into your LogCat ouptut. Information saved in crawler database can be used as per requirement.

WebCrawler_LogCat_Output

You may also like...

2 Responses

  1. Atif Afridi says:

    sorry but this is not working with it crashes in debug mode also

    @Override
    protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    crawlingInfo = (LinearLayout) findViewById(R.id.crawlingInfo);
    startButton = (Button) findViewById(R.id.start);
    urlInputView = (EditText) findViewById(R.id.webUrl);
    progressText = (TextView) findViewById(R.id.progressText);

    crawler = new WebCrawler(this, mCallback);
    }
    // onCreate ENDS
    please help me

Leave a Reply

Your email address will not be published. Required fields are marked *