Android Basic JSOUP Tutorial

In this tutorial, you will learn how to implement JSOUP open source java library in your Android application. JSOUP provides a very convenient API for extracting and manipulating data, using DOM, CSS, and jquery-like methods. JSOUP allows you to scrape and parse HTML from a URL, file, or string and many more. We will create 3 buttons on the main view and each button will perform different tasks such as showing the website title, description and a logo. So lets begin…

Before you proceed with this tutorial, download the latest JSOUP library from here.

Create a new project in Eclipse File > New > Android Application Project. Fill in the details and name your project JsoupTutorial.

Application Name : JsoupTutorial

Project Name : JsoupTutorial

Package Name : com.androidbegin.jsouptutorial

Open your MainActivity.java and paste the following code.

MainActivity.java

package com.androidbegin.jsouptutorial;

import java.io.IOException;
import java.io.InputStream;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import android.os.AsyncTask;
import android.os.Bundle;
import android.app.Activity;
import android.app.ProgressDialog;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.view.View;
import android.view.View.OnClickListener;
import android.widget.Button;
import android.widget.ImageView;
import android.widget.TextView;

public class MainActivity extends Activity {

	// URL Address
	String url = "http://www.androidbegin.com";
	ProgressDialog mProgressDialog;

	@Override
	public void onCreate(Bundle savedInstanceState) {
		super.onCreate(savedInstanceState);
		setContentView(R.layout.activity_main);

		// Locate the Buttons in activity_main.xml
		Button titlebutton = (Button) findViewById(R.id.titlebutton);
		Button descbutton = (Button) findViewById(R.id.descbutton);
		Button logobutton = (Button) findViewById(R.id.logobutton);

		// Capture button click
		titlebutton.setOnClickListener(new OnClickListener() {
			public void onClick(View arg0) {
				// Execute Title AsyncTask
				new Title().execute();
			}
		});

		// Capture button click
		descbutton.setOnClickListener(new OnClickListener() {
			public void onClick(View arg0) {
				// Execute Description AsyncTask
				new Description().execute();
			}
		});

		// Capture button click
		logobutton.setOnClickListener(new OnClickListener() {
			public void onClick(View arg0) {
				// Execute Logo AsyncTask
				new Logo().execute();
			}
		});

	}

	// Title AsyncTask
	private class Title extends AsyncTask<Void, Void, Void> {
		String title;

		@Override
		protected void onPreExecute() {
			super.onPreExecute();
			mProgressDialog = new ProgressDialog(MainActivity.this);
			mProgressDialog.setTitle("Android Basic JSoup Tutorial");
			mProgressDialog.setMessage("Loading...");
			mProgressDialog.setIndeterminate(false);
			mProgressDialog.show();
		}

		@Override
		protected Void doInBackground(Void... params) {
			try {
				// Connect to the web site
				Document document = Jsoup.connect(url).get();
				// Get the html document title
				title = document.title();
			} catch (IOException e) {
				e.printStackTrace();
			}
			return null;
		}

		@Override
		protected void onPostExecute(Void result) {
			// Set title into TextView
			TextView txttitle = (TextView) findViewById(R.id.titletxt);
			txttitle.setText(title);
			mProgressDialog.dismiss();
		}
	}

	// Description AsyncTask
	private class Description extends AsyncTask<Void, Void, Void> {
		String desc;

		@Override
		protected void onPreExecute() {
			super.onPreExecute();
			mProgressDialog = new ProgressDialog(MainActivity.this);
			mProgressDialog.setTitle("Android Basic JSoup Tutorial");
			mProgressDialog.setMessage("Loading...");
			mProgressDialog.setIndeterminate(false);
			mProgressDialog.show();
		}

		@Override
		protected Void doInBackground(Void... params) {
			try {
				// Connect to the web site
				Document document = Jsoup.connect(url).get();
				// Using Elements to get the Meta data
				Elements description = document
						.select("meta[name=description]");
				// Locate the content attribute
				desc = description.attr("content");
			} catch (IOException e) {
				e.printStackTrace();
			}
			return null;
		}

		@Override
		protected void onPostExecute(Void result) {
			// Set description into TextView
			TextView txtdesc = (TextView) findViewById(R.id.desctxt);
			txtdesc.setText(desc);
			mProgressDialog.dismiss();
		}
	}

	// Logo AsyncTask
	private class Logo extends AsyncTask<Void, Void, Void> {
		Bitmap bitmap;

		@Override
		protected void onPreExecute() {
			super.onPreExecute();
			mProgressDialog = new ProgressDialog(MainActivity.this);
			mProgressDialog.setTitle("Android Basic JSoup Tutorial");
			mProgressDialog.setMessage("Loading...");
			mProgressDialog.setIndeterminate(false);
			mProgressDialog.show();
		}

		@Override
		protected Void doInBackground(Void... params) {

			try {
				// Connect to the web site
				Document document = Jsoup.connect(url).get();
				// Using Elements to get the class data 
				Elements img = document.select("h1[class=image-logo] img[src]");
				// Locate the src attribute
				String imgSrc = img.attr("src");
				// Download image from URL
				InputStream input = new java.net.URL(imgSrc).openStream();
				// Decode Bitmap
				bitmap = BitmapFactory.decodeStream(input);

			} catch (IOException e) {
				e.printStackTrace();
			}
			return null;
		}

		@Override
		protected void onPostExecute(Void result) {
			// Set downloaded image into ImageView
			ImageView logoimg = (ImageView) findViewById(R.id.logo);
			logoimg.setImageBitmap(bitmap);
			mProgressDialog.dismiss();
		}
	}
}

In this activity, we have created three buttons that response to three different AsyncTask. Before I proceed with further explanation, see the steps below on how to get the html source codes from a website.

Step 1 : Visit http://www.androidbegin.com with any preferred Internet browser on your PC

ourwebsite1255x768

Step 2 : Right-Click on an open space and select “View page source

viewsource1255x768

Step 3 : Website source codes

sourcecode1255x768

A website source code determines how your webpages should appear. However, source code of a web page will only display information and code that is not processed by the server.

The first button retrieves the website title. This is a way to get the page title.

Java Code

@Override
		protected Void doInBackground(Void... params) {
			try {
				// Connect to the web site
				Document document = Jsoup.connect(url).get();
				// Get the html document title
				title = document.title();
			} catch (IOException e) {
				e.printStackTrace();
			}
			return null;
		}

Website Source Code

<head>
	<meta http-equiv="x-ua-compatible" content="IE=8" >  
	<meta charset="UTF-8">
	<title>AndroidBegin - Android Tutorials, Samples, Guides and Tips</title>
	<link rel="icon" href="http://www.androidbegin.com/wp-content/uploads/2013/07/favicon.png" type="image/x-icon" />

 

The second button retrieves the website description. By using Elements, we are able to specify the exact location of the data.

Java Code

@Override
		protected Void doInBackground(Void... params) {
			try {
				// Connect to the web site
				Document document = Jsoup.connect(url).get();
				// Using Elements to get the Meta data
				Elements description = document.select("meta[name=description]");
				// Locate the content attribute
				desc = description.attr("content");
			} catch (IOException e) {
				e.printStackTrace();
			}
			return null;
		}

Website Source Code 

<meta name="description" content="AndroidBegin.com provides useful Android tutorials, samples, guides and tips for Android Developers. Learn to create your own Android Apps and Android Games."/>

 

The third button retrieves the website logo. By using Elements, we are able to specify the exact location of the data.

Java Code

@Override
		protected Void doInBackground(Void... params) {

			try {
				// Connect to the web site
				Document document = Jsoup.connect(url).get();
				// Using Elements to get the class data 
				Elements img = document.select("h1[class=image-logo] img[src]");
				// Locate the src attribute
				String imgSrc = img.attr("src");
				// Download image from URL
				InputStream input = new java.net.URL(imgSrc).openStream();
				// Decode Bitmap
				bitmap = BitmapFactory.decodeStream(input);

			} catch (IOException e) {
				e.printStackTrace();
			}
			return null;
		}

Website Source Code

<div id="header">
				<div class="head-left">
																			<h1 id="logo" class="image-logo">
									<a href="http://www.androidbegin.com"><img src="http://www.androidbegin.com/wp-content/uploads/2013/07/HD-Logo.gif" alt="AndroidBegin"></a>
								</h1><!-- END #logo -->
											</div>

 

Next, create an XML graphical layout for the MainActivity. Go to res > layout > Right Click on layout > New > Android XML File

Name your new XML file activity_main.xml and paste the following code.

activity_main.xml

<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent" >

    <TextView
        android:id="@+id/titletxt"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:gravity="center" />

    <Button
        android:id="@+id/titlebutton"
        android:layout_width="fill_parent"
        android:layout_height="wrap_content"
        android:layout_below="@+id/titletxt"
        android:text="@string/Title" />

    <TextView
        android:id="@+id/desctxt"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_below="@+id/titlebutton"
        android:layout_centerInParent="true"
        android:gravity="center" />

    <Button
        android:id="@+id/descbutton"
        android:layout_width="fill_parent"
        android:layout_height="wrap_content"
        android:layout_below="@+id/desctxt"
        android:text="@string/Description" />

    <ImageView
        android:id="@+id/logo"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_below="@+id/descbutton"
        android:layout_centerInParent="true" />

    <Button
        android:id="@+id/logobutton"
        android:layout_width="fill_parent"
        android:layout_height="wrap_content"
        android:layout_below="@+id/logo"
        android:text="@string/Logo" />

</RelativeLayout>

Next, change the application name and texts. Open your strings.xml in your res > values folder and paste the following code.

strings.xml

<?xml version="1.0" encoding="utf-8"?>
<resources>

    <string name="app_name">Basic Jsoup Tutorial</string>
    <string name="action_settings">Settings</string>
    <string name="hello_world">Hello world!</string>
    <string name="Title">Website Title</string>
    <string name="Description">Website Description</string>
    <string name="Logo">Website Logo</string>

</resources>

In your AndroidManifest.xml, we need to declare permissions to allow the application to connect to the Internet. Open your AndroidManifest.xml and paste the following code.

AndroidManifest.xml

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    package="com.androidbegin.jsouptutorial"
    android:versionCode="1"
    android:versionName="1.0" >

    <uses-sdk
        android:minSdkVersion="8"
        android:targetSdkVersion="17" />

    <uses-permission android:name="android.permission.INTERNET" />

    <application
        android:allowBackup="true"
        android:icon="@drawable/ic_launcher"
        android:label="@string/app_name"
        android:theme="@style/AppTheme" >
        <activity
            android:name=".MainActivity"
            android:label="@string/app_name" >
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />

                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
    </application>

</manifest>

Output:

BasicJsoupTutorial ScreenShots

Source Code

JsoupTutorial (1.6 MiB, 418 downloads)
  • Aravind Asthme

    nice article

  • Pierre Lel Mustang

    Helped a lot . thanks

  • Shuai Wang

    great!

  • Zhubarb

    Thank you, this is very helpful. However, I noticed that it is quite slow. I guess due to having to fetch the entire document separately for all Async Tasks? Wouldn’t it be faster to run “Document document = Jsoup.connect(url).get();” once and then let the buttons access the different parts of the ‘document’?

  • André Felipe

    It’s a good article, but I have a problem. I do everything which is written in the article, but the app don’t work, I used the debug and I discovered where is the problem, apparently, the problem is when the Jsoup try to open a connection com a url(Jsoup.connect(url).get()), but I don’t know how to fix this?

    • Patrik

      Make sure you export jsoup (properties –> Java build path –> Order and export)

      • ethanchan

        i have the same problem too…there is an error in “doInBackground”

  • http://www.AndroidBegin.com/ AndroidBegin

    Hi developers, this tutorial may not be working at the moment because of the changes I made to this website. I will update the current tutorial as soon as possible.

  • zurche

    This is an AWESOME tutorial. Thank you very much for sharing this.

  • AFRODESCENDIENTE

    Hello, what if i want to create a xml layout with the same structure of a div from a website, maybe ‘ Hello ‘ …. how can i add each data into the corresponding field on my xml layout ‘ image.jpgtext ???

  • AFRODESCENDIENTE

    Hello,
    My emulator is not showing the app, throws an error! i tried with an other website but same happens… what should i do?