How to do web scraping in Java – Part I

Web scraping is exciting. I never really used it, until I needed it for one of my projects.

We will be using Java in this tutorial to get the data from the internet. We will using a library called – Jsoup to get the job done.

Step I : (Creating the project)

  • Create a new “maven” java project, maven because – it is easy to add external dependency in the project.
  • Create a new class, say Scrap – and add the main() function.
    • public static void main(String[] args){}

Step II : ( Integrating Jsoup in the project )

  • Now, to use jsoup, go to its site –  https://jsoup.org/. Go, to its download section and copy the dependency –
    • dependency>
        <!-- jsoup HTML parser library @ https://jsoup.org/ -->
        <groupId>org.jsoup</groupId>
        <artifactId>jsoup</artifactId>
        <version>1.10.3</version>
      </dependency>
    • Build the project.
  • Head back to your project. In the project files, named – pom.xml. Add a tag <dependencies> </dependencies> and paste the jsoup dependency in it.

scrap_step1

Step 3 : (Understanding the webpage structure )

I have made a video – for better understanding of the webpage structure.

You can see the video here.  – https://youtu.be/4XdIFbM1fEw 

We will end this part till here. In the next part, we will know how to use Jsoup to get the data out from the internet down to your computer.

 

Advertisements

One Comment Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s