How to do web scraping in Java – Part I

Web scraping is exciting. I never really used it, until I needed it for one of my projects.

We will be using Java in this tutorial to get the data from the internet. We will using a library called – Jsoup to get the job done.

Step I : (Creating the project)

  • Create a new “maven” java project, maven because – it is easy to add external dependency in the project.
  • Create a new class, say Scrap – and add the main() function.
    • public static void main(String[] args){}

Step II : ( Integrating Jsoup in the project )

  • Now, to use jsoup, go to its site – Go, to its download section and copy the dependency –
    • dependency>
        <!-- jsoup HTML parser library @ -->
    • Build the project.
  • Head back to your project. In the project files, named – pom.xml. Add a tag <dependencies> </dependencies> and paste the jsoup dependency in it.


Step 3 : (Understanding the webpage structure )

I have made a video – for better understanding of the webpage structure.

You can see the video here.  – 

We will end this part till here. In the next part, we will know how to use Jsoup to get the data out from the internet down to your computer.



One thought on “How to do web scraping in Java – Part I

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s