Skywalk Developers Blog

This guide is for those interested in learning how to build a basic web scraper that can download and parse html pages. The viewer is expected to have an understanding of tools like java and an IDE. The following guides were performed only on my own website please note it is a offence to perform on anyone else website.

The tools I am using is java jdk 1.7.0_79, Eclipse Mars and Lastly Jsoup.jar(Library to parse html documents).

Step 1: Create Project

Open up the IDE and select the file menu option and then new Java Project.

Click finish.

Step 2: Download Jsoup Library
url: http://jsoup.org/download (jsoup-1.8.3.jar)
Once we have downloaded the Jsoup library we need to add it as a dependency to the build path.

Right click on project folder and select the build path option then configure build path.

Make sure the view opened is open in the library tab and not the others.

Select the option to add a external jar, once the dialog opens find the jar you download and click ok.

Step 3: Writing Scraper Code

To write the code you need to create a new Java class. It can be called anything for now mine is called RunScraper. Create the

public static void main(String []args){

}

method in your class. This will be the start point.

Code Snippet:

The rest of the code I will paste which is pretty self explanatory. The main work being done is by the Jsoup library that once we have connected and downloaded the specific page Jsoup creates a document object that allows us to access the html dom structure as if we were in the browser using Javascript or CSS.

The tutorial is very basic usage of Jsoup and scraping the web, however it can be scaled up quite a bit by incorporating threads and a database and some more code.

This is a fun how to guide. I have always wanted to see what certain applications look like under the hood. ADB as Google's definition is a versatile command line tool that lets you communicate with an emulator instance or connected Android-powered device. So know that we know what ADB is time for fun. The following guides were performed only on my own application please note it is a offence to perform on anyone else code.

Reverse Engineering Android Apps (Using Ubuntu Linux)

This guide assumes you have the android sdk downloaded and installed. Once completed installation plug android device in and open a terminal.

Set ADB in terminal path variable

export PATH=$PATH:/path/to/sdk/platform/tools

Run ADB from terminal with interactive shell

adb shell

Use package manager to view all applications and services running on device

pm list packages - list all the packages on the device

pm list features - list all the features on the device

Only interested in Packages

pm path package_name - displays path to the apk for an app

Pulling apk from device

adb pull /system/app/ApplicationName.apk

Decompiling Apk With APKTool

Download APKTool 2 files:

apktool1.5.2.tar.bz2 (https://code.google.com/p/android-apktool/downloads/detail?name=apktool1.5.2.tar.bz2&can=1&q=)

apktool-install-linux-r04-brut1.tar.bz2 (https://code.google.com/p/android-apktool/downloads/detail?name=apktool-install-linux-r04-brut1.tar.bz2&can=1&q=)

Install APK Tool

Extract all folders anywhere

Move aapt, apktool and apktool.jar to /usr/local/bin

Make all three executable sudo chmod +x filename

Decompile Target APK

apktool d ___.apk (whatever application you pulled from your device)

This tool can now allow to view all resource files inside an apk, if we want to view the java class files then we will use dex2jar to build a jar from the apk

Decompiling with Dex2Jar

Download Dex2Jar

https://code.google.com/p/dex2jar

Extract the zip file anywhere

Build Jar From APK

./home/user/dex2jar-version/d2j-dex2jar.sh /home/user/someApk.apk

Use a jar viewer to view jar folder

http://jd.benow.ca/

extract downloaded file

run ./jd-gui in terminal

open jar created from dex2jar and start browsing code

Optional:

moved ./jd-gui to usr/local/bin for global access

That's it you should now be able to view any source code, however if the application you've decompiled is using proguard the code will be obfuscated and you will need to remap the classes on your own.

Skywalk Developers Blog

Friday, 16 October 2015

Building A Web Scraper With Java

Thursday, 17 September 2015

Having Fun With Android ADB(Android Debug Bridge)