Friday 16 October 2015

Building A Web Scraper With Java

This guide is for those interested in learning how to build a basic web scraper that can download and parse html pages. The viewer is expected to have an understanding of tools like java and an IDE. The following guides were performed only on my own website please note it is a offence to perform on anyone else website.

The tools I am using is java jdk 1.7.0_79, Eclipse Mars and Lastly Jsoup.jar(Library to parse html documents).

Step 1: Create Project

Open up the IDE and select the file menu option and then new Java Project.



Click finish.

Step 2: Download Jsoup Library
url: http://jsoup.org/download (jsoup-1.8.3.jar)
Once we have downloaded the Jsoup library we need to add it as a dependency to the build path.

Right click on project folder and select the build path option then configure build path.

Make sure the view opened is open in the library tab and not the others.



Select the option to add a external jar, once the dialog opens find the jar you download and click ok.

Step 3: Writing Scraper Code

To write the code you need to create a new Java class. It can be called anything for now mine is called RunScraper. Create the 

public static void main(String []args){

}

method in your class. This will be the start point.

Code Snippet:



The rest of the code I will paste which is pretty self explanatory. The main work being done is by the Jsoup library that once we have connected and downloaded the specific page Jsoup creates a document object that allows us to access the html dom structure as if we were in the browser using Javascript or CSS.

The tutorial is very basic usage of Jsoup and scraping the web, however it can be scaled up quite a bit by incorporating threads and a database and some more code.


Thursday 17 September 2015

Having Fun With Android ADB(Android Debug Bridge)

This is a fun how to guide. I have always wanted to see what certain applications look like under the hood. ADB as Google's definition is a versatile command line tool that lets you communicate with an emulator instance or connected Android-powered device. So know that we know what ADB is time for fun. The following guides were performed only on my own application please note it is a offence to perform on anyone else code.
Reverse Engineering Android Apps (Using Ubuntu Linux)
This guide assumes you have the android sdk downloaded and installed. Once completed installation plug android device in and open a terminal.

Set ADB in terminal path variable
export PATH=$PATH:/path/to/sdk/platform/tools

Run ADB from terminal with interactive shell
adb shell

Use package manager to view all applications and services running on device
pm list packages - list all the packages on the device
pm list features - list all the features on the device

Only interested in Packages
pm path package_name - displays path to the apk for an app

Pulling apk from device
adb pull /system/app/ApplicationName.apk

Decompiling Apk With APKTool

Download APKTool 2 files:
apktool1.5.2.tar.bz2 (https://code.google.com/p/android-apktool/downloads/detail?name=apktool1.5.2.tar.bz2&can=1&q=)
apktool-install-linux-r04-brut1.tar.bz2 (https://code.google.com/p/android-apktool/downloads/detail?name=apktool-install-linux-r04-brut1.tar.bz2&can=1&q=)

Install APK Tool
Extract all folders anywhere
Move aapt, apktool and apktool.jar to /usr/local/bin
Make all three executable sudo chmod +x filename

Decompile Target APK
apktool d ___.apk (whatever application you pulled from your device)
This tool can now allow to view all resource files inside an apk, if we want to view the java class files then we will use dex2jar to build a jar from the apk

Decompiling with Dex2Jar
Download Dex2Jar
https://code.google.com/p/dex2jar
Extract the zip file anywhere

Build Jar From APK
./home/user/dex2jar-version/d2j-dex2jar.sh /home/user/someApk.apk

Use a jar viewer to view jar folder
http://jd.benow.ca/
extract downloaded file
run ./jd-gui in terminal
open jar created from dex2jar and start browsing code

Optional:
moved ./jd-gui to usr/local/bin for global access
That's it you should now be able to view any source code, however if the application you've decompiled is using proguard the code will be obfuscated and you will need to remap the classes on your own.