tag:blogger.com,1999:blog-84616810314243151642024-03-20T21:16:04.226-07:00Skywalk Developers BlogTyrone Adamshttp://www.blogger.com/profile/11507404086276822163noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-8461681031424315164.post-40625621119533835192015-10-16T09:54:00.001-07:002015-10-20T15:35:55.754-07:00Building A Web Scraper With JavaThis guide is for those interested in learning how to build a basic web scraper that can download and parse html pages. The viewer is expected to have an understanding of tools like java and an IDE. <span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">The following guides were performed only on my own </span><span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">website</span><span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;"> please note it is a offence to perform on anyone else website.</span><br />
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;"><br /></span>
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">The tools I am using is java jdk 1.7.0_79, Eclipse Mars and Lastly Jsoup.jar(Library to parse html documents).</span><br />
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;"><br /></span>
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;"><b>Step 1: Create Project</b></span><br />
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;"><b><br /></b></span>
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">Open up the IDE and select the file menu option and then new Java Project.</span><br />
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDEjgxTfCXJAUfdH2qEgiUz67hwVoy-4HMGntj7EA4IFR3VBRKDorc4dgb2nHBs3vaZhCn4EjdPyfPkMuDUshI1IsbG9-ULWd4ZuGV9deQoQIr-TMYSBiablXHnMziAxNx6ZkfALIl6gM/s1600/Screen+Shot+2015-10-16+at+18.32.23.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="395" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjDEjgxTfCXJAUfdH2qEgiUz67hwVoy-4HMGntj7EA4IFR3VBRKDorc4dgb2nHBs3vaZhCn4EjdPyfPkMuDUshI1IsbG9-ULWd4ZuGV9deQoQIr-TMYSBiablXHnMziAxNx6ZkfALIl6gM/s400/Screen+Shot+2015-10-16+at+18.32.23.png" width="400" /></a></div>
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;"><br /></span>
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">Click finish.</span><br />
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;"><br /></span>
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;"><b>Step 2: Download Jsoup Library</b></span><br />
<span style="background-color: white; color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;"><b>url: </b></span><span style="color: #333333; font-family: Georgia, Times New Roman, Bitstream Charter, Times, serif;"><span style="line-height: 24px;">http://jsoup.org/download (</span></span><b style="color: #7d0e71; font-family: Consolas, monospace; font-size: 15px; line-height: 15px; text-decoration: none;"><a href="http://jsoup.org/packages/jsoup-1.8.3.jar" style="color: #7d0e71; font-family: Consolas, monospace; font-size: 15px; line-height: 15px; text-decoration: none;">jsoup-1.8.3.jar</a>)</b><br />
Once we have downloaded the Jsoup library we need to add it as a dependency to the build path.<br />
<br />
Right click on project folder and select the build path option then configure build path.<br />
<br />
Make sure the view opened is open in the library tab and not the others.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlMLZ9PtHLSvILhNYx_OzX1LqPmI1jz1cBYS-Zz433Xi0Vk3q-Ui5P9QygWlg8SqmXcV-vvwAtaDx_ETLpagS8ngMG-LoqE2DMNwJi-VgMsIz0lUDm6496nlOm0h0KKdlYYO523q2BTT0/s1600/Screen+Shot+2015-10-16+at+18.37.19.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="285" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlMLZ9PtHLSvILhNYx_OzX1LqPmI1jz1cBYS-Zz433Xi0Vk3q-Ui5P9QygWlg8SqmXcV-vvwAtaDx_ETLpagS8ngMG-LoqE2DMNwJi-VgMsIz0lUDm6496nlOm0h0KKdlYYO523q2BTT0/s400/Screen+Shot+2015-10-16+at+18.37.19.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
Select the option to add a external jar, once the dialog opens find the jar you download and click ok.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<b>Step 3: Writing Scraper Code</b></div>
<div class="separator" style="clear: both; text-align: left;">
<b><br /></b></div>
<div class="separator" style="clear: both; text-align: left;">
To write the code you need to create a new Java class. It can be called anything for now mine is called RunScraper. Create the </div>
<div class="p1">
<span class="s1"><br /></span></div>
<div class="p1">
<span class="s1">public</span> <span class="s1">static</span> <span class="s1">void</span> main(String []<span class="s2">args</span>){</div>
<div class="p2">
<span class="Apple-tab-span"> </span><span class="Apple-tab-span"> </span></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="p1">
<br /></div>
<div class="p1">
<span class="Apple-tab-span"> </span>}</div>
<br />
method in your class. This will be the start point.<br />
<br />
Code Snippet:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjt84kA-PwPCxb3BoMF8Da4Q_4E8gqjR6qzc06iKKQVSpQKKKnIHJbnoCsNNbmYUs7gKEh-PReSYiBROAJ-AGpZmTmpwOH8XvPevPHsEKk2RZ9uOV2J4D655587cY4PsFTBZoV6j3JVsYc/s1600/Screen+Shot+2015-10-16+at+18.52.40.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjt84kA-PwPCxb3BoMF8Da4Q_4E8gqjR6qzc06iKKQVSpQKKKnIHJbnoCsNNbmYUs7gKEh-PReSYiBROAJ-AGpZmTmpwOH8XvPevPHsEKk2RZ9uOV2J4D655587cY4PsFTBZoV6j3JVsYc/s400/Screen+Shot+2015-10-16+at+18.52.40.png" width="378" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br />
The rest of the code I will paste which is pretty self explanatory. The main work being done is by the Jsoup library that once we have connected and downloaded the specific page Jsoup creates a document object that allows us to access the html dom structure as if we were in the browser using Javascript or CSS.<br />
<br />
The tutorial is very basic usage of Jsoup and scraping the web, however it can be scaled up quite a bit by incorporating threads and a database and some more code.<br />
<br />
<br />Tyrone Adamshttp://www.blogger.com/profile/11507404086276822163noreply@blogger.com0tag:blogger.com,1999:blog-8461681031424315164.post-54730950367759950562015-09-17T01:00:00.003-07:002015-09-17T01:00:57.469-07:00Having Fun With Android ADB(Android Debug Bridge)<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
This is a fun how to guide. I have always wanted to see what certain applications look like under the hood. ADB as Google's definition is a versatile command line tool that lets you communicate with an emulator instance or connected Android-powered device. So know that we know what ADB is time for fun. The following guides were performed only on my own application please note it is a offence to perform on anyone else code.</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
Reverse Engineering Android Apps (Using Ubuntu Linux)</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
This guide assumes you have the android sdk downloaded and installed. Once completed installation plug android device in and open a terminal.</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Set ADB in terminal path variable</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
export PATH=$PATH:/path/to/sdk/platform/tools</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Run ADB from terminal with interactive shell</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
adb shell</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Use package manager to view all applications and services running on device</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
pm list packages - list all the packages on the device</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
pm list features - list all the features on the device</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Only interested in Packages</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
pm path package_name - displays path to the apk for an app</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Pulling apk from device</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
adb pull /system/app/ApplicationName.apk</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Decompiling Apk With APKTool</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Download APKTool 2 files:</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
apktool1.5.2.tar.bz2 (https://code.google.com/p/android-apktool/downloads/detail?name=apktool1.5.2.tar.bz2&can=1&q=)</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
apktool-install-linux-r04-brut1.tar.bz2 (https://code.google.com/p/android-apktool/downloads/detail?name=apktool-install-linux-r04-brut1.tar.bz2&can=1&q=)</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Install APK Tool</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
Extract all folders anywhere</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
Move aapt, apktool and apktool.jar to /usr/local/bin</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
Make all three executable sudo chmod +x filename</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Decompile Target APK</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
apktool d ___.apk (whatever application you pulled from your device)</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
This tool can now allow to view all resource files inside an apk, if we want to view the java class files then we will use dex2jar to build a jar from the apk</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Decompiling with Dex2Jar</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
Download Dex2Jar</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
https://code.google.com/p/dex2jar</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
Extract the zip file anywhere</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Build Jar From APK</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
./home/user/dex2jar-version/d2j-dex2jar.sh /home/user/someApk.apk</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<br /></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Use a jar viewer to view jar folder</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
http://jd.benow.ca/</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
extract downloaded file</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
run ./jd-gui in terminal</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
open jar created from dex2jar and start browsing code</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong><br /></strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
<strong>Optional:</strong></div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
moved ./jd-gui to usr/local/bin for global access</div>
<div style="color: #333333; font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; font-size: 16px; line-height: 24px;">
That's it you should now be able to view any source code, however if the application you've decompiled is using proguard the code will be obfuscated and you will need to remap the classes on your own.</div>
Tyrone Adamshttp://www.blogger.com/profile/11507404086276822163noreply@blogger.com0