Saturday, November 21, 2015

Freelance or not to Freelance? That's the question!

I’ve worked as a software engineer for the last 3 years, during which I was involved in making mostly Mobile Apps and build Backend services to be used in modern day API models. Most of the applications that I’ve worked on are still in use today, by individuals and organizations. I was always looking to try something new, solve a problem or help fix bugs. This led me to help and collaborate with other people, so I got to learn various technologies.

It's always been difficult for me to pick a favorite language! But languages I've long been associated with will be C/C++ and Java, but I've had my hands dirty in a lot of other languages Python, Ruby, Neo4j, Scala, Go. I also have a lot of experience with web design (HTML, CSS, JavaScript) especially currently I'm more into Node, Angular and React bug, Big Data, cloud technologies.

Freelancing gives you option to work for people who need to solve an issue and need it fast. I've come across a lot of freelancing platform but haven't liked or worked out with any platform!
That's when I came across TopTal.com and I must say it does seem pretty interesting. Following are points that I liked the most:
  • It’s the place where the best freelancers are – TopTal receives thousands of applications every month, but only about 3% of applicants get in. According to their website and various blog posts on the web, TopTal has developed a tough screening process to identify and accept the best engineers. Oh, and did I mention that the engineers who are accepted at TopTal work for clients such as Airbnb and ZenDesk? Need I say more?
  • Great blog posts – This is how I learned about TopTal in the first place. They have excellent posts on their Engineering Blog, written by freelance software engineers who work at TopTal. I’ve learned a lot from their blog, and I highly recommend that you try it for yourself.
  • Less stress: As a programmer I just want to put my headphones on and get some work done, and working from home makes this 10x easier.
Anyway, I’ve just began the interview process at TopTal.com, I really like to get in and become one of the freelancers who work there. If you’re a software engineer looking for work, I recommend that you do the same.

Wednesday, June 20, 2012

Having fun web crawling with phantomJs

A couple of weeks ago, a colleague of mine showed me this cool tool called phantomJs.
This is a headless browser, that can receive javascript to do almost anything you would want from a regular browser, just without rendering anything to the screen.

This could be really useful for tasks like running ui tests on a project you created, or crawling a set of web pages looking for something.

...So, this is exactly what i did!
There's a great site I know of that has a ton of great ebooks ready to download, but the problem is that they show you only 2 results on each page, and the search never finds anything!

Realizing that this site has a very simple url structure (e.g.: website/page/#), I just created a quick javascript file, telling phantomjs to go through the first 50 pages and search for a list of keywords that interest me. If i find something interesting, it saves the name of the book along with the page link into a text file so i can download them all later. :)

Here's the script :
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
var page;
var fs = require('fs');
var pageCount = 0;
 
scanPage(pageCount);
 
function scanPage(pageIndex) {
 // dispose of page before moving on
 if (typeof page !== 'undefined')
  page.release();
 
 // dispose of phantomjs if we're done
 if (pageIndex > 50) {
  phantom.exit();
  return;
 }
 
 pageIndex++;
  
 // start crawling...
 page = require('webpage').create();
 var currentPage = 'your-favorite-ebook-site-goes-here/page/' + pageIndex;
 page.open(currentPage, function(status) {
  if (status === 'success') {
   window.setTimeout(function() {
    console.log('crawling page ' + pageIndex);
     
    var booksNames = page.evaluate(function() {
     // there are 2 book titles on each page, just put these in an array
     return [ $($('h2 a')[0]).attr('title'), $($('h2 a')[1]).attr('title') ];
    });
    checkBookName(booksNames[0], currentPage);
    checkBookName(booksNames[1], currentPage);
     
    scanPage(pageIndex);
   }, 3000);
  }
  else {
   console.log('error crawling page ' + pageIndex);
   page.release();
  }
 });
}
 
// checks for interesting keywords in the book title,
// and saves the link for us if necessary
function checkBookName(bookTitle, bookLink) {
 var interestingKeywords = ['C#','java','nhibernate','windsor','ioc','dependency injection',
  'inversion of control','mysql'];
 for (var i=0; i<interestingKeywords.length; i++) {
  if (bookTitle.toLowerCase().indexOf(interestingKeywords[i]) !== -1) {
   // save the book title and link
   var a = bookTitle + ' => ' + bookLink + ';';
   fs.write('books.txt', a, 'a');
   console.log(a);
   break;
  }
 }
}

And this is what the script looks like, when running :  
Just some notes on the script :
  • I added comments to try to make it as clear as possible. Feel free to contact me if it isn't.
  • I hid the real website name from the script for obvious reasons. This technique could be useful for a variety of things, but you should check first about legality issues.
  • I also added an interval of 3 seconds between each website crawl. Another precaution from putting too much load on their site.

In order to use this script, or something like it, just go to the phantomjs homepage, download it, and run this at the command line :
C:\your-phantomjs-lib\phantomjs your-script.js

Enjoy! :)