facebookProfileSpider

A Python spider using Selenium to crawl Facebook user profile information such as first name,last name,work information,education information and etc,and output the information into a csv file.

About

As we know,the page contents of Facebook are created by many Javascript plugins, thus we can not simply crawl the data using Regex or Scrapy framework.We need to use Selenium to simulate a web browser action and then get data from it. Using Selenium may cost time but it will be the most effective way to crawl from these sites such as Facebook or Taobao.

This project had batter to be run at Eclipse on Win7,but will add support to Ubuntu and let it can run on the Linux terminal later.

Require

Python2.7
Selenium 2.42.1
BeautifulSoup 4.3.2
urllib2
A stable VPN account if you are in the mainland China.
Jdk1.6
Eclipse

Usage

First,ensure you can access to Facebook freely and quickly,then run the facebookSpider.py to make this project run,then it will login to Facebook automatically and crawl data from the specified URLs one by one.

All the urls are written in the urls.py file.All the configuration items are written in the settings.py file.

Note

Some guys told me that they have a problem when run this application,this is because you have not set the User-Agent correctly when running.

In the facebookLogin.py,change the User-Agentdata depends on which browser you are using.Only if you have set the correctly value for it,your Selenium can run normaly.

def __init__(self):
    '''
    Constructor
    '''
    cookie=cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
    # change to User-Agent depends on your own account and browser data,and do not use it directly!
    opener.addheaders = [('Referer', 'http://login.facebook.com/login.php'),
                        ('Content-Type', 'application/x-www-form-urlencoded'),
                        ('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 (.NET CLR 3.5.30729)')]
    self.opener=opener

The User-Agent is used to avoid login to Facebook each time when fetch data from Facebook,if you do not know how to set it, just Google!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
fpspider		fpspider
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

facebookProfileSpider

About

Require

Usage

Note

About

Releases

Packages

Languages

frankiechen1/facebookProfileSpider

Folders and files

Latest commit

History

Repository files navigation

facebookProfileSpider

About

Require

Usage

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages