Twitter’s favourite swear words*

* From a sample of 3.2 million tweets from 2927 users.

CAUTION

If you’re likely to get even the slightest bit offended, DO NOT READ the rest of this post.

 

As a minor diversion from some more serious/formal research, I thought it’s be interesting to see what the most popular swear words were by using the 400+ available here http://urbanoalvarez.es/blog/?download=badwords and looking through the 3.2ish million tweets in my CouchDB corpus.  Those 3.2m tweets cames from the public tweet stream of 2927 people.

I parsed the tweets and ended up with ass. Enjoy

Tag Cloud of Twitter Swear words

Leave a Comment

Getting Twitter User Information with Tweepy and jsonpickle

Yesterday I shared some simple Python code to dump a Twitter users information. See post here.

I managed to recreate this in Tweepy using jsonpickle (see code snippet below).

For the interested, I still didn’t manage to figure out the Tweepy Python object navigation. There’s a <cough> remarkably </cough> similar problem posted on stackoverflow here.  Thanks to MarcW for summarizing what I’m attempting to do as: “so basically, I want to loop through user.__getstate__() and if I find an object which requires further iteration, loop through that too”

Ultimately I’ll want a nice JSON object to pipe into CouchDB, so I found jsonpickle.  Some short playing around and I have a solution (I’m still curious about the object iteration/navigation FWIW as I’m sure it’s something stupid I’m missing).

Here’s the code, it converts the Python object to JSON, so no need to patch Tweepy.

# -*- coding: utf-8 -*-

import sys

import tweepy
import json
import jsonpickle
from pprint import pprint


api = tweepy.API()



def main():
    print "Starting."
    
    user = api.get_user('TheSuggmeister',include_entities=1)

    print "================ type ================="
    print type(user)

    print "================ dir ================="
    print dir(user) 


    print "================ user.status ================="
    pickled = jsonpickle.encode(user)
    print(json.dumps(json.loads(pickled), indent=4, sort_keys=True))   #you could just print pickled, but this makes it pretty
 
    print "================= end ================="



if __name__ == "__main__":
  main()

Comments (1)

Simple Python code to get Twitter user information

I’d been searching high and low for a simple script to get Twitter user information and spit out the JSON object.  Tweepy felt like a sledgehammer to crack a nut.  More specifically,  navigating the Tweep class object returned is going to take me some more work.

So if you’re looking for a quick way to dump a users information, here’s some code that works.  I’ll start putting these on GitHub shortly…

# -*- coding: utf-8 -*-

import sys
import json
import urllib2

#
# A simple script to get twitter user information
#
# Authentication not required
# Twitter API call ref at https://dev.twitter.com/docs/api/1/get/users/show
#
# Author: Chris Sumner 27th Feb 2012
#


def main():
    print 'starting'
    # Use urllib2 to make our Twitter API call
    # For more information on urllib2, take a look at 'The Missing Manual' at http://www.voidspace.org.uk/python/articles/urllib2.shtml
    #
    req = urllib2.Request('https://api.twitter.com/1/users/show.json?screen_name=TheSuggmeister&include_entities=true')
    response = urllib2.urlopen(req)
    the_page = response.read()

    # print JSON object
    print json.loads(the_page) 


    # Neater way to print Twitter JSON data
    # From : http://stackoverflow.com/questions/9170288/need-to-pretty-print-twitter-json-data-to-a-file-using-python

    print(json.dumps(json.loads(the_page), indent=4, sort_keys=True))





if __name__ == "__main__":
  main()

Comments (1)