Bhaskar Karambelkar's Blog

How to use Twitter’s Search REST API most effectively.


Tags: REST Python SEarch Tweepy Twitter API

This blog post will discuss various techniques to use Twitter’s search REST API most effectively, given the constraints and limits of the said API. I’ll be using python for demonstration, but any native API which supports the Twitter REST API will do.


Twitter provides the REST search api for searching tweets from Twitter’s search index. This is different than using the streaming filter API, in that the later is real-time and starts giving you results from the point of query, while the former is retrospective and will give you results from past, up to as far back as the search index goes (usually last 7 days). While the streaming API seems like the thing to use when you want to track a certain query in real time, there are situations where you may want to use the regular REST search API. You may also want to combine the two approaches, i.e. start 2 searches, one using the streaming filter API to go forward in time and one using the REST search API to go backwards in time, in order to get some on-going and past context for your search term.

Either way if the REST Search API is something you want to use, then there are a few limitations you need to be aware of and some techniques you can use to maximize the resources the API gives you. This post will explore approaches to use the REST search API optimally in order to find as much information as fast as possible and yet remain within the constraints of the API. To start with the API Rate Limit page details the limits of various Twitter APIs, and as per the page the limit for the Search API is 180 Requests per 15 mins window for per-user authentication. Now here’s the kicker, most code samples on the internet for the search API use the Access Token Auth method, which is limited to the aforementioned 180 Requests/15 mins limit, and per request you can ask for maximum 100 tweets, giving you a grand total limit of 18,000 tweets/15 mins, If you download 18K tweets before 15 mins, you won’t be able to get any more results until your 15 min. window expires and you search again. Also you need to be aware of the following limitations of the search API.

Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.


Before getting involved, it’s important to know that the Search API is focused on relevance and not completeness. This means that some Tweets and users may be missing from search results. If you want to match for completeness you should consider using a Streaming API instead.

What this means is, using the search API you are not going to get all the tweets that match your search criteria, even if they are present in your desired timeframe. This is an important point to keep in mind when drawing conclusions about the size of the dataset obtained from using the search REST API.

The problem

So given this background information, can we do something about the following points ?

  • Could we query at a rate faster than 18K tweets/15 mins ?
  • Could we maintain a search context across our API rate limit window, so as to avoid getting duplicate results when searching repeatedly over a long period of time ?
  • Could we do something about the fact that not all tweets matching the search criteria will be returned by the API ?

And the answer to all these 3 questions is YES. There wouldn’t be a point to this blog post if the answers were no, would there ?

The Solution

I’ll be using python and the excellent Tweepy API for this purpose, but any API in any programming language that supports Twitter’s REST APIs will do.

To start with our first question about being able to search at a rate greater than 18K tweets/15 mins. The solution is to use Application only Auth instead of the Access Token Auth. Application only auth has higher limits, precisely up to 450 request/sec and again with a limitation of requesting maximum 100 tweets per request, this gives a rate of 45,000 tweets/15-min, which is 2.5 times more than the Access Token Limit.

The code sample below shows how to use App Only Auth using the Tweepy API.

import tweepy

# Replace the API_KEY and API_SECRET with your application's key and secret.
auth = tweepy.AppAuthHandler(API_KEY, API_SECRET)

api = tweepy.API(auth, wait_on_rate_limit=True,

if (not api):
    print ("Can't Authenticate")

# Continue with rest of code

The secret is the AppAuthHandler instead of the more frequent OAuthHandler which you find being used in lots of code samples. This sets up App-only Auth and gives you higher limits. Also as an added bonus notice the wait_on_rate_limit & wait_on_rate_limit_notify flags set to true. What this does is make the Tweepy API call auto wait (sleep) when it hits the rate limit and continue upon expiry of the window. This avoids you to have to program this part manually, which as you’ll shortly see makes your program much more simple and elegant.

Next we tackle the second question about maintaining a search context when querying repeatedly over a long time frame. REST APIs by their very nature are stateless, i.e. there is no implicit context maintained by the server in between successive calls to the same API which can tell it what results have been sent to the client so far. So what we need is a way for the client to tell the API server where it is in a search result context, so that the server can then send the next set of results (This is called pagination). The search REST API allows this by accepting two input parameters as part of the API viz. max_id & since_id which serve as the upper and lower bounds of the unique IDs that Twitter assigns each tweet. By manipulating these two inputs during successive calls to the search API you can paginate your results. Below is a code sample that does just that.

import sys
import jsonpickle
import os

searchQuery = '#someHashtag'  # this is what we're searching for
maxTweets = 10000000 # Some arbitrary large number
tweetsPerQry = 100  # this is the max the API permits
fName = 'tweets.txt' # We'll store the tweets in a text file.

# If results from a specific ID onwards are reqd, set since_id to that ID.
# else default to no lower limit, go as far back as API allows
sinceId = None

# If results only below a specific ID are, set max_id to that ID.
# else default to no upper limit, start from the most recent tweet matching the search query.
max_id = -1L

tweetCount = 0
print("Downloading max {0} tweets".format(maxTweets))
with open(fName, 'w') as f:
    while tweetCount < maxTweets:
            if (max_id <= 0):
                if (not sinceId):
                    new_tweets =, count=tweetsPerQry)
                    new_tweets =, count=tweetsPerQry,
                if (not sinceId):
                    new_tweets =, count=tweetsPerQry,
                                            max_id=str(max_id - 1))
                    new_tweets =, count=tweetsPerQry,
                                            max_id=str(max_id - 1),
            if not new_tweets:
                print("No more tweets found")
            for tweet in new_tweets:
                f.write(jsonpickle.encode(tweet._json, unpicklable=False) +
            tweetCount += len(new_tweets)
            print("Downloaded {0} tweets".format(tweetCount))
            max_id = new_tweets[-1].id
        except tweepy.TweepError as e:
            # Just exit if any error
            print("some error : " + str(e))

print ("Downloaded {0} tweets, Saved to {1}".format(tweetCount, fName))

The above code will write all the downloaded tweets in a text file. Each line representing a tweet encoded in JSON format. The tweets in the file are in reversed order of the creation timestamp i.e. going from most recent to most farthest. There’s probably some room for beautifying the above code, but it works and can download literally millions of tweets at the optimal rate of 45K tweets/15-mins. Just run the code in a background process and it will go back as far as the search API allows until it has exhausted all the results. What’s more using the initial values for max_id and/or since_id you can fetch results to and from arbitrary IDs. This is really helpful if you want to the program repeatedly to fetch newer results since last run. Just look up the max ID (the ID of the first line) from the previous run and set that to since_id for the next run. If you’ve to stop your program before exhausting all the possible results and rerun it again to fetch the remaining results, you can look up the min ID (the ID of the last line) and pass that as max_id for the next run to start from that ID and below.

Now we look at our third question, given the fact that the search results will not contain all possible matching tweets, can we do something about it ? The answer is yes, but it gets a bit tricky. The idea is that; Of the tweets you have fetched there will be quite a lot of retweets, and chances are that some of the original tweets of these retweets are not in the results downloaded. But each retweet also encodes the entire original tweet object in its JSON representation. So if we pick out these original tweets from retweets then we can augment our results by including the missing original tweets in the result set. We can easily do this as each tweet is assigned a unique ID, thus allowing us to use set functions to pick out only the missing tweets.

This approach is not as complicated as it sounds, and can be easily accomplished in any programming language. I have a working code written in R (not shown here). I leave it as an exercise to the reader to implement it in python or whichever language of his/her choice. From my tests for various search queries , I get anywhere from 2% to 10% more tweets this way, so it’s a worthwhile exercise, and it completes your dataset in that you have all the original tweets of every retweet found in your dataset.


I highlighted some of the limitations of Twitter’s search REST API; how you can best use it to the fullest allowed rate limit. I also explained approaches to paginate results as well as extending the result set by another 2% to 10% by extracting missing original tweets from the retweets. Using these approaches you should be able to download a whole lot more tweets at a much faster rate.

Technical Notes:

  • Tweepy also has a api.Cursor method which could possibly replace the whole while loop in the second code sample, but it seems the Cursor API suffers from memory leak and will eventually crash your program. Hence my approach is based on modification of this answer on stackoverflow.
  • For extracting the missing original tweets from retweets, think of the following pseudo-code.
    • Store all downloaded tweets in a set (say set A)
    • From this set filter out the retweets & extract the original tweet from these retweets (say set B)
    • Insert in set A all unique tweets from set B that are not already in set A