From: Harmon Date: Mon, 12 Apr 2021 18:28:24 +0000 (-0500) Subject: Update and improve documentation for streaming X-Git-Url: https://vcs.fsf.org/?a=commitdiff_plain;h=bf641965278d2486cebef45bb940e2d2e691d420;p=tweepy.git Update and improve documentation for streaming --- diff --git a/docs/index.rst b/docs/index.rst index 8aa437b..a4d6618 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -16,11 +16,11 @@ Contents: auth_tutorial.rst code_snippet.rst cursor_tutorial.rst - streaming_how_to.rst api.rst stream.rst exceptions.rst extended_tweets.rst + streaming.rst running_tests.rst changelog.md diff --git a/docs/streaming.rst b/docs/streaming.rst new file mode 100644 index 0000000..4856c5e --- /dev/null +++ b/docs/streaming.rst @@ -0,0 +1,100 @@ +.. _streaming_guide: + +.. currentmodule:: tweepy + +********* +Streaming +********* + +:class:`Stream` allows `filtering`_ and `sampling`_ of realtime Tweets using +Twitter's API. + +.. _filtering: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/overview +.. _sampling: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/sample-realtime/overview + +Streams utilize Streaming HTTP protocol to deliver data through +an open, streaming API connection. Rather than delivering data in batches +through repeated requests by your client app, as might be expected from a REST +API, a single connection is opened between your app and the API, with new +results being sent through that connection whenever new matches occur. This +results in a low-latency delivery mechanism that can support very high +throughput. For futher information, see +https://developer.twitter.com/en/docs/tutorials/consuming-streaming-data + +Using :class:`Stream` +===================== + +To use :class:`Stream`, an instance of it needs to be initialized with Twitter +API credentials (Consumer Key, Consumer Secret, Access Token, Access Token +Secret):: + + import tweepy + + stream = tweepy.Stream( + "Consumer Key here", "Consumer Secret here", + "Access Token here", "Access Token Secret here" + ) + +Then, :meth:`Stream.filter` or :meth:`Stream.sample` can be used to connect to +and run a stream:: + + stream.filter(track=["Tweepy"]) + +Data received from the stream is passed to :meth:`Stream.on_data`. This method +handles sending the data to other methods based on the message type. For +example, if a Tweet is received from the stream, the raw data is sent to +:meth:`Stream.on_data`, which constructs a :class:`Status` object and passes it +to :meth:`Stream.on_status`. By default, the other methods, besides +:meth:`Stream.on_data`, that receive the data from the stream, simply log the +data received, with the `logging level`_ dependent on the type of the data. + +.. _logging level: https://docs.python.org/3/howto/logging.html#logging-levels + +To customize the processing of the stream data, :class:`Stream` needs to be +subclassed. For example, to print the IDs of every Tweet received:: + + class IDPrinter(tweepy.Stream): + + def on_status(self, status): + print(status.id) + + + printer = IDPrinter( + "Consumer Key here", "Consumer Secret here", + "Access Token here", "Access Token Secret here" + ) + printer.sample() + +Threading +========= +Both :meth:`Stream.filter` and :meth:`Stream.sample` have a ``threaded`` +parameter. When set to ``True``, the stream will run in a separate `thread`_, +which is returned by the call to either method. For example:: + + thread = stream.filter(follow=[1072250532645998596], threaded=True) + +.. _thread: https://docs.python.org/3/library/threading.html#thread-objects + +Handling Errors +=============== +:class:`Stream` has multiple methods to handle errors during streaming. +:meth:`Stream.on_closed` is called when the stream is closed by Twitter. +:meth:`Stream.on_connection_error` is called when the stream encounters a +connection error. :meth:`Stream.on_request_error` is called when an error is +encountered while trying to connect to the stream. When these errors are +encountered and ``max_retries``, which defaults to infinite, hasn't been +exceeded yet, the :class:`Stream` instance will attempt to reconnect the stream +after an appropriate amount of time. By default, all three of these methods log +an error. To customize that handling, they can be overriden in a subclass:: + + class ConnectionTester(tweepy.Stream): + + def on_connection_error(self): + self.disconnect() + +:meth:`Stream.on_request_error` is also passed the HTTP status code that was +encountered. The HTTP status codes reference for the Twitter API can be found +at https://developer.twitter.com/en/support/twitter-api/error-troubleshooting. + +:meth:`Stream.on_exception` is called when an unhandled exception occurs. This +is fatal to the stream, and by default, an exception is logged. diff --git a/docs/streaming_how_to.rst b/docs/streaming_how_to.rst deleted file mode 100644 index 62cf5af..0000000 --- a/docs/streaming_how_to.rst +++ /dev/null @@ -1,125 +0,0 @@ -.. _streaming_how_to: -.. _Twitter Streaming API Documentation: https://developer.twitter.com/en/docs/tweets/filter-realtime/overview -.. _Twitter Streaming API Connecting Documentation: https://developer.twitter.com/en/docs/tutorials/consuming-streaming-data -.. _Twitter Response Codes Documentation: https://dev.twitter.com/overview/api/response-codes - -********************* -Streaming With Tweepy -********************* -Tweepy makes it easier to use the twitter streaming api by handling authentication, -connection, creating and destroying the session, reading incoming messages, -and partially routing messages. - -This page aims to help you get started using Twitter streams with Tweepy -by offering a first walk through. Some features of Tweepy streaming are -not covered here. See streaming.py in the Tweepy source code. - -API authorization is required to access Twitter streams. -Follow the :ref:`auth_tutorial` if you need help with authentication. - -Summary -======= -The Twitter streaming API is used to download twitter messages in real -time. It is useful for obtaining a high volume of tweets, or for -creating a live feed using a site stream or user stream. -See the `Twitter Streaming API Documentation`_. - -The streaming api is quite different from the REST api because the -REST api is used to *pull* data from twitter but the streaming api -*pushes* messages to a persistent session. This allows the streaming -api to download more data in real time than could be done using the -REST API. - -In Tweepy, an instance of **tweepy.Stream** establishes a streaming -session and routes messages to **StreamListener** instance. The -**on_data** method of a stream listener receives all messages and -calls functions according to the message type. The default -**StreamListener** can classify most common twitter messages and -routes them to appropriately named methods, but these methods are -only stubs. - -Therefore using the streaming api has three steps. - -1. Create a class inheriting from **StreamListener** - -2. Using that class create a **Stream** object - -3. Connect to the Twitter API using the **Stream**. - - -Step 1: Creating a **StreamListener** -===================================== -This simple stream listener prints status text. -The **on_data** method of Tweepy's **StreamListener** conveniently passes -data from statuses to the **on_status** method. -Create class **MyStreamListener** inheriting from **StreamListener** -and overriding **on_status**.:: - - import tweepy - #override tweepy.StreamListener to add logic to on_status - class MyStreamListener(tweepy.StreamListener): - - def on_status(self, status): - print(status.text) - -Step 2: Creating a **Stream** -============================= -We need an api to stream. See :ref:`auth_tutorial` to learn how to get an api object. -Once we have an api and a status listener we can create our stream object.:: - - myStreamListener = MyStreamListener() - myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener) - -Step 3: Starting a Stream -========================= -A number of twitter streams are available through Tweepy. Most cases -will use filter. -For more information on the capabilities and limitations of the different -streams see `Twitter Streaming API Documentation`_. - -In this example we will use **filter** to stream all tweets containing -the word *python*. The **track** parameter is an array of search terms to stream. :: - - myStream.filter(track=['python']) - -This example shows how to use **filter** to stream tweets by a specific user. The **follow** parameter is an array of IDs. :: - - myStream.filter(follow=["2211149702"]) - -An easy way to find a single ID is to use one of the many conversion websites: search for 'what is my twitter ID'. - -A Few More Pointers -=================== - -Async Streaming ---------------- -Streams do not terminate unless the connection is closed, blocking the thread. -Tweepy offers a convenient **is_async** parameter on **filter** so the stream will run on a new -thread. For example :: - - myStream.filter(track=['python'], is_async=True) - -Handling Errors ---------------- -When using Twitter's streaming API one must be careful of the dangers of -rate limiting. If clients exceed a limited number of attempts to connect to the streaming API -in a window of time, they will receive error 420. The amount of time a client has to wait after receiving error 420 -will increase exponentially each time they make a failed attempt. - -Tweepy's **Stream Listener** passes error codes to an **on_error** stub. The -default implementation returns **False** for all codes, but we can override it -to allow Tweepy to reconnect for some or all codes, using the backoff -strategies recommended in the `Twitter Streaming API Connecting -Documentation`_. :: - - class MyStreamListener(tweepy.StreamListener): - - def on_error(self, status_code): - if status_code == 420: - #returning False in on_error disconnects the stream - return False - - # returning non-False reconnects the stream, with backoff. - -For more information on error codes from the Twitter API see `Twitter Response Codes Documentation`_. -