PycURL Quick Start
==================

Retrieving A Network Resource
-----------------------------

Once PycURL is installed we can perform network operations. The simplest
one is retrieving a resource by its URL. To issue a network request with
PycURL, the following steps are required:

    1. Create a ``pycurl.Curl`` instance.
    2. Use ``setopt`` to set options.
    3. Call ``perform`` to perform the operation.

Here is how we can retrieve a network resource in Python 2::

    import pycurl
    from StringIO import StringIO

    buffer = StringIO()
    c = pycurl.Curl()
    c.setopt(c.URL, 'http://pycurl.sourceforge.net/')
    c.setopt(c.WRITEDATA, buffer)
    c.perform()
    c.close()

    body = buffer.getvalue()
    # Body is a string in some encoding.
    # In Python 2, we can print it without knowing what the encoding is.
    print(body)

This code is available as ``examples/quickstart/get_python2.py``.

PycURL does not provide storage for the network response - that is the
application's job. Therefore we must setup a buffer (in the form of a
StringIO object) and instruct PycURL to write to that buffer.

Most of the existing PycURL code uses WRITEFUNCTION instead of WRITEDATA
as follows::

    c.setopt(c.WRITEFUNCTION, buffer.write)

While the WRITEFUNCTION idiom continues to work, it is now unnecessary.
As of PycURL 7.19.3 WRITEDATA accepts any Python object with a ``write``
method.

Python 3 version is slightly more complicated::

    import pycurl
    from io import BytesIO

    buffer = BytesIO()
    c = pycurl.Curl()
    c.setopt(c.URL, 'http://pycurl.sourceforge.net/')
    c.setopt(c.WRITEDATA, buffer)
    c.perform()
    c.close()

    body = buffer.getvalue()
    # Body is a byte string.
    # We have to know the encoding in order to print it to a text file
    # such as standard output.
    print(body.decode('iso-8859-1'))

This code is available as ``examples/quickstart/get_python3.py``.

In Python 3, PycURL response the response body as a byte string.
This is handy if we are downloading a binary file, but for text documents
we must decode the byte string. In the above example, we assume that the
body is encoded in iso-8859-1.

Python 2 and Python 3 versions can be combined. Doing so requires decoding
the response body as in Python 3 version. The code for the combined
example can be found in ``examples/quickstart/get.py``.

Examining Response Headers
--------------------------

In reality we want to decode the response using the encoding specified by
the server rather than assuming an encoding. To do this we need to
examine the response headers::

    import pycurl
    import re
    try:
        from io import BytesIO
    except ImportError:
        from StringIO import StringIO as BytesIO

    headers = {}
    def header_function(header_line):
        # HTTP standard specifies that headers are encoded in iso-8859-1.
        # On Python 2, decoding step can be skipped.
        # On Python 3, decoding step is required.
        header_line = header_line.decode('iso-8859-1')
        
        # Header lines include the first status line (HTTP/1.x ...).
        # We are going to ignore all lines that don't have a colon in them.
        # This will botch headers that are split on multiple lines...
        if ':' not in header_line:
            return
        
        # Break the header line into header name and value.
        name, value = header_line.split(':', 1)
        
        # Remove whitespace that may be present.
        # Header lines include the trailing newline, and there may be whitespace
        # around the colon.
        name = name.strip()
        value = value.strip()
        
        # Header names are case insensitive.
        # Lowercase name here.
        name = name.lower()
        
        # Now we can actually record the header name and value.
        headers[name] = value

    buffer = BytesIO()
    c = pycurl.Curl()
    c.setopt(c.URL, 'http://pycurl.sourceforge.net')
    c.setopt(c.WRITEFUNCTION, buffer.write)
    # Set our header function.
    c.setopt(c.HEADERFUNCTION, header_function)
    c.perform()
    c.close()

    # Figure out what encoding was sent with the response, if any.
    # Check against lowercased header name.
    encoding = None
    if 'content-type' in headers:
        content_type = headers['content-type'].lower()
        match = re.search('charset=(\S+)', content_type)
        if match:
            encoding = match.group(1)
            print('Decoding using %s' % encoding)
    if encoding is None:
        # Default encoding for HTML is iso-8859-1.
        # Other content types may have different default encoding,
        # or in case of binary data, may have no encoding at all.
        encoding = 'iso-8859-1'
        print('Assuming encoding is %s' % encoding)

    body = buffer.getvalue()
    # Decode using the encoding we figured out.
    print(body.decode(encoding))

This code is available as ``examples/quickstart/response_headers.py``.

That was a lot of code for something very straightforward. Unfortunately,
as libcurl refrains from allocating memory for response data, it is on our
application to perform this grunt work.

Writing To A File
-----------------

Suppose we want to save response body to a file. This is actually easy
for a change::

    import pycurl

    # As long as the file is opened in binary mode, both Python 2 and Python 3
    # can write response body to it without decoding.
    with open('out.html', 'wb') as f:
        c = pycurl.Curl()
        c.setopt(c.URL, 'http://pycurl.sourceforge.net/')
        c.setopt(c.WRITEDATA, f)
        c.perform()
        c.close()

This code is available as ``examples/quickstart/write_file.py``.

The important part is opening the file in binary mode - then response body
can be written bytewise without decoding or encoding steps.

Following Redirects
-------------------

By default libcurl, and PycURL, do not follow redirects. Changing this
behavior involves using ``setopt`` like so::

    import pycurl

    c = pycurl.Curl()
    # Redirects to https://www.python.org/.
    c.setopt(c.URL, 'http://www.python.org/')
    # Follow redirect.
    c.setopt(c.FOLLOWLOCATION, True)
    c.perform()
    c.close()

This code is available as ``examples/quickstart/follow_redirect.py``.

As we did not set a write callback, the default libcurl and PycURL behavior
to write response body to standard output takes effect.

Setting Options
---------------

Following redirects is one option that libcurl provides. There are many more
such options, and they are documented on `curl_easy_setopt`_ page.
With very few exceptions, PycURL option names are derived from libcurl
option names by removing the ``CURLOPT_`` prefix. Thus, ``CURLOPT_URL``
becomes simply ``URL``.

.. _curl_easy_setopt: http://curl.haxx.se/libcurl/c/curl_easy_setopt.html

Examining Response
------------------

We already covered examining response headers. Other response information is
accessible via ``getinfo`` call as follows::

    import pycurl
    try:
        from io import BytesIO
    except ImportError:
        from StringIO import StringIO as BytesIO

    buffer = BytesIO()
    c = pycurl.Curl()
    c.setopt(c.URL, 'http://pycurl.sourceforge.net/')
    c.setopt(c.WRITEDATA, buffer)
    c.perform()

    # HTTP response code, e.g. 200.
    print('Status: %d' % c.getinfo(c.RESPONSE_CODE))
    # Elapsed time for the transfer.
    print('Status: %f' % c.getinfo(c.TOTAL_TIME))

    # getinfo must be called before close.
    c.close()

This code is available as ``examples/quickstart/response_info.py``.

Here we write the body to a buffer to avoid printing uninteresting output
to standard out.

Response information that libcurl exposes is documented on
`curl_easy_getinfo`_ page. With very few exceptions, PycURL constants
are derived from libcurl constants by removing the ``CURLINFO_`` prefix.
Thus, ``CURLINFO_RESPONSE_CODE`` becomes simply ``RESPONSE_CODE``.

.. _curl_easy_getinfo: http://curl.haxx.se/libcurl/c/curl_easy_getinfo.html

Sending Form Data
-----------------

To send form data, use ``POSTFIELDS`` option. Form data must be URL-encoded
beforehand::

    import pycurl
    try:
        # python 3
        from urllib.parse import urlencode
    except ImportError:
        # python 2
        from urllib import urlencode

    c = pycurl.Curl()
    c.setopt(c.URL, 'http://pycurl.sourceforge.net/tests/testpostvars.php')

    post_data = {'field': 'value'}
    # Form data must be provided already urlencoded.
    postfields = urlencode(post_data)
    # Sets request method to POST,
    # Content-Type header to application/x-www-form-urlencoded
    # and data to send in request body.
    c.setopt(c.POSTFIELDS, postfields)

    c.perform()
    c.close()

This code is available as ``examples/quickstart/form_post.py``.

``POSTFIELDS`` automatically sets HTTP request method to POST. Other request
methods can be specified via ``CUSTOMREQUEST`` option::

    c.setopt(c.CUSTOMREQUEST, 'PATCH')

File Upload
-----------

To upload a file, use ``HTTPPOST`` option. To upload a physical file,
use ``FORM_FILE`` as follows::

    import pycurl

    c = pycurl.Curl()
    c.setopt(c.URL, 'http://pycurl.sourceforge.net/tests/testfileupload.php')

    c.setopt(c.HTTPPOST, [
        ('fileupload', (
            # upload the contents of this file
            c.FORM_FILE, __file__,
        )),
    ])

    c.perform()
    c.close()

This code is available as ``examples/quickstart/file_upload_real.py``.

``libcurl`` provides a number of options to tweak file uploads and multipart
form submissions in general. These are documented on `curl_formadd page`_.
For example, to set a different filename and content type::

    import pycurl

    c = pycurl.Curl()
    c.setopt(c.URL, 'http://pycurl.sourceforge.net/tests/testfileupload.php')

    c.setopt(c.HTTPPOST, [
        ('fileupload', (
            # upload the contents of this file
            c.FORM_FILE, __file__,
            # specify a different file name for the upload
            c.FORM_FILENAME, 'helloworld.py',
            # specify a different content type
            c.FORM_CONTENTTYPE, 'application/x-python',
        )),
    ])

    c.perform()
    c.close()

This code is available as ``examples/quickstart/file_upload_real_fancy.py``.

If the file data is in memory, use ``BUFFER``/``BUFFERPTR`` as follows::

    import pycurl

    c = pycurl.Curl()
    c.setopt(c.URL, 'http://pycurl.sourceforge.net/tests/testfileupload.php')

    c.setopt(c.HTTPPOST, [
        ('fileupload', (
            c.FORM_BUFFER, 'readme.txt',
            c.FORM_BUFFERPTR, 'This is a fancy readme file',
        )),
    ])

    c.perform()
    c.close()

This code is available as ``examples/quickstart/file_upload_buffer.py``.

.. _curl_formadd page: http://curl.haxx.se/libcurl/c/curl_formadd.html