EzDev.org

boto

For the latest version of boto, see <a href="https://github.com/boto/boto3">https://github.com/boto/boto3</a> -- Python interface to Amazon Web Services boto: A Python interface to Amazon Web Services &mdash; boto v2.38.0


Downloading the files from s3 recursively using boto python.

I have a bucket in s3, which has deep directory structure. I wish I could download them all at once. My files look like this :

foo/bar/1. . 
foo/bar/100 . . 

Are there any ways to download these files recursively from the s3 bucket using boto lib in python?

Thanks in advance.


Source: (StackOverflow)

How do I test a module that depends on boto and an Amazon AWS service?

I'm writing a very small Python ORM around boto.dynamodb.layer2. I would like to write tests for it, but I don't want the tests to actually communicate with AWS, as this would require complicated setup, credentials, network access, etc.

Since I plan to open source the module, including credentials in the source seems like a bad idea since I will get charged for usage, and including credentials in the environment is a pain.

Coupling my tests to the network seems like a bad idea, as it makes the tests run slower, or may cause tests to fail due to network errors or throttling. My goal is not to test boto's DynamoDB interface, or AWS. I just want to test my own code.

I plan to use unittest2 to write the tests and mock to mock out the parts of boto that hit the network, but I've never done this before, so my question boils down to these:

  1. Am I going about this the right way?
  2. Has anyone else done this?
  3. Are there any particular points in the boto.dynamodb interface that would be best to mock out?

Source: (StackOverflow)

Django + S3 (boto) + Sorl Thumbnail: Suggestions for optimisation

I am using S3 storage backend across a Django site I am developing, both to reduce load from the EC2 server(s), and to allow multiple webservers (redundancy, load balancing) access the same set of uploaded media.

Sorl.thumbnail (v11) template tags are being used in our templates to allow flexible image resizing/cropping.

Performance on media-rich pages is not very good, and when a page containing thumbnails needing to be generated for the first time is accessed, the requests even time out.

I understand that this is due to sorl thumbnail checking/downloading the original image from S3 (which could be quite large and high resolution), and rendering/checking/uploading the thumbnail.

What would you suggest is the best solution to this setup?

I have seen suggestions of storing a local copy of files in addition to the S3 copy (not to great when a couple of server are being used for load balancing). Also I've seen it suggested to store 0-byte files to fool sorl.thumbnail.

Are there any other suggestions or better ways of approaching this?


Source: (StackOverflow)

How to mix Django, Uploadify, and S3Boto Storage Backend?

Background

I'm doing fairly big file uploads on Django. File size is generally 10MB-100MB.

I'm on Heroku and I've been hitting the request timeout of 30 seconds.

The Beginning

In order to get around the limit, Heroku's recommendation is to upload from the browser DIRECTLY to S3.

Amazon documents this by showing you how to write an HTML form to perform the upload.

Since I'm on Django, rather than write the HTML by hand, I'm using django-uploadify-s3 (example). This provides me with an SWF object, wrapped in JS, that performs the actual upload.

This part is working fine! Hooray!

The Problem

The problem is in tying that data back to my Django model in a sane way. Right now the data comes back as a simple URL string, pointing to the file's location.

However, I was previously using S3 Boto from django-storages to manage all of my files as FileFields, backed by the delightful S3BotoStorageFile.

To reiterate, S3 Boto is working great in isolation, Uploadify is working great in isolation, the problem is in putting the two together.

My understanding is that the only way to populate the FileField is by providing both the filename AND the file content. When you're uploading files from the browser to Django, this is no problem, as Django has the file content in a buffer and can do whatever it likes with it. However, when doing direct-to-S3 uploads like me, Django only receives the file name and URL, not the binary data, so I can't properly populate the FieldFile.

Cry For Help

Anyone know a graceful way to use S3Boto's FileField in conjunction with direct-to-S3 uploading?

Else, what's the best way to manage an S3 file just based on its URL? Including setting expiration, key id, etc.

Many thanks!


Source: (StackOverflow)

Upload image available at public URL to S3 using boto

I'm working in a Python web environment and I can simply upload a file from the filesystem to S3 using boto's key.set_contents_from_filename(path/to/file). However, I'd like to upload an image that is already on the web (say https://pbs.twimg.com/media/A9h_htACIAAaCf6.jpg:large).

Should I somehow download the image to the filesystem, and then upload it to S3 using boto as usual, then delete the image?

What would be ideal is if there is a way to get boto's key.set_contents_from_file or some other command that would accept a URL and nicely stream the image to S3 without having to explicitly download a file copy to my server.

def upload(url):
    try:
        conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
        bucket_name = settings.AWS_STORAGE_BUCKET_NAME
        bucket = conn.get_bucket(bucket_name)
        k = Key(bucket)
        k.key = "test"
        k.set_contents_from_file(url)
        k.make_public()
                return "Success?"
    except Exception, e:
            return e

Using set_contents_from_file, as above, I get a "string object has no attribute 'tell'" error. Using set_contents_from_filename with the url, I get a No such file or directory error . The boto storage documentation leaves off at uploading local files and does not mention uploading files stored remotely.


Source: (StackOverflow)

How can I copy files bigger than 5 GB in Amazon S3?

Amazon S3 REST API documentation says there's a size limit of 5gb for upload in a PUT operation. Files bigger than that have to be uploaded using multipart. Fine.

However, what I need in essence is to rename files that might be bigger than that. As far as I know there's no rename or move operation, therefore I have to copy the file to the new location and delete the old one. How exactly that is done with files bigger than 5gb? I have to do a multipart upload from the bucket to itself? In that case, how splitting the file in parts work?

From reading boto's source it doesn't seem like it does anything like this automatically for files bigger than 5gb. Is there any built-in support that I missed?


Source: (StackOverflow)

git aws.push: No module named boto

i'm trying to follow the tutorial: deploy django on aws Elastic Beanstalk

when i'm doing the Step 6's substep 5:

git aws.push

I get a ImportError message:

(tryhasinenv)Lee-Jamess-MacBook-Pro:tryhasin h0925473$ git aws.push
Traceback (most recent call last):
  File ".git/AWSDevTools/aws.elasticbeanstalk.push", line 21, in <module>
    from aws.dev_tools import * 
  File "/Users/h0925473/tryhasin_root/tryhasin/.git/AWSDevTools/aws/dev_tools.py", line 5, in <module>
    import boto
ImportError: No module named boto

I have no idea what to do. Can somebody tell me what's wrong?


Source: (StackOverflow)

How to store data in GCS while accessing it from GAE and 'GCE' locally

There's a GAE project using the GCS to store/retrieve files. These files also need to be read by code that will run on GCE (needs C++ libraries, so therefore not running on GAE).

In production, deployed on the actual GAE > GCS < GCE, this setup works fine. However, testing and developing locally is a different story that I'm trying to figure out.

As recommended, I'm running GAE's dev_appserver with GoogleAppEngineCloudStorageClient to access the (simulated) GCS. Files are put in the local blobstore. Great for testing GAE.

Since these is no GCE SDK to run a VM locally, whenever I refer to the local 'GCE', it's just my local development machine running linux. On the local GCE side I'm just using the default boto library (https://developers.google.com/storage/docs/gspythonlibrary) with a python 2.x runtime to interface with the C++ code and retrieving files from the GCS. However, in development, these files are inaccessible from boto because they're stored in the dev_appserver's blobstore.

Is there a way to properly connect the local GAE and GCE to a local GCS?

For now, I gave up on the local GCS part and tried using the real GCS. The GCE part with boto is easy. The GCS part is also able to use the real GCS using an access_token so it uses the real GCS instead of the local blobstore by:

cloudstorage.common.set_access_token(access_token)

According to the docs:

access_token: you can get one by run 'gsutil -d ls' and copy the
  str after 'Bearer'.

That token works for a limited amount of time, so that's not ideal. Is there a way to set a more permanent access_token?


Source: (StackOverflow)

boto issue with IAM role

I'm trying to use AWS' recently announced "IAM roles for EC2" feature, which lets security credentials automatically get delivered to EC2 instances. (see http://aws.amazon.com/about-aws/whats-new/2012/06/11/Announcing-IAM-Roles-for-EC2-instances/).

I've set up an instance with an IAM role as described. I can also get (seemingly) proper access key / credentials with curl.

However, boto fails to do a simple call like "get_all_buckets", even though I've turned on ALL S3 permissions for the role.

The error I get is "The AWS Access Key Id you provided does not exist in our records"

However, the access key listed in the error matches the one I get from curl.

Here is the failing script, run on an EC2 instance with an IAM role attached that gives all S3 permissions:

import urllib2
import ast
from boto.s3.connection import S3Connection

resp=urllib2.urlopen('http://169.254.169.254/latest/meta-data/iam/security-credentials/DatabaseApp').read()
resp=ast.literal_eval(resp)
print "access:" + resp['AccessKeyId']
print "secret:" + resp['SecretAccessKey']
conn = S3Connection(resp['AccessKeyId'], resp['SecretAccessKey'])
rs= conn.get_all_buckets()

Source: (StackOverflow)

Using Amazon s3 boto library, how can I get the URL of a saved key?

I am saving a key to a bucket with:

    key = bucket.new_key(fileName)
    key.set_contents_from_string(base64.b64decode(data))
    key.set_metadata('Content-Type', 'image/jpeg')
    key.set_acl('public-read')

After the save is successful, how can I access the URL of the newly created file?


Source: (StackOverflow)

Using django-storages and the s3boto backend, How do I add caching info to request headers for an image so browser will cache image?

I am using the s3boto backend, not the s3 backend.

In the django-storages docs it says to specify the AWS_HEADERS variable on your settings.py file:

AWS_HEADERS (optional)

If you’d like to set headers sent with each file of the storage:

# see http://developer.yahoo.com/performance/rules.html#expires
AWS_HEADERS = {
'Expires': 'Thu, 15 Apr 2010 20:00:00 GMT',
'Cache-Control': 'max-age=86400',
}

This is not working for me.

Here is my model:

class Photo(models.Model):
    """
        docstring for Photo
        represents a single photo.. a photo can have many things associated to it like
        a project, a portfolio, etc...
    """

    def image_upload_to(instance, filename):
        today = datetime.datetime.today()
        return 'user_uploads/%s/%s/%s/%s/%s/%s/original/%s' % (instance.owner.username, today.year, today.month, today.day, today.hour, today.minute, filename)

    def thumb_upload_to(instance, filename):
        today = datetime.datetime.today()
        return 'user_uploads/%s/%s/%s/%s/%s/%s/thumb/%s' % (instance.owner.username, today.year, today.month, today.day, today.hour, today.minute, filename)

    def medium_upload_to(instance, filename):
        today = datetime.datetime.today()
        return 'user_uploads/%s/%s/%s/%s/%s/%s/medium/%s' % (instance.owner.username, today.year, today.month, today.day, today.hour, today.minute, filename)



    owner = models.ForeignKey(User)
    # take out soon
    projects = models.ManyToManyField('Project', through='Connection', blank=True)
    image = models.ImageField(upload_to=image_upload_to)
    thumb = ThumbnailerImageField(upload_to=thumb_upload_to, resize_source=dict(size=(102,102), crop='center'),)
    medium = ThumbnailerImageField(upload_to=medium_upload_to, resize_source=dict(size=(700,525),))
    title = models.CharField(blank=True, max_length=300)
    caption = models.TextField(blank=True)
    can_view_full_res = models.BooleanField(default=False)
    is_portfolio = models.BooleanField(default=False)
    created_time = models.DateTimeField(blank=False, auto_now_add=True)
    disabled = models.DateTimeField(blank=True, null=True, auto_now_add=False)
    cost = models.FloatField(default=0)
    rating = models.IntegerField(default=0)
    mature_content = models.BooleanField(default=False)
    objects = ViewableManager()

    def get_absolute_url(self):
        return "/m/photo/%i/" % self.pk

    def get_prev_by_time(self):
        get_prev = Photo.objects.order_by('-created_time').filter(created_time__lt=self.created_time)
        try:
            return get_prev[0]
        except IndexError:
            return None

    def get_next_by_time(self):
        get_next = Photo.objects.order_by('created_time').filter(created_time__gt=self.created_time)
        try:
            return get_next[0]
        except IndexError:
            return None

    def __unicode__(self):
        return(self.title)

This is what is on my template where I have the image...

<img class='shadow' src='{{ object.medium.url }}'>

Here are the request and response headers:

Request URL:https://MY_UPLOAD_CONTAINER.s3.amazonaws.com/user_uploads/travismillward/2012/3/23/3/0/medium/_0677866898.jpg?Signature=s%2ByKsWDxrDJbyeVHd%2BDS3JlByts%3D&Expires=1332529522&AWSAccessKeyId=MY_ACCESS_KEYID
Request Method:GET
Status Code:200 OK
Request Headersview parsed
GET /user_uploads/travismillward/2012/3/23/3/0/medium/_0677866898.jpg?Signature=s%2ByKsWDxrDJbyeVHd%2BDS3JlByts%3D&Expires=1332529522&AWSAccessKeyId=MY_ACCESS_KEYID HTTP/1.1
Host: MY_UPLOAD_CONTAINER.s3.amazonaws.com
Connection: keep-alive
Cache-Control: max-age=0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.83 Safari/535.11
Accept: */*
Referer: http://localhost:8000/m/photo/1/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Query String Parametersview URL encoded
Signature:s+yKsWDxrDJbyeVHd+DS3JlByts=
Expires:1332529522
AWSAccessKeyId:MY_ACCESS_KEYID
Response Headersview parsed
HTTP/1.1 200 OK
x-amz-id-2: wOWRRDi5TItAdiYSPf8X4z4I4v5/Zu8XLhwlxmZa8w8w1Jph8WQkenihVJI/ZKnV
x-amz-request-id: THE_X_AMZ_REQUEST_ID
Date: Fri, 23 Mar 2012 18:05:24 GMT
Cache-Control: max-age=86400
Last-Modified: Fri, 23 Mar 2012 09:00:13 GMT
ETag: "6e34e718a349e0bf9e4aefc1afad3d7d"
Accept-Ranges: bytes
Content-Type: image/jpeg
Content-Length: 91600
Server: AmazonS3

When I paste the path to the image into the address bar it WILL cache the image and give me a 304... Here are those request and response headers:

Request URL:https://MY_UPLOAD_CONTAINER.s3.amazonaws.com/user_uploads/travismillward/2012/3/23/3/0/medium/_0677866898.jpg?Signature=evsDZiw3QGsjPacG4CHn6Ji2dDA%3D&Expires=1332528782&AWSAccessKeyId=MY_ACCESS_KEYID
Request Method:GET
Status Code:304 Not Modified
Request Headersview parsed
GET /user_uploads/travismillward/2012/3/23/3/0/medium/_0677866898.jpg?Signature=evsDZiw3QGsjPacG4CHn6Ji2dDA%3D&Expires=1332528782&AWSAccessKeyId=MY_ACCESS_KEYID HTTP/1.1
Host: MY_UPLOAD_CONTAINER.s3.amazonaws.com
Connection: keep-alive
Cache-Control: max-age=0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.83 Safari/535.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Referer: http://localhost:8000/m/photo/1/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
If-None-Match: "6e34e718a349e0bf9e4aefc1afad3d7d"
If-Modified-Since: Fri, 23 Mar 2012 09:00:13 GMT
Query String Parametersview URL encoded
Signature:evsDZiw3QGsjPacG4CHn6Ji2dDA=
Expires:1332528782
AWSAccessKeyId:MY_ACCESS_KEYID
Response Headersview parsed
HTTP/1.1 304 Not Modified
x-amz-id-2: LfdHa10SdWnx4UH1rc62NfUDeiNVGRzBX73CR+6GDrXJgv9zo+vyQ9A3HCr1YLVa
x-amz-request-id: THE_X_AMZ_REQUEST_ID
Date: Fri, 23 Mar 2012 18:01:16 GMT
Last-Modified: Fri, 23 Mar 2012 09:00:13 GMT
ETag: "6e34e718a349e0bf9e4aefc1afad3d7d"
Server: AmazonS3

Source: (StackOverflow)

How to generate a temporary url to upload file to Amazon S3 with boto library?

I knew how to download file in this way - key.generate_url(3600).

But when I tried to upload : key.generate_url(3600, method='PUT'), the url didn't work. I was told: "The request signature we calculated does not match the signature you provided. Check your key and signing method."

I cannot found example code in boto homepage for how to use the function generate_url(method='PUT'). Does anyone here know how to use it for the uploading? how to set the params for the path of upload file?


Source: (StackOverflow)

Upload resized image to S3

I'm trying to upload resized image to S3:

fp = urllib.urlopen('http:/example.com/test.png')
img = cStringIO.StringIO(fp.read())

im = Image.open(img)
im2 = im.resize((500, 100), Image.NEAREST)  
AK = 'xx' # Access Key ID 
SK = 'xx' # Secret Access Key

conn = S3Connection(AK,SK) 
b = conn.get_bucket('example')
k = Key(b)
k.key = 'example.png'
k.set_contents_from_filename(im2)

but I get an error:

 in set_contents_from_filename
    fp = open(filename, 'rb')
TypeError: coercing to Unicode: need string or buffer, instance found

Source: (StackOverflow)

How do I get the file / key size in boto S3?

There must be an easy way to get the file size (key size) without pulling over a whole file. I can see it in the Properties of the AWS S3 browser. And I think I can get it off the "Content-length" header of a "HEAD" request. But I'm not connecting the dots about how to do this with boto. Extra kudos if you post a link to some more comprehensive examples than are in the standard boto docs.

EDIT: So the following seems to do the trick (though from looking at source code I'm not completely sure.):

bk = conn.get_bucket('my_bucket_name')
ky = boto.s3.key.Key(bk)
ky.open_read()  ## This sends a GET request. 
print ky.size

For now I'll leave the question open for comments, better solutions, or pointers to examples.


Source: (StackOverflow)

Boto - Uploading file to a specific location on Amazon S3

This is the code I'm working from

import sys
import boto
import boto.s3

# AWS ACCESS DETAILS
AWS_ACCESS_KEY_ID = ''
AWS_SECRET_ACCESS_KEY = ''

bucket_name = AWS_ACCESS_KEY_ID.lower() + '-mah-bucket' conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.create_bucket(bucket_name, location=boto.s3.connection.Location.DEFAULT)
uploadfile = sys.argv[1]

print 'Uploading %s to Amazon S3 bucket %s' % \
       (uploadfile, bucket_name)

def percent_cb(complete, total):
    sys.stdout.write('.')
    sys.stdout.flush()

from boto.s3.key import Key
k = Key(bucket)
k.key = 'my test file'
k.set_contents_from_filename(testfile, cb=percent_cb, num_cb=10)

On my S3 I have created "directories", like this "bucket/images/holiday". I know these are only virtual directories.

My question is, how can I modify this upload specifically to bucket/images/holiday virtual directory on S3 rather than the bucket root?


Source: (StackOverflow)