[Blueprint servercloud-p-move-ec2-mirrors-to-s3] Move EC2 mirrors to S3

Ben Howard ben.howard at canonical.com
Fri Oct 28 17:07:50 UTC 2011


Blueprint changed by Ben Howard:

Whiteboard changed:
  Rationale:
  S3 is a faster, more scalable technology that allows us to reduce costs and increase availability for EC2 users.
  
  Assumption:
    * Running on-EC2 mirrors is expensive and presents availability challenges.
    * on-EC2 mirrors have limited bandwidth and thus speeds can be affected by load
    * S3 has very high availability
    * S3 intra-region bandwidth is free
    * on-EC2 mirrors results in bandwidth charges when users use a mirror outside the availability zone
    * S3 is extremely fast. Generally speeds to S3 access are near-native
    * Amazon has asked for us to host our mirrors on S3
    * Uploads to S3 are free
  
- A  prototyped-S3 mirror will be shown at UDS.
+ A prototyped-S3 mirror will be shown at UDS.
+ 
+ ---
+ 
+ Hurdles: 
+ https://bugs.launchpad.net/ubuntu/+source/apt/+bug/882832 - S3 does not accept "+"'s when fetching files.
+   * May need to change APT meta-data to remove "+"'s
+   * May need to maintain a separate meta-data for S3 buckets and re-sign
+ 
  ----
  
  Q. 'LAN' bandwidth access to the region mirrors is currently free already, no? -- Daviey
  A. All upload and intra-zone transit is free.
  
  Q. How do you keep people out of EC2 from access?
  A. Bucket policies enable restricting access to the Amazon CIDR address.
  
  Comments:
  - Can this whiteboard be pre-filed with some examples of how others have implemented this?
     *  AFAIK, there is only one implementation of S3 buckets. Most examples that I know of us S3 as a storage backend, while having an EC2 instance front the S3 storage. Amazon Linux AMI is the only pure S3 backend/frontend solution.
  
  - How would this work?
    * There are a number of ways to push out a repository using existing tools. s3fuse (albeit rather buggy) allows you to mount a s3 bucket as file system. s3cmd allows for local to remote synchronization. However, in order to build something that scales well, you need something that uses multiple connections and pushes multiple files at once (i.e. threading and boto python library)
     * Example syncronization code can be found at lp:~utlemming/+junk/s3repo_tool (this code will be used to populate the prototyped-S3 mirror for UDS.
  
  --
  
  Examples:
  Amazon Linux AMI currently uses S3 for backing the mirrors. The design is more or less:
      - One bucket per region.
      - Buckets are named off of the DNS CNAME, i.e. "us-east-1.ec2.archive.ubuntu.com"
      - a method of pushing pristine repositories up

-- 
Move EC2 mirrors to S3
https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-move-ec2-mirrors-to-s3



More information about the Ubuntu-server-bugs mailing list