| 1 | #+latex_header: \documentclass[12pt]{article} |
| 2 | #+latex_header: \usepackage[margin=1in]{geometry} |
| 3 | #+OPTIONS: ^:nil |
| 4 | |
| 5 | GNU MediaGoblin |
| 6 | |
| 7 | * About |
| 8 | |
| 9 | What is MediaGoblin? I'm shooting for: |
| 10 | |
| 11 | - Initially, a place to store all your photos that's as awesome as, |
| 12 | more awesome than, existing proprietary solutions |
| 13 | - Later, a place for all sorts of media, such as video, music, etc |
| 14 | hosting. |
| 15 | - Federated, like statusnet/ostatus (we should use ostatus, in fact!) |
| 16 | - Customizable |
| 17 | - A place for people to collaborate and show off original and derived |
| 18 | creations |
| 19 | - Free, as in freedom. Under the GNU AGPL, v3 or later. Encourages |
| 20 | free formats and free licensing for content, too. |
| 21 | |
| 22 | Wow! That's pretty ambitious. Hopefully we're cool enough to do it. |
| 23 | I think we can. |
| 24 | |
| 25 | It's also necessary, for multiple reasons. Centralization and |
| 26 | proprietization of media on the internet is a serious problem and |
| 27 | makes the web go from a system of extreme resilience to a system |
| 28 | of frightening fragility. People should be able to own their data. |
| 29 | Etc. If you're reading this, chances are you already agree though. :) |
| 30 | |
| 31 | * Milestones |
| 32 | |
| 33 | Excepting the first, not necessarily in this order. |
| 34 | |
| 35 | ** Basic image hosting |
| 36 | ** Multi-media hosting (including video and audio) |
| 37 | ** API(s) |
| 38 | ** Federation |
| 39 | |
| 40 | Maybe this is 0.2 :) |
| 41 | |
| 42 | ** Plugin system |
| 43 | |
| 44 | * Technology |
| 45 | |
| 46 | I have a pretty specific set of tools that I expect to use in this |
| 47 | project. Those are: |
| 48 | |
| 49 | - *[[http://python.org/][Python]]:* because I love, and know well, the language |
| 50 | - *[[http://www.mongodb.org/][MongoDB]]:* a "document database". Because it's extremely flexible |
| 51 | (and scales up well, but I guess not down well) |
| 52 | - *[[http://namlook.github.com/mongokit/][MongoKit]]:* a lightweight ORM for mongodb. Helps us define our |
| 53 | structures better, does schema validation, schema evolution, and |
| 54 | helps make things more fun and pythonic. |
| 55 | - *[[http://jinja.pocoo.org/docs/][Jinja2]]:* for templating. Pretty much django templates++ (wow, I |
| 56 | can actually pass arguments into method calls instead of tediously |
| 57 | writing custom tags!) |
| 58 | - *[[http://wtforms.simplecodes.com/][WTForms]]:* for form handling, validation, abstraction. Almost just |
| 59 | like Django's templates, |
| 60 | - *[[http://pythonpaste.org/webob/][WebOb]]:* gives nice request/response objects (also somewhat djangoish) |
| 61 | - *[[http://pythonpaste.org/deploy/][Paste Deploy]] and [[http://pythonpaste.org/script/][Paste Script]]:* as the default way of configuring |
| 62 | and launching the application. Since MediaGoblin will be fairly |
| 63 | wsgi minimalist though, you can probably use other ways to launch |
| 64 | it, though this will be the default. |
| 65 | - *[[http://routes.groovie.org/][Routes]]:* for URL routing. It works well enough. |
| 66 | - *[[http://jquery.com/][JQuery]]:* for all sorts of things on the javascript end of things, |
| 67 | for all sorts of reasons. |
| 68 | - *[[http://beaker.groovie.org/][Beaker]]:* for sessions, because that seems like it's generally |
| 69 | considered the way to go I guess. |
| 70 | - *[[http://somethingaboutorange.com/mrl/projects/nose/1.0.0/][nose]]:* for unit tests, because it makes testing a bit nicer. |
| 71 | - *[[http://celeryproject.org/][Celery]]:* for task queueing (think resizing images, encoding |
| 72 | video) because some people like it, and even the people I know who |
| 73 | don't don't seem to know of anything better :) |
| 74 | - *[[http://www.rabbitmq.com/][RabbitMQ]]:* for sending tasks to celery, because I guess that's |
| 75 | what most people do. Might be optional, might also let people use |
| 76 | MongoDB for this if they want. |
| 77 | |
| 78 | ** Why python |
| 79 | |
| 80 | Because I (Chris Webber) know Python, love Python, am capable of |
| 81 | actually making this thing happen in Python (I've worked on a lot of |
| 82 | large free software web applications before in Python, including |
| 83 | [[http://mirocommunity.org/][Miro Community]], the [[http://miroguide.org][Miro Guide]], a large portion of |
| 84 | [[http://creativecommons.org/][Creative Commons' site]], and a whole bunch of things while working at |
| 85 | [[http://www.imagescape.com/][Imaginary Landscape]]). I know Python, I can make this happen in |
| 86 | Python, me starting a project like this makes sense if it's done in |
| 87 | Python. |
| 88 | |
| 89 | You might say that PHP is way more deployable, that rails has way more |
| 90 | cool developers riding around on fixie bikes, and all of those things |
| 91 | are true, but I know Python, like Python, and think that Python is |
| 92 | pretty great. I do think that deployment in Python is not as good as |
| 93 | with PHP, but I think the days of shared hosting are (thankfully) |
| 94 | coming to an end, and will probably be replaced by cheap virtual |
| 95 | machines spun up on the fly for people who want that sort of stuff, |
| 96 | and Python will be a huge part of that future, maybe even more than |
| 97 | PHP will. The deployment tools are getting better. Maybe we can use |
| 98 | something like Silver Lining. Maybe we can just distribute as .debs |
| 99 | or .rpms. We'll figure it out. |
| 100 | |
| 101 | But if I'm starting this project, which I am, it's gonna be in Python. |
| 102 | |
| 103 | ** Why mongodb |
| 104 | |
| 105 | In case you were wondering, I am not a NOSQL fanboy, I do not go |
| 106 | around telling people that MongoDB is web scale. Actually my choice |
| 107 | for MongoDB isn't scalability, though scaling up really nicely is a |
| 108 | pretty good feature and sets us up well in case large volume sites |
| 109 | eventually do use MediaGoblin. But there's another side of |
| 110 | scalability, and that's scaling down, which is important for |
| 111 | federation, maybe even more important than scaling up in an ideal |
| 112 | universe where everyone ran servers out of their own housing. As a |
| 113 | memory-mapped database, MongoDB is pretty hungry, so actually I spent |
| 114 | a lot of time debating whether the inability to scale down as nicely |
| 115 | as something like SQL has with sqlite meant that it was out. |
| 116 | |
| 117 | But I decided in the end that I really want MongoDB, not for |
| 118 | scalability, but for flexibility. Schema evolution pains in SQL are |
| 119 | almost enough reason for me to want MongoDB, but not quite. The real |
| 120 | reason is because I want the ability to eventually handle multiple |
| 121 | media types through MediaGoblin, and also allow for plugins, without |
| 122 | the rigidity of tables making that difficult. In other words, |
| 123 | something like: |
| 124 | |
| 125 | #+BEGIN_SRC javascript |
| 126 | {"title": "Me talking until you are bored", |
| 127 | "description": "blah blah blah", |
| 128 | "media_type": "audio", |
| 129 | "media_data": { |
| 130 | "length": "2:30", |
| 131 | "codec": "OGG Vorbis"}, |
| 132 | "plugin_data": { |
| 133 | "licensing": { |
| 134 | "license": "http://creativecommons.org/licenses/by-sa/3.0/"}}} |
| 135 | #+END_SRC |
| 136 | |
| 137 | Being able to just dump media-specific information in a media_data |
| 138 | hashtable is pretty great, and even better is having a plugin system |
| 139 | where you can just let plugins have their own entire key-value space |
| 140 | cleanly inside the document that doesn't interfere with anyone else's |
| 141 | stuff. If we were to let plugins to deposit their own information |
| 142 | inside the database, either we'd let plugins create their own tables |
| 143 | which makes SQL migrations even harder than they already are, or we'd |
| 144 | probably end up creating a table with a column for key, a column for |
| 145 | value, and a column for type in one huge table called "plugin_data" or |
| 146 | something similar. (Yo dawg, I heard you liked plugins, so I put a |
| 147 | database in your database so you can query while you query.) Gross. |
| 148 | |
| 149 | I also don't want things to be too lose so that we forget or lose the |
| 150 | structure of things, and that's one reason why I want to use MongoKit, |
| 151 | because we can cleanly define a much structure as we want and verify |
| 152 | that documents match that structure generally without adding too much |
| 153 | bloat or overhead (mongokit is a pretty lightweight wrapper and |
| 154 | doesn't inject extra mongokit-specific stuff into the database, which |
| 155 | is nice and nicer than many other ORMs in that way). |
| 156 | |
| 157 | ** Why wsgi minimalism / Why not Django |
| 158 | |
| 159 | If you notice in the technology list above, I list a lot of components |
| 160 | that are very [[http://www.djangoproject.com/][Django-like]], but not actually Django components. What |
| 161 | can I say, I really like a lot of the ideas in Django! Which leads to |
| 162 | the question: why not just use Django? |
| 163 | |
| 164 | While I really like Django's ideas and a lot of its components, I also |
| 165 | feel that most of the best ideas in Django I want have been |
| 166 | implemented as good or even better outside of Django. I could just |
| 167 | use Django and replace the templating system with Jinja2, and the form |
| 168 | system with wtforms, and the database with MongoDB and MongoKit, but |
| 169 | at that point, how much of Django is really left? |
| 170 | |
| 171 | I also am sometimes saddened and irritated by how coupled all of |
| 172 | Django's components are. Loosely coupled yes, but still coupled. |
| 173 | WSGI has done a good job of providing a base layer for running |
| 174 | applications on and [[http://pythonpaste.org/webob/do-it-yourself.html][if you know how to do it yourself]] it's not hard or |
| 175 | many lines of code at all to bind them together without any framework |
| 176 | at all (not even say [[http://pylonshq.com/][Pylons]], [[http://docs.pylonsproject.org/projects/pyramid/dev/][Pyramid]], or [[http://flask.pocoo.org/][Flask]] which I think are still |
| 177 | great projects, especially for people who want this sort of thing but |
| 178 | have no idea how to get started). And even at this already really |
| 179 | early stage of writing MediaGoblin, that glue work is mostly done. |
| 180 | |
| 181 | Not to say I don't think Django isn't great for a lot of things. For |
| 182 | a lot of stuff, it's still the best, but not for MediaGoblin, I think. |
| 183 | |
| 184 | One thing that Django does super well though is documentation. It |
| 185 | still has some faults, but even with those considered I can hardly |
| 186 | think of any other project in Python that has as nice of documentation |
| 187 | as Django. It may be worth |
| 188 | [[http://pycon.blip.tv/file/4881071/][learning some lessons on documentation from Django]], on that note. |
| 189 | |
| 190 | I'd really like to have a good, thorough hacking-howto and |
| 191 | deployment-howto, especially in the former making some notes on how to |
| 192 | make it easier for Django hackers to get started. |