More documentation on all the core storage API methods
[mediagoblin.git] / READMEish.org
1 #+latex_header: \documentclass[12pt]{article}
2 #+latex_header: \usepackage[margin=1in]{geometry}
3 #+OPTIONS: ^:nil
4
5 GNU MediaGoblin
6
7 * About
8
9 What is MediaGoblin? I'm shooting for:
10
11 - Initially, a place to store all your photos that's as awesome as,
12 more awesome than, existing proprietary solutions
13 - Later, a place for all sorts of media, such as video, music, etc
14 hosting.
15 - Federated, like statusnet/ostatus (we should use ostatus, in fact!)
16 - Customizable
17 - A place for people to collaborate and show off original and derived
18 creations
19 - Free, as in freedom. Under the GNU AGPL, v3 or later. Encourages
20 free formats and free licensing for content, too.
21
22 Wow! That's pretty ambitious. Hopefully we're cool enough to do it.
23 I think we can.
24
25 It's also necessary, for multiple reasons. Centralization and
26 proprietization of media on the internet is a serious problem and
27 makes the web go from a system of extreme resilience to a system
28 of frightening fragility. People should be able to own their data.
29 Etc. If you're reading this, chances are you already agree though. :)
30
31 * Milestones
32
33 Excepting the first, not necessarily in this order.
34
35 ** Basic image hosting
36 ** Multi-media hosting (including video and audio)
37 ** API(s)
38 ** Federation
39
40 Maybe this is 0.2 :)
41
42 ** Plugin system
43
44 * Technology
45
46 I have a pretty specific set of tools that I expect to use in this
47 project. Those are:
48
49 - *[[http://python.org/][Python]]:* because I love, and know well, the language
50 - *[[http://www.mongodb.org/][MongoDB]]:* a "document database". Because it's extremely flexible
51 (and scales up well, but I guess not down well)
52 - *[[http://namlook.github.com/mongokit/][MongoKit]]:* a lightweight ORM for mongodb. Helps us define our
53 structures better, does schema validation, schema evolution, and
54 helps make things more fun and pythonic.
55 - *[[http://jinja.pocoo.org/docs/][Jinja2]]:* for templating. Pretty much django templates++ (wow, I
56 can actually pass arguments into method calls instead of tediously
57 writing custom tags!)
58 - *[[http://wtforms.simplecodes.com/][WTForms]]:* for form handling, validation, abstraction. Almost just
59 like Django's templates,
60 - *[[http://pythonpaste.org/webob/][WebOb]]:* gives nice request/response objects (also somewhat djangoish)
61 - *[[http://pythonpaste.org/deploy/][Paste Deploy]] and [[http://pythonpaste.org/script/][Paste Script]]:* as the default way of configuring
62 and launching the application. Since MediaGoblin will be fairly
63 wsgi minimalist though, you can probably use other ways to launch
64 it, though this will be the default.
65 - *[[http://routes.groovie.org/][Routes]]:* for URL routing. It works well enough.
66 - *[[http://jquery.com/][JQuery]]:* for all sorts of things on the javascript end of things,
67 for all sorts of reasons.
68 - *[[http://beaker.groovie.org/][Beaker]]:* for sessions, because that seems like it's generally
69 considered the way to go I guess.
70 - *[[http://somethingaboutorange.com/mrl/projects/nose/1.0.0/][nose]]:* for unit tests, because it makes testing a bit nicer.
71 - *[[http://celeryproject.org/][Celery]]:* for task queueing (think resizing images, encoding
72 video) because some people like it, and even the people I know who
73 don't don't seem to know of anything better :)
74 - *[[http://www.rabbitmq.com/][RabbitMQ]]:* for sending tasks to celery, because I guess that's
75 what most people do. Might be optional, might also let people use
76 MongoDB for this if they want.
77
78 ** Why python
79
80 Because I (Chris Webber) know Python, love Python, am capable of
81 actually making this thing happen in Python (I've worked on a lot of
82 large free software web applications before in Python, including
83 [[http://mirocommunity.org/][Miro Community]], the [[http://miroguide.org][Miro Guide]], a large portion of
84 [[http://creativecommons.org/][Creative Commons' site]], and a whole bunch of things while working at
85 [[http://www.imagescape.com/][Imaginary Landscape]]). I know Python, I can make this happen in
86 Python, me starting a project like this makes sense if it's done in
87 Python.
88
89 You might say that PHP is way more deployable, that rails has way more
90 cool developers riding around on fixie bikes, and all of those things
91 are true, but I know Python, like Python, and think that Python is
92 pretty great. I do think that deployment in Python is not as good as
93 with PHP, but I think the days of shared hosting are (thankfully)
94 coming to an end, and will probably be replaced by cheap virtual
95 machines spun up on the fly for people who want that sort of stuff,
96 and Python will be a huge part of that future, maybe even more than
97 PHP will. The deployment tools are getting better. Maybe we can use
98 something like Silver Lining. Maybe we can just distribute as .debs
99 or .rpms. We'll figure it out.
100
101 But if I'm starting this project, which I am, it's gonna be in Python.
102
103 ** Why mongodb
104
105 In case you were wondering, I am not a NOSQL fanboy, I do not go
106 around telling people that MongoDB is web scale. Actually my choice
107 for MongoDB isn't scalability, though scaling up really nicely is a
108 pretty good feature and sets us up well in case large volume sites
109 eventually do use MediaGoblin. But there's another side of
110 scalability, and that's scaling down, which is important for
111 federation, maybe even more important than scaling up in an ideal
112 universe where everyone ran servers out of their own housing. As a
113 memory-mapped database, MongoDB is pretty hungry, so actually I spent
114 a lot of time debating whether the inability to scale down as nicely
115 as something like SQL has with sqlite meant that it was out.
116
117 But I decided in the end that I really want MongoDB, not for
118 scalability, but for flexibility. Schema evolution pains in SQL are
119 almost enough reason for me to want MongoDB, but not quite. The real
120 reason is because I want the ability to eventually handle multiple
121 media types through MediaGoblin, and also allow for plugins, without
122 the rigidity of tables making that difficult. In other words,
123 something like:
124
125 #+BEGIN_SRC javascript
126 {"title": "Me talking until you are bored",
127 "description": "blah blah blah",
128 "media_type": "audio",
129 "media_data": {
130 "length": "2:30",
131 "codec": "OGG Vorbis"},
132 "plugin_data": {
133 "licensing": {
134 "license": "http://creativecommons.org/licenses/by-sa/3.0/"}}}
135 #+END_SRC
136
137 Being able to just dump media-specific information in a media_data
138 hashtable is pretty great, and even better is having a plugin system
139 where you can just let plugins have their own entire key-value space
140 cleanly inside the document that doesn't interfere with anyone else's
141 stuff. If we were to let plugins to deposit their own information
142 inside the database, either we'd let plugins create their own tables
143 which makes SQL migrations even harder than they already are, or we'd
144 probably end up creating a table with a column for key, a column for
145 value, and a column for type in one huge table called "plugin_data" or
146 something similar. (Yo dawg, I heard you liked plugins, so I put a
147 database in your database so you can query while you query.) Gross.
148
149 I also don't want things to be too lose so that we forget or lose the
150 structure of things, and that's one reason why I want to use MongoKit,
151 because we can cleanly define a much structure as we want and verify
152 that documents match that structure generally without adding too much
153 bloat or overhead (mongokit is a pretty lightweight wrapper and
154 doesn't inject extra mongokit-specific stuff into the database, which
155 is nice and nicer than many other ORMs in that way).
156
157 ** Why wsgi minimalism / Why not Django
158
159 If you notice in the technology list above, I list a lot of components
160 that are very [[http://www.djangoproject.com/][Django-like]], but not actually Django components. What
161 can I say, I really like a lot of the ideas in Django! Which leads to
162 the question: why not just use Django?
163
164 While I really like Django's ideas and a lot of its components, I also
165 feel that most of the best ideas in Django I want have been
166 implemented as good or even better outside of Django. I could just
167 use Django and replace the templating system with Jinja2, and the form
168 system with wtforms, and the database with MongoDB and MongoKit, but
169 at that point, how much of Django is really left?
170
171 I also am sometimes saddened and irritated by how coupled all of
172 Django's components are. Loosely coupled yes, but still coupled.
173 WSGI has done a good job of providing a base layer for running
174 applications on and [[http://pythonpaste.org/webob/do-it-yourself.html][if you know how to do it yourself]] it's not hard or
175 many lines of code at all to bind them together without any framework
176 at all (not even say [[http://pylonshq.com/][Pylons]], [[http://docs.pylonsproject.org/projects/pyramid/dev/][Pyramid]], or [[http://flask.pocoo.org/][Flask]] which I think are still
177 great projects, especially for people who want this sort of thing but
178 have no idea how to get started). And even at this already really
179 early stage of writing MediaGoblin, that glue work is mostly done.
180
181 Not to say I don't think Django isn't great for a lot of things. For
182 a lot of stuff, it's still the best, but not for MediaGoblin, I think.
183
184 One thing that Django does super well though is documentation. It
185 still has some faults, but even with those considered I can hardly
186 think of any other project in Python that has as nice of documentation
187 as Django. It may be worth
188 [[http://pycon.blip.tv/file/4881071/][learning some lessons on documentation from Django]], on that note.
189
190 I'd really like to have a good, thorough hacking-howto and
191 deployment-howto, especially in the former making some notes on how to
192 make it easier for Django hackers to get started.