Tweekers-specs-en
From WikiStrycore
First revision : 2009/09/11Mathieu Comandon <strycore@gmail.com>
Contents |
Introduction
Why this project ?
Being around Open Source Software for a few years now, I realized that something vital is missing in the developer's community. There are hundreds of amazing tools to manage source code, bugs, there's a lot of freely available documentation on the Internet and in manpages so that's a good point for Open Source software and it is one of its major strength.
But when you take a step back and look at all of it, it's pretty much a big mess. This is not only true to me, being a novice in Open Source development but also to big, experienced companies like Google[1] or Adobe[2].
The Free Software liberties state in liberty #1 :
The freedom to study how the program works, and change it to make it do what you wish (freedom 1). Access to the source code is a precondition for this.
Having access to the source code is one of the greatest things ever. But have you ever tried to take a random project and decided to study it ? Noticed who hard it is ? In the end, if you spend a lot of time reading the source code and finding the right documentation you might understand how the program works. But it's a lot of effort, and with all this freely distributable content, it's not really hard to imagine a better way to organize all this mess. What's even worse is that there will be many people trying to understand a given project in order to contribute to it or just to get better at programming. Most will give up given the complexity of a mature project, and the little bits they manage to understand will remain cryptic to another random person trying to dive into the project.
With all these software forges, bugtrackers, wikis, manpages, there is no single tool to help you study the source code. For me, that's almost an insult to the freedom #1 of Free Software. Yes, you can study the source code but good luck with that !
We see a lot of Social Networks appearing but websites such as Facebook won't do anymore than let you know about the latest funny Youtube video or when you'll go out for a beer with your friends. Even professional social networks such as Linkedin won't make you more efficient in your work. Social networks have a real potential to create a better way of working collaboratively, and yet, it's almost unused. Of course, Launchpad is a great platform for maintaining projects but it has it's limits and anyway, it's nothing more than a good software forge. Questions and answers sites such as Stack Overflow do not focus on software projects, it everybody on their own, working on their own pieces of code and helping each other.
All of this is only making life harder for new contributors. We want to make software project really easy to understand so we can get more contributions from users that are not in the core development team. I see these difficulties as the main obstacle to better innovation in Open Source software, but luckily it's not something impossible to fix.
Staying focused on your goal
What really matters is not the source code itself, it's what you can do with it's compiled (or interpreted) form. This is what I had in mind when I started the first version of Tweekers in 2005. Back then I didn't have Open Source software development in mind, but the initial goal still remains : finding solutions to problems in a top down fashion. Working this way avoids to stick to a project that could be replaced with a better equivalent. The final goal stays on focus and all must be done to find the simplest solution possible (following the KISS principle).
Traditional software documentation exposes every single functionality in a project, resulting in a heavy, hard to understand document, where the information you need is lost in an ocean of useless parts. In Tweekers, it must be really easy to hide parts of a document that are not concerning your current problem. This can be done by adding tags to parts of documents and implementing something similar to code folding in IDEs.
Why are the current tools imperfect ?
Free software developers have a lot of tools to communicate, share code, debug programs, get documentation, etc... but it's far from being really user-friendly and contributing to a single project needs a lot of commitment and time to understand it.
Let's review a few tools that people can use to collaborate on software projects:
- Forums : Forums are a really messy place and one of the most inefficient tools to work on a project. Most of the time the forum's search engine is unsatisfying. Relevant information is lost between misunderstandings, unrelated problems, trolling, wrong solutions.Problems remain untouched after they've been marked as solved.
- Mailing Lists : They suffer from the same problems as forums but are even less practical. The positive thing about mailing list is their audience. You can often directly get in touch with someone directly involved with the project you're working on or with experienced users. This means that you are more likely to get a good answer if you ask an meaningful question. But when it comes to finding information in the mailing list's archives, the situation is even worse than forums. Most of the time you rely on Google to find information in these mailing lists. Like forums, solved problems remain unedited with all irrelevant information making the task harder to find a solution.Nevertheless, mailing lists contain very valuable information and Tweekers will have a dedicated robot doing searches in them.
- IRC : IRC (or Jabber) can be very effective as a collaboration method due to it's realtime aspect. That is of course if you are lucky enough be online at the same time as an experienced user. The big drawback is that solutions found on IRC tend to stay there. A happy user will rarely post the solution to his problem on another platform. Nobody reads IRC logs to find a solution, you just go on the channel, hoping that someone will answer you.
- Wikis : They are maybe the best way to give good information on a project. There is rarely irrelevant information published in a wiki, it can be edited by anyone, cleaned up, updated... There are some issues with wiki, though. It's almost anonymous so you don't really know if the person who published the information is experienced or not. And it's not a platform where you usually ask for help, it's more a place where you seek for answers, not where you ask for question (which could remain unanswered forever)
- Plain documentation : It might be the best tool to use for learning on a project, but sometimes documentation are difficult to understand and lack real world examples. It's not a collaborative tool at all. There may be exceptions like PHP's documentation where you have comments below the doc pages with good examples. Documentation should be allowed to be dynamic without denaturing the original work.
Let's summarize what major problems these collaborative platforms have :
- Lack of organization : Searchs engines are not always efficient. Relevant information is lost between a lot of useless crap. Information is not tagged.
- Information redundancy : With these inefficient mediums, it's common to have the same questions asked over and over.
- Different platform leads to different communities : Between forums, wikis, mailing lists, all interoperability is lost. Different users use different platform, reducing the chances to get the good answers.
- Everybody is equal : Whether you're taking to Linus Torvalds himself or to a wanabe hacker who's learned C for 3 months at school, on the web it's all the same. If two users tell you two different things, who should you trust ? Now imagine a system build like PGP's trust mechanism. You could give Guido Van Rossum the maximum level of trust on Python programming so that users who don't know who Guido is, know at least that he knows what he's talking about when Python's on the topic. This system would also be similar to a role playing game. You earn experience in various abilities by doing great stuff, by giving smart answers.
Another thing is that we can't ask for software developers to write (good) documentation if they don't want to. Tweekers goals is about putting together people who want to understand a project together and unite their efforts. Trying to understand a medium sized project alone is almost impossible, but when you're a small group things become much more easy. Every question answered is easily reachable, and you can read it without all the research that led to it. You have the question and the answer in their simplest forms. Any unnecessary part is shaved out, to make an analogy with Occam's razor.
Contact the right person for your problem.
One of Tweeker's main objective is to attribute roles to users. The platform will be optimized to reach the people who are the best at what you're trying to do. It will also be easy to contact the core development team even if they don't have an account on the site. Robots will parse source code, readme and authors files, searching for email addresses and import them automatically on a project's page. Members of Tweekers will then be able to attribute roles to the different persons on the team. If a developer is also a member of Tweekers, you will be asked for a confirmation if another user attaches you to a particular role. Note that it won't be possible to send private messages to developers via Tweekers, if this is what you really want to do, you'll have to manually email the developer.
Project details
License and distribution
The software's backbone will be licensed with the GNU AGPL v3 License. Parts specific to the server such as the database or configuration files will not be covered by this license.
The distributable application will have minimal configuration settings allowing to run it without hassle for the end-user. In the Symfony framework, initial assets for the database are called fixtures, these fixtures will contain some sample data, allowing to test the application quickly.
Localization
Until Tweekers has reached an advanced state in features an usability, it will be entirely in English. The source code will remain in English, this includes variable names, comments and documentation. However, the documentation can be translated in other languages.
Technologies
The earliest draft planned to use Google Wave for real time collaboration on documents. But since Google hasn't opened it's code yet, we'll have to wait some time to be able to use this technology. Using Google's server is not OK, we need our own federation server which will run a version of Wave customized for the need of the project. Robots are an important part of the project. They will convey an important amount of data so we can't use Google App Engine. It's still not possible to host our own Wave robots. For the time being and until further notice, we'll do without Wave.
The web frontend will very likely be built with Django. It's an awesome Web framework and Python has an enormous amount of librairies to deal with source code.
Robots will be build mainly with Python and Shell scripts, interacting with software forges and a local Ubuntu repository mirror.
For a richer experience, Javascript will mandatory and there might be heavy usage of jQuery so browsers without Javascript won't be supported. Actually, the only browsers to get support might be the ones which implements HTML 5. No effort supporting Internet Explorer will be made.
Data
The user
The user will have to fill in mandatory data when he subscribes. Mandatory data has to be kept to a minimum. A user does not have to subscribe on Tweekers to be in the database. Any programmer who has worked on a project that is referenced in Tweekers will be in the users database. If a programmer wishes to get a real account while already referenced in Tweekers, there should be a strong authentication mechanism to make sure the programmer really is who is claims he is. Currently, I can only think of PGP to authenticate a person reliably, the downside is that not every user has a PGP key, and if they do, they might not be signed by other users. Because of this, a user should be able to connect his existing account with his email. Because this is a weak protection (even if the email headers can be checked for fraud), the user has to remain in an unverified state until he gets a signed and reliable PGP key.
The mandatory data for an user will only consist of a mail and password
Skills
A user will be able to edit his different skills and give them a rating depending on how well he thinks he knows the subject. The skill level might be adjusted with some kind of tests. This should not be mandatory, but could help if the user doesn't really know his skill level.
After the skills has been set, they will be adjustable by other users and the user himself.
Skills will have a value from 0 to 100 but will also be shown as stars. (1 star = 20%).
Reply fields will have a star icon with a tooltip “Give this user credit for his skills”. Users can give a lower or higher rating but can also confirm the current rating. A rating consisting of 100 users averaging 80% at a skill will be more trusted that a rating with only 2 users. So there's two values to take into account when looking at a skill rating : the rating itself, and it's weight. A good way to represent this visually would be with alpha channels. Users with few ratings would have almost transparent stars, and users with a lot of rating would have very bright, colorful stars.
We don't need to keep track of which user has rated what rating. A simple pondered average calculation will be enough. It's still possible to keep track of each individual rating if we find a use for that, meanwhile having pre-calculated ratings will be nicer to the database.
User groups
Users should be able to create groups to gather other users with the same interests, hackers living in the same town while not being attached to any particular project.
Projects
A project can be an application (e.g., Firefox), a library(e.g., libgtk) or a collection of both (e.g., Gnome).
A project must have
- home page
- documentation
- authors
- license
- type (library, application, …)
- section (networking, graphics, sound, web ... ) The section must not be another project (Gnome or Xorg for example)
It may also have :
- Wikipedia page
- Freshmeat project
- Launchpad page
- Sourceforge page
- page on other trackers
- a forum
- Wiki page
There are many webpages available for a single project and it's impossible to know how many so the list above is only to give you an idea of what's possible. There is no way that the database will have a “Wikipedia” field for instance. Instead there will be a project_websites table containing the url and the type of Website (just for classification and cosmetic purposes). The website type will be guessed automatically when possible.
Usually, a project will always have a subproject, the programming languages are also sub-projects themselves. This allows to reach any element of the project (source code or documentation).
Forks should not be considered as subproject of the parent project as there is no dependency on it.
Subprojects should optionally take into account project versions for the child and parent projects.
Project versions
Version identification is diverse. There is not a norm than could identify versions for every projects.
There are :
- Version numbers : 1.51 , 3.0 , 1.1.32
- Revisions (usually the commit number of a version control system)
- Code Names : Karmic Koala, Shiretoko, Sid, Europa
- Stability hints : RC, beta, alpha, trunk
A new version contains a changelog containing bugfixes, new features and modified features.
For more information about software versioning see Wikipedia : http://en.wikipedia.org/wiki/Software_versioning
Publishing content
Post title should be kept short, but can be a bit longer than a forum thread or mail title. If we want to keep the possibility of posting Tweeks to identi.ca or Twitter, the total message should not exceed 140 characters, including a short-url to the Tweek itself (let's say 20 characters).
The Tweeks will follow the same convention for tagging as identi.ca : @ for users, ! for tags and # for projects. However Tags will be entered in a separate input field (as an option).
For example:
How do I make a copy of an existing disk image in #VirtualBox ?
Projects : VirtualBox Tags : |
The project field has been automatically filled in with VirtualBox from the message title.
It should be possible to remove automatic tagging when for example posting a message like this :
Can we delete the #include <stdlib.h> in somefile.c ?
Projects : C, someproject Tags : Code cleanup |
The first example can also be entered as :
How do I make a copy of an existing disk image ?
Projects : VirtualBox Tags : |
Making it a bit shorter. We don't need to know that it's about VirtualBox since it will only appear on VirtualBox's page.
This will be seen on a microblogging site as :
| How do I make a copy of an existing disk image ? !virtualbox http://bit.ly/bleh |
Easy document attachment and editing
Documents will be the data returned by Tweekers' robots. It can be source code, documentation, man page, a forum thread …
Not only you can link to a document, but you can point to any single line, you can highlight certain parts of the document, hide other parts and link your annotations with it.
At the time I'm writing this document, I still don't have access to Google Wave technology. The process on collaborative editing is still not clear, but luckily, once Wave is made available, it will become clear on how to implement this functionality. I've made a few experimentation with Javascript but this is not the way to go, document editing must not be done in Javascript but in Python (or Java) by a robot.
Client software
Tweekers will have a optional client for the desktop, making some tasks easier.
Jabber support
Users will be able to receive notifications with their default Jabber client like in identi.ca. This functionality is independent from the main Tweekers client. Some functionality might only be available with the main client.
System configuration
The client will be able to communicate to the main server many types of information about the user's configuration. It will be able to determine which hardware the client in run to, send some configuration files (located in /etc or in the /home folder). It will also manage authentication.
Project manager
The client will keep a list of the project the user is working on. It will fetch tasks related to these projects on the main server. In more advanced versions of the client, the user will be able to select some text in a source file, make a note an upload it automatically on Tweekers.
Scenarios
Scenario 1 : Understanding a project
Real life scenario : etswitch is a program to switch from a fullscreen OpenGL or SDL game to the desktop. This program is written in C and is more or less maintained.
Nevertheless, this program is really useful for games, and I'd like to integrate this functionality in one of my projects (Lutris). I'm not really experienced with C, I'm more comfortable with Python and PHP. But at least I've learned C, so I can understand most of it.
What I have a harder time understanding is everything else :
-
- Some standard GNU C libraries (fcntl.h , ioctl.h, …)
- Xlib libraries (Xatom, xpm, Xmu, keysym, …)
- autotools (automake, autoconf, …)
Finding documentation for all of these libraries and tools is pretty much limited to the manpages which is not what I would expect these days. Nevertheless, these manpages contain precious information, so the manpage robot will be useful.
Lets see which subprojects (actually includes) etswitch has
grep -hr "#include" *
gives (with a bit of cleanup, I removed comments and project specific includes)
#include <ctype.h> #include <dirent.h> #include <errno.h> #include <fcntl.h> #include <features.h> #include <float.h> #include <libgen.h> #include <linux/limits.h> #include <linux/soundcard.h> #include <malloc.h> #include <pthread.h> #include <pwd.h> #include <sgtty.h> #include <signal.h> #include <stdarg.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/file.h> #include <sys/ioctl.h> #include <sys/param.h> #include <sys/resource.h> #include <sys/stat.h> #include <sys/systemcfg.h> #include <sys/time.h> #include <sys/types.h> #include <termio.h> #include <unistd.h> #include <X11/extensions/record.h> #include <X11/extensions/xf86vmode.h> #include <X11/Intrinsic.h> #include <X11/keysym.h> #include <X11/Xatom.h> #include <X11/X.h> #include <X11/Xlib.h> #include <X11/Xmu/WinUtil.h> #include <X11/Xos.h> #include <X11/xpm.h> #include <X11/Xutil.h> Okay that's quite a lot of includes, some of them are pretty standard like stdio.h but most of them are totally new to me (from a developer's point of view). You can't just type
man dirent.h No manual entry for dirent.h
Looking at this webpage : http://www.delorie.com/gnu/docs/dirent/directory.3.html it seems that there is a dirent manpage, but I don't have it … damn... which package must I install to read it ?
Anyway, this is just one huge mess, and I'm not the only one to think this way. Google has struggled with Linux development with Chrome and the developer of the game Braid was working on a Linux port of his game until he realized how messy everything was. When did things stop evolving ? Many years ago, hardcore engineers made some awesome work on Unix, Linux and BSD and then it kinda slowed down, people preferred to move to high-level languages such as Python rather than clean up the low level ones and make them easier to use and understand.
Okay, back at etswitch. We saw that we extracted the dependencies (includes) pretty easily and I might have cleaned the output by hand only because I was to lazy to look for the perfect regexp that would give me the perfect result. I guess that this is what configure scripts do. If a configure can do this for a Makefile, it can't be hard to link the list of includes to some docs.
Lessons learned from the scenarios
Coding guidelines
I would say that one of the most important aspect of understanding source code is how well it is written. Coding guidelines exists for every language and in most case they are very similar between different languages.
It is very hard to understand a program that instantiate one letter variable because you have to read the algorithm to determine the use of the function. However with little knowledge of the language you can help others by refactoring existing source code. Refactoring can be done easily on most IDEs such as Eclipse. I don't know about emacs or vim but it should be possible since they are “the most powerful code editing tools on earth”. No source code editor would be considered complete without a simple variable renaming functionality, right ?
Refer to the coding guidelines of your language of choice ,
for GNU C it's here : http://www.gnu.org/prep/standards/standards.html
and for Python it's mostly in PEP-8 : http://www.python.org/dev/peps/pep-0008/
Study theses guidelines well before writing your code because if you don't you or someone else will have to make your code more readable.
If you were writing closed source software, no one would care about messy code, but since we're talking about open source, you don't just have freedoms. You also have the duty to write readable code. And yes it also means that pretty much everything should be in English. Translate as much as you want but keep your code in English as it is the language that almost every programmer on Earth understands.
Readable code is the first step for an efficient collaboration. Writing cryptic, unreadable source code is pretty much the same as writing proprietary software.
Concurrent Versions Systems
No matter which one you choose, just pick one! The CodeBot will know how to handle any CVS and most of software forges but if your code is only available as a tar.gz archive , this will make things harder for everyone.
Also if you see a project that doesn't use a CVS feel free to create one on a popular forge. You should try to convince the upstream developers to use it because you don't have access to the development code.
Don't be ashamed by the quality or the stability of your code, it's more important for everyone and for you to respect the “release early, release often” rule. By sharing the first stages of your application you are more likely to get help from contributors. And if you don't then Tweekers won't be very useful for you. Usually, developers don't commit their work until it has reached a certain level of stability, but you can also commit some code that will break to have it analyzed by others, if you are aware of a bug, just let others know in the commit message.
Robots
Manpage-robot (manbot )
This robots is queried with a command name, or a C function, it queries the manpages and send it back in the wave in HTML (using man2html).
Many times it's just too hard to get good documentation on a C library. Manpages are not as readable as HTML files and you really have to really know what you're looking for. When typing man <name> in a terminal, you get the documentation for the function, not the include file. What about the other functions in the include ?
The robot will be able to determine every manpage of a particular package.
Mailing-list robot ( MLbot )
First this robot has to get copies of several mailing list archives (the .tgz archives) then import it in a database so users can search the contents. The threads contained in this cache can then be imported in a wave.
Mlbot can also be queried for the list of mailing lists available in cache and can also be asked to import new ones. As long as the web interface is powered by Mailman, it should be ok.
Documentation robot ( docbot )
As with Mlbot, docbot builds a cache of existing documentation.
Code robot ( codebot )
Code robot is installed on a server with all revision control softwares available. It can be queried either with a version control url ( http://dolphin-emu.googlecode.com/svn/trunk/ ) or with a Forge name and a project name. (dolphin-emu@google-code). Like many other robots it caches the source code and can be queried for updates.
Supported version control systems :
- git
- mercurial
- svn
- cvs
- bzr
- … (any other system is not a priority right now)
Supported forges :
- sourceforge
- launchpad
- github
- google-code
- alioth
- savannah
- … (other forges[3] might be supported when all the ones above are correctly implemented)
The codebot will also be able to get source code from any distributions (yes it does mean that there should be as many codebots as there are Linux distros).
Codebots have to be able to extract author names and email from source code, readme's and changelogs.
A new release from a project will be published on the project's page, with the changelog.
Package Builder robot ( BuildBot )
This one will try to build binary packages automatically from source code. It's very similar to Launchpad which can build binaries when you've uploaded source packages on your PPA.
Bug report robot ( BugBot )
The Bugbots will be able to communicate with all kind of different bugtrackers (Launchpad, Bugzilla, etc...). If patches are submitted in a bug report, BugBot will propose them to CodeBot, which will validate it and send it to BuildBot.
Like CodeBot, BugBot will also be able to extract names and email addresses from bug reports.
IRC Robot ( IRCBot )
This one will connect a user to IRC and act as a bridge between IRC and a Wave. When the user closes his chat session, the log will be submitted for cleanup.