Derik in Minnesota
Oct. 21st, 2009
02:43 pm - Remote Wiki Backup
I am not a trusting soul.
When the Wiki I contribute to crashed a few months ago, we discovered that our backups were not being performed as billed. That was a great sadness, and a scramble to back up ~27,000 pages out of web-caches, plus re-creating some very complicated templates from scratch. It was not fun.
Several months later… TWiki is safely ensconced in a new host and humming along. Our new host’s backups have been tested and verified to work.
I still worry. I mean… it’s not like the backups are off-site. A fire would wipe us out. A properly robust backup system must allow users to download their own backups.
(Quite aside the fact that our ostensible philosophical commitment when departing from Wikia should compel us to make reasonable backups to allow others to leave from us.)
Complicating things… I don’t have database access, so any backup system must, perforce, run remotely.
With all that in mind, I’ve been noodling at a script to scrape the name of every page in every namespace from the wiki, then archive’s the page’s raw contents. No history, no user data, no IP addresses… just the essentials.
…it’s harder than you’d think. We’ve got plenty of articles with multibyte names (non-english characters) in addition to multi-byte text. MySQL handles multibye text easily enough… as does PHP if you beat it hard enough, but the default behavior of mySQL seems to be to deliver multibyte-encoded content as latin-1, scrambling foreign characters. Getting all the ducks lined up has been a back-burner project for a couple months.
(It was actually backburner since BEFORE the Bookworm Crash that wiped out TFWiki. I regret not having it worked out before then.)
After some debugging, this is what I have:
- A script that scrapes the names/namespaces of all the pages currently on the wiki– that’s about 37,000 pages.
- A script that will then query the wiki for the raw (pre-render) text of these pages and store them in a database.
This isn’t a great solution. Since I’m running on a remote web-server, it means making 37,000 individual queries to the server I’m querying. I could do this in minutes with database access, but since this is a live wiki, I’m throttling the queries to one every 10 seconds to prevent overloading the server, which means snapshotting all 37,000 pages will take… 10 days.
There is no guarantee that the version I snapshot won’t be a vandalized page, reverted seconds after i take a picture. Or with templates… that I’m not grabbing a micro-version that’s not working, or incomparable with an inter-dependent template snapshotted later. (My solution was to hard-code the templates to be snapshotted first, and simply monitor the recentChanges to make sure there were no edits to them while they were being scraped.)
And of course the results aren’t in an easily imported format– they’re BLOB fields in an associative database that doesn’t correspond to mediaWiki structure. You cant’ really do much with them in this form.
…except hold onto them. If something goes wrong… they’re not in the best format– but they’re archived with no character-encoding issues, in original wikitext. It would take some custom-coding, but the text would get back into the wiki with 0 loss.
Well, no loss… except the pages which have been edited since they had a snapshot taken.
In an ideal world… this script would be crontab’d and monitor its own progress and execution-time to adjust its own throttle, it would monitor recentChanges, and it would import the edits onto its own wiki– live-mirroring the other. Oh– and it’d do something about the images, which this doesn’t back up at all.
That goes on the backburner though. Next up for me is a total rewrite of the site’s bot, using some of what I learned here… there are some links it stubbornly refuses to fix, and I think that a proper systematic script will get it working better.
For now… I can hold onto this and be content. Whatever else may happen… I know the site will not be wiped out.
Bird in the hand.
Sep. 14th, 2009
02:18 pm - ZX License: Key Ideas
It is September 14. I am crazy busy. All Hail Megatron #15 is released in the 16th.
Let’s do this thing.
I’ve been calling my proposal for TFWiki’s hypothetical “second license” intended to protect producers of official Transformers-related material from legal complications around the CC-BY-SA license which governs our content the “ZX License.” I’ve never really defined what that means. I do so now, shotgun style.
Given: Copyright exists
It is a given of this license that TFWiki has a basic copyright over its own value-added contributions which is separate from the raw facts of the universe which are indisputably Hasbro’s property.
(Whether or not Hasbro is able to exert copyright over those ‘bare facts’ is a question; people can publish “unauthorized guides,” which suggests that collections of fiction-fact may not be subject to copyright. However, such guides are written in a very specific “voice” to toe this legal line very carefully. TFWiki’s articles are not- and frequently include direct quotes from Hasbro’s material they indisputably can exert copyright control over. So for our purposes, we’ll just say ‘yes, Hasbro can exert ownership over its portion of our content.’)

The result is that TFWiki’s articles are under joint copyright, partly Hasbro’s property, partly our own. Neither party can legally publish them without the permission of the other. That is not the same thing as “Hasbro owns our articles,” rather it is “Hasbro’s partial ownership of our articles restricts our ability to do whatever we want with them.”
The correlary that many seem to miss is that our partial ownership of the article restricts Hasbro from doing whatever they want with them. Specifically, if Hasbro use them, our portion is automatically licensed under CC-BY-SA, whose viral license terms (including the right for anyone to make or distribute copies) will then contaminate the resulting work.
Thus… the need for a second license.
Permissive vs. restrictive law
IP License grants tend to be very long and legally complicated things, because they wish to grant the licensee very specific, non-blanket usage rights, so every way in which those rights are granted-but-then-limited must be outlined in mind-numbeing detail phrased in the most unambiguous way possible.
…we’re not doing that. The ZX License is permissive, not restrictive. It essentially boils down to “You can do literally anything as long as you limit the content used to a small ammount. This license is only offered for small-amount usage.”
As a result, the legalese can be short.
Legal concepts
The “ZX” in the ZX License is rooted in two basic concepts of common or natural law.
Customary Freehold – The unwritten legal relationship between a landowner and tenants on his land whose right to be there has never been officially written down but stands from long tradition. These include the tenants responsibilities towards the landowner, their rights towards the land, etc. Customary Freehold has largely fallen out of use since the 19th century (when it became typical for all contracts to be written down) but is still recognized to be legally valid.
Xenia – The ritualized “guest-host” relationship that exists between a Household and the temporary guests within it. The guest’s responsibility not to abuse the Host’s hospitality and the Host’s not to make his guests feel a burden, as well as some social and even legal obligations. Somewhat notably, it includes an obligation for guests to defend their host from attack. (This last bit was actually the cause of the Trojan War… Hellen of Troy may have been beautiful, but it was the honor-obligation from all the guests at her husband’s dinner party where she was stolen that launched the thousand ships…)
In both cases the obligations between Guest-Resident and Host center on ideas of “implied consent” and “duty to rescue” found in most modern Good Samaritan Laws. (Indemnification: not so much.)
Intellectual Property as Real Estate
The concepts above govern the relationship between a guest and host on physical property.
We hold that the same basic principles apply with intellectual property. Both recognize a difference between trespass and theft, concepts of adverse possession and (most importantly) have guests… not just int he form of business partners (who are granted explicit license to be on that property) but individuals who are invited to stay for a bit and play with the owned-concepts fount there.
Broadly, you can divide this sort of “intellectual real estate” into two types, dependent on the landowner’s relationship to guests on their property.
* Closed Culture – Guests are allowed to visit, but strictly on a look-but-do-not-touch basis, like a tour. Melrose Place is a good example of a Closed Culture; the Spelling Entertainment Group has consistently (and notoriously) acted to shut down many type of “fan” activities, including fanfic archives and even discussion boards. Melrose Place is a Closed Culture entertainment where fans are welcome to visit and view, but not wander freely or create their own works. It is very much provided with “no user serviceable parts inside.”
* Open Culture – Guests are allowed to wander freely and actively encouraged to create their own derivations based on the IP found there, which are recognized as belonging to them. Dungeons & Dragons would be an extreme example of an Open Culture. Users are provided a sandbox kit and expected to make their own characters, settings and adventures that TSR will not have no ownership of except for those elements which were drawn from the D&D lexicon.
Most properties fall somewhere between these extremes. Increasingly in recent years the fans-as-receivers closed model has fallen out of favor as a relic of an management culture that made no distinction between fandom-activity and piracy… but it is still sometimes practiced today by property owners who want to exert control over the manner in which fans interact with their product.
Transformers is something of a middle-ground. Cartoon, comics et all are centrally produced, but (fairly uniquely at the time) every Transformers toy produced since 1984 has included a bio, and character stats including a ranking within the faction command hierarchy, and the play-pattern presented by it’s own commercials has consistently been one of narrative roleplay. Clearly children have been encouraged to create their own adventures, and each character comes with their own bio and stat-set as a starter kit to do so.
What this all essentially amounts to is the idea that if a Copyright owner has historically extended safe harbor to fanish activities, they cannot summarily retract it, in the same way a Landowner cannot summarily eject customary freeholders and similar to the way a trademark owner cannot seek to exert control over a trademark which they have allowed to fall into general use.
This means that the status fans occupy on the owner’s property is more akin to Homesteading than squatting. (Arguably this is legally important, because it characterizes many common fandom activities as a permitted or semi-permitted use rather than an ongoing copyright violation that weakens the general copyright on the core property.)
(There is something of a fandom-bill-of-rights-and-responsibiliti
Key Meta-concepts
Having identified underlying concepts and applied them to IP-as-property, it’s time to label our understanding of those ideas:
* Zeloxenia – The “ZX” in ZX License, this loosely translates as “fan guest-hospitality.” Zeloxenia encompasses Xenia, Customary Freehold, and homesteading as outlined above. Fans who have been invited to play on another’s property are recognized to have a right to do so, while also having significant responsibility toward the property-owner not to damage the landscape in the process.
Fans who choose to tarry or ‘camp’ in this property and build more complex derivations may have an understood right to do so… but in so doing they also become more responsible for the area they occupy. As their level of involvement deepens, fans bear an increasing burden to protect the value of the property; this may mean properly citing copyright so their use does not erode the owner’s property, and a basic responsibility cultivate the property they are occupying. To a certain extent, resident-fans go from being visitors on this property to stewards or Yeomen of it.
Xenia includes a ritual exchange of gifts, which may be fulfilled here by a “good-neighbor” relationship; you may borrow one another’s hedgeclippers as long as you make sure to return them. This is similar to land-use rights that might be expected under common freehold… a basic diffusion or interchange of rights and property occurs between both parties that is mutually-forgiven/freely-gifted as a natural part of their relationship.
Good faith is a prerequisite for Zeloxenia to exist, and in Closed Culture broadcast-receiver models where fans have no rights, it is understood to be very weak or not exist at all.
An important point to note here: This is all describing a relationship and understanding that has always existed between fans and the object of their fandom. (At least in media fandom, most non-media fandom qualifies as a Close Culture.) As such, we hold that these principles (and the unwritten contract they encompass) have always been in effect, and fans’ past contributions to TFWiki were made under this understanding. (Asking people to explicitly ratify the license largely renders this distinction moot though.)
* Lagom measure – A Sweedish/Norwegian concept of “just the right amount.” (If you know Norwegians… this explains a lot about their personality.) Under the ZX License, a licensee will be granted the right to re-use a Lagom measure of content– an amount which does not prohibit use, but also does not encourage it. This means the unit size of a Lagom measure variable by situation… which heads off precedent-based license jailbreaking; just because X amount of re-use was Lagon for one situation does not mean it is automatically Lagom for another. The maximum ammount of re-use in a Lagom measure is understood to be greater than that permitted by Fair Use.
You could use a dozen pages of legalese to try to define how that works… or you could just fall back on the pre-existing Sweedish concept. Since our license is permissive and not restrictive… we can just cite the Sweedish concept.
Basic bounds are set via a philosophical statement to the effect that: “We think it is bad for the vitality of the Transformers brand to be re-using our content because that may cause it to become fixed and cease to grow and change… but recognize that some re-use is inevitable and wish to permit that without consequence.” …and you are essentially granting a blank check for content re-usage that still strongly encourages such usage to be minimal: Gross abuse of the definition constitutes bad faith and causes the license to lapse.
How content is used
For a hypothetical content re-user (such as IDW) that wanted to use our content under the ZX license, their re-use can be broken down as follows:
* Use under the ZX license. That is to say– usage rights afforded to content generated after the date of the ZX Licens’es adoption, or to prior content which was generated by users ratifying the ZX License and explicitly re-releasing their contributions.
* Use under principle of Zeloxenia. The balance of past contributions not signed-off-on, but whose usage in combination with the prior category up to a Lagom measure is implicitly provided for under the operative principle of fan guest-hospitality.
* Use under Fair Use. A portion of any remaining content not covered by the ZX license or beyond the size of a Lagon measure may be used as fair use.
* Any use more extensive than a Lagom Measure + fair use of the remainder must be licensed under the terms governing the remainder. (In most wikis’ case, this is CC-BY-SA.)
In short, if a re-user (such as IDW) takes shallow sips of our content, they are provided by 3 level of cascading protection before the question of CC-BY-SA license contamination can even come into play. You would have to drink a mighty gulp from our content to exceed all 3.
Structure
Finally, I believe the license structure would (roughly) break down into the following sections.
- License deed – (nonbinding) – plain-English statement of “what you get”
- Philosophical statement – (nonbinding) – Responsibilities fans have toward a brand
- Key ideas – (nonbinding) – Summary of legal concept, as above
- Application of Key Ideas – (nonbinding) – How we understand them to apply to our situation
- License code – (binding) – “consistent with the principles granted above, the copyright holders grant you use to…” Because the license hangs on pre-existing legal principles, we do not need to outline them in legalese; and any deficiency in our description in the prior nonbinding sections is moot, because that deficiency would not be ‘consistent with the principles.’ Again, this is only possible because we’re drafting a permissive license, not a restrictive one.
- Clauses – (binding) – Future versions and enforceability caveats; cribbed from existing licenses.
- Adoption – (binding) – Proviso to allow individual users to ratify this license beyond just the community doing so and thus explicitly re-release their prior contributions under the ZX license “to remove any doubt.”
Ugh. Anyway– that’s the reasoning and skeleton I’m thinking of. Most of the license is essentially a “statement of understanding” of legal principles which exist outside this license… no more binding than the Creative Commons “GUI” deed. The actual binding legal code is kept to an absolute minimum and essentially boils down to “you can use it under the relationship described above, if you exceed this or act in bad faith, your use lapses and you may incur CC-BY-SA consequences.” Because it would require truly heroic abuse to do so, this provides more-than-reasonable protection for a content re-user.
If each of those sections can be boiled down to a few sentences (and I’m fairly sure they can) then you have the basis for a serviceable plain-English license that covers our ass, covers Hasbro and it’s licensees ass, does not give away the farm, recognizes the rights of fans to exist, and codifies or recognizes the moral imperative of good conduct and good faith that should underlay fans relationship to the underlying brand.
Oh– and it doesn’t violate the terms of CC-BY-SA. Important that.
And it doesn’t weaken CC-BY-SA3, since the core principle of Zeloxenia, by fundamental definition, only exists in relation to media fandom– so the principles of the ZX License cannot be used as to pry other content out of CC-BY-SA– and actually (as outlined in a previous post) strengthens the CC license by relieving unresolved legal “pressure” at stress points where CC-BY-SA fails in relation to wikis.
But thank god most of that can go unwritten, because I’m exhausted just looking at it.
Sep. 9th, 2009
03:23 am - In which I yell at people kind enough to talk to me
I have not slept in some time, and perhaps have taken too many different kinds of cold medication at once… because everything is sort of pulsing, and my peripheral vision has developed a scanline flicker like a bad CRT monitor.
Anyway I’m high as a kite and catching up on blog comments and discussions relating to TFWiki’s current copyright issues.
I generalize these comments below: (Expect profanity. A lot of profanity.)
COMMENT: Nothing a fan produces can be owned by a fan, because it is an illegal use of another’s property, the whole of the new work (including the fan’s original additions) is ‘confiscated’ back to the original copyright owner.
RESPONSE: Horseshit. Go fuck yourself.
This argument is insane on it’s face. People get their head up their ass and think “ooh, fanfic, you don’t own it, no one can own it…” like it’s as a tenant of uncritical belief. But that belief has no basis in fact.
There is no “special legal status” for fan fiction. That means there’s no special protected status… but there’s also no special persecuted status. It comes down to Copyright.
• A graphic designer creates a poster. By mistake he fails to clear the rights to one of the images used (a flower.) Following the logic of the above argument, the graphic designer would not be able to re-compose the poster, replacing the offending images with another one because the entire poster had become property of the flower’s copyright holder. “You used something that doesn’t belong to you, now it all belongs to us.”
• That sure must be a rude surprise to the owners of the other 5 flower images the graphic designer did pay to use. Suddenly their flowers belong to someone else?
Copyright does not work that way.
“Oh, but what about Bitter Sweet Symphony? The Verve used a Rolling Stones track, and the Stones ended up owning the whole thing!”
…that was a court settlement. Binding arbitration to repay damages for mis-representing the extend of the sample used when it was licensed (”you paid for an inch and took a yard”) awarded the Stones ownership over the song in lieu if financial damages. The verve didn’t have the $$ to pay for their mistake, so the song rights were confiscated, just like repossessing someone’s belongings to pay debt.
This is like saying “I hit someone with a car, now he owns my house.” No, he sues you for hitting him with your car, and if he wins he gets your house.
It’s. Not. Automatic. The practical application of law is not the same thing as “the law says.”
I repeat; Horseshit. Go fuck yourself. I entertain no more repetition of this argument.
COMMENT: One-sided licenses cannot exist, the licensing party has to explicitly sign on to a license or it has no binding force.
RESPONSE: This argument is too stupid to live.
• Someone owns an image.
• They offer it for sale under certain terms and conditions.
• You don’t get to ignore those conditions just because you paid for a copy. If you paid the fee for for “500 printed copies” you can’t print 5000. Nor can you give away copies of it for free, or resell more copies yourself.
If I buy a copy of a Microsoft Word, am I allowed to make copies of it, give them to my friends, or sell copies to other people?
If I use an image that was free-for-use under “CC-BY”, that means I have to include attribution. If I don’t feel like it, I’m not allowed to just pretend I own the picture and not be bound by the license. It’s pre-condition of use. If it is used, that pre-condition must be met!
There are some rumblings going on with software EULAs being invalidated. EULA’s are those crappy little contracts you have to sign to use software (and increasingly, DVD’s.) But they’re failing because you can’t sell someone a product and then spring more conditions on them. For that kind of agreement to be binding, it would have to be made before/when the person bought it.
But again– EULA’s are completely different than copyright. Just like Damages. This has nothing to do the the natural action of copyright law.
COMMENT: You’ll never be able to sue IDW, you’re crazy to try.
RESPONSE: Just… go away. You haven’t been paying attention, and now we’re at step 25 in the process. I’m not going back to explain how you got turned around at step 3. The class is not going to stop for Ralph Wiggum to catch up, because Ralph isn’t gonna catch up even if we do.
COMMENT: You really think you can claim Hasbro doesn’t have a right to use it’s own intellectual property?
RESPONSE: Okay– for starters “you really think you can claim” puts us back into ‘practical application of law’ territory, so this question isn’t about the natural action of copyright.
But let’s pretend it is. This situation is, in fact very messy. So lets break it down into cleaner examples highlighting the principles involved.

• “The original licensor has a right to use all derivations of their work.”
No. Not unless that was a condition they made when licensing that work. Look no further than various versions of The Tick. The original 12-issue comic book series was adapted into a cartoon series and a live action series. The cartoon created several characters not found in the comic. Years later, the live-action series wanted to use those characters… and found they couldn’t. They belonged to the producers of the cartoon series, and when they bought the rights to the 12 issue comic by Ben Edlund… those characters had not been included. They would also have to purchase the rights from the cartoon’s producers. (Instead they created new, similar characters.)
Popular fiction is littered with examples of this. Perhaps one of the strangest examples is the tendency of British sci-fi publishers to acquire the Dr. Who license, publish a few adventures with an original companion, lose the license… and then continue publishing solo adventures with the companion, carefully avoiding direct reference to the Who-elements while still taking place in the same continuity. Clearly copyright can be broken up into parts.
At the other extreme- properties like Star Trek or Star Wars have typically allow licensees to do whatever they want- but Paramount has right-of-use over any new derivations. Indeed, the original owner does have a right to all derivations… but not because it’s an intrinsic right, but because they wrote their licensing contracts that way.
Transformers used to be like this. Up through the year 2003 appearances by Japanese or Euro-exclusive characters in US materials were limited to Easter Eggs because the US and Japanese Transformers licenses were separate, having many characters in common but also many characters unique to each side of the Pacific. The state of Transformers licensing seems to have changes in the year 2005 (possibly relating to Takara’s bankruptcy) and Hasbro and TakaraTomy now hold joint rights over the totality of Transformers fiction, able to reference anything they damn well please. (And oh, how they please…)
• Does Hasbro have a right to use copyrighted material they do not own?
No. I don’t think anyone’s claiming this, but I’m underlining it. Hasbro can’t use Flash Gordon or Harry Potter without paying for those rights- just like any other kind of intellectual property. That includes Wikipedia. Hasbro (or anyone else) wanting to use any ammount of Wikipedia content beyond Fair Use would have to license it. In Wikipedia’s case, that means CC-BY-SA, and the “SA” clause means that the resulting product created from that use would also be CC-BY-SA.
So if IDW created a cover for All Hail Megatron #15 that was a homage to DaVinci’s Virtuvian Man and quoted large chunks of of the Wikipedia article on DaVinci as background text… that cover would logically become CC-BY-SA.
So. If work can become contaminated, and this kind of re-use reaches the standard to cause contamination… the question is really “Does TFWiki have any copyright claim on its own content?” If it does, then our content can contaminate IDW. If it doesn’t…
Well, if it doesn’t, it could come as a shock to Wookiepedia, all of Wikia, not to mention thousands of other wikis operating under the exact same license we are. Because it would mean that all their content belongs to Lucasfilm/dozens of other planes, exclusively.
To some degree, this is a re-stating of the original “does a fan own any part of a fanfic” question, to which the answer appears to be yes, though it hasn’t been definitively settled. (Mostly because such issues can only be settled by court cases– and who goes to court over a fanfic?) There is a vocal minority of fanfic authors who believe the law says that fanfic cannot be copyrighted. They may think this makes it ‘pure.’ Unfortunately, I was on the internet in 1997 when FOX started targeting fansites, so I can’t help but remember this ideological orthodoxy suddenly springing up everywhere as a magic spell by fanfic archive mistresses terrified of being sued to ward off the demon-lawyers. I know this belief is deeply entrenched in its culture, but so are the teachings of Saul of Tarses; that don’t make ‘em right.
20% of American adults think the sun revolves around the Earth. Belief does not make it so.
But in another (more important) way, it’s a different issue entirely. Because these are facts. Facts cannot be copyrighted, only a particular expression of those facts. Phone Numbers and trivia books are two seminal examples of what cannot be copyrighted and what defines fair use.
So if facts can’t be copyrighted… what about the facts of a fictional story? Could it possibly be legal to (for example) publish a book consisting of nothing but character bios, summaries of fiction and behind-the-scenes anecdotes about a series without having the license for that series?
YES GOD DAMNIT. THEY’RE CALLED “UNAUTORIZED GUIDES.” THEY’VE BEEN WRITTEN FOR EVERY MAJOR FRANCHISE THAT EVER EXISTED FROM STAR WARS TO HARRY POTTER. IN THE MID-1990’s WHEN STAR TREK WAS AT ITS PEAK, AUTHORIZED AND UNAUTHORIZED GUIDES TO TNG WERE COMPETING HEAD-TO-HEAD. IF PARAMOUNT HAD A LEGAL BASIS TO BLOCK THAT KIND OF SHIT THEY WOULD HAVE! IT’S 320 PAGES OF SMALL-PRINT DOUBLE COLUMN TEXT SUMMARIZING EVERY TRANSFORMERS CARTOON OR COMIC STORY EVERY PUBLISHED BROKEN DOWN BY SUMMARY, ANALYSIS, FEATURED CHARACTERS, TRIVIA AND GOOFS, MEMORABLE MOMENTS, AND CONTINUITY REFERENCES.
(Does that structure sound familiar? Almost identical to the Wiki’s story summary pages? Uh huh! I own a signed copy of this guide.)
If you want to discuss “but I don’t think we actually can have copyright to our own material” please go away. The discussion you want to have back at step 6 isn’t useful, we’re on step 25.
75% of the crap people keep bringing up isn’t copyright law. It’s punitive settlements, law-as-practiced, Trademark, EULAS, examples of bad faith licensing whose meaning someone has misinterpreted.
So 75% of you shut the fuck up. The remaining 25%, continue.
If you’re not sure which group you belong to… *sigh*, continue. But please be open to the idea you might be wrong.
As rage-tastic as this post is, I am open to the possibility that I might be wrong. I am willing to be convinced.
But any argument that hopes to convince me that you can’t create collections of Transformers facts without Hasbro owning the copyright on the result absolutely must somehow account for the demonstrated fact you $%^&*() can.
(Oh man… I am not a patient and reasonable person by nature, and it chafes. I will doubtless regret posting this when I am no longer high, but for right now… it’s like some kind of rage colonic. I feel cleansed in my everything.)
I declare the “you can’t own shit” discussion closed.
Suck my balls.
Aug. 27th, 2009
06:50 pm - 19 days, 351 Words and Patient Zero
When the Transformers Wiki was deciding to relicense its content under the Creative Commons, we spent a lot of time discussing it. I mean—an insane amount of time. The GFDL’s relicensing option offered a single up-or-down choice; stay with GFDL or switch to Creative Commons. Why bother?
Well, because the community is concerned about potential problems our license might cause for Hasbro. CC-BY-SA is not a ‘play license,’ it’s court-tested and carries significant consequences.
The question that we kept coming back to: So what if some official Transformers publication re-uses a portion of our content? Of course, what are the odds of that happening?
Aug. 16th, 2009
12:28 am - Translate Japanese with Wiki Templates
Everyone has “someday maybe” creative projects; the kind of think they’d like to do if they got an opportunity, but rarely make time for. Sometimes these are passionate fever-dreams, more often they’re simple roads-not-taken… a possible path you glimpse while on your way to another destination, and regret not having the time to explore.
I’ve done a lot of (fairly complicated) Wikiparser template programming for The Transformers Wiki over the years. You can combine the conditional and transformative control-structures that come standard on wikis in surprisingly complicated ways. It’s a bit like a programming language where there are no variables.
For over a year I’ve been wanting to apply Wikiparser to the problem of translating Japanese; at least Hiragana and Katakana (their ‘phoenetic’ alphabets.) I had the idea while gut-deep in another template… and I was pretty sure it could be done, the question was how well?
I got off work early last Friday and decided to find out.
( Read the rest of this entry » )Aug. 8th, 2009
03:44 pm - JQuery assisted page moves on MediaWiki
On august 6, I was presented with a large-but-finite set of pages that needed to be moved on a MediaWiki project; the parenthetical disambig was changing from “Page Name (Object)” to “Page Name (Obj).”
Since there were more than 100 such pages (and the disambig change wasn’t actually quite so neat) I decided to streamline. All these pages belonged to the same MediaWiki category, which we’ll call “Things.”
I added the following code to my ‘execute when the page loads’ JQuery file:
if (wgPageName == "Category:Things"){
jQuery.each( jQuery('#mw-pages a'), function(i){
var href = jQuery(this).attr('href');
var uri = 'http://domain.com/w/index.php?title=Spe
uri += href.replace('/wiki/','');
uri += '&wpNewTitle=';
uri += href.replace('/wiki/','').replace('%28Ob
uri += '&wpReason=Mass+Diambig+change';
jQuery(this).after(' (move)');
});
}
Once the page is finished loading, the scrit checks ff I on the page called “Category:Things”.
If so, it grabs every page link, and appends another link after it; (move).
Clicking on that link brings you to: http://domain.com/w/index.php?title=Spec
The result is a pre-populared move form, reducing the entire move proscess to two clicks that can be accomplished very quickly in tabs!
See, MediaWiki, being a well-written computationally-wasteful piece of software (the two are not mutually exclusive) tests for certain query values and uses them to populate the form values. There is no place in the MediaWiki software that actually uses these values. But because the query test’s are there… it’s laughably easy to semi-automate a process like this.
Considerate of the programmers to include them, neh?
Before someone asks… this wiki does not include jQuery. I load it from offsite (on my own web-host) by adding the following lines to “Special:Mypage/monobook.js“: document.write('<' + 'script language="javascript" type="text/javascript" src="'); It’s like MAGIC! I get to run customized jQuery on a wiki that doesn’t have it– WITHOUT embedded the entire nasty code in my own JS file! I can make all the edits on my own FTP server without spamming the WikiProject’s RecentChanges page! Note: While I don’t think it matters… I do not use jQuery’s $(document).ready(); instead I use addOnloadHook(doStuff); Actual time… about 10-15 minutes. Since I had about 120 pages to move, and 6-9 seconds per-move (which involves typing) seems a optimistic… I came out ahead-to-break-even. And if I ever have to do something similar in the future, I’m way ahead! (Writing this blog entry took longer than the code.)
/* Javascript Includes */
document.write('<' + 'script language="javascript" type="text/javascript" src="');
document.write('http://www.mydomain.com/j
document.write('">
document.write('http://www.mydomain.com/e
document.write('">
addOnloadHook() being a stock mediaWiki function that basically does the same thing. doStuff() (obviously) being the nesting function I stick the code I execute on pageload inside. (I don’t remember why I do it this way… so I thought I’d mention it in case it was important.)
Aug. 6th, 2009
02:45 pm - TFWiki.net killed two people this year
Ages ago and for reasons which escape me at the moment, I was given access to TFWiki’s Google analytics account. (Probably because I once wrote custom software to track our downtime.) As a consequence, I make periodic traffic or architecture reports…. because no one else really cares to. (The last one prompted a redesign of the front page.)
I present my informal wrap-up (including the fatalities mentioned in the subject) below the cut:
( Read the rest of this entry » )Jul. 7th, 2009
10:45 pm - The Bastards of Free Culture
I’m a Transformers fan. For the last few years I’ve been active on the Transformers Wiki. And the TFWiki is currently debating whether or not to migrate our site-wide licensing from GFDL to CC-BY-SA3. Essentially a platform change in our site’s underlying “legalese” from Linux to Mac.
Sounds boring, right?
The complete meltdown of Western Civilization, after the cut.
( Read the rest of this entry » )Mar. 26th, 2009
04:33 pm - The Host with the Least
This is the new face of the Transformers wiki;

For an increasing number of hours per day, the site experiences network timeouts, database locks, service brownouts and straight-out “nothing there-ness.”
In the aftermath of our hosts accidentally deleting our database, the actual hosting TFWiki is receiving as we try to rebuild has actually gotten worse.
There is still no word on the server logs we need… Just a great big sucking silence coming from Texas.
For months they’ve done nothing– including clearing up our billing issues– and the one time they do do something for their own mysterious reasons, it shatters like a clay pot. And instead of springing into action to help clean up the mess they created– they revert back to a wall of silence and inaction while our site get more and more unstable.
I can’t call them “worst hosts ever,” I’m sure there are companies out there has done worse by their clients… possibly gotten them killed… but certainly the worst I’ve dealt with– and I’ve had a couple projects that went through hosting hell.
Urge to be nastily unprofessional… rising.
Mar. 23rd, 2009
02:48 pm - 25,000 Pages of Pain
The Transformers Wiki suffered a serious database fault on March 15th. It’s been 9 days.
We’ve had a tech-guy on standby try and reconstruct the database for 7 days. He’s a specialist, he does not come cheap, and he’s doing this on a volunteer basis. Our web-host is dragging its ankles, demanding teleconferences and approvals and all sorts of limp-wristed bullshit rather than give us what we need to try this.
Understand– no action is required on their part. All they have to do it give us access to some raw logs. Or dump us copies. And it’s taken them seven days, with no sign of action on the horizon yet.
I once had a web-host delete a dev site and database– including all back-ups– the only copy of this software we’d spent weeks developing. Totally their fault, they had a brain fart, thought “we didn’t want it anymore” and didn’t bother to back anything up before acting.
They at least had the grace to seem sorry about it.
I really get the impression that our host is simply dragging its feet, based on the logic that if we do enough recovery work– hand-remeshing 9 month-out-of-date files, and de-parsing HTML from internet cache’s for 25,000 pages… that it simply won’t be worth the effort for us to hold their feet to the fire– escaping the TFWiki’s righteous wrath simply because the diminishing returns offered by continued attempts to wring cooperation out of a them will make us lose interest.
That pretty much pisses me off. They may be content to waste our high-priced tech-guru’s time, and our time (he’s already done some preparatory coding for the files they show no signs of giving over,) but I value my own time. And I don’t take kindly to putting in more than a hundred hours trying to re-create work that our host lost due to a series of screw-ups. If there is not see some indication hour web-host has exerted themselves on our behalf, I am billing them for my time spent cleaning up their mistakes.
I was ballparking the amount of man-hours sunk into this recovery so far– and I came up with at least $30,000.00 worth of man-hours spent scrambling to recover from this disaster. 25,000 web-page caches tracked down and saved essentially by hand. Custom processing code written in Perl, Python and PHP to facilitate processing. Iteratively rolled-out tools developed in response to the community’s needs, dozens of hours spent scratch-rebuilding recursive templates– and the place still looks like a disaster area.
The reason for the unannounced upgrade that resulted in this disaster (we’re told) was a desire to reduce database load. The community hadn’t noticed anything, but apparently it was such a concern for the host that he went in and spontaneously upgraded our software- “unloading” the database in the process. Didn’t make a backup before hand, discovered that our nightly backup procedure (which had been knocking the entire site offline for an hour every night) was somehow misconfigured and not being saved. “Oops.” Four days after the crash, our host repeatedly reset and then totally killed the site because it was being “hammered” by someone– which happens to line up exactly with one of our users downloading the 1.7gb image backup. That transfer maxed out at 150kb/s. Our host’s response was to kill the site for 6 hours while it was “moved to another server–” yet the Wiki continues to bog down at the same time as the sites it used to share space with. VPS? Maybe, but then why were DNS entries changed?
We really have no idea what setup TFWiki is running on right now. We were told we had dedicated hardware and pipe- but it behaves for all the world like a virtual machine on a throttled connection. A miswired, badly-configured non-load-balancing connection.
The site has been growing more unstable over the last several says, database timeouts, unreachable network brownouts lasting several minutes occurring almost hourly. Part of this is probably due to traffic– all those people putting in time frantically trying to get the site up and running again. But it’s been a week, shouldn’t the host be able to adapt to this?
This whole thing stinks. We’re having smoke blown in our faces, and nothing feels right. I’m growing increasingly convinced that what we were sold when we signed up, and the actual set-up we were running on were two very, very different things. Most people only use a tiny fraction of their hosting capacity– we were sold the promise of a powerhouse web-host we could grow into, but after 9 months it’s feeling more and more like a 4 cylinder Geo Metro. And I’m no longer willing to assume in good faith that there’s a V-8 hiding in there, given the mounting evidence to the contrary.
The really disgusting thing is, we’d have installed our own backup extensions by now if we were given proper access. Despite 9 months of requests we were unable to get a FTP access set up for one of our people.
It is a measure of how pissed off I am that I am not mentioning our host’s name. The point is not to drag their name through the mud– that would be juvenile and pointless. They have to do business under that name. This is about me venting, not trashing them in a way that will lead back to them in Google. I have been disappointed on basically every level of professional conduct by these guys. While I can’t imagine any circumstance under which TFWiki would choose to stay with this host… it is my hope that they learn from this experience, and become a better hsoe for other people in the future.
I know who TFWiki was sharing our un-backed-up server with. Some of those people make their livelihoods by that content. The Wiki is at least mostly recoverable- with dozens of people putting literally thousands of hours into it- any individual losing all their stuff to this bungling wouldn’t have had that option! Let this be a learning experience for our host, so that this never happens again.
Remember the host I mentioned who once deleted a live dev-site with 3 weeks worth of our work? You better believe they learned from the mortification of that experience. I on the other hand, had learned the lesson beforehand I download a full mirror of the site every morning, and a backup of the database every 3 days because I don’t trust people.
Walking into work that morning and seeing the ashen faces of my co-workers as they explained everything was gone– then being able to tell them I had a complete backup on my desktop just out of force of habit? That is a precious memory.
We are all professionals here, let us act like it.
And as a professional, the 112+ hours I’ve clocked in so far cleaning up their mess has a cash value of $2240.
Mar. 16th, 2009
09:46 pm - The Second Law of Thermodynamics and SQL
Late Monday TFWiki.net (the Transformers wiki) suffered a serious database fault in the midst of what should have been a routine software upgrade, completely wiping out about 9 months worth of data and fragging the user accounts.
Causes
The root cause of the fault remains obscure. It was accompanied by a spate of mass-vandalism. TFWiki attributed the problem to the “Bookworm Virus,” and a message was posted on an “underground hacker website” claiming responsibility for the hack.
The actual chain of events is more ambiguous. While the database crash and subsequent vandalism may have external causes, complicating factors that rest squarely with the TFWiki host. He failed to make a database backup prior to upgrading the software, so when the database imploded, there was nothing to revert back to. The next obvious choice– revert to one of the regularly-performed backups– failed when an increasingly-frantic ransacking of the host’s archives revealed that there were no regularly-performed backups, the TFWiki server hadn’t been set up for them. Oops.
(This is perhaps more a matter of concern for members of the Blank Label Comics whose Webcomic archives and subscription databases are hosted on the same server and presumably also not being backed up.)
This follows the Transformers Wiki’s semi-tradition unfortunate timing; Wikia buggy Monaco “upgrade” was rolled out during the 2008 Transformers convention (while all the admins with the power to fix things where gone, but also a peak-traffic time,) they migrated to their new host (located in Houston) just as Hurricane Ike was causing rolling blackouts throughout Texas, and the Bookworm Virus struck right after the TV-movie that kicked off Transformers Animated’s final season.
Effects
“Yesterday is yesterday, if we try to recapture it, we will only lose tomorrow.”
–Some Guy
TFWiki is moving on. With the prospects of any sort of database backup turning up looking increasingly remote, TFWiki is digging in, scraping the Google cache to recover page text (a process that even with scripts helping will take between 2-6 days) and resigning themselves to reformatting by hand more than 8000 pages of raw html.
Myself, I’m weeping, because I’m resigning myself to reformatting the templates.
It would be one thing if the loss was total– then you could give up. But the job ahead– recovering 9 months worth of work by hand, is simply dishearteningly large. All the King’s Horses and all the King’s Men will put Humpty Dumpty back together again. …at a cost of time and effort roughly equivalent to the the original construction.
“In a system, a process that occurs will tend to increase the total entropy of the universe.”
–The Second Law of Thermodynamics
Order does not spontaneously arise from disorder. Ice cream doesn’t stay frozen in a hot room, piles of cards do not snap up into neat houses, and eggs do not spontaneously unscramble themselves.

And yet… while entropy will never decrease in a closed system, it can sometimes decrease locally, like the unscrambling egg this is merely phenomenally unlikely, not impossible.
Most of my friends call me a “tech-guy.” The dirty little secret is… every tech-guy has a tech-guy he calls for advice when he runs into something that’s out of his depth. Mine is Andrew Burton, he’s one of those guys who blogs about programming languages for fun. I wanted advice on customizing Warrick, a cache-scraper script written in perl. Like all good tech-guys Andy listens for about 5 minutes before veering off on a tangent and asking if we have access to the server’s SQL logs.
In the past Andy has, in desperate straits, had to re-create a deleted database without, well… a database, or a backup or export of any sort. This involves writing custom filters for the Server’s SQL activity logs and rebuilding it forward in time, piece by piece. Where a normal “page export with history” has dozens of revisions per page– this method has dozens of queries per revision. Reassembled in such a fashion, like building a human being from a map almost molecule by molecule, TFWiki’s 500,000 page revisions for 27,000 pages will translate into tens of millions of individual SQL queries. If every file is in place and intact, and every filter properly coded… this method can be used to grow a new functioning database architecture, atom by atom, from… nothing really. And this has worked for him in the past.
I give it about a 1 in 3 odds of success. It pretty much depends on everything being perfect and getting prying the logs out of our less-than-impressive host.
A 33% shot of seeing an egg unscramble is pretty good odds.
So yeah. Andrew Burton. He performs thermodynamic miracles.
(I’ll let you know how this one turns out.)



