The Life of …

January 18th, 2009

There was an interesting article in Le Monde today referring to an article published in Le Tigre, an independent French magazine. The article in Le Tigre was simply the reconstructed biography of “Marc L” – a person, the paper claimed, they chose randomly on the Internet using Google and the data collected on social web sites.

They posted a number of details about him such as his age, sexual preference, the schools he attended, the music he listened to, and his friends and partners over the last few years… It reads just like a mix between the people’s section of a newspaper and a wikipedia entry.

The persistence of information

All the information was legally obtained, since it was publicly available on the Internet – although, they claim many details were removed after he requested it. While we’re all aware (to varying degrees) of the trails we leave on the Internet, it is easy to forget that information that we thought was transient is still there and can be collected to produce our portrait and extract information about us.

The conversation we had on a forum 5 years ago may still be there somewhere, unflattering photos posted by friends may also still be there. Since the emergence of social websites, there have been many articles on the subject and on the impact of leaving too much information on these websites.

But as data is better organised, more searchable, but not so easy to remove; we have all the more reasons to be careful about what we say on the Net. Let’s not forget Google’s mission to organise the world’s information and make it universally accessible. Information is not transient.

A false sense of privacy

An email is no more secure than a post card as two legal secretaries in a Sydney law firm were painfully reminded a few years ago when their incendiary email exchange was forwarded to pretty much everyone in the company before appearing in overseas newspapers.

But the so-called “Generation Y” who like to collect hundreds, if not thousands, of friends on FaceBook or MySpace and maintain personal blogs do not seem to mind. In fact, I am surprised by how much information people are willing to share with others — strangers and friends alike.

It seems that a lot of people are induced into a false sense of privacy not realising how much the information they publish becomes public, and how much of it could be used against them.

Identity theft

Given that most of the information for the “security question” can be gleaned on the Internet, we are increasingly more vulnerable to identity theft. Notwithstanding the fact that a lot of people still use basic passwords and generally the same one to access their accounts.

Anyone’s personal information is, as marketers would put it, “at your fingertips”, so new techniques are going to be needed to protect us from identity theft. And I hope that the options will be better than coming up with passwords that include combinations of symbols and numbers which are impossible to remember, or providing even more personal information about ourselves.

The future

All this makes me wonder how this will evolve.

It is not impossible to imagine having some online reputation management software coming out helping people clean up their traces, possibly optimising their friends and links to improve their online identity or removing what should not be there.

On the other side, you could have increasing sophisticated automated online portraits for use by marketers and recruiters – primitive versions already exist. Or identity thieves could start using photos of us to use when face recognition becomes more widely available.

So should we learn not to disclose too much information about ourselves, just like we learned not to undress in front of an opened window? Or should we get used to watching the neighbour walking around naked?

Afterword: Internet speed

December 22nd, 2008

Following my post on Australia’s Internet filter and looking around for other opinions on the subject, it seems that a lot of people were concerned about speed of the Internet here and the potential network performance degradation when a filter is in place — especially after the results of the ISP-level Internet content filtering laboratory trials were released in July 2008.

Looking at the 2008 broadband rankings published by the ITIF last June, it seems that international comparison does not place Australia all that well at the moment when it comes to broadband speed. When I visited Japan, France and Finland last year, I noticed that the Internet was significantly faster. Here are the numbers that show why…

Average download speed in Mbps per country

Average download speed in Mbps per country

Admittedly, Finland and France have amongst the fastest broadband speeds in the world — 21.7 Mbps and 17.6 Mbps respectively; not to mention Japan with a whooping 63.6 Mbps. At these speeds, even if these countries were to put in place the worst ISP-based filter in terms of performance degradation (87%), their broadband speed would still be faster than in Australia (1.7 Mbps).

Hopefully, the National Broadband Network will address the performance issue, but in the meantime we have to deal with sluggish connections hoping than an Internet filter will not be make them even slower.

Using a GA to report differences in XML

December 5th, 2008

A few years ago, we had to implement a simple difference tool for XML.

We had a very specific need but did not want include anything too sophisticated to our project. We only needed to report the differences between two XML documents in various ways and in XML format so that we could hightlight them or do something abou them.

We had a look at a few commercial libraries such as Delta XML, but at the time, none suited our needs, either because of the licencing terms or because the tool did not report what we were after. So we developed our own, and decided to open source it, just in case someone else might be interested.

This led to the development of DiffX, a Java API for comparing XML documents. Instead of using complex tree algorithms, we decided to tackle the problem differently by viewing an XML document as an sequence of events and analysing the differences.

The idea worked great for our purpose and DiffX has been used in production in several software packages since then including PageSeeder.

The project is currently hosted on Topologi’s website, can downloaded from SourceForge and is distributed under the very lenient Artistic Licence.

But the algorithm we use is memory hungry which makes DiffX unsuitable for large documents as it builds a large matrix based on the size of each document.

As it solved our problem, so we didn’t give much thought afterwards and the project fell into neglect. But recently, I stumbled upon a good Genetic Algorithm library JGAP, and realised that this could be an interesting way of solving the problem.

So I have decided to resurrect DiffX. Rick Jelliffe‘s first reaction was ‘How fun!”. Hopefully, it will be. In the meantime, I will do some code brushing to use Java 5, remove deprecated classes, etc… Then if a GA is indeed suitable for our problem space (and if time permits!), I will try to work on some new algorithms.

More to be posted on this…

Australia’s Internet filter

November 15th, 2008

After a year-long trip that took me to countries such as Syria, Egypt and China where Internet censorship keeps busy a whole bunch of public servants, I was certainly not impressed to learn that Australia was going ahead with an Internet filtering system and that tests were under way. In fact, I originally thought it was a joke! But I wasn’t sure whether I should be amused or concerned.

From my experience of browsing the Web in these countries, the task is futile. For one, censors – computers or humans – are always too late, they simply cannot catch up with the amount of information out there. Even when they do, there are always so many ways around it that I am not sure it is even worth trying. In Syria, it took less than a minute for the guy minding the Internet Cafe in Damascus to give me access to FaceBook (blocked at the time) via one of the numerous proxies they use.

If people go to the trouble of looking for “inappropriate content”, they will also be savvy enough to find a proxy or setup one. I bet that as soon as the Internet filter is declared effective, there will be as many ways to circumvent it posted on the net. And there will always be ways around it, for the simple reason that secure protocols and cryptography cannot be made illegal – otherwise it would be end of secure banking and e-commerce – not withstanding a serious encroachment upon our privacy.

Adverse effects

Without even going into the ethical aspects, I can think of many other adverse effects to an Internet filter scheme:

  • False positives – how many legitimate sites will risk being blocked?
  • Internet Speed, already below the standard of comparable countries – how will this effect the performance of the Internet?
  • The black list, even secret – will it risk becoming a reference list for people looking for Internet content?
  • Censorship creep – how can we ensure than the black list will not be used for blocking other sites? How do we ensure that the scope of the censorship scheme will not be extended to cover other areas which have nothing to do with protecting children from harmful content?
  • More monitoring is less monitoring – will it drive people accessing “illegal content” to tighten their security and anonymity?
  • Blame shifting – How can someone clear their site if it has been a victim of an attack which inserts links to “illegal content”? Would the same happen if your site shares the same IP as a site that has been blocked?
  • And let’s not forget that there is nothing more appealing for teenagers than something they are not allowed to do or see…

Back to when I travelled in Syria, I remember meeting a programmer who also made a living out of software piracy – quite common there due to the US export ban. He explained that most of his work consisted of removing the potential security threats from hacked programs so that he could sell them. Without going at length into computer security, these threats are designed to infect computers so that they propagate links to porn sites, poker sites, and the like; these threats are designed specifically so that legitimate websites are turned into hosts for less legitimate sources, thereby shifting the blame and risk onto people who have nothing to do with it.

A friend of mine was recently blackmailed by an attacker who launched a DoS attack and wanted to include hidden porn and online gaming links on my friend’s portal (which serves thousands of blogs). The site was down for a few hours with all the engineers working frantically to repel the attack.

These two examples are certainly not an excuse for not taking security seriously, but as any administrator knows, anyone can be a victim of these kind of attacks and end up with links to inappropriate content. When you have been battling with a Internet attack, I doubt that you’re in the mood for battling with ACMA to try to get your site removed from the black list.

The Finnish experiment

It is often mentioned that the scheme will cause Australia to join the ranks of North Korea, Burma, Iran, China, Cuba, Belarus and Syria – states which are hardly known for their progressive policies. It is true that when a country finds itself associated with these countries it is generally a sign of bad policy, even if I trust the Australian government a whole lot more than the Syrian one.

But it is interesting to know that Finland also has experimented with the use of an Internet Filter and it seems that the results are less than conclusive. This article on Finnish Internet censorship by EFFI, a Finnish online civil rights organisation, outlines several of the shortcomings; as well as this one. The Finnish scheme seems to have shown several of the side effects that I have listed above: some legitimates sites were blocked; the black list circulated on the Net; and there were plans to include more sites for other reasons (such hate speech, breach of copyright and online gambling), which were not in the scope of the original law.

It is worth noting that the Finnish scheme does not seem to have led to any arrests in relation to child pornography: since most sites blocked are outside of Finland, they cannot be reached by law. The Law however, as it is the case in Australia, is sufficient to prosecute offenders regardless of the presence of an Internet filter.

Little debate, a lot of expense

All that said, I haven’t been able to find much reliable information about the Australian plan apart from the fact that $125.8 million will be dedicated to cyber-safety over the next four years. I have received an email from GetUp without any link to the relevant information. Apart from a couple of articles and opinion pieces from the BBC and ABC, I could not find much detail about the plan either, though a lot of people commented on the subject on the blogosphere.

I tend to believe that there must be more efficient and creative ways of fighting the distribution of inappropriate content online. No-one denies that the Internet brings new challenges to the access of information, but I am sceptical that ISP filtering is an effective way to address the issue.

It will be interesting to see how this debate evolves in Australia, but I wish there was more information on the scheme and a little more debate in the public sphere, before large sums of money are committed to a plan that is legally tricky, ethically debatable and technically impractical.