wikiteam/research/paper-wikiteam-2014/wikiteam-2014-en.tex

\documentclass[11pt,twocolumn]{article}
\setlength{\columnsep}{0.5cm}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[english]{babel}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{natbib}

\title{\vspace{-15mm}
	\fontsize{24pt}{10pt}\selectfont
	\textbf{WikiTeam: collaborative preservation of wikis}
	}
\author{
	\large
	\textsc{Emilio J. Rodríguez-Posada, Federico Leva, Luiz Augusto} \\
	\normalsize	WikiTeam \\
	\normalsize	\{\href{mailto:emijrp@gmail.com}{emijrp}, \href{mailto:nemowiki@gmail.com}{nemowiki}, \href{mailto:lugusto@gmail.com}{lugusto}\}@gmail.com
	\vspace{-5mm}
	}
\date{}


\begin{document}


\twocolumn[
  \begin{@twocolumnfalse}

    \maketitle

\begin{abstract}
  Internet users have an increasingly role in web content creation. There are initiatives and solutions for the digital preservation of the web, including the well-known Internet Archive, but they are inefficient for archiving user-generated content in social networks and wikis. In this article we explore the problems in wiki preservation, the lack of tools to achieve successfully this task and we present and assess WikiTeam, the solution that we have built. WikiTeam is a collaborative effort to develop and run software for digital preservation of wikis. As of January 2014, we have extract the texts, histories, images and metadata for more than 4,500 stand-alone wikis, several wikifarms and 24TB of Wikimedia Commons files. The preserved content represents an huge cumule of datasets of the wikisphere, with an incalculable historical and research value.
  \\
  \\
  \textbf{Keywords:} web digital preservation, social web archiving, archiving applications and systems

\end{abstract}

  \end{@twocolumnfalse}
  ]

\section{Introduction}

This is a general introduction on web archiving.

Ideas (write yours):
\begin{itemize}
\item brief description and importance of digital preservation, Internet Archive and Wayback
\item user-generated content explosion, social networks, wikis, Archive Team, LoC archiving Twitter
\item Wikipedia 2001, people start to use MediaWiki for their wikis, wikifarms,
\item wikis not only are text and files, but interesting metadata, histories. IA preserves text/files (and not all/always), but it is inefficient saving histories and metadata
\item as most wikis are free-licensed, there are no issues preserving this content
\end{itemize}

\section{Digital preservation of wikis}

This is a section for specifically wikis (an area inside web archiving).

Ideas (write yours):
\begin{itemize}
\item lack of public dumps/mirrors
\item lost wikifarms (ScribbleWiki)
\item existing software: wikitravel scripts oxygenpump %http://wikitravel.org/en/Wikitravel_talk:Database_dump %http://code.google.com/p/oxygenpump/
\item proposals: Urobe
\item other: manually export with Special:Export, or ad-hoc scripts
\end{itemize}


\section{WikiTeam: digital preservation of the wikisphere}

Here we are.

Ideas (write yours):
\begin{itemize}
\item presentation of WikiTeam
\item achievements (statistics, wikifarms, commons)
\item single backups and backups in batches (launcher)
\item uploaded to IA (long-time preservation, bittorrent webseed)
\item how we generate lists of wikis and reuse others (Pavlo list)
\item WikiApiary partnership
\end{itemize}


\section{Conclusions and future work}

Ideas (write yours):
\begin{itemize}
\item sumarizing the best
\item current issues and possible solutions
\end{itemize}


\bibliographystyle{wink}
\bibliography{wikiteam-2014}

\section*{Acknowledgements}


\section*{License}
This work has a license \href{http://creativecommons.org/licenses/by-sa/3.0/}{Creative Commons Attribution-ShareAlike 3.0 Unported}.

\end{document}