Table of Contents
Resources to Create Scientific Works
Arbeiten an der TU Dortmund
Wie schreibe ich eine Bachelor-/Masterarbeit?
- Allgemeine Richtlinien zu Abschlussarbeiten an der Fakultät lesen und einprägen
- Thema aussuchen
- Exposé schreiben (man kann die LaTex Vorlage für Abschlussarbeiten verwenden)
- Betreuer bitten, die Arbeit anzumelden
- Laufzettel bei Fr. Schiller in der Prüfungsverwaltung holen
- Betreuer den Laufzettel unterschreiben lassen, und dort wieder abgeben
- ggf. einen Einführungsvortrag anmelden und halten
Während des Schreibens:
- Das Diplomanden- und Doktorandenseminar besuchen
- Wenn große Rechenleistung benötigt wird: Neben den großen Rechnenknoten für Studenten (z.B. plutonium) darf man auch einen Account für einen Lehrstuhlserver beantragen. Dann gibt es noch das LiDo-Cluster.
Wie gebe ich die Arbeit ab?
- Arbeit nach Vorlage schreiben
- Ein Abbildungs/Tabellen/Algorithmenverzeichnis ist nicht notwendig
- Vor dem Abgeben unbedingt das Literaturverzeichnis überprüfen
- Deckblatt im Dekanat abholen
- Arbeit drucken und mit Deckblatt binden
- Arbeit im Dekanat abgeben. Außerhalb der Sprechstunden des Dekanats kann die Arbeit bei der Leitwarte abgegeben werden.
Siehe auch Lehrstuhl-weite Regelungen
General Rules
Spell Checking
- Use automatic spell checking, e.g. aspell!
- Ask your mom or girl-friend/best friend to spell check, if applicable.
You have to check manually:
- doubly appearing words like 'and and'
- tenses, etc.
Style
- keep sentences as short as possible, as eloquent as necessary
- Write in active form:
- bad: “One creates a tree …”
- bad: “A tree is created for…”
- good: “We create a tree…”
- write output-driven/result-orientated, not a dairy or a log-book
- I want to see what your motivations are and why you write this
- do not talk about unnecessary stuff that I can reread on Wikipedia (e.g., historic facts, what a red-black tree is…)
- avoid inner clauses:
- bad: 'Exporting data from the main memory to the hard drive is, due to the lower access rates, very inefficient.'
- good: 'Due to the lower access rates, exporting data from the main memory to the hard drive is very inefficient.'
- Footnotes must not contain essential information that is needed to understand the text. Footnotes should be used sparsely.
- When measuring time/space, always add units. Is it O(n) time or bits? If it is space, is it counted in bits, bytes or words?
Notation for Stringology Articles
- σ : alphabet size
- Σ : alphabet
- T : the main text
- n : the length of T
- P : a pattern
- m : the length of P
- \ell for length or leaves in trees
- i,j,k for counting variables or positions
- small letters/greek letters for number variables
- large letters for strings, substrings, factors, arrays, data structures like heaps
- Q for queue, H for heap, T for the main text
- When bounding the domain of an integer, prefer ≤ to <, i.e., instead of 0 < i < 5, write 1 ≤ i ≤ 4.
Typography
- use always the same font family
- prefer suitable highlighting by making the font italic,bold,teletype or sans-serif instead of using “”
- highlight keywords when they are introduced, e.g., with \emph{…} or \textbf{\emph{…}} or with color.
- “Hard-coded” letters or example string are usually put in \texttt{…}
- normal text in the math-environment is put into \text{…}
- If you add a citation reference at the end of the sentence, add it before the full stop. Like: This is always true [4].
- Put function names in a text-environment like \textup{…} or \textsl{…}
- Paragraphs should contain at least two sentences. A list of paragraphs with one sentence looks very rugged.
- Add text between \section{…} and \subsection{…}
Figures, Pictures and Tables
- It is appreciate to draw pictures by your own. If not, always cite from where you copied the material. There is no exception when the material is public domain/open source, etc.
- Pictures support your explanations, so add them when applicable.
- Use only vector graphics, or if unavailable, use raster images with high resolution
- Check with a gray-scale printer whether your pictures look nicely
- Your picture descriptions should not contain information about the coloring used
- instead of taking plain red,blue,green,yellow, there are plenty color schemes available (e.g. google color or color brewer)
- Captions should be written in a unified form. For instance, start with the title of the figure (i.e., not a whole sentence). Optionally, it is followed with full sentences explaining the figure. Be concise with the full stops in the caption.
- Only tables showing real data (like time/space in experiments) are labeled as “Table”. A table used as a form of visualization is still a “Figure”.
Pseudo Code
- algorithm2e with \usepackage[linesnumbered,ruled,vlined]{algorithm2e} can produce nice, compact pseudo code
- it is possible to add colored syntax highlighting to your pseudo code
- pseudo code should not replace the description of an algorithm. An algorithm should be described in such a way that it understandable even if the pseudo code is missing. So pseudo code is only an additional aid for understanding what is going on.
References
- Every reference put in the list of references should appear in your main text. Use them where they apply!
- Each reference needs a minimum set of properties, based on the media:
- Websites: URL, title, authors (if known), the day when did you accesses this website (a formal date)
- Proceedings: authors, title, conference/journal name, year, pages, publisher, series (if available), volume (if available).
- Book: authors, title, year, publisher, edition.
- All references have to be unified, including
- Author names (order of first and family name)
- Full name or abbreviation of a conference
Examples
for a conference article use inproceedings
- title: Watch out that proper names (like of persons) are capitalized (achieved by adding {} around the first letters)
- booktitle: conference name or its abbreviation. long version: “In Proceedings of …”
- if the proceedings have a series (like LNCS of Springer, add series and volume)
Caveats
- title: Watch out that proper names (like of persons) are capitalized (achieved by adding {} around the first letters)
for a conference article use inproceedings
- booktitle: conference name or its abbreviation. long version: “In Proceedings of …”
for a journal article use article
- journal: journal name or its abbreviation.
- Additionally use the number-entry
@inproceedings{lzciss, author = {Johannes Fischer and Tomohiro I and Dominik K{\"{o}}ppl}, title = {{L}empel-{Z}iv Computation in Small Space ({LZ-CISS})}, booktitle = {Proc.\ CPM}, publisher = "Springer", pages = {172--184}, series = {LNCS}, volume = {9133}, year = {2015} } @article{cohen10fast, author = {Hagai Cohen and Ely Porat}, title = {Fast Set Intersection and Two-Patterns Matching}, journal = {Theor.~Comput.~Sci.}, volume = {411}, number = {40--42}, pages = {3795--3800}, year = {2010} }
Please obey these rules strictly. Most free citation services like Google scholar, DBLP, Citeseer, etc.,
- omit attributes
- do not write “Proc\. ” or “Proceedings” for the proceedings of a conference
- add additional attributes like DOI, URLs, month, the name of the editors
Latex
- Citations
- bad 'like \cite{coolGuy}'
- good 'like~\cite{coolGuy}' (add tilde to prevent hyphenation)
- better '\citet{coolGuy} said that' with natbib package
- References
- bad 'see Fig. \ref{coolFigure}'
- good 'see Fig.~\ref{coolFigure}' (add tilde to prevent hyphenation)
- better 'see \cref{coolFigure}' with cleveref package
- Images
- use tikz to write your images directly in latex
- use ipe or inkscape for easy vector graphics drawing
- Typography
- use math-environment for variables
- Difference between \(A_v\) and \(A_{\text{ v}}\) is that the former is a variable *parametrized* by v and the latter is a variable which has a v in its subscript as a name.
- If a variable is called
foo
, then write in math-mode \(\text{foo}\) instead of simply \(foo\), because \(foo\) looks like f o o.
- Keywords can/should be highlighted with \emph{.} or \emph{\textbf{.}}
Git
We can support your thesis by a git repository. Just send me your public SSH key so that you can access your fresh created repository. If you intend to write a thesis about compression, think about using our compression framework. We maintain our framework at the ITMC's repository hosting service. You need to register first at this site before I can add you as a member.
Supplementary Material
About German
Mit Deutsch als Sprache haben Sie sich für die schwierige Ausarbeitungsmöglichkeit entschieden. Neben einer Füllzahl an schwierigen Grammatikregeln (unter anderem Casus, Verbkonjugation und Pronomina-Deklinationen) ist es auch sehr reizvoll, ein wissenschaftliches Dokument anzufertigen, das auf rein englisch-sprachiges Referenzmaterial basiert. Anbei einige Besonderheiten der Sprache, angepasst für das wissenschaftliche Arbeiten:
Umgang mit englischen Fachbegriffen
- Bad: das Sparse Suffix Sorting
- Idiotenleerzeichen
- Welches Genus haben englische Nomen?
- Bad: 'die spärliche Suffix Sortierung'
- gebräuchlicher englischer Fachbegriff
- Good: 'die Sparse-Suffix-Sortierung (\it{sparse suffix sorting})'
- zusammengesetztes Wort mit deutschem Nomen als Endung, → Genus bekannt
- Fachwort kursiv in Klammern
Präzise und Prägnant schreiben
- bad: Die Zahlen stellen die Indices der Suffixe dar.
- good: Die Zahlen sind die Indices der Suffixe.
- Vermeiden von 'das heißt zum Beispiel dass'
Schwammige Ausdrücke
- bad: A stellt B dar. Gemeint ist aber A ist B, oder A ist ein Repräsentant von B. A stellt B dar heißt, dass es eine Projektion π gibt, unter der π(B) = A.
- good: A ist B
Beliebte Fehler
- Genus: Suffix, Präfix
- Definition von Reihe
- Arrays haben Einträge, keine Slots.
- Bibliographie mit englisch-gemischten Keywords (außerhalb des Titels der Quelle bzw. der Proceedings muss alles in einheitlicher Sprache sein, z.B. 'Seiten' statt 'pages')
Tools
- online Rechtschreibprüfung von Duden
Gepflogenheiten
- Aufzählungen mit einer Anzahl von eins bis zwölf werden ausgeschrieben
Programming
- comments are always in english
- write test-driven
- learn how to benchmark
- statistics like significance, median, arithmetic mean, etc.
- number of runs of an experiment dependent to expectation value, standard deviation
- head for large input values! If your code works with 640KB, can it cope (at least) with a simple human genome taking approx. 3GB, too?
- Learn how to write unit tests and how to log. Do not use the standard output for logging!
- Use large data sets for testing like the Pizza&Chili Corpus
C++
Courses/Material
Tools:
- recommended compilers
- g++ and gdb,
- clang and lldb
- use cmake as a build tool
- find bottlenecks with gprof
- use valgrind to find memory leaks, and to analyze memory consumption
- evaluate your memory usage with malloc count
- use gcov to find dead code (code never executed) and to ensure that your tests cover your source
- for benchmarking your project, use the flags '-O3 -DNDEBUG', for debugging your project, use '-O0 -ggdb' or '-Og -ggdb'. In order to debug your program plus a library, compile the library with the flags '-O0 -ggdb' and drop the flag '-O3'.
- do not use 'cout' for logging! Use glog. For instance, to test for the invariant a < b write the macro DCHECK_LT(a,b).
- write tests with gtest
- benchmark speed with celero
Programming Etiquette
- prefer uint64_t to int if you want an unsigned integer taking 64 bits.
- avoid casts; C-casts, escpecially dynamic_casts can take time
- avoid class hierarchies (like in Java) since downcasting references/pointers take time
Interesting data structure libraries
- Giuseppe Ottaviano's succinct library
- Simon Gog's SDSL lite
- Our lossless compression framework tudocomp
- Nicola Prezza's dynamic bit vector library
Caveats
- std::vector uses two size_t variables storing its physically size (called reserved size) and the number of elements it contains (actual size). You have to call reserve(n) to fix the size, while resize(n) allows you to change the number of elements. For freeing all memory, you have to call clear() and shrink_to_fit().
Java
- do an introductory course to learn/freshen up your language skills
- learn maven
Tools for String Analysis
Howto prepare slides
- maximize: illustrations, examples, motivation
- minimize/drop: formulae, whole sentences
- I do not recommend using beamer/latex, especially with some standard template.
See also additional tips in German