section

EXECUTE Exercise 4

The proper usage of the script to meet requirements for exercise 4: python sort.py -i INPUT_FILE

Running the script with the above arguments will cause for results to be printed to standard console in the order read from the input file rather than being written to a file and sorted. Using option -o, the output can be written directly to a file (python sort.py -h will also display all the options for running sort.py).

IMPORTANT!

do no use any --sort-fn sorting options with 'ex4' validation to avoid sorting displayed results.

SORTING

We chose to implement the sorting as follows.

A character with a lower ASCII value comes before a character with a higher ASCII value. This was chosen because parts of URLs are case sensitive and we felt it was important to respect that.
If two strings start with the same character, the next character is used to break the tie (repeat as necessary)
If looking at a character past the end of the string, it takes ASCII value 0 (so if you have two strings with the first 5 characters the same, if the first string ends after character 5 and the second doesn't, the first string will always come before the second in the output).

OPTIONS IN MORE DETAIL

We have a python script sort.py in the root directory which takes four arguments.

OPTION 1 - REQUIRED

The first argument, -i, provides the input file name

OPTION 2 - OPTIONAL

The second argument (optional), -o, provides the output file name to write to. If not specified, the results are printed directly to console.

OPTION 3 - OPTIONAL

The third argument (optional), --sort-fn, provides a set of sorting algorithms which can be applied to sort the input file.

mergesort2 (imported)
mergesort
selectionsort2 (imported)
selectionsort
heapsort
nosort
quicksort (imported)
radixsort2 (imported)
radixsort

Each sorting option is associated with the corresponding sort function suggested by the name of the shown option. 'nosort' function (default) is intended to disable sorting and allow ex4 validation to work properly.

OPTION 4 - OPTIONAL

The four argument (optional), --validate-fn, enables the functionality of URL validation and transformation of URLs to canonical form. NOTE! validity of URLs is performed after the URLs have been transformed to the canonical form (output list is in canonical form).

ex4 : (default) runs the output requirement for ex4
all : enables Valid = true|false URLs to pass thorough
valid : enables only Valid = true URLs to pass through
invalid : enables only Valid = false URLs to pass through

STRUCTURE

The algorithms module in this root directory and contains all of the sorting algorithms in their own file. Loading our module also loads all of the individual sorts and allows easy separation of the files while still providing a simple interface. Url validation is also it own module to maintain functionality separation.

TESTING

Unit tests are available in the tests/ folder: python tests/test_algorithms.py

Unit test are included for cover

URL validator
URL canonicalizer
URL comparators (<, >, ==)

OTHER

Using the files we provide, one working command would be: python sort.py -i test_input.txt -o output.txt --sort-fn mergesort You can then test the correctness of the sorting by calling: diff test_output.txt output.txt

The output file is created at the path OUTPUT_FILE relative to the directory from which the script was run. For example, in our sample command, it would be located in the current working directory. If you run the script from .. relative to the sort.py script and provide -o other/directory/output.txt, it will write the file to ../other/directory/output.txt relative to the sort.py script.

ALGORITHM CHOICES

Our choice for the O(n^2) algorithm was selection sort. We felt this was the simplest algorithm to use and was very simple to implement in Python. For the O(n log n) algorithms we included heapsort and mergesort, both of which have a worst case runtime within the specified parameters. Python had built in functions that made heapsort easy to use, and many of us have experience with mergesort and quicksort, but we opted to use the former as quicksort would have been much more complicated. The O(n) algorithm was radix sort, which uses buckets for each possible ASCII value to sort the URLs. Radix sort is a specialty sorting algorithm that takes advantage of the fact that the elements being sorted can be broken up into singular chunks that are limited in number (in this case one of 255 characters). This breaking up into singular chunks results in k passes, where k is the number of chunks in the longest element. If n >> k, the algorithm is essentially O(n). However, this is not always the case so the true worst case of this algorithm is O(kn). For all these algorithms we assume that the URLs use proper ASCII values and do not contain other symbols.

The documentation for our URL validation and canonicalization design may be found at https://github.com/Full-House-UW/section/wiki/URL-Validation-and-Canonicalization.

Group Division:

Corneliu, Brandon - Merge sort
Jeff - Script setup and design
Conor - Radix sort
Janette - Heapsort
Nat - Testing
Shahaf - Selection sort
Shahaf & Janette - Coordination

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
algorithms		algorithms
tests		tests
.gitignore		.gitignore
README.md		README.md
sample_inputs.txt		sample_inputs.txt
sort.py		sort.py
test_input.txt		test_input.txt
test_output.txt		test_output.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

section

EXECUTE Exercise 4

IMPORTANT!

SORTING

OPTIONS IN MORE DETAIL

OPTION 1 - REQUIRED

OPTION 2 - OPTIONAL

OPTION 3 - OPTIONAL

OPTION 4 - OPTIONAL

STRUCTURE

TESTING

OTHER

ALGORITHM CHOICES

About

Releases

Packages

Languages

cmoiceanu/section

Folders and files

Latest commit

History

Repository files navigation

section

EXECUTE Exercise 4

IMPORTANT!

SORTING

OPTIONS IN MORE DETAIL

OPTION 1 - REQUIRED

OPTION 2 - OPTIONAL

OPTION 3 - OPTIONAL

OPTION 4 - OPTIONAL

STRUCTURE

TESTING

OTHER

ALGORITHM CHOICES

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages