Skip to content

cmoiceanu/section

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

section

EXECUTE Exercise 4

The proper usage of the script to meet requirements for exercise 4: python sort.py -i INPUT_FILE

Running the script with the above arguments will cause for results to be printed to standard console in the order read from the input file rather than being written to a file and sorted. Using option -o, the output can be written directly to a file (python sort.py -h will also display all the options for running sort.py).

IMPORTANT!

do no use any --sort-fn sorting options with 'ex4' validation to avoid sorting displayed results.

SORTING

We chose to implement the sorting as follows.

  • A character with a lower ASCII value comes before a character with a higher ASCII value. This was chosen because parts of URLs are case sensitive and we felt it was important to respect that.
  • If two strings start with the same character, the next character is used to break the tie (repeat as necessary)
  • If looking at a character past the end of the string, it takes ASCII value 0 (so if you have two strings with the first 5 characters the same, if the first string ends after character 5 and the second doesn't, the first string will always come before the second in the output).

OPTIONS IN MORE DETAIL

We have a python script sort.py in the root directory which takes four arguments.

OPTION 1 - REQUIRED

The first argument, -i, provides the input file name

OPTION 2 - OPTIONAL

The second argument (optional), -o, provides the output file name to write to. If not specified, the results are printed directly to console.

OPTION 3 - OPTIONAL

The third argument (optional), --sort-fn, provides a set of sorting algorithms which can be applied to sort the input file.

  • mergesort2 (imported)
  • mergesort
  • selectionsort2 (imported)
  • selectionsort
  • heapsort
  • nosort
  • quicksort (imported)
  • radixsort2 (imported)
  • radixsort

Each sorting option is associated with the corresponding sort function suggested by the name of the shown option. 'nosort' function (default) is intended to disable sorting and allow ex4 validation to work properly.

OPTION 4 - OPTIONAL

The four argument (optional), --validate-fn, enables the functionality of URL validation and transformation of URLs to canonical form. NOTE! validity of URLs is performed after the URLs have been transformed to the canonical form (output list is in canonical form).

  • ex4 : (default) runs the output requirement for ex4
  • all : enables Valid = true|false URLs to pass thorough
  • valid : enables only Valid = true URLs to pass through
  • invalid : enables only Valid = false URLs to pass through

STRUCTURE

The algorithms module in this root directory and contains all of the sorting algorithms in their own file. Loading our module also loads all of the individual sorts and allows easy separation of the files while still providing a simple interface. Url validation is also it own module to maintain functionality separation.

TESTING

Unit tests are available in the tests/ folder: python tests/test_algorithms.py

Unit test are included for cover

  • URL validator
  • URL canonicalizer
  • URL comparators (<, >, ==)

OTHER

Using the files we provide, one working command would be: python sort.py -i test_input.txt -o output.txt --sort-fn mergesort You can then test the correctness of the sorting by calling: diff test_output.txt output.txt

The output file is created at the path OUTPUT_FILE relative to the directory from which the script was run. For example, in our sample command, it would be located in the current working directory. If you run the script from .. relative to the sort.py script and provide -o other/directory/output.txt, it will write the file to ../other/directory/output.txt relative to the sort.py script.

ALGORITHM CHOICES

Our choice for the O(n^2) algorithm was selection sort. We felt this was the simplest algorithm to use and was very simple to implement in Python. For the O(n log n) algorithms we included heapsort and mergesort, both of which have a worst case runtime within the specified parameters. Python had built in functions that made heapsort easy to use, and many of us have experience with mergesort and quicksort, but we opted to use the former as quicksort would have been much more complicated. The O(n) algorithm was radix sort, which uses buckets for each possible ASCII value to sort the URLs. Radix sort is a specialty sorting algorithm that takes advantage of the fact that the elements being sorted can be broken up into singular chunks that are limited in number (in this case one of 255 characters). This breaking up into singular chunks results in k passes, where k is the number of chunks in the longest element. If n >> k, the algorithm is essentially O(n). However, this is not always the case so the true worst case of this algorithm is O(kn). For all these algorithms we assume that the URLs use proper ASCII values and do not contain other symbols.

The documentation for our URL validation and canonicalization design may be found at https://github.com/Full-House-UW/section/wiki/URL-Validation-and-Canonicalization.

Group Division:

  • Corneliu, Brandon - Merge sort
  • Jeff - Script setup and design
  • Conor - Radix sort
  • Janette - Heapsort
  • Nat - Testing
  • Shahaf - Selection sort
  • Shahaf & Janette - Coordination

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%