[hanzi-flash] single mode + misc modifications

- indexing changed - readme updated - output is now stdout
[day-inator] added script
2024-07-12 16:59:59 +08:00 · 2024-07-12 09:25:40 +08:00
4 changed files with 200 additions and 18 deletions
--- a/day-inator/README.md
+++ b/day-inator/README.md
@ -0,0 +1,22 @@
 # day-inator
 Script to practice the [Doomsday algorithm](https://www.timeanddate.com/date/doomsday-rule.html).
 You are provided a random date, and are asked to determine its weekday.
 See the script source code for more information.
 Example usage:
 ```
 $ python day_inator.py
 Day-Inator 2000
 ---------------
 Use 1-7 for Mon-Sun. See script docstring for more information.
 2037-11-18
 Guess: 3
 Correct.
 Took 22 seconds with 1 try.
 ```
--- a/day-inator/day_inator.py
+++ b/day-inator/day_inator.py
@ -0,0 +1,63 @@
 #!/usr/bin/env python
 """Guess the weekday of a random date.
 For more information, search 'Doomsday algorithm'.
 You may pass a date argument as YYYY-MM-DD (leading zeroes required).
 Otherwise, a random date is generated.
 """
 import datetime
 import random
 import sys
 epsilon: int = 365 * 200
 """Maximum variation in days for the date compared to today."""
 r = random.randint(-epsilon, epsilon)
 """Actual variation in days for the date compared to today."""
 d: datetime.date = datetime.date.today() + datetime.timedelta(days=1) * r
 """Date to guess."""
 if len(sys.argv) > 1:
    d = datetime.date.fromisoformat(sys.argv[1])
 def guess(d: datetime.date) -> bool:
    """Prompt for guess.
    Returns True if guess is good.
    """
    try:
        ans: int = (int(input("Guess: ")) - 1) % 7
        if ans not in range(7):
            raise ValueError
    except ValueError:
        print("Invalid input.")
    else:
        if ans == d.weekday():
            return True
    return False
 print("""
 Day-Inator 2000
 ---------------
 Use 1-7 for Mon-Sun. See script docstring for more information.
 """)
 print(d)
 start = datetime.datetime.now()
 fails: int = 0
 while not guess(d):
    print("Wrong.")
    fails += 1
 print("Correct.")
 dur = (datetime.datetime.now() - start).seconds
 print(f"Took {dur} seconds with {fails + 1} tr{'ies' if fails else 'y'}.")
--- a/hanzi-flash/README.md
+++ b/hanzi-flash/README.md
@ -1,10 +1,74 @@
 # hanzi-flash
-Generate flashcards for a range of frequent hanzi characters.
+Generate flashcards in CSV format for a range of frequent hanzi words.
-There are multiple sources for words:
+This is based on the
 [hsk CSV](https://github.com/plaktos/hsk_csv)
 repo, including common usage words graded by difficulty.
 These form the vocabulary of the HSK (hanzi proficiency exam).
- [hanziDB CSV](https://github.com/ruddfawcett/hanziDB.csv) (purely frequency based)
+## usage
 - [hsk CSV](https://github.com/plaktos/hsk_csv) (common usage words, graded by difficulty)
-Be careful to keep the filenames as `hanzi_db.csv` and `hsk*.csv` because the script tries to recognize these.
+This script requires the HSK vocabulary in a CSV file.
 The expected format is word, pronunciation in pinyin, and definition.
 You may combine all levels into a single file as such:
 ```
 git clone https://github.com/plaktos/hsk_csv
 cd hsk_csv
 cat hsk*.csv > all_hsk.csv
 ```
 To use the script, put this `all_hsk.csv` file in the same directory, or pass the path explicitly with the `-i/--input` flag.
 CSV output goes to stdout, which can be redirected to a file.
 For example, this generates a flashcard deck for the entire HSK vocabulary:
 ```
 python hanzi_flash.py -i ./all_hsk.csv > output.csv
 ```
 ## ranges
 HSK's 6 levels have increasingly large vocabulary.
 This script can help you divide this into more digestible chunks.
 Specify the `-s/--start` and `-e/--end` options to only output a range of characters.
 For example, the first 50 characters:
 ```
 python hanzi_flash.py -s 1 -e 50
 ```
 Or, the next 50:
 ```
 python hanzi_flash.py -s 51 -e 100
 ```
 Once generated, use your flashcard app's merge feature after importing both these decks.
 ## single character mode
 HSK's vocabulary is in words, not in individual characters.
 Pass the `-S/--single` flag to break up the words into characters.
 The flashcard will have a single character, and the answer will be its pronunciations and example words containing it.
 This is intended as a supplement to the regular word flashcard decks.
 Single mode respects the range options above,
 and only outputs new, unique, characters
 that appear first in the given range.
 It will also not duplicate flashcards for words that are single characters.
 For example, take the following invocations, with and without single mode:
 ```
 $ python hanzi_flash.py -s 17 -e 19
 电脑,diàn nǎo (computer)
 电视,diàn shì (television)
 电影,diàn yǐng (movie)
 $ python hanzi_flash.py -s 17 -e 19 --single
 脑,nǎo / 电脑
 视,shì / 电视
 影,yǐng / 电影
 ```
 Single mode only picks out the new characters (电 was learned before the given range `17-19`).
--- a/hanzi-flash/hanzi_flash.py
+++ b/hanzi-flash/hanzi_flash.py
@ -2,31 +2,64 @@
 """
 Generate flashcards for a range of frequent hanzi characters.
-Based on https://github.com/ruddfawcett/hanziDB.csv
+See attached README for more information.
 """
 import csv
 import itertools
 import argparse
-import re
+import sys
 from pathlib import Path
 parser = argparse.ArgumentParser()
-parser.add_argument("-s", "--start", default=0, type=int)
+parser.add_argument("-s", "--start", default=1, type=int)
 parser.add_argument("-e", "--end", default=99999999, type=int)
-parser.add_argument("-O", "--output", default="hanzi_flash.csv", type=Path)
+parser.add_argument("-i", "--input", default="all_hsk.csv", type=Path)
-parser.add_argument("-i", "--input", default="hanzi_db.csv", type=Path)
+parser.add_argument(
    "-S",
    "--single",
    action="store_true",
    help="Output unique single characters instead of words.",
 )
 args = parser.parse_args()
-offset = 1
+prev: set[str] = set()
 """Characters from previous single character card decks."""
-fname: str = args.input.stem
+single: set[str] = set()
-if fname.startswith("hsk"):
+"""Already single characters."""
-    offset = 0
+
 uniq: dict[str, set[str]] = {}
 """Character to words mapping."""
 prons: dict[str, set[str]] = {}
 """Character to pronunciations mapping."""
 with open(args.input) as csv_file:
    reader = csv.reader(csv_file)
-    with open(args.output, "w") as outp_file:
+    writer = csv.writer(sys.stdout)
-        writer = csv.writer(outp_file)
+    start = 0 if args.single else args.start - 1
-        for row in itertools.islice(reader, args.start, args.end + 1):
+    for i, row in enumerate(itertools.islice(reader, start, args.end)):
-            writer.writerow([row[offset], f"{row[offset+1]} ({row[offset+2]})"])
+        word, pron, mean = row[:3]
        if args.single:
            if len(word) > 1:
                for sound, char in zip(pron.lower().split(), word):
                    if i < args.start - 1:
                        prev.add(char)
                    elif char not in prev:
                        if char not in uniq:
                            uniq[char] = set()
                            prons[char] = set()
                        uniq[char].add(word)
                        prons[char].add(sound)
            else:
                single.add(word[0])
        else:
            writer.writerow([word, f"{pron} ({mean})"])
    if args.single:
        for char in uniq:
            if char not in single:
                writer.writerow(
                    [char, f"{', '.join(prons[char])} / {' '.join(uniq[char])}"]
                )
Author	SHA1	Message	Date
dogeystamp	46823e7bce	[hanzi-flash] single mode + misc modifications - indexing changed - readme updated - output is now stdout	2024-07-12 16:59:59 +08:00
dogeystamp	bc0376ac6d	[day-inator] added script	2024-07-12 09:25:40 +08:00