How to organize PDF and HTML files into folders so that the interesting one(s) to read can be found when needed.
Using some special characters to manage Folder and Files listing order when viewed from a Windows PC or an iPad. This is a tricky issue :
- # ^ . ¤ =
The Challenge :
Nowadays we just Google whenever we need information about a subject.
Introductory updated information can always be found on Wikipedia.
But what to do with scattered off-line information kept in local PDF or HTML files. PDF files can be anything from a pamphlet or a handout to a complete e-book.
- So a file naming system and a folder naming system is described that helps finding the information when needed.
- Introducing a File Content grading system and using empty Folders as Index Cards for a File Grouping system.
Why bother :
Older references once found and read are often removed from the WEB after some time. I am a person who wants to be able to re-read older references to help my memory so I need an off-line system.
And once the number of off-line files passes ~100 then they must be organized in order to be handy. I am way past that limit.
Organizing can be in the form of tagging as used for music. Unfortunately there is no standard tagging format for documents saved within the file that can be used across platforms.
Programs handling tagging of files :
Here is a link to a website with a comprehensive overview of tagging program ( primarily for Windows ) :
Why not -just- a wiki :
Keeping documents as documents and not as text in wiki systems like Confluence on a web site preserves the layout of the text, including page separation, typeface selections, chapters, bookmarks and so on.
So you can make a proper print out.
The Challenge – part 2
Having settled on managing documents then the first problem arises. How are folder and file names presented on different platforms. Nowadays we have a number of platforms like :
How to ensure that we have a common look on the platforms we want to use. A concept for file naming is proposed that handles this problem.
Text Search using DocFetcher :
Unknown by most then a PC, Mac and Linux can have document search almost like Googl’ing. However it is not available on iOS nor Android.
The magic program allowing this is DocFetcher. You select a drive or folder to be indexed and it sets of. Some time later it is done and you can now search for sentences and get a list of documents, including PDF files, where there was a match. You even see text snippets of the contexts where the text was found. And it present the results in a snap. If you change or add documents the program will quickly update it database. It is a brilliant piece of software.
OneNote 2016 :
Another text management concept called OneNote ( or OneNote 2016 ) is also very useful.
I use OneNote 2016 as the first stop to manage shorter texts typically captured from the WEB, but I find the OneNote concept less suited for longer texts which I prefer to have in a PDF file.
Using the old desktop version has several benefits :
- It allows you to do proper backups. Read the horror story about having no backup when the Microsoft OneNote server mess-up : https://community.spiceworks.com/topic/2246511-onenote-has-a-dark-side-stop-using-onenote-until-you-read-this
- It allows you to export the full Notebook or just a Section or just a Page to a PDF file which may include automatically generated Bookmarks after some unknown rule.
- The continuously updated OneNote iPad ( and Android ) app handles the older OneNote 2016 files fine so both apps can be used for daily use as preferred.
Preferred File Types :
I have most static content documents in one of these file types :
Documents naturally divided into pages are best viewed using PDF as good PDF viewers have a lot of reading features. This is not so for the other file types. There is a add-on for some WEB browsers called FireShot. This add-on will create a potentially very long PDF page without any page break. The concept does work but most OCR decoders and some PDF readers are not prepared for this and will crash or refuse to view such a file.
WEB pages can be saved to a single HTML file using add-on’s. This is kind of last resort solution as the WEB browser add-on’s, generating such a file, did not always generate a proper copy. Typically some images would be missing or the text formatting would be flawed. Such a plug-in could be Save Page WE or SingleFile both of which -sometimes- work. – But if the web content instead is copied to Word then Word supports saving the content as PDF as well as DOCX. Word is exceptionally good at managing all the copied text and images in the original layout and Word of course also allows you to do the required editing to manage page shifts !
This is an Age Old Subject. So what new twists can possibly be added to this subject.
To get started, I will just refer to what others has written, so here is a nice text with a nice text layout :
The Folder and Files Organization Objective :
What they advise to do makes sense. What they end up suggesting is a ( very ) deeply nested folder structure in which each folder level gets more specific/detailed.
My PDF documents are not sharply dividable into just one specific topic so creating a deeply nested folder structure actually adds confusion instead of clarity.
So in my case I will add additional constraints :
- I don’t want endlessly deep folder nesting. I limit nesting to max two folder overview levels and one level with files. Thus max three folder levels and usually only two. Example shown below.
- I don’t want folders including only one or two files.
- I don’t want folders including 100’s of files.
- Everything should look the same whether viewed on a PC or viewed on an iPad ( or Android ).
What I have come up with is :
- A file naming template.
- A concept of how to use folder names as index cards into the content of a folder.
- Use a special character ¤ to separate items when listed on the same line ( file name or folder name ).
The funny character ¤ is a classic character called “generic currency” character and it is included even in modern character sets, like ISO/IEC 8859. Texts including this character looks the same on both a Windows PC and on an iPad. For some strange reason it is never used for anything, but as it visually looks good it is the perfect item separator character when making a filename which should include various information.
The File Name Template :
I have therefore created my own file naming template using the ¤ character and some spacing surrounding it as :
Title & Version ¤ Grading ¤ Year ¤ Pages ¤ Writer ¤ Publisher
Title is obvious.
Version could be any brief marking, like v2 or 2nd. Don’t waste many character on that.
Grading is not obvious. I sometimes adds some characters to describe my impression of the text. I use these four character selections : C|R E I P|T. C|T means the text is either Cursory or Reference like. E means the text includes Examples. I means the text is Illustrated. P|T means that the text is either Practical or Theoretical. So a text graded as RT will be difficult and slow to read. A text graded as CEIP will be easy, maybe even amusing, to read.
Year is obvious. Use 1978 and not just 78. I considered putting year first as it indicates the relevance of a file. Something written in 1930 is probably not as up-to-date and relevant as something written in 2010. But in the end and because having a lot of texts without date stamping I settled for making it entry two ( or three ).
Pages is obvious. It conveniently indicates the complexity of the file. Something 3 pages long is probably not as comprehensive as something 300 pages long. HTML files are not separated into pages so there is no length info for such a file. Append page number with a p, like 10p. And use 1xp for HTML. This avoids any misinterpretation with year.
Writer is obvious. I don’t always include the Writer as I don’t know the writer anyway.
Publisher is obvious. A book like text may include a Publisher. But it is more relevant for WEB content where naming the company publishing the information could be important.
With all that information to be included in a file name it is important to be as brief as possible. Ignoring the Writer and/or Publisher when necessary. Keeping the length shorter than ~ 100 characters is fine.
I am increasingly adding grading to text file names as it adds a another dimension to the type and quality of the text.
A very dry text would be graded as :
¤ ···T ¤
A very user friendly yet comprehensive text would be graded as :
¤ CEIP ¤
User friendly ? – Texts are written with various objectives. One might be that the writer just wants to send a message. Like a law text. Another purpose would be that the writer cares to explain why and how his message should be of interest to the reader. Like the notes for a law text. You get the idea.
Using Folders as Index Cards to Information in a Folder :
It is important to notice that an iPad list folders mixed with files when listed alphabetically. This is an Apple thing and in my opinion quite stupid. I want folders naturally listed in front of files so something must be done to ensure that the iPad also list folders ahead of files. A little file naming ingenuity makes it possible.
Folders at the Highest nesting level :
At the highest folder nesting level there are only folders. Below is a partial listing of the first entries. The full list includes about 100 folders which is the maximum number of folders I want to to scroll through. The content is sometimes re-arranged to keep that limit :
A ¤=¤ AUD= .Acoustics AUD= .Brüel & Kjær AUD= .Engineering AUD= Amplifiers AUD= Driver Units, Crossover and more AUD= HI-FI AUDz Media and Music and more AUDz Reviews AUDz Software AUDz Speakers B ¤=¤ BASE^ LabVIEW and the NI World BASE^ Python BASE= MATLAB & Simulink, Octave, Scilab and HiQ C ¤=¤ CAD= Electrical CAD= Math CAD= Mechanical D ¤=¤ DSP= .Analog Devices DSP= Basic Concepts DSP= Converter Principles DSP= Digital Communication DSP= Digital Filters
I try to group content into a few all-encompassing subjects, like AUD ( Audio ), BASE ( important knowledge ), CAD, DSP and so on. I have about 15 such subjects. But it changes as I sometimes re-group the content within the subjects.
A re-group was creating a Mechatronics subject, which points to concepts as Robotics, Control Theory, Mechanical Analogies and other sub-subjects. These sub-subjects makes nice relations within the Mechatronics subject.
The listing includes these special attributes :
A ¤=¤ Is just a visual separator ( empty folder ) between each subject. I try to use a single character for clarity.
AUD= .Acoustics Where Audio is the subject. The dot or point in .Acoustics is used for sorting, forcing Acoustics to be listed first, as well as to indicate this is an important item within this subject. The equal = character indicates that there is a organized sub folder here including files related to the subject.
AUDz Reviews The z character replacing an equal = character indicates that there is an un-organized sub folder here including a mess of files ( that should be organized ).
BASE^ LabVIEW and the NI World The presence of the ^ character indicates that this is a sub folder consisting of more subject folders instead of files. This concept is a convenient way to avoid having too many folders at the highest nesting level. The used special characters ensures that a folder link to a sub folder with additional subjects is listed first.
Sub folders at the next highest nesting level :
The next highest nesting level may contain :
- Either files for the subject.
- Or it may contain more folders related to the subject to avoid having too many folders at the highest nesting level.
BASE^ LabVIEW and the NI World is an example of a sub folder with additional subject folders instead of files. This is the folder listing :
A= Language, Classic and NXG B= Actor Framework and OOP C= Project, SVN, EXE-built and more D= Digital Signal Processing E= MathScript, Matlab and HiQ F= Vision G= SQL H= Remote Panels and Computing I= Sim and Control, Simulink, PID and Fuzzy Logic J= Kalman Filtering K= Test Automation L= Python and LabPython M= Assorted N= Toolkits O= References P= NI-DAQ, VISA, PXI and more Q= DLL, CIN and more R= DMA, Buffers and more S= myRIO and CompactRIO T= ELVIS ( look in the Mechatronics chapter ) U= Hardware and more V= CVI aka LabWindows Language W= ComponentWorks X= Measurement Studio Y= ActiveX, ATL, COM and OLE Z= Multisim Ø= Newsletter and more
The full content is listed to show that it may be convenient to add a character is front, like A= in order to control what is listing first ( most often used or most relevant or whatever ).
But the sub folder could also have looked like the folder listing shown for the highest nesting level. The only rule here is that it should look good to the reader ( me ).
Sub folders at the lowest nesting level :
The lowest nesting level ( either second or third ) holds the actual files related to the subject. Here is a listing :
-1 = Introductions and Tutorials -3 = Comprehensive Texts -4 = Fluffy Texts -5 = Comprehensive Documentation -A = LabVIEW Technical Resource #1 #1x = Documentation Resources Index ¤ 2018.html #1i = Get Start with LabVIEW ¤ 2013 ¤ 89p ¤ NI.pdf #1x = Introduction to LabVIEW ¤ 2016 ¤ 71p.pptx #1i = LabVIEW Fundamentals ¤ 2005 ¤ 165p ¤ NI.pdf #1x = Tips LabVIEW Development ¤ 2007 ¤ 39p.pdf #3 #3i = LabVIEW - User Manual ¤ 2003 ¤ 349p ¤ NI.pdf #4 #4i = Best Pract. for BDs and FPs ¤ 2011 ¤ 115p.pdf #4i = GPOWER XNodes and VIMs ¤ 2016 ¤ 33p.pdf #4x = SW Eng Tools with LabVIEW - Hands On ¤ 43p.pdf #4i = LabVIEW - Dev Guidelines ¤ 2003 ¤ 97p ¤ NI.pdf #4i = LabVIEW - Meas. Manual ¤ 2000 ¤ 358p ¤ NI.pdf #4i = LabVIEW - Meas. Manual ¤ 2003 ¤ 159p ¤ NI.pdf #4x = LabVIEW Graph Dev - Hands On ¤ 2006 ¤ 126p.pdf #4x = What is LV used for ¤ ViewPoint Systems.html #5 #5i = G Prog Reference Manual ¤ 1998 ¤ 667p ¤ NI.pdf #5i = Func and VI Ref Manual ¤ 1999 ¤ 609p ¤ NI.pdf #5i = LabVIEW Version 5.1 Addendum ¤ 1999 ¤ 108p.pdf #5i = The LabVIEW Style Book ¤ 363p.pdf #A #Ax = LabVIEW Technical Resource 1996 Q3 ¤ 24p.pdf #Ax = LabVIEW Technical Resource 1999 Q3 ¤ 8p.pdf #Ax = Tech. Res. Introduces Bundled Value Packs.pdf
The listing include both folders ( in bold ) and files ( in blue-ish ).
The listing also include these special attributes :
#1 Which is a visual separator ( empty file ) between each group. I try to use a single character for clarity.
-1 The minus sign preceding the number or character is important as it controls what an iPad lists first. So the folder names starts with this character to ensure they are listed first.
The folders shown first are empty and are only used as a convenient Index Card content overview of the files.
Both the iPad and a computer indicates folders with one type of icon and files with other types of icons which adds to the ease of content overview.
The numbers ( 0 to 9 ) shown in the folder names are reserved to always read this ( when included ) :
-0 = Recommended Texts -1 = Introductions and Tutorials -2 = Brief Concise Texts -3 = Comprehensive Texts -4 = Fluffy Texts -5 = Comprehensive Documentation -8 = History or Background -9 = Handbook ( covers everything )
The recommended text is listed first. The other texts are listed in “heavy” order.
The characters ( A to Y ) can be included as needed. Z has a special meaning. It is listed last and indicates that the subject includes one or more zip files that may be convenient to have here :
-Z = ZIPs and more
Marking OCR state of a file :
#1i #1x #1z #1 The characters x, i, z and no-character in a file name indicates the OCR state of the file, meaning whether it can be searched for text phrases.
A PDF file can have one of several states :
- i means that the file contains a text layer that can be searched -and- the text includes a Table of Content ( ~ bookmark listing ). That is the preferred file quality where text can be found by searching and a content overview is presented by the ToC.
- x is as i except there is no ToC. This kind of file is still searchable but it is difficult getting a quick overview of the content. A ToC should be added but making one is tedious work.
- z is a file that that doesn’t include a searchable text layer. A “text” page is just an image. Such a file should be run through an OCR program to include a text layer so that the file as a minimum can be searched.
- no-character is a file that has not been checked for the presence of a text layer. Meaning it should be checked when convenient.
A HTML file is always text and illustrations so it can always be searched. But it never includes a Table of Content. It is always a x.
This concludes the description of my preferred off-line file organization. The basic idea is to present the files attractively in my preferred style. It requires some discipline to maintain but as long as the organization can be done on a PC ( using Total Commander ) it is manageable.
Take notice of the use of empty folders as a Content or Index Card listing giving a quick impression of the files content within a subject folder. Somewhat like an Index Drawer as they could be found in a Library, in old days. That concept can be tweaked as desired.