How to organize PDF, MHT and HTML files into folders so that the interesting one(s) to read can be found when needed.
The Challenge :
Nowadays we just Google whenever we need information about a subject.
Introductory information can always be found on Wikipedia.
But what to do with scattered off-line information kept in local PDF, MHT or HTML files.
Why bother :
Older references once found and read are often removed from the WEB after some time. I am a person who wants to be able to re-read older references to help my memory so I need an off-line system.
And once the number of off-line files passes ~100 then they must be organized in order to be handy. I am way past that limit.
Why not -just- a wiki :
Keeping documents as documents and not as text in wiki systems like Confluence preserves the layout of the text, including typeface selections, chapters and so on. And you can make a proper print out.
Text Search using DocFetcher :
Unknown by most then a PC, Mac and Linux can have document search almost like Googl’ing. However it is not available on iOS nor Android. The magic program allowing this is DocFetcher. You select a drive or folder to be indexed and it sets of. Some hours later it is done and you can now search for sentences and get a list of documents, including PDF files, where there was a match. You even see text snippets of the contexts where the text was found. And it present the results in a snap. If you change or add documents the program will quickly update it database. It is a brilliant piece of software.
OneNote 2016 :
Another text management concept called OneNote ( or OneNote 2016 ) is also very useful.
I use OneNote 2016 as the first stop to manage shorter texts typically captured from the WEB, but I find the OneNote concept less suited for longer texts which I prefer to have in a PDF or MHT file.
Using the old desktop version has several benefits :
- It allows you to do proper backups. Read the horror story about having no backup when the Microsoft OneNote server mess-up : https://community.spiceworks.com/topic/2246511-onenote-has-a-dark-side-stop-using-onenote-until-you-read-this
- It allows you to export the full Notebook or just a Section or just a Page to a PDF file which may include automatically generated Bookmarks after some unknown rule -or- export the content to a MHT file.
- The continuously updated OneNote iPad ( and Android ) app handles the older OneNote 2016 files fine so both apps can be used for daily use as preferred.
Preferred File Types :
I have most static content documents in one of three file types :
Documents naturally divided into pages are best viewed using PDF as good PDF viewers have a lot of reading features. This is not so for the other file types. There is a add-on for some WEB browsers called FireShot. This add-on will create a potentially very long PDF page without any page break. The concept does work but most PDF readers are not prepared for this and will crash or refuse to view such a file.
Pages that are not split into pages, like WEB pages, are not always suitable as PDF files as images introduces arbitrary page splits which looks silly. The MHT file type is an attractive file type option as it does not introduce arbitrary page breaks of the text when viewed. MHT ( which originated as a web archive file type ) includes everything, including images and text formatting, in just one file. Unfortunately WEB browsers does not save to MHT. – But if the web content is first copied to OneNote then OneNote supports saving the content as MHT as well as PDF and DOCX. OneNote is exceptionally good at managing all the copied text and images in the original layout and OneNote also allows you to do editing ! – 2020 addition : Support for MHT is miserable in iOS and Android. MHT files can be displayed properly in Internet Explorer ( on Windows ) and Word ( on Windows only ). MHT files can be converted to PDF using Word so the text layout can be modified to manage the unavoidable page breaks.
WEB pages can be saved to a single HTML file using add-on’s. This was kind of last resort solution as the add-on’s generating such a file did not always generate a proper copy. Typically some images would be missing or the text formatting would be flawed. Such a plug-in could be Save Page WE and SingleFile both of which -sometimes- work. – But if the web content instead is copied to Word then Word supports saving the content as PDF as well as DOCX. Word is exceptionally good at managing all the copied text and images in the original layout and Word of course also allows you to do the required editing to manage page shifts !
This is an Age Old Subject. So what new twists can possibly be added to this subject.
To get started, I will just refer to what others has written, so here is a nice text with a nice text layout :
The Folder and Files Organization Objective :
What they advise to do makes sense. What they end up suggesting is a ( very ) deeply nested folder structure in which each folder level gets more specific/detailed.
My PDF documents are not sharply dividable into just one specific topic so creating a deeply nested folder structure actually adds confusion instead of clarity.
So in my case I will add additional constraints :
- I don’t want endlessly deep folder nesting. I limit nesting to max two folder overview levels and one level with files. Thus max three folder levels.
- I don’t want folders including only one or two files.
- I don’t want folders including 100’s of files.
- Everything should look the same whether viewed on a PC or viewed on an iPad ( or Android ).
What I have come up with is :
- A file naming template.
- A concept of how to use folder names as index cards into the content of a folder.
- Use a special character ¤ to separate items when listed on the same line ( file name or folder name ).
The funny character ¤ is a classic character called “generic currency” character and it is included even in modern character sets, like ISO/IEC 8859. Texts including this character looks the same on both a Windows PC and on an iPad. For some strange reason it is never used for anything, but as it visually looks good it is the perfect item separator character when making a filename which should include various information.
The File Name Template :
I have therefore created my own file naming template using the ¤ character and some spacing surrounding it as :
Title & Version ¤ Grading ¤ Year ¤ Pages ¤ Writer ¤ Publisher
This format is sufficiently general to handle PDF files as well as HTML files.
PDF, MHT and HTML are the three basic file formats I use for documentation. Other documents like Word files or Excel files is converted to PDF as the information is not supposed to be live documents to be modified.
Title is obvious.
Version could be any brief marking, like v2 or 2nd. Don’t waste many character on that.
Grading is not obvious. I sometimes adds some characters to describe my impression of the text. I use these four character selections : C|R E I P|T. C|T means the text is either Cursory or Reference like. E means the text includes Examples. I means the text is Illustrated. P|T means that the text is either Practical or Theoretical. So a text graded as RT will be difficult and slow to read. A text graded as CEIP will be easy, maybe even amusing, to read.
Year is obvious. Use 1978 and not just 78. I considered putting year first as it indicates the relevance of a file. Something written in 1930 is probably not as up-to-date and relevant as something written in 2010. But in the end and because having a lot of texts without date stamping I settled for making it entry two ( or three ).
Pages is obvious. It conveniently indicates the complexity of the file. Something 3 pages long is probably not as comprehensive as something 300 pages long. HTML files are not separated into pages so there is no length info for such a file.
Writer is obvious. I don’t always include the Writer as I don’t know the writer anyway.
Publisher is obvious. A book like text may include a Publisher. But it is more relevant for WEB content where naming the company publishing the information could be important.
With all that information to be included in a file name it is important to be as brief as possible. Ignoring the Writer and/or Publisher when necessary. Keeping the length shorter than ~ 100 characters is fine.
I am increasingly adding grading to text file names as it adds a another dimension to the type and quality of the text.
A very dry text would be graded as :
¤ ···T ¤
A very user friendly yet comprehensive text would be graded as :
¤ CEIP ¤
User friendly ? – Texts are written with various objectives. One might be that the writer just wants to send a message. Like a law text. Another purpose would be that the writer cares to explain why and how his message should be of interest to the reader. Like the notes for a law text. You get the idea.
Using Folders as Index Cards to Information in a Folder :
It is important to notice that an iPad list folders mixed with files when listed alphabetically. This is an Apple thing and in my opinion quite stupid. I want folders naturally listed in front of files so something must be done to ensure that the iPad also list folders ahead of files. A little file naming ingenuity makes it possible.
Folders at the Highest nesting level :
At the highest folder nesting level there are only folders. Below is a listing of the first entries. The full list includes about 100 folders which is the maximum number of folders I want to to scroll through. The content is sometimes re-arranged to keep that limit :
A ¤=¤ Audio - .Acoustics ¤ Audio - .Brüel & Kjær ¤ Audio - .Engineering ¤ Audio - Amplifiers ¤ Audio - Driver Units, Crossover and more Audio - HI-FI ¤ Audio - Media and Music and more Audio - Reviews Audio - Software Audio - Speakers ¤ B ¤=¤ Basic - LabVIEW and the Nat. Inst. world -¤- 27 Basic - MATLAB and Octave ¤ Basic - Python -¤- 10 C ¤=¤ CAD - Electrical ¤ CAD - Math ¤ CAD - Mechanical ¤ D ¤=¤ DSP - .Analog Devices ¤ DSP - Basic Concepts ¤ DSP - Converter Principles ¤ DSP - Digital Communication ¤ DSP - Digital Filters ¤
I try to group content into a few all-encompassing subjects, like Audio, Basic ( important knowledge ), CAD, DSP and so on. I have about 15 such subjects. But it changes as I sometimes re-group the content within the subjects.
My lastest re-arrangement was the introduction of the Mechatronics subject, which is an important engineering container concept including items from control theory, DSP, mechanical analogies and other sub-subjects.
The listing includes these special attributes :
A ¤=¤ Which is a visual separator ( empty folder ) between each subject. I try to use a single character for clarity.
Audio – .Acoustics ¤ Where Audio is the subject. The dot or point in .Acoustics is used for sorting, forcing Acoustics to be listed first, as well as to indicate this is an important item within this subject. The final ¤ character indicates that there is a organized sub folder here including files for the subject.
Audio – Reviews The absence of the ¤ character indicates that there is an un-organized sub folder here including a mess of files.
Basic – LabVIEW and the National Instruments world -¤- 27 The presence of the -¤- 27 characters indicates there is an organized nested sub folder including 27 folders. This concept is a way to avoid having too many folders at the highest nesting level.
Sub folders at the next highest nesting level :
The next highest nesting level may contain :
- Either files for the subject.
- Or it may contain more folders related to the subject to avoid having too many folders at the highest nesting level.
Basic – LabVIEW and the National Instruments world -¤- 27 indicated 27 sub folders. This is a folder listing :
A - Language, Classic and NXG ¤ B - Actor Framework and OOP ¤ C - Project, SVN, EXE-built and more ¤ D - Digital Signal Processing ¤ E - MathScript, Matlab and HiQ ¤ F - Vision ¤ G - SQL ¤ H - Remote Panels and Computing ¤ I - Sim and Control, Simulink, PID and Fuzzy Logic ¤ J - Kalman Filtering ¤ K - Test Automation ¤ L - Python and LabPython ¤ M - Assorted ¤ N - Toolkits ¤ O - References ¤ P - NI-DAQ, VISA, PXI and more ¤ Q - DLL, CIN and more ¤ R - DMA, Buffers and more ¤ S - myRIO and CompactRIO ¤ T - ELVIS ( look in the Mechatronics chapter ) U - Hardware and more ¤ V - CVI aka LabWindows Language ¤ W - ComponentWorks ¤ X - Measurement Studio ¤ Y - ActiveX, ATL, COM and OLE ¤ Z - Multisim ¤ Ø - Newsletter and more ¤
I listed the full content to show that it may be convenient to add a character is front, like A – in order to control what is listing first ( most often used or most relevant or whatever ).
But the folder could also have looked like the folder listing shown for the highest nesting level. The only rule here is that it should look good to the reader ( me ).
Sub folders at the lowest nesting level :
The lowest nesting level ( either second or third ) holds the actual files related to the subject. Here is a listing :
-1 = Introductions and Tutorials -3 = Comprehensive Texts -4 = Fluffy Texts -5 = Comprehensive Documentation -6 = LabVIEW Technical Resource #1 #1 = Documentation Resources Index ¤ 2018.html #1 = Get Start with LabVIEW ¤ 2013 ¤ 89p ¤ NI.pdf #1 = Introduction to LabVIEW ¤ 2016 ¤ 71p.pptx #1 = LabVIEW Fundamentals ¤ 2005 ¤ 165p ¤ NI.pdf #1 = Tips Labview Development ¤ 2007 ¤ 39p.pdf #3 #3 = LabVIEW - User Manual ¤ 2003 ¤ 349p ¤ NI.pdf #4 #4 = Best Pract. for BDs and FPs ¤ 2011 ¤ 115p.pdf #4 = GPOWER XNodes and VIMs ¤ 2016 ¤ 33p.pdf #4 = SW Eng Tools with LabVIEW - Hands On ¤ 43p.pdf #4 = LabVIEW - Dev Guidelines ¤ 2003 ¤ 97p ¤ NI.pdf #4 = LabVIEW - Meas. Manual ¤ 2000 ¤ 358p ¤ NI.pdf #4 = LabVIEW - Meas. Manual ¤ 2003 ¤ 159p ¤ NI.pdf #4 = LabVIEW Graph Dev - Hands On ¤ 2006 ¤ 126p.pdf #4 = What is LV used for ¤ ViewPoint Systems.html #5 #5 = G Prog Reference Manual ¤ 1998 ¤ 667p ¤ NI.pdf #5 = Func and VI Ref Manual ¤ 1999 ¤ 609p ¤ NI.pdf #5 = LabVIEW Version 5.1 Addendum ¤ 1999 ¤ 108p.pdf #5 = The LabVIEW Style Book ¤ 363p.pdf #6 #6 = LabVIEW Technical Resource 1996 Q3 ¤ 24p.pdf #6 = LabVIEW Technical Resource 1999 Q3 ¤ 8p.pdf #6- = Tech. Res. Introduces Bundled Value Packs.pdf
The listing include both folders ( in bold ) and files ( in blue-ish ). ( I have shortened some file names to avoid line wrap-around in this post ).
The listing also include these special attributes :
#1 Which is a visual separator ( empty file ) between each group. I try to use a single character for clarity.
-1 The minus sign preceding the number or character is important as it controls what an iPad lists first. So the folder names starts with this character to ensure they are listed first.
The folders shown first are empty and are only used as a convenient Index Card content overview of the files.
Both the iPad and a computer indicates folders with one type of icon and files with other types of icons which adds to the ease of content overview.
The first six numbers ( 0 to 5 ) shown in the folder names are reserved to always read this ( when included ) :
-0 = Recommended Texts -1 = Introductions and Tutorials -2 = Brief Concise Texts -3 = Comprehensive Texts -4 = Fluffy Texts -5 = Comprehensive Documentation
The recommended text is listed first. The other texts are listed in “heavy” order.
The remaining numbers ( 6 to 9 ) and characters ( A to Y ) can be included as needed. Z has a special meaning. It is listed last and indicates that the subject includes one or more zip files that may be convenient to have here :
-Z = ZIPs and more
This concludes the description of my preferred off-line file organization. The basic idea is to present the files attractively in my preferred style. It requires some discipline to maintain but as long as the organization can be done on a PC ( using Total Commander ) it is manageable.
Take notice of the use of empty folders as a Content or Index Card listing giving a quick impression of the files content within a subject folder. That concept can be tweaked as desired.