How to organize PDF and HTML files into folders so that the interesting one(s) to read can be found when needed.
Using some special characters to manage Folder and Files listing order when viewed from a Windows PC or an iPad. This is a tricky issue :
- # ^ . ¤ =
This is heavy stuff. So be prepared to read this post slowly. Ask yourself the question : Why is this guy writing about these odd details. Could it be, they are important ?
The Challenge :
Nowadays we just Google whenever we need information about a subject.
Introductory updated information can always be found on Wikipedia.
But what to do with scattered off-line information kept in local PDF or HTML files. PDF files can be anything from a pamphlet or a handout to a complete e-book.
- So a file naming system and a folder naming system is described that helps finding the information when needed.
- Introducing a File Content grading system and using empty Folders as Index Cards for a File Grouping system.
Why bother :
Older references once found and read are often removed from the WEB after some time. I like to be able to re-read older references to help my memory so I want an off-line system.
And once the number of off-line files passes ~100 then they must be organized in order to be handy. I am way past that limit.
Organizing could be in the form of tagging as used for music. Unfortunately there is no standard tagging format for documents saved within the file that can be used across platforms. The problem here is my quest for something also usable on a tablet like an iPad or Android. There are plenty of options if one merely want something for desktops like Windows, see below.
Organizing information can be taken to any desired complexity. Like :
- A Catalog’ing system like used for museum artifacts.
- A Citation Index system as we have used used for scientific topics for decades.
But I settled with something simple that didn’t require any database and do work for both tablets and desktops.
Mind You. This topic is really difficult so take your time to understand why I choose this solution.
Programs handling tagging of files :
Here is a link to a website with a comprehensive overview of tagging programs ( primarily for Windows ) :
And here is a PDF file describing what to think about to manage scientific data. The solutions suggested are specific for their needs but the issues described has general use :
Why not -just- a wiki :
Keeping documents as documents and not as text in wiki systems like Confluence on a web site preserves the layout of the text, including page separation, typeface selections, chapters, bookmarks and so on.
So you can make a proper print out.
And equally important. You can just rename and re-group the files into now/other folders to suit your needs.
Notice that such systems are based on access to a server of some kind. It is not self-contained.
Besides wiki concepts we have the Microsoft SharePoint system.
Which is an attempt to present related files within a managed view generated by a program. As the view is managed then you can add searchable metadata, comments and even graphics as needed.
Sharepoint is nice but you host it on a server. So you need internet access. So again it is not self-contained.
Here is a link to a website with a nice overview of what SharePoint can do related to this subject :
The Challenge – part 2
Having settled on managing documents just using a file system on a disk then the first problem arises. How are folder and file names presented on different platforms. Nowadays we have a number of platforms like :
How to ensure that we have a common look on the platforms we want to use. A concept for folder naming and file naming is proposed that handles this problem.
Text Search using DocFetcher :
Unknown by most then a PC, Mac and Linux can have document search almost like Googl’ing. However it is neither available on iOS nor Android.
The magic program allowing this is DocFetcher. You select a drive or folder to be indexed and it sets of. Some time later it is done and you can now search for sentences and get a list of documents, including PDF files, where there was a match. You even see text snippets of the contexts where the text was found. And it present the results in a snap. If you change or add documents the program will quickly update it database. It is a brilliant piece of software.
OneNote 2016 :
Another text management concept called OneNote ( or OneNote 2016 ) is also very useful.
I use OneNote 2016 as the first stop to manage shorter texts typically captured from the WEB, but I find the OneNote concept less suited for longer texts which I prefer to have in a PDF file.
Using the old desktop version has several benefits :
- It allows you to do proper backups. Read the horror story about having no backup when the Microsoft OneNote server mess-up : https://community.spiceworks.com/topic/2246511-onenote-has-a-dark-side-stop-using-onenote-until-you-read-this
- It allows you to export the full Notebook or just a Section or just a Page to a PDF file which may include automatically generated Bookmarks after some unknown rule.
- The continuously updated OneNote iPad ( and Android ) app handles the older OneNote 2016 files fine so both apps can be used for daily use as preferred.
Preferred File Types :
I have most static content documents in one of these file types :
Documents naturally divided into pages are best viewed using PDF as good PDF viewers have a lot of reading features. This is not so for the other file types. There is a add-on for some WEB browsers called FireShot. This add-on will create a potentially very long PDF page without any page break. The concept does work but most OCR decoders and some PDF readers are not prepared for this and will crash or refuse to view such a file.
WEB pages can be saved to a single HTML file using add-on’s. This is kind of last resort solution as the WEB browser add-on’s, generating such a file, did not always generate a proper copy. Typically some images would be missing or the text formatting would be flawed. Such a plug-in could be Save Page WE or SingleFile both of which -sometimes- work. – But if the web content instead is copied to Word then Word supports saving the content as PDF as well as DOCX.
Word is exceptionally good at managing all the copied text and images in the original layout and Word of course also allows you to do the required editing to manage page shifts !
This is an Age Old Subject. So what new twists can possibly be added to this subject.
To get started, I will just refer to what others has written, so here is a nice text with a nice text layout :
The Folder and Files Organization Objective :
What they advise to do makes sense. What they end up suggesting is a ( very ) deeply nested folder structure in which each folder level gets more specific/detailed.
My PDF documents are not sharply dividable into just one specific topic so creating a deeply nested folder structure actually adds confusion instead of clarity.
So in my case I will add additional constraints :
- I don’t want endlessly deep folder nesting. I limit nesting to max two folder overview levels and one level with files. Thus max three folder levels and usually only two. Example shown below.
- I don’t want folders including only one or two files.
- I don’t want folders including 100’s of files.
- Everything should look the same whether viewed on a PC or viewed on an iPad ( or Android ).
What I have come up with is :
- A file naming template.
- A concept of how to use folder names as index cards into the content of a folder.
- Use a special character ¤ to separate items when listed on the same line ( file name or folder name ).
The funny character ¤ is a classic character called “generic currency” character and it is included even in modern character sets, like ISO/IEC 8859. Texts including this character looks the same on both a Windows PC and on an iPad. For some strange reason it is never used for anything, but as it visually looks good it is the perfect item separator character when making a filename which should include various information.
The File Name Template :
I have therefore created my own file naming template using the ¤ character and some spacing surrounding it as :
Title & Version ¤ Grading ¤ Year ¤ Pages ¤ Writer ¤ Publisher
Title is obvious.
Version could be any brief marking, like v2 or 2nd. Don’t waste many character on that.
Grading is not obvious. I sometimes adds some characters to describe my impression of the text. I use these four character selections : C|R E I P|T. C|T means the text is either Cursory or Reference like. E means the text includes Examples. I means the text is Illustrated. P|T means that the text is either Practical or Theoretical. So a text graded as RT will be difficult and slow to read. A text graded as CEIP will be easy, maybe even amusing, to read.
Year is obvious. Use 1978 and not just 78. I considered putting year first as it indicates the relevance of a file. Something written in 1930 is probably not as up-to-date and relevant as something written in 2010. But in the end and because having a lot of texts without date stamping I settled for making it entry two ( or three ).
Pages is obvious. It conveniently indicates the complexity of the file. Something 3 pages long is probably not as comprehensive as something 300 pages long. HTML files are not separated into pages so there is no length info for such a file. Append page number with a p, like 10p. And use 1xp for HTML. This avoids any misinterpretation with year.
Writer is obvious. I don’t always include the Writer as I don’t know the writer anyway.
Publisher is obvious. A book like text may include a Publisher. But it is more relevant for WEB content where naming the company publishing the information could be important.
With all that information to be included in a file name it is important to be as brief as possible. Ignoring the Writer and/or Publisher when necessary. Keeping the length shorter than ~ 100 characters is fine.
I am increasingly adding grading to text file names as it adds a another dimension to the type and quality of the text.
A very dry text would be graded as :
¤ ···T ¤
A very user friendly yet comprehensive text would be graded as :
¤ CEIP ¤
User friendly ? – Texts are written with various objectives. One might be that the writer just wants to send a message. Like a law text. Another purpose would be that the writer cares to explain why and how his message should be of interest to the reader. Like the notes for a law text. You get the idea.
Using Folders as Index Cards to Information in a Folder :
It is important to notice that an iPad list folders mixed with files when listed alphabetically. This is an Apple thing and in my opinion quite stupid. I want folders naturally listed in front of files so something must be done to ensure that the iPad also list folders ahead of files. A little file naming ingenuity makes it possible.
Folders at the Highest nesting level :
At the highest folder nesting level there are only folders. Below is a partial listing of the first entries. The full list includes about 100 folders which is the maximum number of folders I want to to scroll through. The content is sometimes re-arranged to keep that limit :
-A »»»»»»»»»»»»»»»» all -A^^ = LabVIEW and the NI World -A^^ = Python -C »»»»»»»»»»»»»»»» cad -C^ = Electrical, Mathematical and Mechanical -C^ = MATLAB & Simulink, Octave, Scilab and MATRIXx -D »»»»»»»»»»»»»»»» dsp -D^ = .Analog Devices -D^ = .IEEE ASSP Magazine -D^ = Adaptive Signal Processing and LMS -D^ = Application Specific -Dz = Unsorted Concepts
I try to group content into a few all-encompassing subjects, like ALL ( important knowledge for me ), CAD, DSP and so on. I have about 15 such subjects. But it changes as I sometimes re-group the content within the subjects.
A re-group was creating a Mechatronics subject, which points to concepts as Robotics, Control Theory, Mechanical Analogies and other sub-subjects. These sub-subjects makes nice relations within the Mechatronics subject.
The listing includes these special attributes :
-A Is just a visual separator ( empty folder ) between each subject. I try to use a single character for clarity. All folder names starts with a hyphen ( minus sign ) to ensure it is listed first when viewed on an iPad.
-A^^ = LabVIEW and the NI World The presence of the double ^^ characters indicates that this is a sub folder consisting of more subject folders instead of files. This concept is a convenient way to avoid having too many folders at the highest nesting level. The used special characters ensures that a folder link to a sub folder with additional subjects is listed first.
-D^ = Application Specific The single ^ character indicates that there is an organized sub folder here including file documents related to the subject. Using that folder character convention it is actually possible to create additional folder in a folder holding files. Violating my two sub-folder limitation.
-D^ = .Analog Devices The dot or point in .Analog Devices is used for sorting, forcing Analog Devices to be listed first, as well as to indicate this is an important item within this subject.
-Dz = Unsorted Concepts The z character indicates that there is an un-organized sub folder here including a mess of file documents ( that should be organized ).
Sub folders at the next highest nesting level :
The next highest nesting level may contain :
- Either files for the subject.
- Or it may contain more folders related to the subject to avoid having too many folders at the highest nesting level.
-A^^ = LabVIEW and the NI World is an example of a sub folder with additional subject folders instead of files. This is the folder listing :
-A^ = Language, Classic G and NXG -B^ = LabVIEW CHM Help Files ( there are 300+ ) -C^ = Toolkits, Modules and related. Full listing -D^ = Release and Upgrade Notes and other Notes -E^ = Project Explorer, SVN, EXE-built and more -F^ = Design Patterns - Basic and Advanced Architectures -H^ = Control Design and Simulation Module, PID and Fuzzy -H= = Control Engineering also in 'REG^ ..' -H= = HIL ( using FPGA ) -H= = System Identification Toolkit -H= = VeriStand -I^ = System Tools for Large Applications -I= = Command Line Interface ( Development Management ) -I= = DIAdem -I= = FlexLogger -I= = InsightCM -I= = InstrumentStudio -I= = SystemLink -J^ = NI DAQ Board User Manuals ( 1995 .. 1999 ) -J= = More boards and manufacturers in 'MEC^ Basic' -L^ = Vision Development Module -L= = Machine Learning is present in 'MEC^ AI' -L= = NIVision OpenCV Utilities -L= = OpenVino ( ~ Deep Learning ) -L= = TensorFlow ( ~ Deep Learning ) -N^ = Complex Subjects supported by LabVIEW -N= = MathScript RT Module -N= = Python -N= = Robotics is present in 'MEC^ Robotics' -O^ = Other Visual Languages -P^ = Remote Panels, Remote Computing and WEB -Q^ = PXI, CompactDAQ, NI-DAQmx, NI-DAQ, VISA and more -Q= = Industrial CAN-bus, FIELD-bus and MOD-bus -R^ = CompactRIO, myRIO, Single-Board RIO and more -R= = ELVIS Education Hardware is included here -R= = FPGA support is included here -R= = Real-Time Module is included here -S^ = 3rd Party Hardware related -T^ = DMA, Buffers, Interrupts and more -U^ = DLL, CIN, ATL, COM, OLE, OPC and more -W^ = LabWindows-CVI -X^ = Measurement Studio ( for Visual Studio dotNET ) -Y^ = MultiSim ( Electronics Workbench MultiSim ) -Z^ = NI Days and other Marketing from NI
The full content is listed to show that it is convenient to add a character in front, like –N(^) in order to control what is listing first ( most often used or most relevant or whatever ).
Also note that additional explanations can be added as one-liners for a subject like in -N= and be indented to making the overview easier to read. Creativity is the limit for what can be included. The equal sign = is necessary to force the explanation to be listed below the subject line -N^ , when using an iPad or an Android.
Sub folders at the lowest nesting level :
The lowest nesting level ( either second or third level ) holds the actual files related to the subject. Here is a listing :
-1 = Introductions and Tutorials -3 = Comprehensive Texts -4 = Fluffy Texts -5 = Comprehensive Documentation -A = LabVIEW Technical Resource #1 #1i = Get Start with LabVIEW ¤ 2013 ¤ 89p ¤ NI.pdf #1x = Introduction to LabVIEW ¤ 2016 ¤ 71p.pptx #1i = LabVIEW Fundamentals ¤ 2005 ¤ 165p ¤ NI.pdf #1x = Documentation Resources Index ¤ 2018.html #1x = Tips LabVIEW Development ¤ 2007 ¤ 39p.pdf #3 #3i = LabVIEW - User Manual ¤ 2003 ¤ 349p ¤ NI.pdf #4 #4i = Best Pract. for BDs and FPs ¤ 2011 ¤ 115p.pdf #4i = GPOWER XNodes and VIMs ¤ 2016 ¤ 33p.pdf #4x = SW Eng Tools with LabVIEW - Hands On ¤ 43p.pdf #4i = LabVIEW - Dev Guidelines ¤ 2003 ¤ 97p ¤ NI.pdf #4i = LabVIEW - Meas. Manual ¤ 2000 ¤ 358p ¤ NI.pdf #4i = LabVIEW - Meas. Manual ¤ 2003 ¤ 159p ¤ NI.pdf #4x = LabVIEW Graph Dev - Hands On ¤ 2006 ¤ 126p.pdf #4x = What is LV used for ¤ ViewPoint Systems.html #5 #5i = Func and VI Ref Manual ¤ 1999 ¤ 609p ¤ NI.pdf #5i = G Prog Reference Manual ¤ 1998 ¤ 667p ¤ NI.pdf #5i = LabVIEW Version 5.1 Addendum ¤ 1999 ¤ 108p.pdf #5i = The LabVIEW Style Book ¤ 363p.pdf #A #Ax = LabVIEW Technical Resource 1996 Q3 ¤ 24p.pdf #Ax = LabVIEW Technical Resource 1999 Q3 ¤ 8p.pdf #Ax = Tech. Res. Introduces Bundled Value Packs.pdf
The listing include both folders ( in bold ) and files ( in blue-ish ).
The listing also include these special attributes :
#1 Which is a visual separator ( empty file ) between each group. I try to use a single character for clarity.
-1 The minus sign preceding the number or character for a folder is important as it controls what an iPad lists first. So the folder names always starts with this character to ensure they are listed first.
The folders shown first are empty and are only used as a convenient library like IndexCard content overview of the files.
Both the iPad and a computer indicates folders with one type of icon and files with other types of icons which adds to the ease of content overview.
The numbers ( 0 to 9 ) shown in the folder names are reserved to always read this ( when included ) :
-0 = Recommended Texts -1 = Introductions and Tutorials -2 = Brief Concise Texts -3 = Comprehensive Texts -4 = Fluffy Texts -5 = Comprehensive Documentation -7 = Thesis or Disertation -8 = History or Background -9 = Handbook ( covers everything )
The recommended text is listed first. The other texts are listed in “heavy” order.
The characters ( A to Y ) can be included as needed. Z has a special meaning. It is listed last and indicates that the subject includes one or more zip files that may be convenient to have here :
-A = [important sub-subjects as relevant] -Z = ZIPs and more
Marking OCR state of a file :
#1i #1x #1z #1 The characters x, i, z and no-character in a file name indicates the OCR state of the file, meaning whether it can be searched for text phrases.
A PDF file can have one of several states :
- i means that the file contains a text layer that can be searched -and- the text includes a Table of Content ( ~ bookmark listing ). That is the preferred file quality where text can be found by searching and a content overview is presented by the ToC.
- x is as i except there is no ToC. This kind of file is still searchable but it is difficult getting a quick overview of the content. A ToC should be added but making one is tedious work.
- z is a file that that doesn’t include a searchable text layer. A “text” page is just an image. Such a file should be run through an OCR program to include a text layer so that the file as a minimum can be searched.
- no-character is a file that has not been checked for the presence of a text layer. Meaning it should be checked when convenient.
A HTML file is always text and illustrations so it can always be searched. But it never includes a Table of Content. It is always a x.
This concludes the description of my preferred off-line file organization. The basic idea is to present the files attractively in my preferred style. It requires some discipline to maintain but as long as the organization can be done on a PC ( using Total Commander ) it is manageable.
Take notice of the use of empty folders as a Content or Index Card listing giving a quick impression of the files content within a subject folder. Somewhat like an Index Drawer as they could be found in a Library, in old days. That concept can be tweaked as desired.